[tt] CHE: Data Deluge From Collider Prompts Next Big Information Revolution

Premise Checker <checker at panix.com> on Sun Sep 14 11:17:49 CEST 2008

Data Deluge From Collider Prompts Next Big Information Revolution
The Chronicle of Higher Education, 8.9.12
http://chronicle.com/weekly/v55/i03/03a01501.htm

By RICHARD MONASTERSKY

When the Large Hadron Collider revs up to full capacity near Geneva, it 
will generate about 15 million gigabytes of data each year -- enough to 
fill a stack of DVDs more than two miles high.

So much information will be pouring out that it will equal about 1 percent 
of the total data produced each year throughout the world, says François 
Grey, head of communications for information technology at CERN, the 
European particle-physics laboratory where the collider is located.

The collider project will need to sort and store every single bit and then 
make them available for physicists on every continent except Antarctica.

To meet this grand challenge, CERN has built up the LHC Computing Grid to 
handle the data and provide access for the 7,000 scientists from 500 
universities and laboratories around the world who are participating in 
the experiment.

Often called the Grid, the distributed computing network will eventually 
link up 100,000 processors. About 20 percent of those CPU's sit in long 
rows of racks at CERN, with the rest spread around the globe at national 
labs and universities.

The computing facilities are distributed like the branches of a tree, with 
CERN as the main trunk, or Tier 0. It sends copies of all of the collider 
data to 11 major limbs called Tier 1 facilities.

The United States has two of these, at Brookhaven National Laboratory and 
at Fermi National Accelerator Laboratory, which each serve one of the 
major teams of researchers involved in the collider project.

The bulk of the computing power is spread out beyond these limbs, among 
250 smaller branches called Tier 2 centers.

The University of Texas at Arlington is the lead institution for one of 
the Tier 2 centers in the United States. The university has devoted 1,000 
processors and 500,000 gigabytes of storage to the project, says Kaushik 
De, a professor of physics there and the center's coordinator.

When a physicist at a university wants to analyze some collider data, she 
submits her job through her computer at her institution. The LHC Grid 
software then goes out looking for the data, the programs, and the 
computing power she needs for the job.

The request might land at a local Tier 2 facility or it might travel to a 
Tier 1 halfway around the world. Once the available processors have 
finished the analysis, the Grid sends back the results to her own 
computer. "The best analogy for the Grid is a farming cooperative," says 
Mr. Grey. "By sharing resources, we can use them more efficiently."

Unlike the World Wide Web, which was developed at CERN, the idea of a grid 
for distributed computing was conceived by researchers in the United 
States in the 1990s. The fields of astronomy, biomedicine, and earth 
sciences are already using computing grids, as are companies like IBM and 
Hewlett Packard.

But the LHC Grid will be the biggest test of this strategy yet, says Mr. 
Grey. "It's really putting the grid into practice."

More information about the tt mailing list