A joint Fermilab/SLAC publication

Achievement unlocked: 100 petabytes of data


Experiments at the Large Hadron Collider reached a milestone in data collection just before the accelerator’s last collisions for the next two years.

Photo of 100 petabytes
Photo by CERN

A collective library of every written word, in every language, would contain about 50 petabytes of data. Today, just before the Large Hadron Collider smashed its last proton beams in advance of a two-year shutdown, scientists there announced their experiments had recorded double that amount.

The accelerator, located on the border of Switzerland and France, sends two beams of protons in opposite directions around a 17-mile ring, bringing them into collision at four points. Six detectors—two multipurpose and four optimized to monitor specific phenomena—collect data from what happens in these collisions.

When parts of the proton beams collide, their energy shifts momentarily into mass, forming short-lived particles that pass through or decay within the detectors, leaving signatures of their presence. Scientists design computer programs tailored to pick the most interesting collisions from among the noise. Out of the 600 million collisions produced by the LHC every second, only a few prove interesting enough to keep.

A copy of the data from those interesting collisions is recorded and kept at CERN. Another copy of the data is split between participating laboratories and universities, who provide the computing power necessary for scientists and students to analyze pieces of the data.

“Every experiment has at least two copies of the data they decide to keep,” says Alberto Pace, the leader of CERN’s data management group. The first copy is recorded to tape at CERN; a second is recorded at remote sites outside the laboratory.

Since the LHC’s first collisions in November 2009, the rate of data that it has produced and that its experiments have recorded has accelerated exponentially. Initially, each experiment was able to send about one gigabyte of data per second to be recorded. That’s about 1000 mp3s every single second. Now, they can send about six gigabytes per second. And they have plans to go even further.

“We’ll improve the system during the long shutdown,” Pace says. “We want to make sure that, if the experiments need a higher data transfer rate, it’ll be possible.”

The beam will continue to run with no collisions until Feb. 16, when the LHC will power off for a two-year shutdown for repairs to prepare it to reach higher energies. During that time, the LHC will go through renovations and maintenance work that will gradually cover the entire circumference of the machine.

But those 100 petabytes of LHC data will give scientists not working on the shutdown plenty to do while they wait for more data. “It’s no problem filling two years,” says Steven Goldfarb, a physicist on the ATLAS experiment. “By the time it’s over, we’ll be wishing we had one more month.”

Note: This article has been corrected to state that LHC experiments store back-up copies of their data at remote locations, not on the CERN site.

Latest news articles

Today’s long-anticipated announcement by Fermilab’s Muon g-2 team appears to solidify a tantalizing conflict between nature and theory. But a separate calculation, published at the same time, has clouded the picture.

The New York Times

It's not the next Higgs boson—yet. But the best explanation, physicists say, involves forms of matter and energy not currently known to science.


First results from Fermilab’s Muon g-2 experiment strengthen evidence of new physics

The new measurement from the Muon g-2 experiment at Fermilab strongly agrees with the value found at Brookhaven and diverges from theory with the most precise measurement to date.


A laser beam has been used to slow down antihydrogen atoms, the simplest atoms made of pure antimatter.