Skip to main content

Achievement unlocked: 100 petabytes of data

Experiments at the Large Hadron Collider reached a milestone in data collection just before the accelerator’s last collisions for the next two years.

Photo of 100 petabytes
Photo by CERN

A collective library of every written word, in every language, would contain about 50 petabytes of data. Today, just before the Large Hadron Collider smashed its last proton beams in advance of a two-year shutdown, scientists there announced their experiments had recorded double that amount.

The accelerator, located on the border of Switzerland and France, sends two beams of protons in opposite directions around a 17-mile ring, bringing them into collision at four points. Six detectors—two multipurpose and four optimized to monitor specific phenomena—collect data from what happens in these collisions.

When parts of the proton beams collide, their energy shifts momentarily into mass, forming short-lived particles that pass through or decay within the detectors, leaving signatures of their presence. Scientists design computer programs tailored to pick the most interesting collisions from among the noise. Out of the 600 million collisions produced by the LHC every second, only a few prove interesting enough to keep.

A copy of the data from those interesting collisions is recorded and kept at CERN. Another copy of the data is split between participating laboratories and universities, who provide the computing power necessary for scientists and students to analyze pieces of the data.

“Every experiment has at least two copies of the data they decide to keep,” says Alberto Pace, the leader of CERN’s data management group. The first copy is recorded to tape at CERN; a second is recorded at remote sites outside the laboratory.

Since the LHC’s first collisions in November 2009, the rate of data that it has produced and that its experiments have recorded has accelerated exponentially. Initially, each experiment was able to send about one gigabyte of data per second to be recorded. That’s about 1000 mp3s every single second. Now, they can send about six gigabytes per second. And they have plans to go even further.

“We’ll improve the system during the long shutdown,” Pace says. “We want to make sure that, if the experiments need a higher data transfer rate, it’ll be possible.”

The beam will continue to run with no collisions until Feb. 16, when the LHC will power off for a two-year shutdown for repairs to prepare it to reach higher energies. During that time, the LHC will go through renovations and maintenance work that will gradually cover the entire circumference of the machine.

But those 100 petabytes of LHC data will give scientists not working on the shutdown plenty to do while they wait for more data. “It’s no problem filling two years,” says Steven Goldfarb, a physicist on the ATLAS experiment. “By the time it’s over, we’ll be wishing we had one more month.”

Note: This article has been corrected to state that LHC experiments store back-up copies of their data at remote locations, not on the CERN site.