Physicist Alexx Perloff, a graduate student at Texas A&M University on the CMS experiment, is using data from the first run of the Large Hadron Collider for his thesis, which he plans to complete this year. When all is said and done, it will have taken Perloff a year and a half to collect the computing time necessary to analyze all the information he needs—not unusual for a thesis.
But had he had the computing tools LHC scientists are using now, he estimates he could have finished his particular kind of analysis in about three weeks - the equivalent of having 26 times the computing resources. Although Perloff represents only one scientist working on the LHC, his experience shows the great leaps scientists have made in LHC computing by democratizing their data, becoming more responsive to popular demand and improving their analysis software.
A deluge of data
Scientists estimate the current run of the LHC could create up to 10 times more data than the first one. CERN already routinely stores 6 gigabytes (or 6 billion units of digital information) per second, up from 1 gigabyte per second in the first run.
The second run of the LHC is more data-intensive because the accelerator itself is more intense: The collision energy is 60 percent greater, resulting in “pile-up” or more collisions per proton bunch. Proton bunches are also injected into the ring closer together, resulting in more collisions per second.
On top of that, the experiments have upgraded their triggers, which automatically choose which of the millions of particle events per second to record. The CMS trigger will now record more than twice as much data per second as it did in the previous run.
Had CMS and ATLAS scientists relied only on adding more computers to make up for the data hike, they would likely have needed about four to six times more computing power in CPUs and storage than they used in the first run of the LHC. To avoid such a costly expansion, they found smarter ways to share and analyze the data.
Flattening the hierarchy
Over a decade ago, network connections were less reliable than they are today, so the Worldwide LHC Computing Grid was designed to have different levels, or tiers, that controlled data flow.
All data recorded by the detectors goes through the CERN Data Centre, known as Tier-0, where it is initially processed, then to a handful of Tier-1 centers in different regions across the globe. During the last run, the Tier-1 centers served Tier-2 centers, which were mostly the smaller university computing centers where the bulk of physicists do their analyses.
“The experience for a user on Run I was more restrictive,” says Oliver Gutsche, assistant head of the Scientific Computing Division for Science Workflows and Operations at Fermilab, the US Tier-1 center for CMS. “You had to plan well ahead.”
Now that the network has proved reliable, a new model, initiated in Run I and now fully in place for Run II, “flattens” the hierarchy, enabling a user at any ATLAS or CMS Tier-2 center to access data from any of their centers in the world. This was initiated in Run I and is now fully in place for Run II. Through a separate upgrade known as data federation, users can also open a file from another computing center through the network, enabling them to view the file without going through the process of transferring it from center to center.
Another significant upgrade affects the network stateside. Through its Energy Sciences Network, or ESnet, the US Department of Energy increased the bandwidth of the transatlantic network that connects the US CMS and ATLAS Tier-1 centers to Europe. A high-speed network, ESnet transfers data 15,000 times faster than the average home network provider.
Dealing with the rush
One of the thrilling things about being a scientist on the LHC is that when something exciting shows up in the detector, everyone wants to talk about it. The downside is everyone also wants to look at it.
“When data is more interesting, it creates high demand and a bottleneck,” says David Lange, CMS software and computing co-coordinator and a scientist at Lawrence Livermore National Laboratory. “By making better use of our resources, we can make more data available to more people at any time.”
To avoid bottlenecks, ATLAS and CMS are now making data accessible by popularity.
“For CMS, this is an automated system that makes more copies when popularity rises and reduces copies when popularity declines,” Gutsche says.
Improving the algorithms
One of the greatest recent gains in computing efficiency for the LHC relied on the physicists who dig into the data. By working closely with physicists, software engineers edited the algorithms that describe the physics playing out in the LHC, thereby significantly improving processing time for reconstruction and simulation jobs.
“A huge amount of effort was put in, primarily by physicists, to understand how the physics could be analyzed while making the computing more efficient,” says Richard Mount, senior research scientist at SLAC National Accelerator Laboratory who was ATLAS computing coordinator during the recent LHC upgrades.
CMS tripled the speed of event reconstruction and halved simulation time. Similarly, ATLAS quadrupled reconstruction speed.
Algorithms that determine data acquisition on the upgraded triggers were also improved to better capture rare physics events and filter out the background noise of routine (and therefore uninteresting) events.
“More data” has been the drumbeat of physicists since the end of the first run, and now that it’s finally here, LHC scientists and students like Perloff can pick up where they left off in the search for new physics—anytime, anywhere.