A joint Fermilab/SLAC publication
Illustration: Theory search vs. anomaly detection
Illustration by Sandbox Studio, Chicago with Ana Kova

Blink and it’s gone


Fast electronics and artificial intelligence are helping physicists capture data and decide what to keep and what to throw away.

The nucleus of the atom was discovered a century ago thanks to scientists who didn’t blink.

Working in pitch darkness at the University of Manchester between 1909 and 1913, research assistants Hans Geiger and Ernest Marsden peered through microscopes to count flashes of alpha particles on a fluorescent screen. The task demanded total concentration, and the scientists could count accurately for only about a minute before fatigue set in. The physicist and science historian Siegmund Brandt wrote that Geiger and Marsden maintained their focus by ingesting strong coffee and “a pinch of strychnine.” 

Modern particle detectors use sensitive electronics instead of microscopes and rat poison to observe particle collisions, but now there’s a new challenge. Instead of worrying about blinking and missing interesting particle interactions, physicists worry about accidentally throwing them away.  

The Large Hadron Collider at CERN produces collisions at a rate of 40 million per second, producing enough data to fill more than 140,000 one-terabyte storage drives every hour. Capturing all those events is impossible, so the electronics have to make some tough choices.

“We need to be constantly looking. We can’t close our eyes.”

To decide which collisions to retain for analysis and which ones to discard, physicists use specialized systems called trigger systems. The trigger is the only component to observe every collision. In about half the time it takes a human to blink, the CMS experiment’s triggers have processed and discarded 99.9975% of the data. 

Depending on how a trigger is programmed, it could be the first to capture evidence of new phenomena—or to lose it.  

“Once we lose the data, we lose it forever,” says Georgia Karagiorgi, a professor of physics at Columbia University and the US project manager for the data acquisition system for the Deep Underground Neutrino Experiment. “We need to be constantly looking. We can’t close our eyes.”

The challenge of deciding in a split second which data to keep, some scientists say, could be met with artificial intelligence. 

A numbers game

Discovering new subatomic phenomena often requires amassing a colossal dataset, most of it uninteresting. 

Geiger and Marsden learned this the hard way. Working under the direction of Ernest Rutherford, the two scientists sought to reveal the structures of atoms by sending streams of alpha particles through sheets of gold foil and observing how the particles scattered. They found that for about every 8000 particles that passed straight through the foil, one particle would bounce away as though it had collided with something solid. That was the atom’s nucleus, and its discovery sent physics itself on a new trajectory.

By today’s physics’ standards, Geiger and Marsden’s 1-in-8000 odds look like a safe bet. The Higgs boson is thought to appear in only one out of every 5 billion collisions in the LHC. And scientists have only a small window of time in which to catch them.

“At CMS we have a massive amount of data,” says Princeton physicist Isobel Ojalvo, who has been heavily involved in upgrading the CMS trigger system. “We’re only able to store that data for about three and a half [millionths of a second] before we make decisions about keeping it or throwing it away.”

Illustration: Iceberg of discarded date (something lost)

Illustration by Sandbox Studio, Chicago with Ana Kova

The triggers will soon need to get even faster. In the LHC’s Run 3, set to begin in March 2022, the total number of collisions will equal that of the two previous runs combined. The collision rate will increase dramatically during the LHC’s High-Luminosity era, which is scheduled to begin in 2027 and continue through the 2030s. That’s when the collider's luminosity, a measure of how tightly the crossing beams are packed with particles, is set to increase tenfold over its original design value.

Collecting this data is important because in the coming decade, scientists will intensify their searches for phenomena that are just as mysterious to today’s physicists as atomic nuclei were to Geiger and Marsden. 

A new physics

In 2012, the Higgs boson became the last confirmed elementary particle of the Standard Model, the equation that succinctly describes all known forms of matter and predicts with astonishing accuracy how they interact. 

But there are strong signs that the Standard Model, which has guided physics for nearly 50 years, won’t have the last word. In April, for instance, preliminary results from the Muon g-2 experiment at the US Department of Energy’s Fermi National Accelerator Laboratory offered tantalizing hints that the muon may be interacting with a force or particle the Standard Model doesn’t include. Identifying these phenomena and many others may require a new understanding.

“Given that we have not seen [beyond the Standard Model] physics yet, we need to revolutionize how we collect our data to enable processing data rates at least an order of magnitude higher than achieved thus far,” says MIT physicist Mike Williams, who is a member of the Institute for Research and Innovation in Software for High-Energy Physics, IRIS-HEP, funded by the National Science Foundation.

Physicists agree that future triggers will need to be faster, but there’s less consensus on how they should be programmed. 

“How do we make discoveries when we don’t know what to look for?” asks Peter Elmer, executive director and principal investigator for IRIS-HEP. “We don’t want to throw anything away that might hint at new physics.”

There are two different schools of thought, Ojalvo says. 

The more conservative approach is to search for signatures that match theoretical predictions. “Another way,” she says, “is to look for things that are different from everything else.”

This second option, known as anomaly detection, would scan not for specific signatures, but for anything that deviates from the Standard Model, something that artificial intelligence could help with.

“In the past, we guessed the model and used the trigger system to pick those signatures up,” Ojalvo says.

But “now we’re not finding the new physics that we believe is out there,” Ojalvo says. “It may be that we cannot create those interactions in present-day colliders, but we also need to ask ourselves if we’ve turned over every stone.”

Instead of searching one-by-one for signals predicted by each theory, physicists could deploy to a collider’s trigger system an unsupervised machine-learning algorithm, Ojalvo says. They could train the algorithm only on the collisions it observes, without reference to any other dataset. Over time, the algorithm would learn to distinguish common collision events from rare ones. The approach would not require knowing any details in advance about what new signals might be, and it would avoid bias toward one theory or another.

MIT physicist Philip Harris says that recent advances in artificial intelligence are fueling a growing interest in this approach—but that advocates of “theoryless searches” remain a minority in the physics community. 

More generally, says Harris, using AI for triggers can create opportunities for more innovative ways to acquire data. “The algorithm will be able to recognize the beam conditions and adapt their choices,” he says. “Effectively, it can change itself.”

Programming triggers calls for tradeoffs between efficiency, breadth, accuracy and feasibility. “All of this is wonderful in theory,” says Karagiorgi. “It’s all about hardware resource constraints, power resource constraints, and, of course, cost.”

“Thankfully,” she adds, “we don’t need strychnine.”