Six questions physicists ask when evaluating scientific claims

In 2011, scientists on the OPERA experiment announced an observation they couldn’t explain: neutrinos that seemed to be traveling faster than the speed of light.

Their analysis methods were solid and their numbers statistically significant. But very few physicists believed that these neutrinos were actually joyriding past the universe’s ultimate speed limit. “I don’t recall ever having a conversation with a serious scientist who thought it was likely to be correct,” theorist Matt Strassler wrote about OPERA’s findings on his blog, Of Particular Significance.

The skeptical scientists were right: A few months after the announcement, the experimenters discovered the culprits: a fiberoptic cable that was not screwed in properly, plus calibration errors within their timing system.

The goal of science is to search for truth through impartial analysis. But according to University of Texas at Austin professor Peter Onyisi, scientists must also consider the human element.

“There is a significant amount of human interpretation in the sciences,” Onyisi says. “We try to put things in this picture with absolute rules to remove the human judgment. But the human judgment is actually very important.”

Here are six questions physicists ask themselves when judging the merit and meaning of a scientific claim.

1. Where did the data come from?

In 2004, academics studying female fertility published a scientific paper concluding that after a year of trying, one in three women between the ages of 35 and 39 will not be able to get pregnant.

The findings were widely covered. They led psychologist Jean Twenge to fret that she might have missed her chance to start a family—that is, until she read the scientific paper and discovered the source of their data: women living in rural France between the years 1670 and 1830.

“In other words, millions of women are being told when to get pregnant based on statistics from a time before electricity, antibiotics or fertility treatment,” Twenge wrote in a 2013 article in The Atlantic. “Most people assume these numbers are based on large, well-conducted studies of modern women, but they are not.”

According to Onyisi, the source of the data provides vital context for scientific findings. “Your data has to be representative of what you want to represent,” he says.

Extrapolating results from one population to others can lead to misconceptions, or the development of tools or treatments that are ineffective for large subsets of the population. Many facial recognition tools work well for white faces, for example, but fail to recognize people of color.

“If you train your machine-learning algorithms on a specific subset of faces that’s not representative of the population at large, your tool will learn to distinguish that type of person, but not everyone,” Onyisi says.

For their part, physicists are very careful to consider all relevant subatomic processes when developing their Monte Carlo collision simulations, which serve as a check on the data coming out of their experiments. Overlooking a relevant background process could lead to misinterpreting the experimental data or missing important physics signals.

2. How was the data collected and handled?

Even if researchers source their data from a representative population, they still risk accidentally influencing the results through the process of doing the experiment.

“It’s very hard to do things that are completely unbiased,” Onyisi says.

Researchers try to remove bias by performing studies in which certain information is hidden until the very end. Physicists will even build their analyses using simulated data to make sure their desire for discovery is not influencing how they set up the search.

“Once we have the whole analysis procedure defined, documented, reviewed and approved, we ‘open the box’ and look at our signal region,” says Stephane Willocq, a professor at the University of Massachusetts, Amherst. “We try to minimize experimental bias. We’d like to eliminate it completely, but we know that at the end of the day, there may be some level of bias.”

Physicists are always grappling with how they can account for bias and build additional checks and tests into their analyses.

“It helps to have a great sense of self-doubt,” Onyisi says. “It’s a little bit of a joke, but we are always asking ourselves, ‘How could I have possibly done this wrong?’”

3. How exceptional is the data?

According to Willocq, the amount of data scientists need to claim a discovery is closely tied to how irrefutable the data is.

“You could make a discovery with one event,” Willocq says. “What matters is whether your discovery can be easily mimicked by something else.”

In 2016, the Laser Interferometer Gravitational-Wave Observatory discovered gravitational waves emanating from a collision of two black holes. Because LIGO scientists had a thorough understanding of ways their detector could be fooled and had eliminated those possibilities, they could claim a discovery with this single event.

But not all signals are as clear as the gravitational waves observed by LIGO. A couple of decades ago, the DAMA experiment observed what could be evidence of dark matter particles—an increased amount of activity in its detector during the months when the Earth is likely moving the fastest through our galaxy’s cloud of dark matter. DAMA has continued to see this signal to this day. But because scientists can think of causes for the signal other than dark matter, most are waiting for some other form of confirmation before accepting the dark-matter interpretation of DAMA’s results.

According to Willocq, eliminating all the other possibilities is where the real work comes in. “You need your experimental tools to be sharp enough to distinguish the signal from the background,” he says. “And you also need to ask, are the tools appropriate for the question you are trying to answer?”

4. Are the results statistically significant?

In 2016, both the ATLAS and CMS experiments saw something unexpected: a bump in their experimental data around 750 GeV. The theory community jumped on these puzzling results and published around 500 papers speculating if this bump could be the first evidence of a new particle. Over the next few months, both experiments quadrupled their datasets. And the bump disappeared.

“When we’re dealing with low-frequency occurrences, we can see impressive fluctuations,” Onyisi says, “but that doesn’t mean we’re seeing something new.”

For instance, a mayor might cite a 7% increase in robberies as evidence of a new crime wave. “But if that increase is from 100 to 107, statistically speaking, that’s the amount of variation we can expect to see year to year,” Onyisi says.

Scientists use statistical analysis to determine the difference between natural variation in a normal process and the undeniable influence of something new.

“Statistical uncertainty is very well understood,” Onyisi says. “There is a very impressive set of mathematical theories behind it. You can easily calculate how often you should see a certain result just due to chance.”

Physicists won’t claim a new discovery until they have passed something called the 5-sigma threshold: that is, the odds that their signal is due to normal statistical fluctuations (and not something new) is 1 to 3.5 million. (The 750 GeV bump scientists saw was only around 2.1 sigma, and it turned out to be just an unlucky 1 in 50 statistical fluctuation.)

5. How significant is the significance?

But statistical significance isn’t the end-all, be-all of scientific discovery.

“People tend to confuse statistical significance with real world significance,” Onyisi says.

The “faster-than-light neutrinos” result had a statistical significance of 6 sigma. It was highly unlikely to be a statistical fluctuation, but also highly unlikely that it proved Einstein wrong.

The physics community is always grappling with how to interpret their statistically significant results. For instance, scientists are extremely confident in their numerous statistically significant measurements of the Higgs boson. But what they cannot yet rule out are models that extend the Standard Model and allow for a composite Higgs—that is, a Higgs boson that is actually a composite particle made from smaller constituent parts.

“Even when something is statistically significant, the interpretations can still differ,” Onyisi says. “Our accumulated knowledge of science is filtered through all these previous experiments and disagreements. We only arrive at a consensus after all the alternative explanations have been ruled out.”

6. Were the results confirmed by an independent experiment?

Even the most rigorous experimenters can make mistakes, which is why independent confirmation is key.

“As physicists, we are very cautious,” Willocq says. “We understand the limitations of the experimental process. There are always uncertainty and assumptions, and we want to see further studies before we make definitive statements.”

Even though it was clear that LIGO’s results constituted the discovery of gravitational waves, that clarity would have come into question if the Virgo gravitational-wave observatory in Italy hadn’t also seen gravitational waves when it turned on.

Independent confirmation is so important that the LHC research program contains two similar experiments—ATLAS and CMS—that both explore some of the same phenomena. The two experiments serve as independent checks on each other’s results thanks to their unique experimental designs. Shortly after the start-up of the LHC, both experiments worked on independent searches for the Higgs boson. In 2012, the two experiments presented independent evidence of a new Higgs-like boson.

According to Willocq, a new major discovery is not where the process of scientific inquiry ends, but where the next chapter begins.

“We haven’t closed the book on the Higgs boson,” he says. “It actually quite the opposite; the Higgs is probably the most exciting particle for us to study for the next few decades. It’s totally unlike anything else discovered so far and a very rich area of exploration.”