Skip to main content

Data scientists face off in LSST machine-learning competition

Data scientists trained computers to pick out useful information from LSST’s hi-res snapshots of the universe.

A rendering of the Large Synoptic Survey Telescope’s dome
LSST Collaboration

A new telescope will take a sequence of hi-res snapshots with the world’s largest digital camera, covering the entire visible night sky every few days—and repeating the process for an entire decade. That presents a big data challenge: What’s the best way to rapidly and automatically identify and categorize all of the stars, galaxies and other objects captured in these images?

To help solve this problem, the scientific collaboration that is working on this Large Synoptic Survey Telescope project launched a competition among data scientists to train computers on how to best perform this task. The Photometric LSST Astronomical Time-Series Classification Challenge, or PLAsTiCC, hosted on the Kaggle.com platform, provided a simulated data set for 3 million objects and tasked participants with identifying which of 15 classifications was the best fit for each object.

Kyle Boone, a UC Berkeley graduate student who has been working on computer algorithms in support of the Nearby Supernova Factory experiment and Supernova Cosmology Project efforts at the US Department of Energy’s Lawrence Berkeley National Laboratory, devoted some of his spare time to the international machine-learning challenge in late 2018 while also working toward his PhD.

“As I worked on job applications, I started playing around with this competition to teach myself more about machine learning,” Boone says. Participants could submit their codes up to five times per day to check their performance on a leaderboard for 1 million objects in the test set. The competition ran from September 28, 2018, to December 17, 2018, and Boone was up against 1383 other competitors on 1093 teams.

“During the last few weeks I worked really hard on it,” devoting all of his evenings and weekends to intense coding, he says.

“My results started to become competitive, and I rushed to implement all of the different ideas that I was coming up with. It was fun, and several teams were neck-and-neck until the end. I learned a lot about how to tune machine-learning algorithms. There are a lot of little ‘knobs’ you can tweak to get that extra 1 percent performance.”

While giving a science talk on the final day of the competition, Boone received a text from his fiancée. “She messaged me and said, ‘Congratulations.’ That was pretty exciting,” he says. He won $12,000 for his first-place finish, and also participated in a second phase of the competition that was more open-ended and is driving toward more applicable solutions in categorizing the objects that LSST will see—the latest round concluded January 15.

Renée Hložek, as assistant professor of astrophysics at the University of Toronto in Canada who led the Kaggle challenge, says, “It is really refreshing to see how combinations of approaches lead to really innovative and novel solutions.

“We have big plans for the next iterations of PLAsTiCC, since there are many ways in which the real LSST data will be even more challenging than our current simulations.”

She notes that PLAsTiCC was created through a collaboration between two science groups working on LSST: the Transient and Variable Stars Collaboration and the Dark Energy Science Collaboration.

Gautham Narayan, a Lasker Data Science Fellow at the Space Telescope Science Institute who is a member of TVS and DESC and served as a host for the LSST Kaggle competition, says that the solutions submitted by PLAsTiCC competitors all had different strengths and weaknesses.

“We’re looking at their submissions to see if we can do even better,” he says. It may be possible to mix and match the different solutions to develop an improved code.

“Machine learning is advancing so fast,” he says. “The numbers are staggering to behold.”

Boone says, “The competition really motivated people to think outside the box and come up with new ideas. There were a lot of very interesting ideas that I don’t think have ever been tried before. I think that combining all of the best models is going to give a huge boost and be very useful for LSST.”

In his work at Berkeley Lab, Boone analyzes data taken from telescopes to understand all of the properties of Type Ia supernovae, and develops new models that can provide accurate distance measurements even for distant supernovae. Type Ia supernovae are used as so-called “standard candles” for measuring distances in the universe based on their luminosity, but these measurements can be affected by the size of the galaxy in which they reside.

Boone says he hopes to apply his programming work for the LSST competition to his work at Berkeley Lab. “It’s very relevant to my own research,” he says, adding that he plans to prepare a scientific paper based on the machine-learning code he wrote for the competition.

Editor's note: A version of this article was originally published by Berkeley Lab.