EP-IT Data science seminars

PHYSTAT seminar: Supervised Learning with Biased Training Data and Applications to Supernova Type Ia Cosmology

by Roberto Trotta




It is often the case that the data set used to train a classifier or regression model is not a statistically representative sample of the unlabelled ''test set'' on which the model is to be deployed -- a problem that in statistics goes under the name of "covariate shift". This lack of representativeness is almost ubiquitous in applied machine learning, and it is common in astrophysics and cosmology, where instrumental selection effects generically make it easier (and hence more probable) to observe brighter objects. In an astrophysical setting, covariate shift can be investigated with realistic numerical simulations, making problems in this domain an ideal test ground for possible statistical solutions. Addressing covariate shift is necessary in order to avoid pernicious biases in the classification/regression on the test set, and to improve accuracy when extrapolating into parts of the data space that suffer from lack of training data. 

In this talk, I will present the basics of supernova type Ia cosmology, one of the main probes used to constrain the nature of dark energy. I will then focus on the problem of covariate shift in supernova type Ia classification from photometric data, a timely and important problem in view of the upcoming thousands of supernova candidates per year from e.g. the Very Rubin Observatory. I will present a solution based on propensity scores partitioning of feature space, called STACCATO, and demonstrate its ability to outperform other methods, in some cases comparable to the performance that would be obtained with unbiased training data.   

The seminar will be done remote only.

Password: 152560

Organized by

M. Girone, M. Elsing, L. Moneta, M. Pierini
Event co-organised with the PHYSTAT Committee