The term “Big Data” emphasizes data quantity, not quality. What will be the effective sample size when we take into account the deterioration of data quality because of, for example, the selection bias in COVID-19 testing or the non-response bias in 2016 US Election polling results? This talk provides an answer to such questions, based on the concept of data defect index (ddi) developed in . It will also discuss briefly the application of ddi for 2020 US Election, as reported in .
 Xiao-Li Meng (2018) “Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election”, Annals of Applied Statistics 12 685.
 M. Isakov and S. Kuriwaki (2020) “Towards Principled Unskewing: Viewing 2020 Election Polls Through a Corrective Lens from 2016”, Harvard Data Science Review, https://hdsr.mitpress.mit.edu/pub/cnxbwum6/release/3
Xiao-Li Meng is Professor of Statistics at Harvard. He was Chair of their Statistics Department, and is the Founding Editor in Chief of the Harvard Data Science Review. His research interests include ‘Statistical theory for data science’; and ‘Signal extraction and uncertainty estimates’. He is a very popular lecturer.
M. Girone, M. Elsing, L. Moneta, M. Pierini
Event co-organised with the PHYSTAT Committee