Speaker
Description
One of the major goals of the Belle II Experiment is the search for rare decay processes, which manifest as tiny signals over large background contributions. Measuring such delicate signals with the highest possible precision requires not only large datasets from the actual experiment, but typically even larger simulated datasets for the development of such analyses.
Since running the analysis software over these entire datasets for every analysis is computationally wasteful, centrally produced, preselected subsets of collision events (so-called skims) are essential for efficient data access. However, the generation of skimmed simulated datasets itself is computationally inefficient, because the entire simulation chain, including expensive detector simulation and reconstruction algorithms, must be run even for events that will be discarded by skims later on.
To remedy this issue, we present a method which uses machine learning algorithms to predict already before the expensive steps of the simulation whether an event will be selected by a skim or not, such that wasteful computation can be skipped for discarded events. In particular, a transformer-based neural network architecture is employed in conjunction with importance sampling to avoid biases in the data selection.
This contribution will highlight the development and validation of our approach, its incorporation into the Belle II production software and its future potential in the face of ever growing data challenges.