Machine learning methodologies have increasingly been employed in high-energy physics research. The interest in ML for physics is due to the incredibly high amount of data which is produced by particle detectors, which makes it impossible for researchers to analyze it in real time. Another issue with detector data is that it is not labeled, i.e. the distinction between signal and background is not available. For this reason, machine learning models are first trained on simulated data for which the underlying data generation process is known. Still, the discrepancies between the simulated data and the real-world detector data may limit the usefulness of the trained model. We propose a contribution which is able to leverage both simulated data, for which a ground truth is available in the form of signal/background distinction, and real-world data. Our technique is related to the domain adaptation problem in machine learning, which aims to maximize performance on a "target domain" (e.g. distinguishing different breeds of dogs) for which labels are not available by adapting a model which has been trained on a related, but not identical, "source domain" (e.g. distinguishing different breeds of cats) for which training labels are available. Critically, we employ a low-memory, high-performance binary neural network which is able to minimize the difference between simulated and real-world data. We show the information-theoretical properties of our model and study its performance in terms of accuracy and throughput on a real-world case study.
|Are you a student?||Yes|