Speaker
Description
Foundation models have revolutionized natural language processing, demonstrating exceptional capabilities in handling sequential data. Their ability to generalize across tasks and datasets offers promising applications in high energy physics (HEP). However, collider physics data, unlike language, involves both continuous and discrete data types, including four-vectors, particle IDs, charges, etc. Additionally, the particles are permutation invariant, which is fundamentally different from natural language. To address these challenges, we investigate various embedding schemes and techniques that introduce physical biases into the framework. Our findings provide valuable insights into the incorporation of foundation models into the HEP domain.
Significance
Although foundation models are already widely used in NLP, there is still more research to be done on their application in HEP. Currently, the HEP community is primarily investigating ways to encode collider physics data such that it can serve as a basis for a variety of tasks. We provide studies and insights at this frontier with our work on jet physics.