15–19 Sept 2025
CERN
Europe/Zurich timezone

Large Physics Model: a foundation model for HEP data reconstructions and analysis with CMS data

15 Sept 2025, 10:35
5m
500/1-001 - Main Auditorium (CERN)

500/1-001 - Main Auditorium

CERN

400
Show room on map
2. Optimal AI deployment for Online Data Processing Cutting Edge AI for Offline Data Processing

Speaker

Sebastian Wuchterl (CERN)

Description

Train foundation models on various tasks to using supervised and unsupervised techniques (e.g.,approach followed to train large LLMs like chatGPT)
- Jet level: Allow a jet algorithm to self-discover patterns and physics properties on unlabelled data to obtain a pre-trained model unbiased from the Monte-Carlo discrepancy. The final goal would be to obtain a Foundation Model for jets that can be finetuned for the different tasks with minimal data/MC disagreement, allowing for a better post-calibration performance.
- Event level: Develop an event view starting from local clusters in the detector, performing the Particle Flow reconstruction task and adapting this model to various tasks. Develop a set of task-specific algorithms from this main algorithm, through a tuning workflow that could run on a local cluster.
- Analysis level: Learn a high-level object-based event representation of CMS events, to provide a foundation model for data analysis that can be finetuned for better event selection or related analysis tasks such as object assignment.

Supervised and unsupervised approaches will be studied.

CERN group/ Experiment

EP-CMG

Working area Area 2: Optimal AI deployment for Online Data Processing
Project goals redefine how events are reconstructed in HEP
Timeline 3 years
Available person power 0
Additional person power request 3 PhD students, 2 fellow
Is this an already ongoing activity? No
Indicative hardware resources needs O(TB) fast storage (SSD) + Many powerful GPUs (O(10-100) H100 or A100 GPUs with at least 6 months of training time per year)

Authors

Presentation materials

There are no materials yet.