Speakers
Description
Given the progress and promising results on ML-based particle flow integration with CMS offline reconstruction in a Run 3 setup (CMS-PFT-25-001), we now aim to extend and integrate MLPF with Phase-2 TICL reconstruction as a plug-in. In Run 3, we demonstrated that with a small transformer model, events containing on the order of ~5’000 tracks and clusters can be reconstructed into final-state particles on a modern inference GPU (e.g. NVIDIA L4) within a few tens of milliseconds, while also improving jet performance compared to the baseline PF algorithm. For Phase-2, we will evaluate to what extent this approach can be applied in an online reconstruction setting, where the initial local reconstruction is provided by existing clustering algorithms in TICL. In analogy to the Run 3 setup, the inputs will be local clusters in detector layers (e.g. reconstructed tracksters), while the target objects will be simulation-based particles (e.g. simulated tracksters). Performance will be assessed using the standard TICL and PF metrics, namely particle-level efficiencies, fake rates and resolutions measured against simulation, as well as jet resolution, matching efficiency and fake rate, and MET resolution. Finally, since the Run 3 setup already enables pileup mitigation through an additional PU-probability output node integrated in the model, we will also explore extending this capability to Phase-2.
CERN group/ Experiment
EP-CMG
| Working area | Area 1" Cutting Edge AI for Offline Data Processing |
|---|---|
| If Other, please specify | Area 2 |
| Project goals | - Integrate MLPF to the TICL chain for CMS Phase-2 online reconstruction as a plug-in; - Determine potential improvements to Phase-2 reconstruction physics performance (particle-level, jet-level and event-level performance) from ML-based particle flow on top of TICL tracksters; - Determine potential improvements to Phase2 reconstruction computational throughput from the ML-based, GPU-native approach. |
| Timeline | - Y1Q1 - Q2: Develop familiarity with the TICL framework, including data exploration and event visualization, setup a first pass of the target definition for training; - Y1Q3 - Q4: Perform the first integration pass by postprocessing events into ML-ready format (e.g. parquet files), testing on small samples or particle-gun samples; - Y2Q1 - Q2: Carry out large scale training and testing on physical samples (e.g. ttbar, QCD) leveraging the same approach and code established for the Run 3 setup; - Y2Q3 - Q4: Conduct additional validation and optimization, including hyperparameter tuning and evaluation against key performance metrics (e.g. jet resolution) following the infrastructure built after iterating with JME in the Run 3 setup |
| Available person power | 0.8 FTE |
| Additional person power request | 1 Fellow |
| Is this an already ongoing activity? | No |
| Indicative hardware resources needs | Access to a GPU cluster with LCG-like software stack and cvmfs access with fast storage facilities across the full duration of the project. |