Speakers
Description
As more and more ML models get used in production, like in real-time data processing and simulation, having infrastructure for reliable and fast turnaround of model retraining and deploying is crucial. To this end, a centralized CI/CD infrastructure and model storage, within LHCb, needs to be developed further as current solutions don’t scale well. In addition, user friendliness needs to be taken into account by designing the infrastructure with interoperability between the different use cases in mind, from production-level real-time data processing to analysts working on n-tuples at the local level. Furthermore, the larger the models become, the bigger the training datasets will be. To make sure data access scales well the resource needs and potential subsequent solutions need to be identified.
CERN group/ Experiment
LHCb
| Working area | Area 5: Infrastructure for AI Deployment |
|---|---|
| If Other, please specify | Area 2 and Area 4 |
| Project goals | Establish a centralized training and deployment pipeline infrastructure for production-level LHCb software, potentially using existing frameworks like MLFlow. Ensure interoperability with different systems within LHCb, from real-time data processing to analysts working with n-tuples on the local level. Develop model storage solutions within the LHCb software deployment infrastructure. Identify resource needs and potential solutions of data access scalability. |
| Timeline | 1 year |
| Available person power | 0.1 FTE |
| Additional person power request | 2 |
| Is this an already ongoing activity? | No |
| Indicative hardware resources needs | There is existing infrastructure at LHCb that can be used, but identifying the potential further resource needs is part of this project |