Speaker
Maurizio Pierini
(CERN)
Description
Study techniques to compress LLMs. This could become relevant to deploy at CERN specific LLMs (e.g., chatbot) minimizing resources needed for inference
CERN group/ Experiment
EP-CMG, KT group
| Working area | Area 2: Optimal AI deployment for Online Data Processing |
|---|---|
| Project goals | benchmark existing techniques, develop new ones, implement them in pQuant |
| Timeline | 1 year |
| Available person power | 3 technical students |
| Additional person power request | none |
| Is this an already ongoing activity? | Yes |
| Indicative hardware resources needs | Access to a GPU cluster multiple A100 with LCG-like software stack, cvmfs access and fast access disk across the full duration of the project |
Author
Maurizio Pierini
(CERN)