15–19 Sept 2025
CERN
Europe/Zurich timezone

LLM compression

16 Sept 2025, 17:30
5m
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100
Show room on map
2. Optimal AI deployment for Online Data Processing Large Language Models-based assistants

Speaker

Maurizio Pierini (CERN)

Description

Study techniques to compress LLMs. This could become relevant to deploy at CERN specific LLMs (e.g., chatbot) minimizing resources needed for inference

CERN group/ Experiment

EP-CMG, KT group

Working area Area 2: Optimal AI deployment for Online Data Processing
Project goals benchmark existing techniques, develop new ones, implement them in pQuant
Timeline 1 year
Available person power 3 technical students
Additional person power request none
Is this an already ongoing activity? Yes
Indicative hardware resources needs Access to a GPU cluster multiple A100 with LCG-like software stack, cvmfs access and fast access disk across the full duration of the project

Author

Presentation materials

There are no materials yet.