24–27 Mar 2025
CERN
Europe/Zurich timezone
There is a live webcast for this event.

LLMs in Production: RAG pipelines and beyond

24 Mar 2025, 11:40
1h
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Data Science, Machine Learning, and AI

Speaker

Jack Charlie Munday (CERN)

Description

Running Large Language Models (LLMs) in production presents lots of complexities extending far beyond your choice in model. Key challenges include:

  • How do you address knowledge staleness (i.e. your model being trained on out of date / not relevant information)?
  • How do you balance cost optimisation with model latency?
  • How do you reduce bias and factual hallucinations?

A widely adopted approach to address these is Retrieval Augmented Generation (RAG).

RAG pipelines implement a two-tiered approach (Retrieval & Generation): allowing models to be given domain-specific information prior to generating their response to a question. Through these techniques, "Off the Shelf" LLMs can be applied to a much wider domain context than what they were originally trained for.

In this lecture, we will explore how to improve the adaptability of LLMs without the need for fine-tuning: covering RAG and related architectures, physics-based approaches like entropix that allow for self-reasoning / context aware sampling, and the challenges with applying these techniques in a production context.

Number of lecture hours 1
Number of exercise hours 0 (no exercises)
Attended school tCSC 2023 (Split)

Author

Jack Charlie Munday (CERN)

Presentation materials