Inverted CERN School of Computing 2025

Name: Inverted CERN School of Computing 2025
Start: 2025-03-24T09:00:00+01:00
End: 2025-03-27T18:15:00+01:00
Location: CERN

24–27 Mar 2025

CERN

Europe/Zurich timezone

There is a live webcast for this event.

CERN School of Computing

Computing.School@cern.ch

Understanding Large Language Models and their Applications in Code Generation

24 Mar 2025, 10:15

31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105

Show room on map

Data Science, Machine Learning, and AI

Andrea Valenzuela Ramirez (CERN)

Lecture

The field of Deep Learning has experienced a significant turning point in the past years, driven by the emergence and rapid development of Large Language Models (LLMs). These advanced models have not only redefined standards in Natural Language Processing (NLP) but are increasingly being integrated into applications and services due to their natural language capabilities.

Interest in using LLMs for coding has grown rapidly, and some companies have sought to transform natural language processing into code generation. However, this trend has already exposed several unresolved challenges in applying LLMs to coding. Despite these difficulties, it has spurred the development of AI-powered code generation tools, such as GitHub Copilot.

In this lecture, we will introduce Large Language Models and explore the related terminology and core components. We will examine the process of creating an LLM from scratch and highlight the requirements for developing domain-specific LLMs, such as those for code generation. Additionally, we will review the initial footprints of these technologies in code generation and discuss their limitations in this domain. We will also explore strategies and architectural approaches that improve the performance of LLMs generally and for code generation specifically.

We will conclude by addressing the ethical concerns surrounding LLMs, such as the authorship and ownership of AI-generated code. Finally, we will explore other applications of LLMs in science, particularly within the High Energy Physics community and at CERN.

Hands-on

I think it could be interesting (but optional) to have a hands-on session, including: 

Some exercises to interact with an LLM from Python (either from a small pre-trained model in PyTorch or through API calls).
Some exercises to explore domain-specific models for code generation.
Hands-on exploration and testing of the LLMs components.
Prompt Engineering strategies or small Fine-Tuning to improve the performance of a model.
Some simplified strategies such as Retrieval Augmented Generation could also be implemented.

Number of lecture hours	1
Number of exercise hours	1
Attended school	CSC 2023 (Tartu)

Andrea Valenzuela Ramirez (CERN)

iCSC2025-CodeGeneration.pdf

Recording

Video preview

Inverted CERN School of Computing 2025

CERN School of Computing

Understanding Large Language Models and their Applications in Code Generation

31/3-004 - IT Amphitheatre

CERN

Speaker

Description

Lecture

Hands-on

Author

Presentation materials

Choose timezone

Inverted CERN School of Computing 2025

CERN School of Computing

Speaker

Description

Lecture

Hands-on

Author

Presentation materials