EP-IT Data Science Seminars
Entropy is all you need? The quest for best tokens and the new physics of LLMs
by
→
Europe/Zurich
40/S2-D01 - Salle Dirac (CERN)
Description
LLMs currently generate texts one token at a time with fixed hyperparameters. In September 2024, OpenAI released o1, a largely undocumented model that rely on a expanded pre-generation strategies to enhance the quality of even small models ("inference scaling"). Since then, two anonymous researchers publicly released a new method of token selection: Entropix leverage entropy as a metric of token uncertainty to constraint the model to explore different generation strategies. This seminar will present an emerging field of post-training research that holds the promise to significantly enhance the quality of very small model (including a 300m million parameters we adapted to Entropix). We'll also discuss the even more speculative impact of this research direction for pretraining by letting the model adapt in flux different forms of training or optimizer strategies.
SPEAKER'S BIO:
Dr. Pierre-Carl Langlais is a researcher in artificial intelligence, co-fonder of Pleias, a French lab specialised in the training of Small Language Models for document processing and other intensive corporate use cases. A long time activist for open science, Pierre-Carl is an admin on Wikipedia and has co-written an influential report for the EU commission on non-commercial open access publishing. In 2024, he coordinated the release of Common Corpus, the largest open dataset available for the training of large language models.
(*) Groupe de Recherches Interdisciplinaires sur les Processus d’Information et de Communication
Coffee will be served at 10:30.
Organised by
M. Girone, M. Elsing, L. Moneta, M. Pierini
Contact
Webcast
There is a live webcast for this event