May 4 – 8, 2026
CERN
Europe/Zurich timezone
Recordings are now available via the timetable. SPS award winners can be found on the dedicated menu page.

The One Token Model: A Multi-Layer Framework for the Granular Estimation of AI Inference Energy

May 7, 2026, 12:10 PM
25m
500/1-001 - Main Auditorium (CERN)

500/1-001 - Main Auditorium

CERN

400
Show room on map

Speaker

Mr Mathieu Francois (Co-Founder & CEO, Antarctica)

Description

The integration of Large Language Models (LLMs) into research workflows introduces a largely opaque layer of carbon intensity. Existing approaches to estimating AI energy consumption rely on time-based heuristics or static hardware profiling, which fail to capture the non-deterministic nature of generative inference. Variations in prompt design, quantization, and decoding strategies can lead to significant fluctuations in energy use, limiting the effectiveness of current sustainability assessments.

This paper introduces the One Token Model (OTM), a unified framework that redefines energy measurement through output-normalized attribution, expressed as Joules per token. OTM integrates telemetry across three layers: infrastructure dynamics, model architecture, and inference behavior.

We validate OTM through a real-time monitoring system that quantifies the marginal energy cost of individual inference requests. By enabling fine-grained, comparable measurements across systems, OTM supports energy-aware optimization and promotes more sustainable, transparent research computing practices.

Authors

Mr Mathieu Francois (Co-Founder & CEO, Antarctica) Mr Kumail Amiruddin (Co-Founder & COO, Antarctica) Mr Raj Banerjee (Vice President, Research and Development - Antarctica Global)

Presentation materials