NGT - Openlab "Optimising Floating Point Precision" Workshop

Name: NGT - Openlab "Optimising Floating Point Precision" Workshop
Start: 2025-07-01T13:00:00+02:00
End: 2025-07-02T18:00:00+02:00
Location: CERN

1 Jul 2025, 13:00 → 2 Jul 2025, 18:00 Europe/Zurich

40/S2-B01 - Salle Bohr (CERN)

40/S2-B01 - Salle Bohr

CERN

100

Show room on map

Alex Lasa Lamarca, Axel Naumann (CERN), Jacob Friedrich Finkenrath (CERN), Maria Girone (CERN), Mariana Velho (CERN), Stefan Roiser (CERN), Vasiliki Batsari

Description

Scientific applications in high energy physics depend in many areas on floating point operations in single, double or even higher precision.

With the upcoming runs at the LHC, both the amount of data and the precision for its calculation will increase significantly and therefore the computing resource requirements. It has already been proven that the throughput of several physics applications can be significantly improved by the use of computing accelerators such as GPUs. In view of this change of computing towards a heterogeneous execution environment, the use of high precision floating point operations for algorithmic data processing deserves dedicated attention with a special focus on the projections for the evolution of future GPU architectures.

This workshop provides a forum to discuss the efficient use of those floating point operations in the context of compute accelerators and will touch on topics such as:

The future evolution of hardware accelerators for high precision floating point operations
Emulation of higher floating point operations on compute accelerators
Tools and techniques to estimate and evaluate floating point operations precision
Algorithmic approaches for leveraging lower precision floating point operations

The workshop will feature selected talks from hardware vendors and developers, computer scientists, physicists and mathematicians on the above topics and provide ample time for discussions.

The deadline for registration is June 20th, 2025.

Contact

alex.lasa.lamarca@cern.ch

mariana.velho@cern.ch

vasiliki.batsari@cern.ch

Participants

153 View full list

Tuesday 1 July
- Tue 1 Jul
- Wed 2 Jul
- 14:00 → 14:10
  
  Welcome Session 10m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Speaker: Stefan Roiser (CERN)
  
  20250601-WS-Float.pdf
- 14:10 → 14:40
  
  Problem statement from experiments and SFT 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Speaker: Vincenzo Innocente (CERN)
  
  Arith@CERN2025.pdf
- 14:40 → 14:50
  
  Problem statement from Theory Department 10m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Speaker: Jacob Finkenrath
  
  HFloat_at_TH1.pdf
- 14:50 → 15:00
  
  Discussion - Questions 10m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
- 15:00 → 15:30
  
  Coffee 30m Restaurant 1
  
  Restaurant 1
- 15:30 → 16:15
  
  Floating Point Emulation in NVDIA Math Libraries 45m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  The trends in computer architecture, primarily driven by AI-based applications (most recently, large language models), has led to a rapid increase in the reduced- and mixed-precision computing capabilities of GPUs. These processors demonstrate an outsized power-efficiency (FLOPS/watt) advantage over systems almost exclusively focused upon native single- and double-precision arithmetic. Thus, there is a great deal of motivation to leverage these capabilities, through the use of various mixed-precision algorithms and emulation techniques, to facilitate greater scientific computing throughput without sacrificing accuracy. We'll touch upon a number of these approaches and present real-world case studies that provide compelling evidence in support of this path to increasing the science per watt of supercomputers.
  
  Speaker: Samuel Rodriguez (NVidia)
  
  cern-talk.pdf
  
  Samuel_Rodriguez.mp4
- 16:15 → 16:45
  
  Advancing AI with AMD: Open Source, Sovereign Innovation, and the Latest in CPU & GPU Performance 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Speakers: Joerg Roskowetz, Sayed Maudodi
  
  AMD_CERN_workshop_July_1st_2025.pdf
  
  Joerg_Roskowetz.mp4
- 16:45 → 17:05
  
  VXP: Extended Precision Accelerator 20m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  in this talk, we propose a RISC-V-based accelerator aimed at extended precision computing for scientific computing applications. Furthermore, we show how it can help improving convergence of iterative solvers in real use cases. Lastly, we present details about our hardware implementations and results obtained on real silicon prototypes.
  
  Speaker: Eric Guthmuller (CEA France)
  
  20250701_CERN_workshop_EG.pdf
  
  20250701_CERN_workshop_EG.pptx
  
  Eric_Guthmuller.mp4
- 17:05 → 17:25
  
  Extended Precision in Convex Optimisation 20m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Semidefinite Programming is a matrix-form generalisation of linear programming, and is typically tackled using Interior Point Methods. These methods are of iterative nature and at each step, a matrix inversion needs to be performed. For small or sparse matrices, direct methods like sparse Cholesky factorisation are used. For dense matrices of larger size, like the ones that arise in convex relaxations of combinatorial problems, Krylov methods like Conjugate Gradient seem a better approach. We show how, as the dual-primal central trajectory approaches the feasible set and the tentative solution becomes rank-deficient, increasing the precision accelerates the convergence (in terms of number of CG iterations).
  
  Speaker: David Herrera-Marti (CEA France)
  
  20250701_CERN_workshop.pdf
  
  David_Herrera_Marti.mp4
- 17:25 → 17:45
  
  Discussion - Questions 20m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
- 17:45 → 19:00
  
  Networking Cocktail 1h 15m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
Wednesday 2 July
- Tue 1 Jul
- Wed 2 Jul
- 09:00 → 09:30
  
  Using physics knowledge to improve numerical stability 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  The numerically stable evaluation of scattering matrix elements near the infrared limit of gauge theories is of great importance for the success of collider physics experiments. We present a novel algorithm that utilizes double precision arithmetic and reaches higher precision than a naive quadruple precision implementation at smaller computational cost. The method is based on physics-driven modifications to propagators, vertices and external polarizations. [https://arxiv.org/abs/2406.07671]
  
  Authors: E. Bothmann (speaker), J. M. Campbell, S. Höche, M. Knobbe
  
  Speaker: Enrico Bothmann (CERN)
  
  bothmann-numerical-stability-2025-07-02.pdf
  
  Enrico_Bothmann.mp4
- 09:30 → 10:00
  
  Double-double for virtual amplitude evaluation 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Two-loop virtual amplitudes are one of the key ingredients of an NNLO cross-section calculation. In this talk I would like to describe the precision requirements of evaluating such amplitudes via sector decomposition and quasi-Monte Carlo integration, and to report on satisfying them using double-double floating point number implementation within pySecDec on CPU and GPU.
  
  Based on: 2402.03301, 2305.19768, and related work.
  
  Speaker: Vitaly Magerya (CERN)
  
  magerya-dd.pdf
  
  Vitaly_Magerya.mp4
- 10:00 → 10:30
  
  An overview of mixed precision strategies for scientific computing 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  The increasing support of lower precision arithmetics in hardware provides new opportunities for high performance scientific computing. However, even though low precision arithmetics can provide significant speed, communication, and energy benefits, their use in scientific computing poses the challenge of preserving the accuracy and stability of the computation. To address this issue, a variety of mixed precision algorithms that combine low and high precisions have emerged. In this talk I will give an overview of mixed precision algorithms in numerical linear algebra, with a focus on recent advances to accelerate the solution of linear systems.
  
  Speaker: Theo Mary (Computer Lab of Paris 6 (Lip6))
  
  main_CERN.pdf
  
  Theo_Mary.mp4
- 10:30 → 10:50
  
  Adaptive Floating-Point Quantization for Efficient Neural Networks 20m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  The rapid growth of deep learning models, particularly Large Language Models (LLMs), which have increased their parameter counts nearly tenfold annually since 2018, has intensified the need for more efficient, power-aware deployment strategies. Quantization is a widely adopted technique for reducing the computational and memory footprint of neural networks by lowering numerical precision.
  This work investigates a floating-point quantization approach to adaptively reduce bitwidths for weights and activations while preserving model accuracy. A quantization-oriented methodology is presented, which analyzes the distribution of tensor values to guide the design of custom floating-point formats. Experimental results on Recurrent Neural Networks demonstrate that this approach achieves an average 3.5× reduction in bit usage, with only a 0.5% drop in top-1 accuracy, using quantization-aware training (QAT).
  Building on this work, a follow-up contribution extended the AMD/Xilinx deployment flow by enabling support for arbitrary floating-point in the Quantized Neural Network format QONNX, complementing the existing support in the QAT library Brevitas and completing the quantization path toward hardware acceleration with the AMD FPGA NN library FINN.
  
  Speaker: Nicolo Ghielmetti (CERN)
  
  float-workshop-openlab.pdf
  
  Nicolo_Ghielmetti.mp4
- 10:50 → 11:05
  
  Discussion - Questions 15m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
- 11:05 → 11:35
  
  Coffee 30m Restaurant 1
  
  Restaurant 1
- 11:35 → 12:05
  
  Mixed precision ab initio tensor network state methods adapted for NVIDIA. Blackwell technology via emulated FP64 arithmetic 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  An overview of recent advances in tensor network state (TNS) methods are
  presented that have the potential to broaden their scope of application
  radically for strongly correlated quantum many body systems. Novel
  mathematical models for hybrid multiNode-multiGPU parallelization on
  high-performance computing (HPC) infrastructures will be discussed.
  Scaling analysis on NVIDIA DGX-A100 and DXG-H100 platforms reaching
  quarter petaflops performance on a single node will also be presented.
  Finally, we discuss cutting edge performance results via mixed precision
  spin adapted ab initio Density Matrix Renormalization Group (DMRG)
  electronic structure calculations utilizing the Ozaki scheme for emulating
  FP64 arithmetic using 8-bit integer logic. By approximating the underlying
  matrix and tensor algebra via finite number of INT8 slices we demonstrate
  for chemical benchmark systems that chemical accuracy can be reached even
  with mixed precision arithmetic. We also show that due to its variational
  nature, DMRG provides an ideal tool to benchmark accuracy domains and
  performance of new hardware developments and related numerical libraries.
  Detailed numerical error analysis and performance assessment are presented
  also for subcomponents of the DMRG algebra by interpolating systematically
  between double and single precision. Our analysis paves the way for
  utilization of state-of-the-art Blackwell technology in tree-like tensor
  network state calculations opening new research directions in material
  sciences and beyond.
  
  Speaker: Ors Legeza (Wigner Research Centre for Physics, Hungary)
  
  cern2025_v01.pdf
  
  Ors_Legeza.mp4
- 12:05 → 12:20
  
  TNL: Numerical Library for Modern Parallel Architectures 15m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  TNL (www.tnl-project.org) is a collection of building blocks that facilitate the development of efficient numerical solvers and HPC algorithms. It is implemented in C++ using modern programming paradigms in order to provide a flexible and user-friendly interface similar to, for example, the C++ Standard Template Library. TNL provides native support for modern hardware architectures such as multicore CPUs, GPUs, and distributed systems, which can be managed via a unified interface. In our presentation, we will demonstrate the main features of the library together with efficiency of the implemented algorithms and data structures.
  
  Speaker: Thomas Oberhuber (Czech Technical University in Prague)
  
  cern-tnl.pdf
  
  Thomas_Oberhuber.mp4
  
  video_ikem_qcrit_normal_flux.mp4
- 12:20 → 12:35
  
  Float32 Expansions – A Possible Answer for Scientific Computing in the Era of AI-Driven GPU Development 15m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  In recent years, the emergence of large language models has led GPU vendors to prioritize performance improvements for lower-precision arithmetic, often at the expense of continued development for Float64. Meanwhile, scientific computing has increasingly relied on GPGPU acceleration, where double precision is still essential. Multi-word expansions for single-precision floating point numbers may offer a viable alternative—providing comparable or even superior precision while achieving better performance than native double precision. In this talk, we will present results using a CUDA-enabled, templated, and ported version of the QD library within the TNL framework, applied to existing numerical algorithms.
  
  Speaker: František Stloukal (Czech Technical University in Prague)
  
  Frantisek_Stloukal.mp4
  
  Optimising_floating_point_precision___CERN___TNL-4.pdf
- 12:35 → 13:05
  
  Discussion - Questions 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
- 13:05 → 14:20
  
  Lunch 1h 15m Restaurant 1
  
  Restaurant 1
- 14:20 → 14:50
  
  Floating-Point Error Estimation Using Automatic Differentiation 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  Floating-point errors highlight the inherent limitations of finite-precision computing, and if left unaddressed, they can lead to severe consequences. In high-precision applications, accurately quantifying these uncertainties is essential. Various approaches have been explored to tackle floating-point errors, including increasing numerical precision, employing compensation algorithms, and applying both statistical and non-statistical estimation techniques. One widely used method for dynamic error estimation is Automatic Differentiation (AD). However, current AD-based tools often require manual code annotations or modifications. Additionally, AD tools based on operator overloading typically necessitate repeated gradient computations across different inputs and inherit the inefficiencies of the operator overloading approach.
  In this work, we introduce a customizable approach for leveraging AD to automatically generate source code that estimates floating-point uncertainties in C/C++ applications using Clad. Our framework, CHEF-FP supports automatic error annotation and allows integration with user-defined error models. We also share our progress in extending this approach to GPU-based applications.
  
  Speaker: Vassil Vasilev (Princeton University)
  
  ChefFP.pdf
  
  Vassil_Vasilev.mp4
- 14:50 → 15:20
  
  Precision auto-tuning and control of accuracy in high performance simulations 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  In the context of high performance computing, new architectures, becoming more and more parallel, offer higher floating-point computing power. Thus, the size of the problems considered (and with it, the number of operations) increases, becoming a possible cause for increased uncertainty. As such, estimating the reliability of a result at a reasonable cost is of major importance for numerical software. In this talk we present an overview of different approaches for accuracy analysis (guaranteed or probabilistic ones) and the related software. We also describe methods to improve the results accuracy. We present the principles of Discrete Stochastic Arithmetic (DSA) that enables one to estimate rounding errors in simulation codes. DSA can be used to control the accuracy of programs in half, single, or double precision via the CADNA library, and also in arbitrary precision via the SAM library. Thanks to DSA, the accuracy estimation and the detection of numerical instabilities can be performed in parallel codes on CPU and on GPU. Most numerical simulations are performed in double precision, and this can be costly in terms of computing time, memory transfer and energy consumption. We present tools for floating-point auto-tuning that aim at reducing the numerical formats used in simulation programs.
  
  Speaker: Fabienne Jézéquel (LIP6, Sorbonne Université)
  
  Fabienne_Jezequel.mp4
  
  slides_FJezequel.pdf
- 15:20 → 15:50
  
  Experiences with CADNA and the Madgraph5 Event Generator 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  This talk presents a summer student project that explored the numerical stability of MadGraph5 using CADNA. It focuses on how CADNA’s warning system and its ability to quantify floating-point precision were used to assess whether MadGraph5 can operate reliably with single-precision floating-point numbers.
  
  Speaker: Stephan Hageboeck (CERN)
  
  CADNA Madgraph.pdf
  
  Stephan_Hageboeck.mp4
- 15:50 → 16:20
  
  Coffee 30m Restaurant 1
  
  Restaurant 1
- 16:20 → 16:50
  
  Emulating Matrix Multiplication Using Mixed-Precision Computation 30m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
  
  This talk introduces a method for emulating matrix multiplication through mixed-precision computation. As exemplified by the Matrix Engine on GPUs, low-precision arithmetic can be performed significantly faster than conventional FP32 or FP64 operations. We present Ozaki Scheme I and II, which leverage low-precision arithmetic to achieve accuracy comparable to standard FP64, and discuss their numerical performance.
  
  Speaker: Katsuhisa Ozaki (Shibaura Institute of Technology)
  
  Katsuhisa_Ozaki.mp4
  
  OZAKI_slide_CERN.pdf
- 16:50 → 17:10
  
  Discussion - Questions 20m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map
- 17:10 → 17:20
  
  Closing Session 10m 40/S2-B01 - Salle Bohr
  
  40/S2-B01 - Salle Bohr
  
  CERN
  
  100
  Show room on map

Choose timezone

NGT - Openlab "Optimising Floating Point Precision" Workshop

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

Restaurant 1

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

Restaurant 1

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

Restaurant 1

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

Restaurant 1

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN

40/S2-B01 - Salle Bohr

CERN