in this talk, we propose a RISC-V-based accelerator aimed at extended precision computing for scientific computing applications. Furthermore, we show how it can help improving convergence of iterative solvers in real use cases. Lastly, we present details about our hardware implementations and results obtained on real silicon prototypes.
Semidefinite Programming is a matrix-form generalisation of linear programming, and is typically tackled using Interior Point Methods. These methods are of iterative nature and at each step, a matrix inversion needs to be performed. For small or sparse matrices, direct methods like sparse Cholesky factorisation are used. For dense matrices of larger size, like the ones that arise in convex...
The increasing support of lower precision arithmetics in hardware provides new opportunities for high performance scientific computing. However, even though low precision arithmetics can provide significant speed, communication, and energy benefits, their use in scientific computing poses the challenge of preserving the accuracy and stability of the computation. To address this issue, a...
The rapid growth of deep learning models, particularly Large Language Models (LLMs), which have increased their parameter counts nearly tenfold annually since 2018, has intensified the need for more efficient, power-aware deployment strategies. Quantization is a widely adopted technique for reducing the computational and memory footprint of neural networks by lowering numerical...
An overview of recent advances in tensor network state (TNS) methods are
presented that have the potential to broaden their scope of application
radically for strongly correlated quantum many body systems. Novel
mathematical models for hybrid multiNode-multiGPU parallelization on
high-performance computing (HPC) infrastructures will be discussed.
Scaling analysis on NVIDIA DGX-A100 and...
TNL (www.tnl-project.org) is a collection of building blocks that facilitate the development of efficient numerical solvers and HPC algorithms. It is implemented in C++ using modern programming paradigms in order to provide a flexible and user-friendly interface similar to, for example, the C++ Standard Template Library. TNL provides native support for modern hardware architectures such as...
In recent years, the emergence of large language models has led GPU vendors to prioritize performance improvements for lower-precision arithmetic, often at the expense of continued development for Float64. Meanwhile, scientific computing has increasingly relied on GPGPU acceleration, where double precision is still essential. Multi-word expansions for single-precision floating point numbers...
Floating-point errors highlight the inherent limitations of finite-precision computing, and if left unaddressed, they can lead to severe consequences. In high-precision applications, accurately quantifying these uncertainties is essential. Various approaches have been explored to tackle floating-point errors, including increasing numerical precision, employing compensation algorithms, and...
In the context of high performance computing, new architectures, becoming more and more parallel, offer higher floating-point computing power. Thus, the size of the problems considered (and with it, the number of operations) increases, becoming a possible cause for increased uncertainty. As such, estimating the reliability of a result at a reasonable cost is of major importance for numerical...
This talk presents a summer student project that explored the numerical stability of MadGraph5 using CADNA. It focuses on how CADNA’s warning system and its ability to quantify floating-point precision were used to assess whether MadGraph5 can operate reliably with single-precision floating-point numbers.
This talk introduces a method for emulating matrix multiplication through mixed-precision computation. As exemplified by the Matrix Engine on GPUs, low-precision arithmetic can be performed significantly faster than conventional FP32 or FP64 operations. We present Ozaki Scheme I and II, which leverage low-precision arithmetic to achieve accuracy comparable to standard FP64, and discuss their...