





Plan de Recuperación, Transformación y Resiliencia





#### Volodymyr Svintozelskyi

Arantza De Oyanguren Campos, Brij Kishor Jashal, Jiahui Zhuo, Valerii Kholoimov <u>Volodymyr Svintozelskyi</u>, Luca Fiorini, Álvaro Fernández and Alberto Valero

IFIC, Univ. of Valencia and CSIC (ES), TATA Institute of Fundamental Research (TIFR),

#### Overview

- Real-time analysis and DAQ system at LHCb
- High-Low project at Valencia
- Hardware power consumption studies
- FPGAs as a way to reduce consumption?
- Consumption per HLT1 algorithm. Can we detect unoptimized code?
- Power consumption in future (WLCG example)

## Real-time analysis at LHCb

- No hardware-based trigger in Run 3
- Allen: the LHCb high-level trigger 1 (HLT1) application on GPUs
- Fast detector reconstruction in O(500) Nvidia RTX A5000



# Real-time analysis at LHCb

- More than 250 algorithms with a high level of parallelization
- Algorithms are executed in predefined sequence
- Allen is an example software, but the conclusions are general



#### LHCb DAQ system for Run3



### The HIGH-LOW project at Valencia

Design of High-Performance Algorithms for Low-Power Sustainable Hardware for LHC Experiments and Their Upgrades

- Transversal project: ATLAS and LHCb experiments
- Pls: Luca Fiorini (ATLAS), & Arantza Oyanguren (LHCb)
- Funded by the Spanish Ministry of Science and Innovation (TED2021-130852B-100)
- About 10 people (physicists + engineers + students)

<u>Aim:</u> Benchmarking new hardware architectures and developing fast and high efficient algorithms with reduced power consumption



#### HL hardware:

- Rack K RETEX LOGIC-2 A600 42U F1000 PH
- APC Metered Rack PDU ZeroU 2G AP8
- SWITCH D-LINK DXS-1210-28T 24x 10GB
- T10G Dual Xeon Scalable HPC 10xGPU PCIe
  - 2 x Intel<sup>®</sup> Xeon<sup>®</sup> Gold 5318Y 2,10GHz 24 Cores 3.40 GHz
  - 8 x 32GB DDR4 3200MHz ECC REG
  - 1 x SSD Samsung 990 PRO 2TB M.2 NVMe 2280 PCIe 4.
  - 1 x Controladora BROADCOM MegaRAID 9560-16i PCIe
  - 8 x HD 10TB SAS 12Gb/s 7.200 rpm
  - 1 x NVIDIA<sup>®</sup> RTX<sup>™</sup> A5000 24GB GB GDDR6 ECC
  - 1 x NVIDIA<sup>®</sup> RTX<sup>™</sup> A6000 Ada Generation 48GB GB GDDR6 ECC
  - 1 x HBA Broadcom N210GBT Dual 10GbE RJ45

#### The HIGH-LOW project at Valencia



# Hardware power efficiency

What is the effect of faster hardware on overall power consumption?



The LHCb decision to run HLT1 on GPUs may have saved up to a factor 10 in energy consumption!

### Hardware utilisation

Higher number of GPU streams leads to faster CPU heating  $\rightarrow$  higher consumption



\* Part of Allen algorithms needs to be executed on CPU, which leads to CPU heating and cooler fan speed ramp up

#### FPGA's for less consumption?

- Offloading of some computation tasks to FPGA
- Real-time reconstruction on FPGAs ("artificial retina")
- VELO clustering is already implemented for Run3 in FPGAs !
- Tracking in development for Run5 (~2030)



# FPGA's for less consumption?

#### The same idea can be extended to FPGA:



Seeding algorithm for making tracklets in the last LHCb tracker (SciFi) in FPGAs:

Throughput increases by 30%  $\rightarrow$  Saving 6.2 mW  $\cdot$  s/event

(for 30 MHz rate: 186kW/s)

Use hybrid systems to take benefits of each one

11

Sustainability of real-time analysis at 5 TB/s data rate - Sustainable HEP 2024

## Per-algorithm power consomution

- Control number of executed algorithms in sequence
- Run power consumption test for first N and N+1 algorithms
- The difference is the consumption gain from algorithm *N*
- Expected to be proportional to total execution time



#### Per-algorithm power consomution



#### Per-algorithm power consomution

- Due to time limitations, only specific N were chosen for studies
- The HighLow machine is a shared resource - required to run overnight to minimize effect from other users (but still there)
- Multiple runs are necessary for stat. error estimation
- No anomaly is detected → no significant performance drops?



#### Estimation of WLCG CPU requirements

In Millions of HS06 of ATLAS and CMS

A similar prediction needs to be made for the trigger systems



# Summary

- Move towards heterogeneous computing systems
- Use the best and more efficient available hardware (vs  $\in$ )
- Optimize the utilization of the hardware
- Keep an eye on your power consumption!
- Planned to make Allen support more hardware architectures (ARM; Intel GPUs)
- Need to make predictions for the trigger system consumption for HL-LHC
- Work is still ongoing 😂

#### Thanks for your attention!