

**MINISTERIO DE CIENCIA. INNOVACIÓN Y UNIVERSIDADES** 



**Financiado por** la Unión Europea **NextGenerationEU** 





"This work is supported by Ministerio de Ciencia, Innovación y Universidad con fondos Next Generation y del Plan de Recuperation, Transformacionales y Resiliencia (project – TED2021–130852B–100)"

## Porting MADGRAPH to FPGA

Héctor Gutiérrez<sup>1</sup>, Luca Fiorini, Alberto Valero, Arantza Oyanguren, Francisco Hervas, Carlos Vico, Javier Fernandez, Santiago Folgueras, Pelayo Leguina <sup>1</sup>Instituto de Física Corpuscular (CSIC-UV) 2nd Computing Challenges workshop (COMCHA) October 2nd, 2024





### Vniver§itat ® València





#### **Contact:** Hector.Gutierrez@ific.uv.es

## Index



### MADGRAPHMADGRAPHMADGRAPHCPUGPUFPGA

#### FUTURE IMPLEMENTATIONS



# MADGRAPH\_aMC@NLO CPU

#### What is MADGRAPH?:

"<u>MadGraph5\_aMC@NLO</u> is a framework that aims at providing all the elements necessary for SM and BSM phenomenology, such as the computations of cross sections, the generation of hard events and their matching with event generators. Processes can be simulated to LO accuracy for any user-defined Lagrangian, and the NLO accuracy in the case of QCD (Quantum Chromo Dynamics) corrections to SM processes. Matrix elements at the tree- and oneloop-level can also be obtained. "





# MADGRAPH\_aMC@NLO CPU



# MADGRAPH\_aMC@NLO CPU



## MADGRAPH4GPU



Credits to: Madgraph5 aMC@NLO for GPUs group

# Setup (CPU + GPU)







#### 13TH GEN INTEL(R) CORE(TM) 17-13700H 2.40 GHZ



#### **GEFORCE RTX 3050 GIGABYTE**

## **Results CPU vs GPU**



## What is an FPGA?





FPGA (Field Programmable Gate Array) DSP (Digital Signal Processing) LUT (LookUp Table) FF (Flip-Flop) **BRAM (Block RAM)** 







## HOW TO PROGRAM IN HLS

Acceleration

Platform

X27432-112922



**AXI Interfaces** 

**Global Memory** 

**DMA Engine** 

XRT API

XRT

Drivers

PCle

- Host development is similar to regular software development. • C/C++ and the OpenCL API are used to:
  - Manage tasks on the FPGA
  - Transfer Data
  - PRogram the FPGA in real-time, optimizing its resources

- FPGA application development is more complex, often using low-level languages like Verilog or VHDL.
- The Vitis environment allows using C/C++/OpenCL C to design functions (kernels).
- Kernels are automatically converted into RTL using High-Level Synthesis (HLS).
- Once RTL is generated, Vitis manages:
  - Synthesis
  - Mapping
  - Creation of the bitstream (packaged in an xclbin file) to program the FPGA.
- Developing applications for Alveo involves two parts:
  - Programming the host (runs on x86 processors).
  - Programming the FPGA (accelerates specific functions).

## MADGRAPH FPGA

#### **Process:** e - e + > u - u +

- top -> check\_sa.cpp (IN:NUMBER\_OF\_EVENTS, IN:RANDOM\_EVENTS, OUT: MOMENTA, OUT:MATRIX\_ELEMENT)
  - Rambo -> Momenta
  - SigmaKin -> Matrix Element
- Rambo(Energy, masses, weight, masses\_size, random inputs)
  - Obtain Momenta
- SigmaKin
  - InitProc() -> SetIndependentCouplings
  - SetParameters & SetDependentcouplings
  - Calculate waveforms
  - Calculate matrix of the process
    - Obtain Matrix Element

| NUMBER_OF_EVENTS | 0 |
|------------------|---|
| RANDOM_EVENTS    | 0 |
|                  |   |





### MADGRAPHF

|        |        |        |                     |                                        | v me_calc |                      |
|--------|--------|--------|---------------------|----------------------------------------|-----------|----------------------|
|        |        |        |                     | pow_generic                            | _double_s | me_calc_Pipeline_V   |
|        |        |        |                     |                                        |           |                      |
|        |        |        |                     |                                        |           | VITIS_LOOP_153_1     |
|        |        |        |                     |                                        |           |                      |
|        |        |        | sin_or_cos_double_s | ✓ sigmaKin                             |           | pow_generic_double_s |
|        |        |        |                     |                                        | 248_7     |                      |
|        |        |        |                     | <ul> <li>calculate_wavefunc</li> </ul> | ma        | trix_1_epem_mupmum   |
|        |        |        |                     |                                        |           |                      |
| OXXXXX | FFV4_0 | OXXXXX | ixxxxx              |                                        | FFV1P0_3  | FFV2_3               |
|        |        |        |                     |                                        |           |                      |

louble \*rambo(double et, double \*xm, double &wt, int num\_part, double \*rand\_numbers) { ra(ndom) m(omenta) b(eautifully) o(rganized) a democratic multi-particle phase space generator this is version 1.0 - written by r. kleiss -- adjusted by hans kuijf, weights are logarithmic (20-08-90) nun\_part = number of particles xm = particle masses ( dim=nexternal-nincoming ) rand numbers = random event wt = weight of the event 

#### double CPPProcess::sigmaKin(double p16[16]) {



- pow\_generic\_double\_s sin\_or\_cos\_double\_s FFV4 3 ixxxxx

| UA |  |
|----|--|

## FPGA Resources

| Resource | Utilization | Avalible | Utilization(%) |
|----------|-------------|----------|----------------|
| LUT      | 404715      | 1759631  | 23             |
| FF       | 549021      | 3660140  | 15             |
| DSP      | 5218        | 12424    | 42             |
| BRAM     | 22          | 3280     | 0.6            |

Frecuency : 121.95 MHz

# Setup (CPU + FPGA)







#### 13TH GEN INTEL(R) CORE(TM) 17-13700H 2.40 GHZ



#### **ALVEO U250**



## **Results CPU vs GPU vs FPGA**

#### Process: e- e+ > u- u+



#### Advantages and disadvantages of HLS for this application

#### Advantages:

- Increased Efficiency and Productivity
- Better for prototyping
- Reduce the development time
- Enables developers to program FPGA in high-level languages

#### Disadvantages:

- Additional Translation Time
- Slower Than coding RTL
- Less control over memory resources

## FUTURE IMPLEMENTATIONS

- Write the code in VHDL for the simple process
- Create the code for a complex process
- Create the new version for all LO processes
- Study the implementation of BSM processes and NLO
- Create a version that combines all three implementations (CPU + GPU + FPGA)



**MINISTERIO DE CIENCIA. INNOVACIÓN Y UNIVERSIDADES** 



**Financiado por** la Unión Europea **NextGenerationEU** 





"This work is supported by Ministerio de Ciencia, Innovación y Universidad con fondos Next Generation y del Plan de Recuperation, Transformacionales y Resiliencia (project – TED2021–130852B–100)"

## Porting MADGRAPH to FPGA

Héctor Gutiérrez<sup>1</sup>, Luca Fiorini, Alberto Valero, Arantza Oyanguren, Francisco Hervas, Carlos Vico, Javier Fernandez, Santiago Folgueras, Pelayo Leguina <sup>1</sup>Instituto de Física Corpuscular (CSIC-UV) 2nd Computing Challenges workshop (COMCHA) October 2nd, 2024





### Vniver§itat ® València





#### **Contact:** Hector.Gutierrez@ific.uv.es