



## Flexible Hough Transform FPGA Implementation for the ATLAS Event Filter

F.Alfonsi<sup>1</sup> on behalf of the ATLAS TDAQ Community

<sup>1</sup> Istituto Nazionale di Fisica Nucleare Bologna

fabrizio.alfonsi@bo.infn.it



BACKGROUND AND MOTIVATION Several machine and detector upgrades are taking place in view of the LHC high luminosity running which is scheduled to start in 2029.

The number of simultaneous collisions (pile-up) per bunch crossing is expected to increase from the average of 60 of Run2 to 140-200. The peak luminosity is planned to reach 5-7.5 x 10<sup>34</sup> cm<sup>-2</sup> s<sup>-1</sup>[1].

[1] ATLAS Collaboration, Technical Design Report for the Phase-II Upgrade of the ATLAS TDAQ System, ATLAS-TDR-029.
[2] ATLAS Collaboration, Technical Design Report for the Phase-II Upgrade of the ATLAS Trigger and Data Acquisition System
- Event Filter Tracking Amendment, ATLAS-TDR-029-ADD-1



This new physics environment will need upgrades for the experiments. ATLAS experiment its detector and upgrade consequently the Trigger and Data Acquisition system, to reach 10 kHz of output data stream with respect to the current 1.5 kHz, after the Event Filter (EF) operations, this using real-time operations. To achieve this a hardware acceleratorbased farm can join the CPU-based Processor Farm to speed up the tracking from the inner-most detectors [2].



EVENT FILTER TRACKING The Event Filter Tracking is an ATLAS project aimed to implement a technologically commodity hardware architecture to join the EF, exploring and exploiting FPGAs, GPUs and high-performing CPUs. It is studying the performance of the Hough Transform (HT) tracking algorithm to use at Trigger level. To exploit it, the ATLAS coordinate system is in polar coordinates, with the position of the particle be defined as radius "r" and azimuth angle " $\phi$ ".



Generic example of HT algorithm: two points in the left are connected in the right thanks to a change in coordinate system

A flexible HT implemented on FPGA has been developed as candidate for (raw) particle tracking for filtering.

The FPGA boards used for implementation tests are Xilinx commercial demonstrators VC709 and VCU1525. Studies of implementation on the Alveo U250 are on-going.



## Flexible Hough Transform

Goal: Versatile HT that targets compute time performance versus occupied resources through high parallelization and high clock rate

ALGORITHM APPROACH



A qA/ $p_t$ : $\phi_0$  2D histogram called "Accumulator" (center of the image on the left), made of several bins, is the core of the algorithm. The HT operations are:

- Accumulator filling using the original HT formula in parallel across all incoming bins.
- Extraction of candidate tracks from the accumulator by applying the original HT formula "again" across all event clusters in parallel.

The developed architecture allows to choose which HT formula to use based on best performance achieved or the requirements to reach:

- $qA/p_t = (\phi_0 \phi)/r$ ;
- $\bullet \quad \phi_0 = \phi + (r * qA/p_t);$

where  $p_t$  represents the particle momentum and  $\phi_0$  the azimuth angle of the track. The selection of the candidate tracks ("road") is done by overlapping a minimum amount of lines drawn in the accumulator to reach a required number of layers. As example of this concept shown in the left picture, the bins quintuplet 6-7-8-7-6 is searched.

FIRMWARE DESIGN

To reach high frequency, the firmware design can separate the implemented FPGA components in several different clock domains all with the same period. This allows the implemented blocks to be considered as independent circuits in the FPGA and so be placed more freely and performing.





The design has been featured with a double storage structure for all the necessary information, including a second accumulator and a second store for the event clusters. This allows to process two events concurrently and consider the firmware as two macro-blocks working separately. The left image shows a simulation where the output stream of the firmware for an event is active, while the input of the next event has been acquired.

ALGORITHM AND FIRMWARE PERFORMANCE

| Resource | Available | Utilization % |
|----------|-----------|---------------|
| LUT      | 1728000   | 16.78         |
| LUTRAM   | 791040    | 1.49          |
| FF       | 3456000   | 18.78         |
| BRAM     | 2688      | 20.35         |
| DSP      | 12288     | 17.77         |
| 10       | 676       | 0.30          |
| BUFG     | 1344      | 1.64          |
| ммсм     | 16        | 31.25         |

The current algorithm application is studying the performance in the barrel region of ITk, applying the same HT version in terms of accumulator and the rest of the parameters. The results shown here are for an accumulator of **168 bins alongside qA/p<sub>t</sub> and 48 bins alongside \phi\_0**, considering to use 8 layers of the 13 available with layer threshold for the road activation of 7 layers.

The left table shows the implementation results for the FPGA card Alveo U250. It includes the resources occupied. Below the image showing the matching of the timing constraints for a **frequency of 400 MHz**. The estimated processing times for the accumulator building and the cluster extrapolation depend on the average amount of clusters in input for the most populated layer and on the average number of roads extracted per event:

- 3000 ns;
- 2700-4500 ns;

| Worst Negative Slack (WNS):  | 0.01 ns |
|------------------------------|---------|
| Total Negative Slack (TNS):  | 0 ns    |
| Number of Failing Endpoints: | 0       |
| Total Number of Endpoints:   | 1367922 |

| Region η  | μ        | $\boldsymbol{\pi}$ |
|-----------|----------|--------------------|
| 0.1 - 0.3 | > 96.5 % | > 90 %             |
| 0.7 - 0.9 | > 97 %   | > 82 %             |

The table above summarizes the preliminary physics performance for two barrel regions of the ITk detector: the region in  $\eta$  [0.1:0.3] and [0.7:0.9] for the same  $\varphi$  region [0.3:0.5] rad. These results are related to all the range of momentum studied in the  $q/p_t$  range [-1.0571758563701927: 1.0571758563701927]. Muons and pions tracking were studied. The performance represent the percentage of truth tracks found.

## Conclusions and Future Steps

The ATLAS experiment is investigating commodity solutions for the Event Filter Tracking, including the possible use of FPGA-based accelerators. A FPGA implementation of a flexible Hough Transform algorithm is proposed as a feasible and well-performing solution for fast tracking in HEP experiments. Firmware design has been defined and consolidated, capable to run at 400 MHz with compact resources utilization. Preliminary tracking performance studies are promising, showing > 95 % efficiency for reconstructing single muons with  $p_T > 1$  GeV. The next steps for the future months regards the completion of the barrel region studies and start of the full detector performance.