A High Performance Multi-Core FPGA Implementation for 2D Pixel Clustering for the ATLAS Fast TracKer (FTK) Processor

3 Jun 2014, 11:40
20m
Administratiezaal (Beurs van Berlage)

Administratiezaal

Beurs van Berlage

Oral Data-processing: 3c) Embedded software III.c Embedded Software

Speaker

Calliope-louisa Sotiropoulou (Aristotle Univ. of Thessaloniki (GR))

Description

The high performance multi-core 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors read out drivers (RODs) at 760Gbps, the full rate of level 1 triggers. Clustering is required as a method to reduce the high rate of the received data before further processing, as well as to determine the cluster centroid for obtaining obtain the best spatial measurement. Our implementation targets the pixel detectors and uses a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The design is fully generic and the cluster detection window size can be adjusted for optimizing the cluster identification process. Τhe implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. This flexibility makes the implementation suitable for a variety of demanding image processing applications. The implementation is robust against bit errors in the input data stream and drops all data that cannot be identified. In the unlikely event of missing control words, the implementation will ensure stable data processing by inserting the missing control words in the data stream. The 2D pixel clustering implementation is developed and tested in both single flow and parallel versions. The first parallel version with 16 parallel cluster identification engines is presented. The input data from the RODs are received through S-Links and a single data stream is also required by the processing units that follow the clustering implementation. Data parallelizing (demultiplexing) and serializing (multiplexing) modules are introduced in order to accommodate the parallelized version and restore the data stream to a single flow afterwards. The results of the first hardware tests of the single flow implementation on the custom FTK input mezzanine (IM) board are presented. We report on the integration of 16 parallel engines in the same FPGA, the resulting performances and the first parallel version hardware tests. The parallel 2D-clustering implementation has sufficient processing power to meet the specification for the Pixel layers of ATLAS, for up to 80 overlapping pp collisions that correspond to the maximum LHC luminosity planned until 2022.

Primary author

Calliope-louisa Sotiropoulou (Aristotle Univ. of Thessaloniki (GR))

Presentation materials