Speaker
Description
The DoWnstream Tracker (DWT) is a system of interconnected FPGAs reconstructing, in the upcoming LHC Run 4, stubs of tracks from the LHCb tracking subdetector located downstream to the magnet (SciFi).
Based on the high parellisable architecture “Artificial Retina”, the DWT aims at accelerating the LHCb reconstruction in the High Level Trigger 1, implemented on GPUs, by injecting track “primitives” to speed up combinatorial tasks.
Presented here is the work on the DWT interface with the current LHCb DAQ chain, from the design to the tests performed at the LHCb Coprocessor TestBed facility.
Summary (500 words)
With an increase of luminosity up to $1.5×10^{34}$ cm$^{-2}$s$^{-2}$ for the LHC Run 5 (a factor 7.5 higher w.r.t. current Run 3), the LHCb Collaboration is seeking new heterogeneous data taking solutions with the intent of maintaining the current ability of reconstructing events at the average LHC bunch crossing rate of 30 MHz.
Already for Run 4, a system of ~100 interconnected FPGAs, reconstructing tracks through the high parellisable “Artificial Retina” architecture, has been approved to be built: The DoWnstream Tracker (DWT). The purpose of this system is twofold: providing acceleration to LHCb event reconstruction in Run 4, and being an R&D effort in view of the upcoming LHC Run 5. The $\mathcal{O}(n)$ time complexity of such architecture, in fact, makes it desirable for high luminosity scenarios.
A copy of the clusters coming from the Scintillating Fiber tracking subdetector of LHCb will be sent to the DWT, which then will inject track stubs primitives into the first layer of the High Level Trigger (HLT1), implemented on GPUs.
The system is therefore an intermediate step between the readout of the detector and the event building step, this means that not only the DWT needs to operate at the bunch crossing rate of 30 MHz, but latencies must be contained as well.
As the current LHCb DAQ system was not designed with this in mind, the DWT is hosted on a separated cluster of servers. Its interface with the LHCb DAQ comprises four steps: (1) detector read-out as additional consumer (on top of the current consumer dedicated to event building) and dispatch via InfiniBand to the DWT servers, (2) receiving SciFi clusters and loading data on the FPGAs, (3) extracting track “primitives” from the FPGAs and send them back to the DAQ servers dedicated to event-building (EBs) via InfiniBand, (4) reception of primitives on EBs and injection in the normal LHCb chain towards HLT1.
In addition to this, detector alignment needs to be dispatched to all FPGAs, on top of all the monitoring and control routines needed for operating the system.
The development of such interface has already started at the LHCb Coprocessor TestBed facility, and each of the aforementioned steps has posed its challenges. Step (1) required modification of the current readout board driver, so that two consumers can access detector data from a circular buffer with safe concurrency. Step (2) called for development and optimisation of DMA transfers to the FPGA using descriptor handling internal to the FPGA custom firmware to lower transfer latency and enhance data integrity at PCIe Gen4 maximum throughput. All steps (1), (2), (3) needed careful optimisation of NUMA bindings in order to achieve maximum throughput with multiple streams inside the same server without interference.
In this talk we present the status and test results of the described system interface, in addition to the development of the DWT integration with the LHCb Run Control and monitoring system.