Speaker
Description
Data-driven readout architectures produce unsorted streams of data packets with variable latency. Reconstructing an event frame, defined as grouping packets from the same time window, requires a sorting operation. Its complexity increases proportionally to the occupancy and distance between the packets’ source and the sorting step.
This contribution presents an on-chip bucket sorting module for high-throughput applications processing up to 200 Gbps with <1% packet loss. Implemented for the LA-Picopix ASIC using 28nm technology and dual-port SRAMs, the module achieves a power consumption <250 mW and an area footprint <7 mm², operating at 320 MHz with fully triplicated control logic.
Summary (500 words)
Modern pixel detectors in high-energy physics and imaging systems require read-out architectures capable of handling extreme particle rate density in the order of GHz/cm2 producing hundreds of Gbps of data packets, while adhering to stringent power and area constraints. To address these challenges, the CERN EP R&D WP5 on IC Technologies is developing solutions to increase the processing capabilities directly on-chip.
Traditional frame-based read-out systems struggle with the high frequency and small size of events in pixel detectors in tracking applications. In contrast, data-driven read-out architectures are more data-efficient and better suited to high hit rates, but they produce unsorted streams of pixel packets. This results in variable packet latencies, as the time from pixel hit to ASIC output depends on hit rate, pixel location, buffer states, and arbitration mechanisms. Sorting packets by event tag at the output stage simplifies downstream processing, reduces off-chip bandwidth requirements, and groups data more compactly.
This contribution presents the packet sorting module developed for the LA-Picopix ASIC, a multi-purpose detector proposed for the LHCb Velopix2 upgrade, featuring ~25ps time resolution and front-end clustering.
The chosen architecture employs simplified bucket sorting to maximize data throughput and minimize power consumption, continuously grouping packets by their time of generation. The module contains several accumulation buckets, each corresponding to a given value of the chosen sorting tag, such as the event tag. Buckets collect packets within a time window defined by the accumulation latency; once the window closes, the bucket is emptied and reassigned to collect a new event tag. Incoming packets are either written to the bucket collecting their tag or discarded if no suitable bucket is available.
The design space exploration was performed using the ESL methodology integrated in the PixESL framework. A high-level model of the sorter module was integrated into the LA-Picopix model and simulated with physics Monte Carlo data. This approach estimates the system's latency and event size distribution, which drives the module's sizing, assuming a data loss target below 1%.
This high-level study led to a 60-bucket design, where each bucket can store from 128 to 384 packets, depending on the mode of operation. The input bandwidth can go up to 184 Gbps operating at 320 MHz, while the output bandwidth is limited to 100 Gbps from the ASIC's output channel.
The design is implemented in a 28 CMOS technology where the buckets are based on dual-port foundry SRAM. All control logic is protected by triple module redundancy, avoiding control issues during operation. The data path instead is not protected due to the area and power overhead and the routability problems that might arise.
The physical implementation achieves timing closure at 320 MHz in all corners at a nominal supply of 0.9 V, showing a power consumption below 250 mW under maximum load. It has a layout of approximately 7x1 mm2 to fit the LA-Picopix periphery floorplan.
This result proves the feasibility of on-chip packet sorting in high-throughput data-driven applications, such as the LHCb VeLo detector upgrade.