Speaker
Description
As detector technologies improve, the increase in resolution, number of channels and overall size create immense bandwidth challenges for the data acquisition system, long data center compute times and growing data storage costs. Much of the raw data does not contain useful information and can be significantly reduced with veto and compression systems as well as online analysis.
We design integrated systems combining digitizers (ADC/TDC), encoders, communication interfaces and embedded machine learning to analyze and reduce data at the source, near or on the detectors. The goal of these systems is to minimize latency and maximize throughput and minimize power consumption while keeping the accuracy as high as possible.
As the final system requires all these modules to work seamlessly together, we built a DAQ testbench to validate the data flow from the detector to the compute nodes. This testbench is built around an Arbitrary Waveform Generator that emulates the digital or analog signal from the detector. This setup measures the performance of the entire system and find any chokepoints or unstable elements. Among the measured performance metrics are maximum throughput, total latency, average and maximum power and accuracy of the applied algorithms when compared to the expected output.
We are currently testing DAQ systems for two applications :
1 ) The CookieBox, an attosecond angular streaking detector used for X-ray pulse shape recovery generating ~800 GB/s. This system requires microsecond latency to apply a veto on downstream detectors. The complete embedded system includes an ADC, an FIR filter, a peak finder algorithm, an optimized quantizer, a neural network to measure the signal characteristics and an Ethernet interface to the compute node. The neural network was improved over its previous implementation and currently operates at 0.14 µs latency and a theoretical 6.67 million events per second maximum throughput on a Virtex VCU128 board. Final assembly of the entire system for testing detector to compute node is underway.
2 ) The billion pixel X-ray camera for use in synchrotrons, XFEL facilities and pulsed power facilities generating up to 15 TB/s. The goal is to compress the camera image in place with no loss of information. To achieve a high compression ration with very low latency, we train neural networks to emulate the ISTA algorithm, which accelerates the processing time for each patch and uses operation directly compatible with hardware. This encoding is followed by a DEFLATE compression. The network compresses each 6x6 pixel patches in 1.01 µs with a ratio of 87:1 when implemented on a ZYNQ ZCU104 running at 100 MHz frequency. When optimizing using a latency strategy, the network achieves a 101:1 compression in 0.89 µs. Current work is underway to process larger patches using decomposed matrices.
The DAQ testbench will let us both qualify best performance and understand the throughput limitations and the power consumption for these complex systems, and hopefully increase buy in by potential users currently limited by data rates.