# High Rate Camera Systems

## Our Approach for Control and Data

## **Control Path**

matth tech

matthew.hart@stfc.ac.uk technologysi.stfc.ac.uk

## Applications

We have designed and built the LOKI board as a local control system for detector systems. Designed to be cheap, small and low power, this can be located in the detector head. An on board MPSoC with Linux is used to run an instance of ODIN-Control. It takes care of all the basics of sensor configuration and operation. Such as clocks, SPI, I2C, resets, triggers. Accessed via a web based GUI it is quick to get up and running.

The MPSoC is a Trenz Zynq UltraScale+ module. This has programmable FGPA space for more complex sequencing and synchronous controls.



Top: The C100 CMOS Sensor Bottom: Samtec Firefly next to retimer ICs





Data from the ASIC/Sensor

4-14Gbps Aurora over optical

Alpha Data ADM-PCIE-9V5

e.g. Hexitec-MHz has 20 lanes at ~4Gbps

Baby-D Test structure 2 lanes at ~14Gbps

STFC has now designed several ASICs and CMOS sensors with high speed serializers for data output. Hexitec-MHz is a 80x80 pixel 1 million fps system for x-ray spectroscopy. C100 is a 2k x 2k CMOS sensor running at 2k fps for electron microscopy. Baby-D is a test structure with a frame rate of 534kHz and is our first major step towards a new x-ray sensor with high frame rate and huge dynamic range



Zynq UltraScale+ Module

LOKI Control Board in use with the Baby-D test structure

### Data Path

Data output from the front end ICs is transported optically to a rack room or desktop system where the links are combined into images, process and saved.

This could be +100 metres away from an experiment where power and space are more readily available.

100G links used to transfer data to the next level. Scope for this to be distributed by a 100G switch to multiple nodes.

#### 1<sup>st</sup> Level FPGA – Frame Building

An off the shelf board with large amounts of optical IO is used as a receiver of the serial Aurora 64B/66B data streams from the ASICs.

The purpose of this board is to combine the relatively modest 5-15Gbps streams into full images and then send them back out on 100G Ethernet to be stored or further processed.

The board we've chosen to use here is an Alpha Data ADM-PCIE-9V5 with a Virtex UltraScale Plus Device. Originally designed for Low Latency Trading, this board has 4 QSFP-DD cages giving capacity for up to 32 lanes input, less when use some capacity for data out on Ethernet.

#### 2<sup>nd</sup> Level FPGA – Frame Processing

Another off the shelf board can optionally be used to apply some processing or feature extraction from the 100G image stream.

Where the science allows the purpose of this board is to reduce the data from raw frames to useful information. In spectroscopy this can be by finding photons and adding the energy to a spectral histogram. In electron microscopy we'd look to find the sub pixel centre of mass and combine many events into a higher resolution single image.

HOZ VERT DIAG1 DIAG2



Left: Classifications of shared pixel events to be identified Right: Real time generated spectrum from a pixel in Hexitec-MHz following event finding and energy summing across pixels.



The board we've chosen for the first applications here is the Xilinx Alveo U55C. Selected for it's modest price/performance while retaining 2xQSFP cages to allow data in and out in the Ethernet domain.

#### Data Storage

Data is received into a PC node on a 100G NIC (not pictured). Using DPDK (data plane development kit) we have developed an approach to maximize the throughput from network to disk to get sustained throughput from the 100G links. Over 90Gbps sustained rates are possible with this implementation (left)

> Write performance of Highpoint SSD7540 PCIE card randwrite |256k batch size | 8x Samsung 980 pro NVME





Data here can be accessed over the PCIe or back out on Ethernet to a NIC

·\* Alveo U55C

#### PC Architecture

The overall data path concept is scalable in building blocks of 100G links.

A dual CPU system works well to handle a system with 2x 100G links which satisfies the needs of the Hexitec-MHz system.

We have a few test beds built as desktop machines as represented here. For the next larger scale system we will use a 4U chassis and motherboard designed for GPU processing. This has capacity for many PCIe devices to scale up the channel count for bigger pixel counts or faster sensors.



#### Data is saved to HDF5 files, for best performance we have been using NVMe raid arrays. Tests with a Highpoint carrier containing 8x Samsung 980 units running continuously showed sustained rates of 16GB/s (Right)

In the development phase of these new projects the aim is to be able to save a few mins of raw data. Enough to characterise the sensors and do some meaningful experiments. We are now testing with 16TB arrays of 8x2TB NVMe drives

## **Lessons Learn and Tips**

Though the principle looks straightforward, in practice DAQ can be really tough. Here are some things we've learnt to make it easier.

#### Serializer Receivers and Channel Bonding

- Keep in mind some hardware is designed for 20G+ kind of rates. With ASICs running down at the <10G level, sometimes you need to turn off default clock recovery circuits which are designed for higher rate signals.
- The Aurora Channel bonding process can timeout if you have not modified it to suit your needs and you are slow to go through the states
- If available a high speed scope is a really helpful bit of kit to debug early data issues before going to FPGA

#### Signal Integrity – Interconnect, Retimers and Power

- At >10G a normal bond wire can start to impede the signal. A ribbon bond is ideal for higher rates, but it's also possible to use double bonds to lower impedance. Keep in mind you might need bigger pads.
- We have had success with and without retimer ICs. If you do need them, they can be hard to find units that go slow enough. We have used Texas Instruments DS110DF410 and the DS280DF810 for faster systems.
- Serializer power supplies need to be stable, make them separate on both VDD and GND like you might an analog section.



Texas Instruments DS280DF810 Retimer IC sitting between the Baby-D ASIC and the Samtec Firefly transceiver Authors: Matthew Hart (UKRI STFC RAL) Adam Davis (UKRI STFC RAL) Ben Cline (UKRI STFC RAL) Dominic Banks (UKRI STFC RAL) Ivan Church (UKRI STFC RAL) Joseph Nobes (UKRI STFC RAL) Matt Roberts (UKRI STFC RAL) Matthew Veale (UKRI STFC RAL) Nicola Guerrini (UKRI STFC RAL) Thomas Gardiner (UKRI STFC RAL) Tim Nicholls (UKRI STFC RAL) William Ian Helsby (UKRI STFC Daresbury Lab) Craig Macwaters (UKRI STFC RAL)



Science and Technology Facilities Council