Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

15–18 Oct 2024
Purdue University
America/Indiana/Indianapolis timezone

An open platform for in-situ high-speed computer vision with hls4ml

16 Oct 2024, 15:20
5m
Steward Center 306 (Third floor) (Purdue University)

Steward Center 306 (Third floor)

Purdue University

128 Memorial Mall Dr, West Lafayette, IN 47907
Lightning 5 min talk + poster Lighting talks

Speaker

Ryan Forelli (Northwestern University)

Description

Low latency machine learning inference is vital for many high-speed imaging applications across various scientific domains. From analyzing fusion plasma [1] to rapid cell-sorting [2], there is a need for in-situ fast inference in experiments operating in the kHz to MHz range. External PCIe accelerators are often unsuitable for these experiments due to the associated data transfer overhead, high inference latencies, and increased system complexity. Thus, we have developed a framework to streamline the process of deploying standard streaming hls4ml neural networks and integrating them into existing data readout paths and hardware in these applications [3]. This will enable a wide range of high-speed intelligent imaging applications with off-the-shelf hardware.

Typically, dedicated PCIe machine vision devices, so-called frame grabbers, are paired with high-speed cameras to handle high throughputs, and a protocol such as CoaXPress is used to transmit the raw camera data between the systems over fiber or copper. Many frame grabbers implement this protocol as well as additional pixel preprocessing stages on an FPGA device due to their flexibility and relatively low cost compared to ASICs. Some manufacturers, such as Euresys, have enabled easy access to their frame grabber FPGAs’ firmware reference design. This reference design, aptly named CustomLogic [4], allows the user to implement custom image processing functions on the available portion of the FPGA. Moreover, open-source co-design workflows like hls4ml enable easy translation and deployment of neural networks to FPGA devices, and have demonstrated latencies on the order of nanoseconds to microseconds [5]. Successful applications using a variety of FPGA accelerators have been demonstrated in many domains including particle physics and materials science. We provide the necessary wrappers, support files, and instruction to integrate an hls4ml model onto a frame grabber device with a few lines of code.

We will present two comprehensive tutorials in collaboration with Euresys to demonstrate the full quantization-aware training-to-deployment and benchmarking process, in addition to hls4ml’s advanced feature set. We will also discuss and explore existing and potential applications. This work ultimately provides a convenient framework for performing in-situ inference on frame grabbers for high-speed imaging applications

References
[1] Wei, Y., Forelli, R. F., Hansen, C., Levesque, J. P., Tran, N., Agar, J. C., Di Guglielmo, G., Mauel, M. E., Navratil, G. A. Review of Scientific Instruments, “Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak” 95(7), 073509 (2024),
https://doi.org/10.1063/5.0190354.
[2] Nitta, N., Sugimura, T., Isozaki, A., Mikami, H., Hiraki, K., Sakuma, S., et al. Cell, “Intelligent Image-Activated Cell Sorting” 175(1), 266-276.e13 (2018), https://doi.org/10.1016/j.cell.2018.08.028.
[3] hls4ml-frame-grabbers, GitHub repository, https://github.com/fastmachinelearning/hls4ml-frame-grabbers.
[4] Euresys, “CustomLogic,” https://www.euresys.com/en/CustomLogic, Euresys S.A., Seraing, Belgium (2021).
[5] Duarte, J., Han, S., Harris, P., Jindariani, S., Kreinar, E., Kreis, B., Ngadiuba, J., Pierini, M., Rivera, R., Tran, N., Wu, Z. J. Instrum., “Fast inference of deep neural networks in FPGAs for particle physics” 13, P07027 (2018), https://doi.org/10.1088/1748-0221/13/07/P07027.

Primary author

Ryan Forelli (Northwestern University)

Co-authors

Presentation materials