Speaker
Description
The rapid development of very intense X-ray sources and faster detectors make it possible to perform entirely new experiments, such as studying very fast processes in biology and materials science. However, this also leads to increasing volumes of data that need to be stored and processed; for example, a 4 megapixel detector taking images at 100,000 frames per second will produce around 800 Gigabytes/s data. To provide fast feedback to experimenters and to reduce demands on storage, it is necessary to develop a data processing pipeline to convert raw data to meaningful images on the fly and perform appropriate data reduction.
Firstly, we are working on speeding up data processing with hardware acceleration. Vendors now offer FPGA accelerator cards for data centre PCs with built-in 100 Gigabit Ethernet links and high bandwidth memory, which could be used to directly receive and buffer detector data before processing it. Additionally, vendors provide software tools for programming these FPGAs with more conventional programming languages (e.g. C++ with OpenCL) rather than hardware description languages. The data processing will consist of image correction (which is relatively fixed for a given detector) and then a first phase of relatively generic data reduction. This data reduction includes, for example, implementation of standard lossless data compression methods.
Secondly, we are working on new methods for data reduction, particularly using machine learning. For example, in FEL experiments such as serial crystallography or serial particle imaging, a large fraction of images are bad, due to the beam missing the sample, and rejecting these images before saving them to disk would greatly reduce the data volume [1]. Machine learning techniques allow this distinction to be learned from simulated data or previous experiments, rather than explicitly programmed, which could allow data rejection during an experiment without expert intervention. To do this task, we have tested a range of “conventional” machine learning methods, based on feature detection algorithms from computer vision, as well as deep learning methods. These methods show good success rates on existing datasets, and work is continuing on ensuring these methods can be generalized to new, unfamiliar datasets.
[1] M Wiedorn et al. Nature communications 9-1 (2018), 1-11
We acknowledge “Helmholtz IVF project InternLabs-0011 (HIREX)” and “Helmholtz Innovationspool project Data-X” for funding.