Nov 4 – 8, 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Feasibility tests of RoCE for the cluster-based event building in LHCb

Nov 7, 2019, 3:30 PM
1h
Hall F (Adelaide Convention Centre)

Hall F

Adelaide Convention Centre

Poster Track 1 – Online and Real-time Computing Posters

Speaker

Rafal Dominik Krawczyk (CERN)

Description

This paper evaluates the utilization of RDMA over Converged Ethernet (RoCE) for the Run3 LHCb event building at CERN. The acquisition system of the detector will collect partial data from approximately 1000 separate detector streams. Total estimated throughput equals 40 terabits per second. Full events will be assembled for subsequent processing and data selection in the filtering farm of the online trigger. As a result, inter-node large-throughput transmissions with a combination of 100 and 25 Gigabit-per-second will be essential features of the system. Therefore, the data exchange mechanism of the cluster must utilize memory-lightweight data transmission protocols.
In this work, the RoCE high-throughput kernel bypass Ethernet-based protocol is benchmarked as an applicable technology for the event building network. CPU and memory bandwidth utilization for RoCE-based data transmissions is investigated and discussed. A comparison of RoCE with InfiniBand protocol is presented. Preliminary performance results are discussed with the selected network hardware supporting the protocol. Relevant utilization and interoperability issues are detailed along with lessons learned along the road.

Consider for promotion No

Primary authors

Rafal Dominik Krawczyk (CERN) Tommaso Colombo (CERN) Niko Neufeld (CERN) Flavio Pisani (Universita e INFN, Bologna (IT)) Sebastien Valat (CERN)

Presentation materials