10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

ALICE HLT Run2 performance overview

10 Oct 2016, 11:45
15m
GG A+B (San Francisco Mariott Marquis)

GG A+B

San Francisco Mariott Marquis

Oral Track 1: Online Computing Track 1: Online Computing

Speaker

Mikolaj Krzewicki (Johann-Wolfgang-Goethe Univ. (DE))

Description

ALICE HLT Run2 performance overview

M.Krzewicki for the ALICE collaboration

The ALICE High Level Trigger (HLT) is an online reconstruction and data compression system used in the ALICE experiment at CERN. Unique among the LHC experiments, it extensively uses modern coprocessor technologies like general purpose graphic processing units (GPGPU) and field programmable gate arrays (FPGA) in the data flow.
Real-time data compression is performed using a cluster finder algorithm implemented on FPGA boards and subsequent optimisation and Huffman encoding stages. These data, instead of raw clusters, are used in storage and the subsequent offline processing. For Run 2 and beyond, the compression scheme is being extended to provide higher compression ratios.
Track finding is performed using a cellular automaton and a Kalman filter algorithm on GPGPU hardware, where CUDA, OpenCL and OpenMP (for CPU support) technologies can be used interchangeably.
In the context of the upgrade of the readout system the HLT framework was optimised to fully handle the increased data and event rates due to the time projection chamber (TPC) readout upgrade and the increased LHC luminosity.
Online calibration of the TPC using HLT's online tracking capabilities was deployed. To this end, offline calibration code was adapted to run both online and offline and the HLT framework was extended accordingly. The performance of this schema is important to Run 3 related developments. Online calibration can, next to being an important exercise for Run 3, reduce the computing workload during the offline calibration and reconstruction cycle already in Run 2.
A new multi-part messaging approach was developed forming at the same time a test bed for the new data flow model of the O2 system, where further development of this concept is ongoing.
This messaging technology, here based on ZeroMQ, was used to implement the calibration feedback loop on top of the existing, graph oriented HLT transport framework.
Utilising the online reconstruction of many detectors, a new asynchronous monitoring scheme was developed to allow real-time monitoring of the physics performance of the ALICE detector, again making use of the new messaging scheme for both internal and external communication.
The spare compute resource, the development cluster consisting of older HLT infrastructure is run as a tier-2 GRID site using an Openstack-based setup, contributing as many resources as feasible depending on the data taking conditions. In periods of inactivity during shutdowns, both the production and development clusters contribute significant computing power to the ALICE GRID.

Primary Keyword (Mandatory) Trigger
Secondary Keyword (Optional) DAQ
Tertiary Keyword (Optional) Distributed data handling

Author

Mikolaj Krzewicki (Johann-Wolfgang-Goethe Univ. (DE))

Presentation materials