Speaker
Description
The second stage of the LHCb software trigger application consists of an
up-front event reconstruction followed by approximately 500 independent
event selections. Monitoring of the performance of the reconstruction and
selections is crucial to detect and address issues that may appear as soon
as possible. To this end, each process produces approximately $3 * 10^3$
monitoring histograms, resulting in $1.35 * 10^8$ histograms that need to
be aggregated. To achieve this, a tiered architecture of monitoring
infrastructure tasks has been put in place, which separately propagates
histogram descriptions and histogram increments using ØMQ. As this separation
is not possible with ROOT, it is based on standard containers and a few
custom classes serialized using boost::serialization.
As a result of the asynchronous processing, different nodes may process
data from different data-taking intervals. This results in up to $5 * 10^{5}$
ROOT histograms, belonging to 300 data-taking intervals, that need to be
written to a single file per interval every few minutes. Improvements in
ROOT's parallelization have allowed simplification of the code, but some
bottlenecks are still present.