3–7 Nov 2008
Ettore Majorana Foundation and Centre for Scientific Culture
Europe/Zurich timezone

ATLAS Handling Problematic Events in Quasi Real-Time

3 Nov 2008, 17:00
25m
Ettore Majorana Foundation and Centre for Scientific Culture

Ettore Majorana Foundation and Centre for Scientific Culture

Via Guarnotta, 26 - 91016 ERICE (Sicily) - Italy Tel: +39-0923-869133 Fax: +39-0923-869226 E-mail: hq@ccsem.infn.it
Parallel Talk 1. Computing Technology Computing Technology for Physics Research

Speaker

Hegoi Garitaonandia (NIKHEF)

Description

The ATLAS experiment at CERN will require about 4000 CPUs for the online data acquisition system (DAQ). When the DAQ system experiences software errors, such as event selection algorithm problems, crashes or timeouts, the fault tolerance mechanism routes the corresponding event data to the so called debug stream. During first beam commissioning and early data taking, a large fraction of events is expected to end up in this stream. In order to identify problems with the DAQ as soon as possible and reduce the turn-around time for fixing these problems, it is of prime importance to treat the debug stream. We have adopted a quasi real-time approach. We have developed an automated system that analyzes the contents of the debug stream and provides fine grained error classification. A high percentage of error events is related to online transient problems. Many of those events are recovered by feeding them to an independent system that reruns the trigger software. To be flexible in terms of computing power requirements, we added a layer of abstraction over the computing backend. This gives the possibility of using the Grid as well as dedicated resources. Using cosmic ray runs, we validated the automatic error analysis and recovery procedure.

Primary authors

Presentation materials