Speaker
Dr
David Mason
(FNAL)
Description
CMS' infrastructure to process, store and analyze data is based on worldwide distributed tiers of computing resources. Monitoring and trouble shooting of all parts of the computing infrastructure, and importantly the experiment specific data flows and workflows running on this infrastructure, is essential to guarantee timely delivery of processed data to the physicists. This is especially important during startup and commissioning where the software and also the computing systems are not yet completely well behaved.
This talk will present the operation, monitoring and trouble shooting of the global CMS data and workflow infrastructure from the Fermilab Remote Operation Center (ROC). It will put an emphasis on the description of remote operation protocols and procedures developed during the first cosmics data taking periods. The talk will point out problems of being physically separated from the infrastructure and the detector operation. It will stress the importance of a well designed infrastructure with a multitude of communication possibilities, from phone calls to state-of-the-art video connections. Also it will point out the advantage of being able to provide operational support outside European working hours without putting too much load on the shift personnel. Overall, the talk will describe the success story of remote operation from Fermilab and give recipes for similar future projects.
Presentation type (oral | poster) | oral |
---|
Author
Dr
David Mason
(FNAL)