Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Efficient monitoring of CRAB3 jobs at CMS

11 Oct 2016, 15:30
1h 15m
San Francisco Marriott Marquis

San Francisco Marriott Marquis

Poster Track 7: Middleware, Monitoring and Accounting Posters A / Break

Speaker

Marco Mascheroni (Fermi National Accelerator Lab. (US))

Description

CRAB3 is a tool used by more than 500 users all over the world for distributed Grid analysis of CMS data. Users can submit sets of Grid jobs with similar requirements (tasks) with a single user request. CRAB3 uses a client-server architecture, where a lightweight client, a server, and ancillary services work together and are maintained by CMS operators at CERN.

As with most complex software, good monitoring tools are crucial for efficient use and long-term maintainability. This work gives an overview of the monitoring tools developed to ensure the CRAB3 server and infrastructure are functional, help operators debug user problems, and minimize overhead and operating cost.

CRABMonitor is a dedicated javascript-based monitoring page dedicated to monitoring system operation. It gathers the results from multiple CRAB3 APIs exposed via a REST interface. It is used by CRAB3 operators to monitor the details of submitted jobs, task status, configuration and parameters, user code, and log files. Links to relevant data are provided when a problem must be investigated. The software is largely javascript developed in-house to maximize flexibility and maintainability, although jQuery is also utilized.

In addition to CRABMonitor, a range of Kibana "dashboards" which have been developed to provide real-time monitoring of system operations will also be presented

Primary Keyword (Mandatory) Distributed workload management
Secondary Keyword (Optional) Monitoring

Primary authors

Emilis Antanas Rupeika (Vilnius University (LT)) Marco Mascheroni (Fermi National Accelerator Lab. (US))

Co-authors

Diego Ciangottini (Universita e INFN, Perugia (IT)) Eric Vaandering (Fermi National Accelerator Lab. (US)) Jadir Marra Da Silva (UNESP - Universidade Estadual Paulista (BR)) Jose Hernandez (CIEMAT) Justas Balcas (California Institute of Technology (US)) Stefano Belforte (Universita e INFN, Trieste (IT))

Presentation materials