Speaker
Description
CRAB3 is a tool used by more than 500 users all over the world for distributed Grid analysis of CMS data. Users can submit sets of Grid jobs with similar requirements (tasks) with a single user request. CRAB3 uses a client-server architecture, where a lightweight client, a server, and ancillary services work together and are maintained by CMS operators at CERN.
As with most complex software, good monitoring tools are crucial for efficient use and long-term maintainability. This work gives an overview of the monitoring tools developed to ensure the CRAB3 server and infrastructure are functional, help operators debug user problems, and minimize overhead and operating cost.
CRABMonitor is a dedicated javascript-based monitoring page dedicated to monitoring system operation. It gathers the results from multiple CRAB3 APIs exposed via a REST interface. It is used by CRAB3 operators to monitor the details of submitted jobs, task status, configuration and parameters, user code, and log files. Links to relevant data are provided when a problem must be investigated. The software is largely javascript developed in-house to maximize flexibility and maintainability, although jQuery is also utilized.
In addition to CRABMonitor, a range of Kibana "dashboards" which have been developed to provide real-time monitoring of system operations will also be presented
Primary Keyword (Mandatory) | Distributed workload management |
---|---|
Secondary Keyword (Optional) | Monitoring |