CHEP 2018 Conference, Sofia, Bulgaria

Name: CHEP 2018 Conference, Sofia, Bulgaria
Start: 2018-07-09T08:00:00+03:00
End: 2018-07-13T13:00:00+03:00
Location: Sofia, Bulgaria

9–13 Jul 2018

Sofia, Bulgaria

Europe/Sofia timezone

Contact us

Using AWS Athena analytics to monitor pilot job health on WLCG compute sites

10 Jul 2018, 14:45

15m

Hall 7 (National Palace of Culture)

Hall 7

National Palace of Culture

presentation Track 3 – Distributed computing T3 - Distributed computing

Peter Love (Lancaster University (GB))

ATLAS Distributed Computing (ADC) uses the pilot model to submit jobs to Grid computing resources. This model isolates the resource from the workload management system (WMS) and helps to avoid running jobs on faulty resources. A minor side-effect of this isolation is that the faulty resources are neglected and not brought back into production because the problems are not visible to the WMS. In this paper we describe a method to analyse logs from the ADC resource provisioning system (AutoPyFactory) and provide monitoring views which target poorly performing resources and help diagnose the issues in good time. Central to this analysis is the use of Amazon Web Services (AWS) to provide an inexpensive and stable analytics platform. In particular we use the AWS Athena service as an SQL query interface for logging data stored in the AWS S3 service. We describe details of the data handling pipeline and services involved leading to a summary of key metrics suitable for ADC operations.

Peter Love (Lancaster University (GB)) Thomas George Hartland Wesley Frost

aws-athena-love

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Using AWS Athena analytics to monitor pilot job health on WLCG compute sites

Hall 7

National Palace of Culture

Speaker

Description

Authors

Presentation materials

Choose timezone

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Speaker

Description

Authors

Presentation materials