17–21 Oct 2016
LBNL
US/Pacific timezone

Platform Providing Network Awareness to ATLAS and Beyond

17 Oct 2016, 15:50
25m
Building 50 Auditorium (LBNL)

Building 50 Auditorium

LBNL

Berkeley, CA 94720
Security & Networking Security & Networking

Speaker

Ilija Vukotic (University of Chicago (US))

Description

With the change of the ATLAS computing model from hierarchical to dynamic, processing tasks are dispatched to sites based not only on availability of resources but also network conditions along the path between compute and storage, which may be topologically and/or geographically distant. We describe a system developed to collect, store, analyze and provide timely access to the network conditions for ATLAS sites, which is also generalized for broader use. We describe the data we collect from four different sources giving orthogonal views of network performance and utilization. The pre-existing ATLAS Distributed Computing Analytics platform is used for data transport and storage. The platform provides interactive monitoring dashboards, and serves as a backend to an alarm and alert system which we have developed for site operators. A co-located Jupyter service is used to perform in-depth interactive data analysis, train different Machine Learning algorithms and test models on historical data. We discuss how the derived knowledge gets used by ATLAS for network anomaly detection, job scheduling and data brokering.

Primary author

Ilija Vukotic (University of Chicago (US))

Presentation materials