Speaker
Description
Throughout the first year of LHC Run 2, ATLAS Cloud Computing has undergone
a period of consolidation, characterized by building upon previously established systems,
with the aim of reducing operational effort, improving robustness, and reaching higher scale.
This paper describes the current state of ATLAS Cloud Computing.
Cloud activities are converging on a common contextualization approach
for virtual machines, and cloud resources are sharing
monitoring and service discovery components.
We describe the integration of Vac resources, streamlined usage of the High
Level Trigger cloud for simulation and reconstruction, extreme scaling on Amazon EC2,
and procurement of commercial cloud capacity in Europe. Building on the previously
established monitoring infrastructure, we have deployed a real-time
monitoring and alerting platform which coalesces data from multiple
sources, provides flexible visualization via customizable dashboards,
and issues alerts and carries out corrective actions in response to
problems. Finally, a versatile analytics platform for data mining of
log files is being used to analyze benchmark data and diagnose
and gain insight on job errors.
Primary Keyword (Mandatory) | Cloud technologies |
---|---|
Secondary Keyword (Optional) | Monitoring |
Tertiary Keyword (Optional) | Computing models |