Within the ATLAS detector, the Trigger and Data Acquisition system is responsible for the online processing of data streamed from the detector during collisions at the Large Hadron Collider (LHC) at CERN. The online farm is composed of ~4000 servers processing the data read out from ~100 million detector channels through multiple trigger levels. The capability to monitor the ongoing data taking and all the involved applications is essential to debug and intervene promptly to ensure efficient data taking. The base of the current web service architecture was designed a few years ago, at the beginning of the ATLAS operation (Run 1). It was intended to serve primarily static content from a Network-attached Storage, and privileging strict security, using separate web servers for internal (ATLAS Technical and Control Network - ATCN) and external (CERN General Purpose Network and public internet) access. During these years, it has become necessary to add to the static content an increasing number of dynamic web-based User Interfaces, as they provided new functionalities and replaced legacy desktop UIs. These are typically served by applications on VMs inside ATCN and made accessible externally via chained reverse HTTP proxies. As the trend towards Web UIs continues, the current design has shown its limits, and its increasing complexity became an issue for maintenance and growth. It is, therefore, necessary to review the overall web services architecture for ATLAS, taking into account the current and future needs of the upcoming LHC Run 3.
In this paper, we present our investigation and roadmap to re-design the web services system to better operate and monitor the ATLAS detector, while maintaining the security of critical services, such as Detector Control System, and maintaining the separation of remote monitoring and on-site control according to ATLAS policies.
|Consider for promotion||No|