Speaker
Description
Norwegian contributions to the WLCG consist of computing and storage resources in Bergen and Oslo for the ALICE and ATLAS experiments. The increasing scale and complexity of Grid site infrastructure and operation require integration of national WLCG resources into bigger shared installations. Traditional HPC resources often come with restrictions with respect to software, administration, and accessibility. Furthermore, expensive HPC infrastructure like fast interconnects is hardly used by grid workload.
As a cost-efficient solution, the Norwegian Grid resources are operated as two platforms within NREC, the Norwegian Research and Education Cloud, which is a cloud computing service operated by the Universities of Oslo and Bergen. It aims to provide easily accessible computing and storage infrastructure for national academic and scientific applications.
By using cloud technology instead of traditional HPC resources, WLCG installations benefit from a high degree of accessibility, flexibility, and scalability while the service provider ensures reliable and secure operation of infrastructure and network.
Orchestration of the virtual instances is based on the Infrastructure-as-a-service paradigm and implemented as declarative configuration files in Terraform. All custom host configuration, software deployment and cluster configuration are implemented as YAML code and deployed using Ansible.
This concept allows for the delivery of high-quality WLCG services with key features such as: fixed and opportunistic computing resources; ARC and JAliEn grid middleware; Slurm and HTCondor backend; CEPH disk storage integrated into Neic NDGF dCache; integrated tape storage; monitoring and alerting based on Prometheus/Grafana ecosystem; fully controlled setup by site admin; scalable extension; quick failover and recovery.
This presentation describes the capabilities of the Norwegian Research and Education Cloud and the strategy for provisioning of Grid computing and storage using the IaaS approach. Details on cluster management and monitoring as a service, flexible cluster orchestration, scalability and performance studies will be highlighted in the presentation.