Conveners
T7 - Clouds, virtualization and containers: S1
- Andrew McNab (University of Manchester)
T7 - Clouds, virtualization and containers: S3
- Fabio Hernandez (IN2P3/CNRS Computing Centre)
T7 - Clouds, virtualization and containers: S5
- Martin Sevior (University of Melbourne (AU))
T7 - Clouds, virtualization and containers: S7
- Dave Dykstra (Fermi National Accelerator Lab. (US))
The WLCG unites resources from over 169 sites spread across the world and the number is expected to grow in the coming years. However, setting up and configuring new sites to support WLCG workloads is still no straightforward task and often requires significant assistance from WLCG experts. A survey presented in CHEP 2016 revealed a strong wish among site admins for reduction of overheads...
The ATLAS experiment at the LHC relies on a complex and distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data. The High Level Trigger (HLT) component of the TDAQ system is responsible for executing advanced selection algorithms, reducing the data rate to a level suitable for recording to permanent storage. The HLT functionality is provided by a...
The cloud computing paradigm allows scientists to elastically grow or shrink computing resources as requirements demand, so that resources only need to be paid for when necessary. The challenge of integrating cloud computing into distributed computing frameworks used by HEP experiments has led to many different solutions in the past years, however none of these solutions offer a complete,...
IceCube is a cubic kilometer neutrino detector located at the south pole. CVMFS is a key component to IceCube’s Distributed High Throughput Computing analytics workflow for sharing 500GB of software across datacenters worldwide. Building the IceCube software suite across multiple platforms and deploying it into CVMFS has until recently been a manual, time consuming task that doesn’t fit well...
Reducing time and cost, through setup and operational efficiency increase is a key nowadays while exploiting private or commercial clouds. In turn this means that reducing the learning curve as well as the operational cost of managing community-specific services running on distributed environments became a key to success and sustainability, even more for communities seeking to exploit...
In the framework of the H2020 INDIGO-DataCloud project we have implemented an advanced solution for the automatic deployment of digital data repositories based on Invenio, the digital library framework developed by Cern. Exploiting cutting-edge technologies, like docker and Apache Mesos, and standard interfaces like TOSCA we are able to provide a service that simplifies the process of creating...
The CERN OpenStack Cloud provides over 200.000 CPU cores to run data processing analyses for the Large Hadron Collider (LHC) experiments. To deliver these services, with high performance and reliable service levels, while at the same time ensuring a continuous high resource utilization has been one of the major challenges for the CERN Cloud engineering team.
Several optimizations like...
The CERN OpenStack cloud has been delivering a wide variety of services to its 3000 customers since it entered in production in 2013. Initially, standard resources such a Virtual Machines and Block Storage were offered. Today, the cloud offering includes advanced features since as Container Orchestration (for Kubernetes, Docker Swarm mode, Mesos/DCOS clusters), File Shares and Bare Metal, and...
The Simulation at Point1 (Sim@P1) project was built in 2013 to take advantage of the ATLAS Trigger and Data Acquisition High Level Trigger (HLT) farm. The HLT farm provides more than 2,000 compute nodes, which are critical to ATLAS during data taking. When ATLAS is not recording data, this large compute resource is used to generate and process simulation data for the experiment. The Sim@P1...
The primary goal of the online cluster of the Compart Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is to build event data from the detector and to select interesting collisions in the High Level Trigger (HLT) farm for offline storage. With more than 1100 nodes and a capactity of about 600 kHEPSpec06, the HLT machines represent up to 40% of the combined Tier0/Tier-1...
As the development of cloud computing, more and more clouds are widely applied in the high-energy physics fields. OpenStack is generally considered as the future of cloud computing. However in OpenStack, the resource allocation model assigns a fixed number of resources to each group. It is not very suitable for scientific computing such as high energy physics applications whose demands of...
To improve hardware utilization and save man power in system management, we have migrated most of the web services in our institute (Institute of High Energy Physics, IHEP) to a private cloud build upon OpenStack since last few years. However, cyber security attacks becomes a serious threats to the cloud progressively. Therefore, a detection and monitoring system for cyber security threats is...
The HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record accounting information to give credit to cloud owners and to verify our use of commercial resources....
Virtualization is a commonly used solution for utilizing the opportunistic computing resources in the HEP field, as it provides an unified software and OS layer that the HEP computing tasks require over the heterogeneous opportunistic computing resources. However there is always performance penalty with virtualization, especially for short jobs which are always the case for volunteer computing...
This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics, CERN accelerator logging system and infrastructure monitoring. The Hadoop Service has started to expand its user base for researchers who want to perform analysis with big data technologies. Among many frameworks, Apache Spark is currently getting the most...
SWAN (Service for Web-based ANalysis) is a CERN service that allows users to perform interactive data analysis in the cloud, in a "software as a service" model. It is built upon the widely-used Jupyter notebooks, allowing users to write - and run - their data analysis using only a web browser. By connecting to SWAN, users have immediate access to storage, software and computing resources that...
In recent years, public clouds have undergone a large transformation. Nowadays, cloud providers compete in delivery specialized scalable and fault tolerant services where resource management is completely on their side. Such computing model called serverless computing is very attractive for users who do not want to worry about OS level management, security patches and scaling resources.
Our...
This contribution reports on the experience acquired from using the Oracle Cloud
Infrastructure (OCI) as an Infrastructure as a Service (IaaS) within the distributed computing environments of the LHC experiments. The bare metal resources provided in the cloud were integrated using existing deployment and computer management tools. The model used in earlier cloud exercises was adapted to the...
Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing, and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent...
CERN's batch and grid services are mainly focused on High Throughput computing (HTC) for LHC data processing. However, part of the user community requires High Performance Computing (HPC) for massively parallel applications across many cores on MPI-enabled intrastructure. This contribution addresses the implementation of HPC infrastructure at CERN for Lattice QCD application development, as...
During 2017 support for Docker and Singularity containers was added to
the Vac system, in addition to its long standing support for virtual
machines. All three types of "logical machine" can now be run in
parallel on the same pool of hypervisors, using container or virtual
machine definitions published by experiments. We explain how CernVM-FS
is provided to containers by the hypervisors, to...
Virtualization and containers have become the go-to solutions for simplified deployment, elasticity and workflow isolation. These benefits are especially advantageous in containers, which dispense with the resources overhead associated with VMs, applicable in all cases where virtualization of the full hardware stack is not considered necessary. Containers are also simpler to setup and maintain...
During 2017, LHCb created Docker and Singularity container definitions which allow sites to run all LHCb DIRAC workloads in containers as "black boxes". This parallels LHCb's previous work to encapsulate the execution of DIRAC payload jobs in virtual machines, and we explain how these three types of "logical machine" are related in LHCb's case and how they differ, in terms of architecture,...