News and updates about organisation, facilities and technologies of the IN2P3 Computing Center.
News from CERN since the HEPiX Spring 2019 workshop.
Presentation of recent developments at Brookhaven National Laboratory's (BNL) Scientific Data & Computing Center (SDCC).
Diamond Light Source (DLS) is an X-ray syncrhotron on the Rutherford Appleton Laboratory Site in Oxfordshire, UK. This site report covers the latest developments at DLS in compute, storage and cloud computing as well as the infrastructure that underpins it.
News and developments from the Nordics
Status update of distributed cloud system development and applications at ASGC, Taiwan as well as the operation efficiency works will be reported.
I will give an update on the status and plans of the US ATLAS SWT2 Center.
This is the PIC report for HEPiX Autumn 2019 at Nikhef
The MALT project is a unique opportunity to consolidate all the current CERN telephony services (commercial PBX-based analogue/IP and proprietary IP) into a single IP-based cost-effective service, built on top of existing open source components and local developments, adapted to CERN users' needs, well integrated into the local environment and really multiplatform. This presentation describes...
Moving users’ data is never an easy process. When you migrate to a different file system, and would like to fit to users’ needs on Windows, Linux and Mac for more than 15 000 accounts, then the magic recipe becomes as difficult as the one for making perfect macaroons !
In this presentation, you will learn key facts on how we handle this complex data migration.
E-mail service is considered as a critical collaboration system. I will share our experience as CERN, regarding technical and organizational challenges when migrating 40 000 mailboxes from Microsoft Exchange to free and open source software solution: Kopano.
CERN Web Services are in the process of consolidating web site and web application hosting services using container orchestration.
The Kubernetes Operator pattern has gained a lot of traction recently. It applies Kubernetes principles to custom applications.
I will present how we leverage the Operator pattern in container-based web hosting services to automate the provisioning and management...
We will present an update on our site since the Spring 2019 report, covering our changes in software, tools and operations.
Some of the details to cover include our use of backfilling jobs via BOINC with cgroups, work with our ELK stack at AGLT2, updates on Bro/MISP at the UM site and information about our newest hardware purchases and deployed middleware.
We conclude with a summary of...
Short usual site presentation, as we'll be hosting the next 2020 autumn HEPIX meeting, for people to know us a bit.
If no room left, fine, if just 2-5mn fine too.
Updates of the KEK projects, including SuperKEKB and J-PARC as well as on the KEK computing research center from the last HEPiX workshop, will be presented.
This will be a quick update on what is happening at NERSC.
Computing center of IHEP has been supporting several HEP experiments for many years. LHCb is the new experiment we supported this year. We just upgraded AFS, HTCondor and EOS at IHEP.The presentation talks about the its current status and next plan.
An update on what's going on at INFN-T1 site
In the past, migrating from one Windows version to the latest one needed a full reinstallation of every single workstation, with all the inconveniences this represents for both users and IT staff.
For Windows 10, Microsoft claimed that the in-place upgrade works fine. How true is this statement?
This presentation will cover real-life feedback.
An update on CERN Linux support distributions and services.
An update on the CentOS community and CERN involvement will be given.
We will discuss software the collections, virtualization and OpenStack SIGs update.
Future plans regarding alternative architectures (ARM for SoCs, etc.) and CentOS 8.
The successful series of HTCondor workshops in Europe started in 2014 continued in 2019 with a workshop held from 24 to 27 September at the European Commission's Joint Research Centre in Ispra, Lombardy, Italy. We will give a short report of this workshop.
In this talk we present an HTC cluster which has been set up
at Bonn University in 2017/2018. On this fully-puppetised cluster all jobs
are run inside Singularity containers. Job management is handled
by HTCondor which nicely shields the container setup from the users.
The users only have to choose the desired OS via a job parameter from an
offered collection of container images. The...
The goal of the HTCondor team is to to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Increasingly, the work performed by the HTCondor developers is being driven by its partnership with the High Energy Physics (HEP) community.
This talk will present recent changes...
The talk provides an overview of the DESY configurations for HTCondor. It
focuses on features we need for user registry integration, node
maintenance operations and fair share / quota handling. We are working on
Docker, Jupyter and GPU integration into our smooth and transparent
operating model setup.
In this talk we will provide details about the scalable limits of the HTCondor transfer mechanism. How it depends on latency, finish rate and how it compares with pure HTTP transfer.
BEIJING-LCG2 is a one of the WLCG Tier 2 grid site. In this topic I will introduce how to running a tire 2 grid site. Including deployment, configuration, monitoring, security, troubleshooting, and VO support.
The benchmarking and accounting of compute resources in WLCG needs to be revised in view of the adoption by the LHC experiments of heterogeneous computing resources based on x86 CPUs, GPUs, FPGAs.
After evaluating several alternatives for the replacement of HS06, the HEPIX benchmarking WG has chosen to focus on the development of a HEP-specific suite based on actual software workloads of the...
In this presentation we'll discuss the design architecture of the HEP Workload benchmark containers, and the proposed replacement for HEPSPEC06, which is based on these containers. We'll also highlight the development efforts which have been completed thus far, and the tooling being used by the project. Finally we'll detail our plan for extending the the existing container benchmark suite to...
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and...
to be filled soon
The information security threats currently faced by WLCG sites are both sophisticated and highly profitable for the actors involved. Evidence suggests that targeted organisations take on average more than six months to detect a cyber attack, with more sophisticated attacks being more likely to pass undetected.
An important way to mount an appropriate response is through the use of a...
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects of...
High Energy Physics (HEP) experiments have greatly benefited from a strong relationship with Research and Education (R&E) network providers and thanks to the projects such as LHCOPN/LHCONE and REN contributions, have enjoyed significant capacities and high performance networks for some time. RENs have been able to continually expand their capacities to over-provision the networks relative to...
In August 2018, we upgraded our campus network. We replaced core switches, border routers, distribution switches to provide 1G/10G connectivity with authentication to end nodes. We have newly introduced firewall sets to segment inner subnets into several groups and transplanted all access control lists from core switches to the inner firewall.
We report our migration and operation history of...
The transition of WLCG central and storage services to dual-stack IPv4/IPv6 is progressing well, thus enabling the use of IPv6-only CPU resources as agreed by the WLCG Management Board. More and more WLCG data transfers now take place over IPv6. During this year, the HEPiX IPv6 working group has not only been chasing and supporting the transition to dual-stack services, but has also been...
We describe the software tool-set being implemented in the context of the NOTED [1] project to better exploit WAN bandwidth for Rucio and FTS data transfers, how it has been developed and the results obtained.
The first component is a generic data-transfer broker that interfaces with Rucio and FTS. It identifies data transfers for which network reconfiguration is both possible and beneficial,...
The NSF funded SAND project was created to leverage the rich network-related dataset being collected by OSG and WLCG, including perfSONAR metrics, LHCONE statistics, HTCondor and FTS transfer metrics and additional SNMP data from some ESnet equipment. The goal is to create visualizations, analytics and user-facing alerting and alarming related to the research and education networks used by...
Due to the amount of data expected from the experiments during RUN3, the CERN Computer Center network has to be upgraded. This presentation will explain all the ongoing works around the Computer Center network: change of router models to provide higher 100G ports density, links upgrade between the experiments and the Computer Center (CDR links), expected closure of Wigner Computer Center and...
Network overview concerning the new LHCb containers located at LHC point 8.
A total of 184 switches installed connected to 4 different routers.
New DWDM line system will be used to connect the IT datacentre extension in the LHCb containers.
This presentation will cover how CERN is proposing to provide the computing capacity needed for the LHC experiments for RUN3 and for RUN4. It will start with some history on the failed attempt to have a second Data Centre ready for RUN3, then describe the solution adopted for RUN3 instead and finally the current plans for RUN4.
The Open Compute Project (OCP) is an organization that shares designs for data centre products among companies.
Its mission is to design and enable the delivery of the most efficient server, storage and data centre hardware designs for scalable computing.
The project was started in 2011, and includes today about 200 members.
This talk will give a report from the 2019 OCP Global Summit,...
The monitoring infrastructure used at the computing centre at DESY, Zeuthen
aged over the years and showed more and more deficits in many areas.
In order to cope with current challenges, we decided to build up a new
monitoring infrastructure designed from scratch using different open source
products like Prometheus, ElasticSearch, Grafana, etc.
The talk will give an overview of our...
A number of co-located meetings were held at Fermilab in early September in the area of Federated Identities and AAI (Authentication and Authorisation Infrastructures) for Physics, including a F2F meeting of the WLCG Authorization Working Group and a mini-FIM4R meeting. This talk gives a high-level overview of these meetings and related recent progress in this area.
Presentation on SciTokens, a distributed authorization framework, and work to integrate distributed authorization technologies such as SciTokens and OAuth 2.0 into HTCondor.
BNL SDCC(Sentific Data and Computing Center) recently enabled SSO authentication strategy using Keycloak, supporting various SSO authentication protocols(SAML/OIDC/OAuth), and allowing multiple authentication options provided under one umbrella including Kerberos Auth, AD(Active Directory) and Federated Identity Authentication via CILogon with Incommon and social provider login. This solution...
I'll be showing collection and presentation tools for monitoring an IB network. Also discussing the ideas behind some of the collection decisions.
The increase in the scale of LHC computing during Run 3 and Run 4 (HL-LHC) will certainly require radical changes to the computing models and the data processing of the LHC experiments. The working group established by WLCG and the HEP Software Foundation to investigate all aspects of the cost of computing and how to optimise them has continued producing results and improving our understanding...
An update on the CERN Database on Demand service, which hosts more than 800 databases for the CERN user community supportin different open source systems such as MySQL, PostgreSQL and InfluxDB.
We will present the current status of the platform and the future plans for the service.
The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used.
Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed...
The STFC CASTOR tape service is responsible for the management of over 80PB of data including 45PB generated by the LHC experiments for the RAL Tier-1. In the last few years there have been several disruptive changes that have or are necessitating significant changes to the service. At the end of 2016, Oracle, which provided the tape libraries, drives and media announced they were leaving the...
The IT storage group at CERN provides tape storage to its users in the form of three services, namely TSM, CASTOR and CTA. Both TSM and CASTOR have been running for several decades whereas CTA is currently being deployed for the very first time. This deployment is for the LHC experiments starting with ATLAS this year. This contribution describes the current status of tape storage at CERN...
In this contribution the evolution of the CERN storage services and their applications will be presented.
The CERN IT Storage group's main mandate is to provide storage for Physics data: to this end an update will be given about CASTOR and EOS, with a particular focus on the ongoing migration from CASTOR to CTA, its successor.
More recently, the Storage group has focused on providing...
CephFS is used as the shared file system of the HTC cluster for
physicists of various fields at Bonn University since beginning
of 2018. The cluster uses IP over InfiniBand. High performance
for sequential reads is achieved even though erasure coding and
on-the-fly compression are employed.
CephFS is complemented by a CernVM-FS for software packages and
containers which come with many...
As one the main data centres in France, the IN2P3 Computing Center (CC-IN2P3, https://cc.in2p3.fr) provides several High Energy Physics and Astroparticles Physics experiments with different storage systems that cover the different needs expressed by these experiments.
The quantity of data stored at CC-IN2P3 is growing exponentially. In 2019, about two billion files are stored. By 2030, this...
The Scientific Data & Computing Center (SDCC) in BNL is responsible for accommodating the diverse requirements for storing and processing petabyte-scale data generated by ATLAS, Belle II, PHENIX, STAR, Simons etc. This talk presents the current operational status of the main storage services supported in SDCC, summarizes our experience in operating largely distributed systems, optimizing in...
CERN runs a private OpenStack Cloud with ~300K cores, ~3K users and several OpenStack services.
CERN users can build services from a pool of compute and storage resources using OpenStack APIs such as Ironic, Nova, Magnum, Cinder and Manila.
For that reason, CERN cloud operators face some operational challenges at scale in order to offer these services in a stable manner.
In this talk, you...
The Large High Altitude Air Shower Observatory (LHAASO) experiment of IHEP is located in Daocheng, Sichuan province (at the altitude of 4410 m), which generates a huge large amount of data and requires massive storage and large computing power.
This article will introduce the current status of LHAASO computing platform at Daocheng. And focus on virtualization technologies such as docker...
The need for an effective distributed data storage has appeared important from the beginning of LHC, and this topic has become particularly vital in the light of the preparation for the HL-LHC run and the emergence of data-intensive projects in other domains such as nuclear and astroparticle physics.
LHC experiments have started an R&D within the DOMA project and we report the recent results...
The Joint Genome Institute (JGI) is a part of the US department of energy and is serving the scientific community with access to high-throughput, high-quality sequencing, DNA synthesis, metabolomics and analysis capabilities. With ever increasing complexity of analysis workflows, and the demand burstable compute, it became necessary to be able to shift those workloads between sites. In this...
We will provide an update on the SLATE project (https://slateci.io), an NSF funded effort to securely enable service orchestration in Science DMZ (edge) networks across institutions. The Kubernetes-based SLATE service provides a step towards a federated operations model, allowing innovation of distributed platforms, while reducing operational effort at resource providing sites.
The...
The High Performance Computing (HPC) domain aims to optimize code in order to use the last multicore and parallel technologies including specific processor instructions. In this computing framework, portability and reproducibility are key concepts. A way to handle these requirements is to use Linux containers. These "light virtual machines" allow to encapsulate applications within its...