HEPiX Spring 2022 online Workshop

Europe/Zurich
Peter van der Reest, Tony Wong
Description

HEPiX Spring 2022 online Workshop

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.

Participants
  • Abel Cabezas Alonso
  • Abhishek Lekshmanan
  • Achim Gsell
  • Adrian Marszalik
  • Ahmed KHOUDER
  • Ajit Mohapatra
  • Akram Khan
  • Al Lilianstrom
  • Alba Vendrell Moya
  • ALBERT ROSSI
  • Aleksandra Wardzinska
  • Alexander Finch
  • Alexander Trautsch
  • Alexandr Zaytsev
  • Alison Packer
  • Alison Peisker
  • Andre Ihle
  • Andrea Chierici
  • Andrea Sciabà
  • Andreas Haupt
  • Andreas Joachim Peters
  • Andreas Klotz
  • Andreas Petzold
  • Andreas Wagner
  • Andrei Dumitru
  • Andreu Pacheco Pages
  • Andrew Bohdan Hanushevsky
  • Andrew Pickford
  • Anil Panta
  • Anirudh Goel
  • Antonín Dvořák
  • Aresh Vedaee
  • Artur Il Darovic Gottmann
  • Attilio De Falco
  • Bas Kreukniet
  • Bastian Neuburger
  • Benjamin Mare
  • Benjamin Smith
  • Benoit DELAUNAY
  • Benoit Million
  • Bertrand SIMON
  • Birgit Lewendel
  • Bo Jayatilaka
  • Bonnie King
  • Brian Davies
  • Bruno Hoeft
  • Bryan Hess
  • Caio Costa
  • Carles Acosta-Silva
  • Carlos Perez Dengra
  • Carmelo Pellegrino
  • Cedric Caffy
  • chaoqi guo
  • Chris Prosser
  • Christian Wolbert
  • Christine Apfel
  • Christoph Beyer
  • Christopher Hollowell
  • Christopher Huhn
  • Dagmar Adamova
  • Daniel Fischer
  • Daniel Fischer
  • Daniel Juarez
  • Daria Phoebe Brashear
  • Dario Graña
  • David Cohen
  • David Crooks
  • David Kelsey
  • David Southwick
  • Dejan Lesjak
  • Denis Pugnere
  • Dennis van Dok
  • Derek Feichtinger
  • Devin Bougie
  • Di Qing
  • Diego Morenza Vazquez
  • Dino Conciatore
  • Dirk Hagemann
  • Dirk Jahnke-Zumbusch
  • Dmitry Litvintsev
  • Domenico Giordano
  • Dorin Lobontu
  • Doris Ressmann
  • Doug Benjamin
  • Edith Knoops
  • Edmar Stiel
  • Edoardo Martelli
  • Eduard Cuba
  • Elena Gazzarrini
  • Elena Planas
  • Elisabet Carrasco Santos
  • Elizabeth Sexton-Kennedy
  • Emil Kleszcz
  • Enrico Bocchi
  • Eric Fede
  • Eric Grancher
  • Eric Vaandering
  • Eric Yen
  • Esther Accion
  • Eva Dafonte Perez
  • Evelina Buttitta
  • Fabien WERNLI
  • Fabrice Le Goff
  • FEDERICO CALZOLARI
  • Felix.hung-te Lee
  • Fons Rademakers
  • Francisco Centeno
  • Frederik Ferner
  • Frederique Chollet
  • Gang Chen
  • Garhan Attebury
  • George Patargias
  • Gerard Hand
  • German Cancio
  • Gerry Seidman
  • Gianfranco Sciacca
  • Gino Marchetti
  • Giuseppe Lo Presti
  • Glenn Cooper
  • Go Iwai
  • Gonzalo Menendez Borge
  • Gonzalo Merino Arevalo
  • Grzegorz Sułkowski
  • Götz Waschk
  • Harald Falkenberg
  • Helge Meinhard
  • Helmut Kreiser
  • Hironori Ito
  • Horst Severini
  • Ian Collier
  • Ilona Neis
  • Ingo Ebel
  • Ivo Camargo
  • Jacek Chodak
  • Jack Henschel
  • Jahson BABEL
  • Jakub Granieczny
  • James Acris
  • James Adams
  • James Simone
  • James Thorne
  • James Walder
  • Jan Hornicek
  • Jan Iven
  • Jan van Eldik
  • Jaroslav Kalus
  • Javier Cacheiro López
  • Jayaditya Gupta
  • Jean-Michel Barbet
  • Jeff Derbyshire
  • Jeffrey Altman
  • Jerome Pansanel
  • Jingyan Shi
  • Jiri Chudoba
  • Joao Afonso
  • Joao Pedro Lopes
  • Joaquim Santos
  • John Gordon
  • Jordi Casals
  • Jordi Salabert
  • Jorge Camarero Vera
  • Jose Caballero Bejar
  • Jose Carlos Luna
  • Jose Flix Molina
  • joshua kitenge
  • José Fernando Mandeur Díaz
  • João Marques
  • Joël Surget
  • Juan Manuel Guijarro
  • Julien Leduc
  • Jürgen Hannappel
  • Karim El Aammari
  • Karl Amrhein
  • Kars Ohrenberg
  • Katy Ellis
  • Kees de Jong
  • Klaus Steinberger
  • Klemens Noga
  • Konstantin Olchanski
  • Krzysztof Oziomek
  • Kyle Pidgeon
  • Laura Hild
  • Laurent Caillat-Vallet
  • Lea Morschel
  • Lei Wang
  • Leslie Groer
  • Liam Atherton
  • Liviu Valsan
  • Lorena Lobato Pardavila
  • Lubos Kopecky
  • Luca Mascetti
  • Ludovic DUFLOT
  • Maarten Litmaath
  • Maciej Pawlik
  • Manfred Alef
  • Manuel Giffels
  • Manuel Reis
  • Marco Mambelli
  • Marcus Ebert
  • Marek Szuba
  • Maria Arsuaga Rios
  • Marina Sahakyan
  • Markus Magdziorz
  • Markus Schelhorn
  • Martin Bly
  • martin flemming
  • Martin Gasthuber
  • Mary Hester
  • Matt Doidge
  • Matthew Heath
  • Matthew Snyder
  • Matthias Jochen Schnepf
  • Mattias Wadenstein
  • Mattieu Puel
  • Maurizio De Giorgi
  • Max Fischer
  • Michael Davis
  • Michael Leech
  • Michaela Barth
  • Michal Kamil Simon
  • Michal Strnad
  • Michel Jouvin
  • Michel Salim
  • Michele Michelotto
  • Miguel Angel Valero Navarro
  • Mihai Patrascoiu
  • Milan Danecek
  • Milos Lokajicek
  • Miltiadis Gialousis
  • Mouflier Guillaume
  • Mozhdeh Farhadi
  • Murray Collier
  • Mwai Karimi
  • Nadine Neyroud
  • Nikos Tsipinakis
  • Ofer Rind
  • Oleg Sadov
  • Oliver Freyermuth
  • Oliver Keeble
  • Olivier Restuccia
  • Onno Zweers
  • Owen Synge
  • Pablo Saiz
  • Pascal Paschos
  • Patrick Ihle
  • Patrick Riehecky
  • Patryk Lason
  • Peter Gronbech
  • Peter van der Reest
  • Peter Wienemann
  • Petya Vasileva
  • Pierre Emmanuel BRINETTE
  • Pierre-Francois Honore
  • Preslav Konstantinov
  • Qiulan Huang
  • Rafael Arturo Rocha Vidaurri
  • Ramon Escribà
  • Randall Sobie
  • Richard Bachmann
  • Richard Parke
  • Robert Appleyard
  • Robert Frank
  • Robert Illingworth
  • Robert Vasek
  • Roberto Valverde Cameselle
  • Robin Hofsaess
  • Ron Trompert
  • Rose Cooper
  • Ruben Domingo Gaspar Aparicio
  • Ryu Sawada
  • sabah salih
  • Samuel Ambroj Pérez
  • Samuel Bernardo
  • Saroj Kandasamy
  • Sebastian Lopienski
  • Sebastien Gadrat
  • Sergey Chelsky
  • Sergi Puso Gallart
  • Shawn Mc Kee
  • Shun Emilio Morishima Morelos
  • Simon George
  • Sokratis Papadopoulos
  • Sophie Catherine Ferry
  • Stefan Dietrich
  • Stefan Lueders
  • Stefan Piperov
  • Stefano Dal Pra
  • Stephan Wiesand
  • Svenja Meyer
  • Thomas Bellman
  • Thomas Birkett
  • Thomas Hartland
  • Thomas Hartmann
  • Thomas Kress
  • Thomas Roth
  • Tigran Mkrtchyan
  • Tim Bell
  • Tim Skirvin
  • Tim Wetzel
  • Timothy Noble
  • Tina Friedrich
  • Tom Dack
  • Tomas Lindén
  • Tomoaki Nakamura
  • Tony Cass
  • Tony Wong
  • Tristan Sullivan
  • Trivan Pal
  • Troy Dawson
  • Ulrich Schwickerath
  • Vanessa Acín Portella
  • Vanessa HAMAR
  • Vicky HUANG
  • Victor Mendoza
  • Vincent Garonne
  • Volodymyr Savchenko
  • Wade Hong
  • Wassef Karimeh
  • Wayne Salter
  • Wei Yang
  • Werner Sun
  • Xuantong Zhang
  • Yaosong Cheng
  • Yingzi Wu
  • Yujiang Bi
  • Yujun Wu
  • Zacarias Benta
  • Zhechka Toteva
  • Zhihua Dong
    • Miscellaneous: Welcome & Logistics Online workshop

      Online workshop

      Convener: Peter van der Reest
    • Site Reports Online workshop

      Online workshop

      • 2
        PIC report

        This is the PIC report for HEPiX Spring 2022 Workshop

        Speaker: Jose Flix Molina (Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
      • 3
        CERN Site Report

        News from CERN since the last HEPiX workshop.

        Speaker: Andrei Dumitru (CERN)
      • 4
        ASGC site report

        ASGC site report

        Speaker: Felix.hung-te Lee (Academia Sinica (TW))
      • 5
        KEK Site Report

        The KEK Central Computer System (KEKCC) is a computer service and facility that provides large-scale computer resources, including Grid and Cloud computing systems and essential IT services, such as e-mail and web services.

        Following the procurement policy for the large scale computer system requested by the Japanese government, we replace the entire KEKCC every four or sometimes five years. The current system has replaced the previous system and has been in operation since September 2020, and decommissioning will start in early 2024.

        During about 20 months of operation in the current system, we have decommissioned some legacy Grid services, like LFC, and migrated some Grid services to the newer operating system, CentOS7. In this talk, we would like to share our experiences and challenges regarding Grid services introduced in the KEKCC. Also, we pick up some issues to be addressed in the future.

        Speaker: Go Iwai (KEK)
      • 6
        IHEP Site Report

        Site Report about Computing platform update and support systems development at IHEP during the past half year

        Speaker: Yaosong Cheng
    • 10:25 AM
      Coffee Break
    • Storage & File Systems Online workshop

      Online workshop

      • 7
        ANTARES: The new tape archive service at RAL Tier-1

        The new tape archive service, ANTARES (A New Tape ArchivE for STFC), at RAL Tier-1 went into production on 4th March, 2022. The service is provisioned with EOS-CTA, developed at CERN. The EOS cluster, a “thin” SSD buffer, manages incoming namespace requests and CTA provides the tape back-end system responsible for the scheduling and execution of tape archival and retrieval operations. In this talk, we summarise the almost two years’ worth of effort to set up and test ANTARES, describe the procedure followed to carry out the migration of RAL Tier-1 data from CASTOR and discuss the service’s performance during the last two WLCG-wide tape challenges.

        Speaker: George Patargias (STFC)
      • 8
        A new Ceph deployment using Cephadm at RAL

        Increasing user demand for file-based storage, provided by the STFC Cloud at RAL, has motivated the production of a new shared file system service based on OpenStack Manila. The service will be backed by a new all-SSD Ceph cluster, ‘Arided’, deployed using the Cephadm orchestrator. This talk will provide a brief overview of our experience deploying a test instance of this service using a containerised Ceph cluster, and of the potential administrative benefits by doing so.

        Speaker: Kyle Pidgeon
    • Miscellaneous: Welcome & Logistics Online workshop

      Online workshop

      Convener: Tony Wong
    • Site Reports Online workshop

      Online workshop

    • Networking & Security Online workshop

      Online workshop

      • 14
        zkpolicy: ZooKeeper Policy Audit Tool

        The interest in using big data solutions based on Hadoop, Kafka and Spark ecosystem is constantly growing in the HEP community, in particular, for use cases related to data analytics and data warehousing. Many distributed system services use Zookeeper as their means for coordination and metadata storage. However, on many occasions, this service is either deployed insecurely or easily becomes a vulnerable setup.

        In this context, we developed zkpolicy, an opensource tool for Zookeeper metadata auditing and policy enforcement.
        The tool allows validating the ownership and ACLs of the information stored in this metadata service with the ability to align with a pre-defined policy. Zkpolicy is currently used in production by the IT department at CERN providing more security and best practices for Kafka and Hadoop central services.

        In this presentation, I will present the zkpolicy tool, the motivation for its development and use cases at CERN and beyond.

        Speaker: Emil Kleszcz (CERN)
      • 15
        Computer Security Landscape Update

        This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks and compromises in the academic community including lessons learned, presents interesting recent attacks while providing recommendations on how to best protect ourselves.

        Speaker: Daniel Fischer (CERN)
      • 16
        Collaborative incident response and threat intelligence

        The threat faced by the research and education sector from determined and well-resourced cyber attackers has been growing in recent years and is now acute. A vital means of better protecting ourselves is to share threat intelligence - key Indicators of Compromise of an ongoing incidents including network observables and file hashes - with trusted partners. We must also deploy the technical means to actively use this intelligence in the defence of our facilities, including a robust, fine-grained source of network monitoring. The combination of these elements along with storage, visualisation and alerting is called a Security Operations Centre (SOCs).

        We report on recent progress of the SOC WG, mandated to create reference designs for these SOCs, with particular attention to work being carried out at multiple 100Gb/s sites to deploy these technologies and a proposal to leverage passive DNS in order to further assist sites of various sizes to improve their security stance.

        We discuss the plans for this group for the coming year and the importance of acting together as a community to defend against these attacks.

        Speaker: Dr David Crooks (UKRI STFC)
    • Miscellaneous: Group Photo Online workshop

      Online workshop

    • 10:20 AM
      Coffee Break
    • End-User IT Services & Operating Systems Online workshop

      Online workshop

      • 17
        The new CERN Web Services Portal

        CDA-WF provides a central hosting infrastructure for Websites and Web applications as well as central web services for collaborative development and projects. In view of the ongoing consolidation of the hosting infrastructure on a common platform, the next generation of OpenShift, called OKD4, the new CERN Web Services Portal was designed and developed to facilitate the management of Web sites and Web applications. In addition to provide a modern and user-friendly interface it features also improved recommendations for services by classification of tools in categories to help navigate the current portfolio.

        Speaker: Aleksandra Wardzinska (CERN)
    • Computing & Batch Services Online workshop

      Online workshop

      • 18
        Rebalancing the HTCondor fairshare for mixed workloads

        The INFN Tier-1 data centre is the main italian computing site for scientific communities on High Energy Physics and astroparticle research. Access to the resources is arbitrated by a HTCondor batch system which is in charge of balancing the overall usage by several competing user groups according to their agreed quotas. The different workloads submitted to the computing cluster is highly heterogeneous and a vast set of different requirements is to be considered by the batch system in order to provide user groups with a satisfactory fair share over the available resources. To prevent or reduce usage disparities a system to self adjust imbalances has been developed and it is being used with satisfactory results. This work explain how and when fair share implementations can miss optimal performances and describes a general method to improve them. Results of the current solution are presented and possible further developments are discussed.

        Speaker: Stefano Dal Pra (Universita e INFN, Bologna (IT))
    • Networking & Security Online workshop

      Online workshop

      • 19
        Update from the HEPiX IPv6 working group

        During the last 6 months the HEPiX IPv6 working group has continued to encourage the deployment of dual-stack IPv4/IPv6 services. We also recommend dual-stack clients (worker nodes etc). Many data transfers are happening today over IPv6 but it is still true that many are not! This talk will present our recent activities including our investigations for the reasons behind ongoing use of IPv4 as well as planning for the move to an IPv6-only core WLCG.

        Speaker: David Kelsey (Science and Technology Facilities Council STFC (GB))
      • 20
        Research Networking Technical WG Status and Plans

        The high-energy physics community, along with the WLCG sites and Research and Education (R&E) networks are collaborating on network technology development, prototyping and implementation via the Research Networking Technical working group (RNTWG). As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon.

        In this talk we’ll give an update on the Research Networking Technical working group activities, challenges and recent updates. In particular we’ll focus on the flow labeling and packet marking technologies (scitags), tools and approaches that have been identified as important first steps for the work of the group.

        Speaker: Shawn Mc Kee (University of Michigan (US))
    • Miscellaneous: Group Photo Online workshop

      Online workshop

    • 4:55 PM
      Coffee Break
    • Computing & Batch Services Online workshop

      Online workshop

      • 21
        Status and prospects of the WLCG HEP-SCORE deployment task force

        We will report on the status and the future plans of the WLCG HEP-SCORE deployment task force.

        Speaker: Helge Meinhard (CERN)
      • 22
        Benchmarking Working Group activities

        The HEPiX working group has been very active in the past months to find a replacement for HS06. The WG is working in strict contact with the WLCG HEPscore deployment task force. This talk will focus on the technical aspect of the new benchmark and on the framework of the Benchmarking Suite in particular the analysis of the last results.

        Speaker: Dr Michele Michelotto (Universita e INFN, Padova (IT))
      • 23
        HEPCloud, an elastic virtual cluster from heterogeneous computing resources

        Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today.
        The current computing landscape is more heterogeneous because of the elevated capacity and capability of commercial clouds and the push of funding agencies toward supercomputers. Both add new complications. Commercial cloud resources are highly virtualized and customizable but need to be managed. High Performance Computers are each one of a kind with different access rules and restrictions, like limited network connectivity or complex access patterns.
        HEPCloud is a single managed portal that allows more scientists, experiments, and projects to use more resources to extract more science. Its goal is to provide cost-effective access by optimizing usage across all available types of computing resources and elastically expand the resource pool on short notice (e.g. by renting temporary resources on commercial clouds).
        Fermilab HEPCloud facility has been used successfully in production for over three years providing and 2021 saw a big ramp up, especially for CMS that used all its Frontera quota 6 months ahead of expiry and used 90M NERSC-hours bonus after consuming all its allocation.
        The Decision Engine is the software at the heart of HEPCloud, deciding where and how much to provision. It is an open-source project (https://github.com/HEPCloud/decisionengine) and recently version 2.0 was released, a release that we consider ready for wider adoption: it has a simplified installation and configuration, it is fully Python 3 code with strict coding best practices, it has a revised architecture with robust message passing between the components making the decisions.

        Speaker: Marco Mambelli (Fermilab (US))
    • End-User IT Services & Operating Systems Online workshop

      Online workshop

      • 24
        Tracking Kernel Rate of Change

        How fast is the Stream8 kernel moving? How do we tell? What can we learn from this information?

        Speaker: Patrick Riehecky (Fermi National Accelerator Lab. (US))
    • Storage & File Systems Online workshop

      Online workshop

      • 25
        EOS Report, Evolution & Strategy

        The presentation will summarize highlights from the 6th EOS workshop and discuss evolution of EOS services and the development roadmap during Run-3.

        Speaker: Andreas Joachim Peters (CERN)
      • 26
        IO Shaping in EOS

        EOS services are used by large user communities and in many cases exposed and operated as a very large shared resource - though the criticality of individual IO activities varies. To give operational handles to shape data access by activity we have recently added support for direct IO, IO priorities, bandwidth policies and filesystem stream overload protection. For meta-data access EOS provides user specific configurable thread-pool and meta-data operation frequency limits.
        The presentation will discuss how these can be used and configured for production services.

        Speaker: Andreas Joachim Peters (CERN)
      • 27
        Third-party-copy transfer service status of JUNO experiment

        Jiangmen Underground Neutrino Observatory (JUNO) is an under-construction neutrino experiment located in Jiangmen, China, which is expected to generate about 3 PB experimental data per year. JUNO plan to share those data to all JUNO collaborators from 4 main data centers in China, France, Italy and Russia.
        Distributed data management system with Third-Party-Copy (TPC) data transfer support is introduced and developed for JUNO experiment. This talk will report our status and experience of third-party-copy service in JUNO distributed system, including HTTP-TPC deployments on data centers with different storage systems, token-based data authentication with macaroon and sci-tokens, operation test results for data transfer service. A system developed for monitoring all JUNO data centers TPC performance will alse be introduced in this talk.

        Speaker: Xuantong Zhang (Chinese Academy of Sciences (CN))
    • 10:15 AM
      Coffee Break
    • Storage & File Systems Online workshop

      Online workshop

      • 28
        bulkrequests: a simple tool for managing file QoS on top of dCache REST API

        Bulkrequests is a small tool that communicates with dCache through its REST API. It arises from the need to be able to consult and modify in a massive way the qos and locality of files stored on tape, such as to pin or unpin a set of files to/from disk as required. It was designed to cover this need in a simple way through a command line tool waiting for the new dCache bulk REST API that incorporates the processing of this type of requests. This tool is based on an existing development called dcacheclient (https://github.com/neicnordic/dcacheclient), supports the same authentication methods, and uses particularly the namespace section to query and change qos and locality. It will be reformulated to use the new bulk REST API when it becomes available.

        Speaker: Dario Graña (IATE - CONICET)
      • 29
        CERN’s Run 3 Tape Infrastructure

        LHC Run 3 is imposing unprecedented data rates on the tape infrastructure at CERN T0. Here we report on the nature of the challenge in terms of performance and reliability, on the hardware we have procured, and how it is deployed, configured and managed. We share details of our experience with the technology selected, a mix of IBM and SpectraLogic libraries and Enterprise and LTO drives. In particular, LTO-9 is a new technology and we cover low level details including media initialisation and its native Recommended Access Order (RAO). We conclude with an outlook on the likely evolution of the infrastructure.

        Speaker: Richard Bachmann (CERN)
      • 30
        The CERN Tape Archive (CTA) - running Tier 0 tape

        During the ongoing long shutdown, all elements in LHC data-taking have been upgraded. As the last step in the T0 data-taking chain, the CERN Tape Archive (CTA) has done its homework and redesigned its full architecture in order to match LHC Run 3 data rates.

        This contribution will give an overview of the CTA service and how it has been deployed in production. We discuss the measures taken to assess and improve its performance and efficiency against various workflows, especially the latest data challenges realised on T0 tape endpoints. We illustrate the monitoring and alerting which is required to maintain performance and reliability during operations, and discuss the outlook for service evolution.

        Speaker: Julien Leduc (CERN)
    • IT Facilities & Business Continuity Online workshop

      Online workshop

      • 31
        Next business day, or whenever we can

        We've operated data center hardware from various major vendors in the last two decades. For most systems we took out expensive support contracts for three to five years so defective hardware (memory, hard drives, motherboards) would be replaced in one business day.
        In recent years we have been noticing a considerable drop in the quality of delivering this support, where suppliers were unable to fulfill their obligations on time for various reasons.
        We'll discuss the probable causes behind this decline, the implications for our operations, and the way we can address this going forward.

        Speaker: Mr Dennis van Dok
    • Storage & File Systems Online workshop

      Online workshop

      • 32
        dCache integration with CTA

        The ever increasing amount of data that is produced by modern scientific facilities like EuXFEL or LHC puts a high pressure on the data management infrastructure at the laboratories. This includes poorly shareable resources of archival storage, typically, tape libraries. To achieve maximal efficiency of the available tape resources a deep integration between hardware and software components are required.

        The CERN Tape Archive (CTA) is an open-source storage management system developed by CERN to manage LHC experiment data on tape. Although today CTA's primary target is CERN Tier-0, the data management group at DESY considers the CTA as a main alternative to commercial HSM systems.

        dCache has an exible tape interface which allows connectivity to any tape system. There are two ways that a le can be migrated to tape. Ether dCache calls a tape system specific copy command or through interaction via an in-dCache tape system specific driver. The latter has been shown (by TRIUMF and KIT Tier-1s), to provide better resource utilization and efficiency. Together with the CERN Tape Archive team we are working on seamless integration of CTA into dCache.

        This presentation will show the design of dCache-CTA integration, current status and first test results at DESY.

        Speaker: Mr Tigran Mkrtchyan (DESY)
      • 33
        EOS and XCache data access performance for LHC analysis at CERN

        Physics analysis is done at CERN in several different ways, using both interactive and batch resources and EOS for data storage. In order to understand if and how the CERN computer centre should change the way analysis is supported for Run3, we performed several performance studies on two fronts: measuring the performance and utilisation levels of EOS with respect to the current analysis workloads, and looking at the performance of different storage configurations, including SSD-based and HDD-based XCache instances, with respect to specific, I/O intensive analysis workloads from ATLAS and CMS. The collected results indicate that the current infrastructure is adequate and works well below saturation, and that specific needs can be fulfilled by dedicated high performance/throughput servers. We expect this type of studies to continue and the CERN infrastructure to adapt to the evolving needs of the LHC analysis community.

        Speaker: Dr Andrea Sciabà (CERN)
      • 34
        Open Source Erasure Coding Technologies

        This presentation will provide a short overview and comparison of four available Open Source erasure coding technologies for storage (MINIO, RADOS, EOS, XRootd EC) in the context of the Erasure Coding Working Group.

        Speaker: Andreas Joachim Peters (CERN)
    • 5:15 PM
      Coffee Break
    • Storage & File Systems Online workshop

      Online workshop

      • 35
        XRootD object storage: native EC-based file store and S3 proxy

        Over the last years we have observed increasing importance of object storage in the WLCG community. In this contribution we report on our effort to accommodate object storage use cases within XRootD, a software framework that is a critical component for data access and management at WLCG sites. Firstly, we introduce a high performance erasure coding (EC) based file storage module motivated by the ALICE O2 use case and compatible with any type of XRootD backend storage. Furthermore, we discuss the XRootD proxy for S3 storage and native XRootD EC-based file store that provides the WLCG required data-transfer-node (DTN) facilities like third-party-copy, checksum query, VOMS authentication and access token support.

        Speaker: Michal Kamil Simon (CERN)
      • 36
        Introducing PostgreSQL Table Partitioning to Dcache

        Database systems have been known to deliver impressive performance for large classes of workloads. Nevertheless, database systems with mammoth data sets or high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server and having working set sizes larger than the system's RAM stresses the I/O capacity of disk drives. This presentation will show how we use postgres table partitioning to take the edge off some of these issues to improve performance.

        Speaker: Mwai Karimi
    • IT Facilities & Business Continuity Online workshop

      Online workshop

      • 37
        SDCC Transition to the New Data Center

        The BNL Computing Facility Revitalization (CFR) project aimed at repurposing the former National Synchrotron Light Source (NSLS-I) building (B725) located on BNL site as a new data center for Scientific Data and Computing Center (SDCC). The CFR project finished the design phase in the first half of 2019, completed the construction phase by the end of FY2021, and entered the early occupancy phase in Jun-Aug 2021. The occupancy of the B725 data center for production CPU and DISK resources of the ATLAS experiment at the LHC at CERN, STAR, PHENIX and sPHENIX experiments at RHIC Collider at BNL, the Belle II Experiment at KEK (Japan) started in 2021Q4 and ramped up in 2022Q1 to the level of 40 racks populated with equipment in the B725 Main Data Hall (MDH). At the same time, two library rows in B725 Tape Room were populated with IBM TS4500 tape libraries serving ATLAS and sPHENIX experiments. The occupancy of B725 MDH is expected to further increase to 70 racks by the end of FY2022. The new HPC clusters and storage systems of BNL Computational Science Initiative (CSI) are to be deployed in B725 data center starting from early FY2023 as well. The transition of the SDCC data center environment for using B725 data center for hosting the majority of CPU and DISK resources, and leaving the old (B515 based) data center for hosting predominantly TAPE resources, is expected to continue until the end of FY2023. In this talk I am going to summarize the main design features of the new SDCC datacenter, report on how the transition to B725 data center occupancy was carried out in 2021Q4-2022Q1 time frame, and highlight the plans for scaling up the occupancy and infrastructure utilization for both old and new data centers up to FY2026.

        Speaker: Alexandr Zaytsev (Brookhaven National Laboratory (US))
    • Grid, Cloud & Virtualisation Online workshop

      Online workshop

      • 38
        CERN Cloud Infrastructure - operations and service update

        CERN's private OpenStack cloud offers more than 300,000 cores to over 3,500 users with services for compute, multiple storage types, baremetal, container clusters, and more.
        CERN Cloud Team constantly works on improving these services while maintaining stability and availability that is critical for many services in IT and the experiment workflows.
        This talk will cover the challenges and our approach to high availability , VMs live migrations and monitoring live migration executions.
        Also, this talk will bring an update of the evolution of the cloud service over the past year, and the plans for the upcoming year.

        Speaker: Jayaditya Gupta (CERN)
      • 39
        Anomaly Detection System for the CERN Cloud Monitoring

        As CERN cloud service managers, one of our tasks is to make sure that the desired computational power is delivered to all users of our scientific community. This task is accomplished by monitoring the utilization metrics of each hypervisor and reacting to alarms in case of server saturation to mitigate the interference between VMs.

        In order to maximize the efficiency of our cloud infrastructure and to reduce the monitoring effort for service managers, we have developed an Anomaly Detection System that leverages unsupervised machine learning methods for time series metrics. Moreover, adopting ensemble strategies, we combine traditional and deep learning approaches.

        This contribution presents the design of our Anomaly Detection system, the algorithms exploited and their performance in the daily operation of the CERN cloud. The analytics pipeline relies on open-source tools and frameworks adopted at CERN, such as pyOD, Tensorflow, Spark, Apache Airflow, Grafana, Elasticsearch.

        Speaker: Antonin Dvorak (Czech Academy of Sciences (CZ))
    • 9:50 AM
      Coffee Break
    • Basic IT Services Online workshop

      Online workshop

      • 40
        Transcoding as a Service

        As part of the modernization of the Weblecture service, a new Transcoding infrastructure has been put in place, based on the FOSS product Opencast [1], to cover the needs of the Weblecture and CDS services.

        In this talk we will explain the work done in order to adapt Opencast to CERN workloads, extension of the metadata to operate with Indico, encoding profiles, visualization using the default Opencast player based on Paella [3], intro/outro, trimming, the actual infrastructure the service is running on, future trends of the Opencast project and last but not least how to access the TaaS [2] service illustrating with CDS as a use case.

        [1] https://opencast.org/
        [2] https://taas.docs.cern.ch/
        [3] https://paellaplayer.upv.es/

        Speakers: Miguel Angel Valero Navarro (Valencia Polytechnic University (ES)), Ruben Domingo Gaspar Aparicio (CERN)
      • 41
        Databases @ DESY

        DESY has relied on a central database service based on Oracle for decades.
        With APEX, this service has received an additional impetus in application development. An unlimited license agreement followed.

        However, Oracle is not supported by all applications and the users are looking for alternatives. The pressure from the users is great.
        In order to confirm the great meaning and importance of databases and not to lose their importance in the database business, a change must be made.

        DESY opts for a database group within IT that also focuses on other databases.

        Speaker: Christine Apfel (DESY)
    • Board meeting (closed session) Online workshop

      Online workshop

    • Grid, Cloud & Virtualisation Online workshop

      Online workshop

      • 42
        Getting FTS at CERN ready for LHC Run3

        The File Transfer Service (FTS) is responsible for distributing the majority of the LHC data across the WLCG infrastructure. FTS schedules and executes data transfers, maximizing the use of available network and storage resources whilst easing the complexity of the grid environment by masking the details of the different underlying transfer protocols and storage endpoints.

        The FTS service is used by more than 30 experiments within the WLCG. In 2021 FTS transferred more than one billion files across various WLCG sites, adding up to more than one exabyte of data. With Run3 rapidly approaching, the CERN service has shifted focus on service consolidation, aiming for increased reliability, ease of operation and built-in service-health monitoring. The software stack has been modernized to facilitate this consolidation, most notably this has included the replacement of all Python2 components by their new Python3 counterparts.

        This presentation will share the lessons learnt and the improvements accomplished whilst preparing the FTS service for LHC Run 3. In particular the presentation will cover the new log-based monitoring service and the new database deployment strategy. An overview of the software improvements will also be given.

        Speaker: Joao Pedro Lopes
      • 43
        Updates on the Integration of the JLAB Computing and Storage resources with the OSG Cyberinfrastructure in support of collaborative research

        Several enhancements have been introduced in the Jefferson Lab infrastructure to increase the robustness of the existing integration with computing pools for a number of collaborations doing experimental research in High Energy Physics. JLAB has provisioned access, entry, and execution points which allow the multiple collaboration users at the facility to submit HTCondor jobs to various pools and accept jobs submitted from other facilities to run in its computing farm. Jefferson Lab has completed infrastructure enhancements in support of multi-VO Open Science grid operations for CLAS12, EIC, GlueX, and MOLLER. Two networks were established for grid-facing services: A Science DMZ network for data transfer nodes outside the firewall and a science portals network for less data-intensive services that benefit from application layer firewalling. The Lab’s existing 2x10Gbit ESNet connections are being upgraded to 2x100Gbit in 2022, which will result in the capability for end-to-end flows supporting reconstruction in addition to simulations. With the system and network upgrades in place, work is in progress on the infrastructure for SciTokens, which is essential for authorization and authentication using federated identities. Work at present involves CILogon, OSG, and JLab, and aims at using SciTokens in HTCondor jobs to support VO-differentiated access to storage on the Science DMZ, both through Open Science Data Federation (OSDF) and to dedicated storage resources.

        Speakers: Mr Bryan Hess (Jefferson Lab), Dr Paschalis Paschos
    • 4:50 PM
      Coffee Break
    • Basic IT Services Online workshop

      Online workshop

      • 44
        Moving from Elasticsearch to OpenSearch at CERN

        The centralised Elasticsearch service has already been running at CERN for over 6 years, providing the search and analytics engine for numerous CERN users, supporting various aspects of the High Energy Physics community. The service has been based on the open-source version of Elasticsearch, surrounded by a set of external open-source plugins offering security, multi-tenancy, extra visualization types and more. Motivated by the recent license change of Elasticsearch and by the streamlined deployment of the feature-rich OpenSearch project as a 100% open-source environment, the decision was taken to migrate the service at CERN towards it. This presentation covers the motivation, design and implementation of this change, the current state and the future plans of the service.

        Speaker: Mr Sokratis Papadopoulos (Ministere des affaires etrangeres et europeennes (FR))
    • Miscellaneous: Workshop wrap-up Online workshop

      Online workshop

      Convener: Tony Wong