HEPiX Spring 2022 online Workshop

Name: HEPiX Spring 2022 online Workshop
Start: 2022-04-25T09:00:00+02:00
End: 2022-04-29T20:00:00+02:00
Location: No location set

25 Apr 2022, 09:00 → 29 Apr 2022, 20:00 Europe/Zurich

Peter van der Reest, Tony Wong

Description

HEPiX Spring 2022 online Workshop

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.

Organisers

hepix-conference-support@hepix.org

Participants

329 View full list

Monday 25 April
- Miscellaneous: Welcome & Logistics Online workshop
  
  Online workshop
  
  Convener: Peter van der Reest
  - 1
    
    Welcome & Logistics
    
    Speakers: Peter van der Reest, Tony Wong
    
    HEPIX_Opening_Spring2022.pdf
    
    HEPIX_Opening_Spring2022.pptx
- Site Reports Online workshop
  
  Online workshop
  - 2
    
    PIC report
    
    This is the PIC report for HEPiX Spring 2022 Workshop
    
    Speaker: Jose Flix Molina (Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
    
    HEPiX_Spring_2022_PIC_Report_JFlix.pdf
    
    Video recording
  - 3
    
    CERN Site Report
    
    News from CERN since the last HEPiX workshop.
    
    Speaker: Andrei Dumitru (CERN)
    
    CERN Site Report - HEPiX Spring 2022.pdf
    
    Video recording
  - 4
    
    ASGC site report
    
    ASGC site report
    
    Speaker: Felix.hung-te Lee (Academia Sinica (TW))
    
    ASGC-site-report-v2.odp
    
    ASGC-site-report-v2.pdf
    
    Video recording
  - 5
    
    KEK Site Report
    
    The KEK Central Computer System (KEKCC) is a computer service and facility that provides large-scale computer resources, including Grid and Cloud computing systems and essential IT services, such as e-mail and web services.
    
    Following the procurement policy for the large scale computer system requested by the Japanese government, we replace the entire KEKCC every four or sometimes five years. The current system has replaced the previous system and has been in operation since September 2020, and decommissioning will start in early 2024.
    
    During about 20 months of operation in the current system, we have decommissioned some legacy Grid services, like LFC, and migrated some Grid services to the newer operating system, CentOS7. In this talk, we would like to share our experiences and challenges regarding Grid services introduced in the KEKCC. Also, we pick up some issues to be addressed in the future.
    
    Speaker: Go Iwai (KEK)
    
    hepix-spring-2022-kek-site-report-iwai-v3.pdf
    
    Video recording
  - 6
    
    IHEP Site Report
    
    Site Report about Computing platform update and support systems development at IHEP during the past half year
    
    Speaker: Yaosong Cheng
    
    Hepix-2022-IHEP Site Report.pdf
    
    Video recording
- 10:25
  
  Coffee Break
- Storage & File Systems Online workshop
  
  Online workshop
  - 7
    
    ANTARES: The new tape archive service at RAL Tier-1
    
    The new tape archive service, ANTARES (A New Tape ArchivE for STFC), at RAL Tier-1 went into production on 4th March, 2022. The service is provisioned with EOS-CTA, developed at CERN. The EOS cluster, a “thin” SSD buffer, manages incoming namespace requests and CTA provides the tape back-end system responsible for the scheduling and execution of tape archival and retrieval operations. In this talk, we summarise the almost two years’ worth of effort to set up and test ANTARES, describe the procedure followed to carry out the migration of RAL Tier-1 data from CASTOR and discuss the service’s performance during the last two WLCG-wide tape challenges.
    
    Speaker: George Patargias (STFC)
    
    Antares_HEPiX2022.pdf
    
    Video recording
  - 8
    
    A new Ceph deployment using Cephadm at RAL
    
    Increasing user demand for file-based storage, provided by the STFC Cloud at RAL, has motivated the production of a new shared file system service based on OpenStack Manila. The service will be backed by a new all-SSD Ceph cluster, ‘Arided’, deployed using the Cephadm orchestrator. This talk will provide a brief overview of our experience deploying a test instance of this service using a containerised Ceph cluster, and of the potential administrative benefits by doing so.
    
    Speaker: Kyle Pidgeon
    
    Ceph deployment with Cephadm.pdf
    
    Ceph deployment with Cephadm.pptx
    
    Video recording
- Miscellaneous: Welcome & Logistics Online workshop
  
  Online workshop
  
  Convener: Tony Wong
  - 9
    
    Welcome & Logistics
    
    Speakers: Peter van der Reest, Tony Wong
    
    HEPIX_Opening_Spring2022.pdf
    
    HEPIX_Opening_Spring2022.pptx
- Site Reports Online workshop
  
  Online workshop
  - 10
    
    RAL Site Report
    
    An update on developments at RAL
    
    Speaker: Martin Bly (STFC-RAL)
    
    2022-04 HEPiX Virtual Workshop Spring 2022 - RAL Site Report.pdf
    
    Video recording
  - 11
    
    Fermilab Site Report
    
    Status, ongoing activities, and future directions at Fermilab.
    
    Speaker: Rafael Arturo Rocha Vidaurri
    
    HEPiX Fermilab site report.pdf
    
    Video recording
  - 12
    
    BNL Site Report
    
    Updates and news since the Fall 2021 meeting
    
    Speaker: Qiulan Huang (Brookhaven National Laboratory (US))
    
    BNL Scientific Data and Computing Center (SDCC) Site Report.pdf
    
    Video recording
  - 13
    
    Diamond Light Source - site report
    
    This presentation will report on activities/developments at Diamond since the last HEPIX
    
    Speaker: Frederik Ferner
    
    dls-site-report-hepix-2022.pdf
    
    Video recording
Tuesday 26 April
- Networking & Security Online workshop
  
  Online workshop
  - 14
    
    zkpolicy: ZooKeeper Policy Audit Tool
    
    The interest in using big data solutions based on Hadoop, Kafka and Spark ecosystem is constantly growing in the HEP community, in particular, for use cases related to data analytics and data warehousing. Many distributed system services use Zookeeper as their means for coordination and metadata storage. However, on many occasions, this service is either deployed insecurely or easily becomes a vulnerable setup.
    
    In this context, we developed zkpolicy, an opensource tool for Zookeeper metadata auditing and policy enforcement.
    The tool allows validating the ownership and ACLs of the information stored in this metadata service with the ability to align with a pre-defined policy. Zkpolicy is currently used in production by the IT department at CERN providing more security and best practices for Kafka and Hadoop central services.
    
    In this presentation, I will present the zkpolicy tool, the motivation for its development and use cases at CERN and beyond.
    
    Speaker: Emil Kleszcz (CERN)
    
    HEPiX_Spring_2022_zkpolicy_Emil_Kleszcz.pdf
    
    Video recording
  - 15
    
    Computer Security Landscape Update
    
    This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks and compromises in the academic community including lessons learned, presents interesting recent attacks while providing recommendations on how to best protect ourselves.
    
    Speaker: Daniel Fischer (CERN)
    
    Computer Security Update.pdf
    
    Video recording
  - 16
    
    Collaborative incident response and threat intelligence
    
    The threat faced by the research and education sector from determined and well-resourced cyber attackers has been growing in recent years and is now acute. A vital means of better protecting ourselves is to share threat intelligence - key Indicators of Compromise of an ongoing incidents including network observables and file hashes - with trusted partners. We must also deploy the technical means to actively use this intelligence in the defence of our facilities, including a robust, fine-grained source of network monitoring. The combination of these elements along with storage, visualisation and alerting is called a Security Operations Centre (SOCs).
    
    We report on recent progress of the SOC WG, mandated to create reference designs for these SOCs, with particular attention to work being carried out at multiple 100Gb/s sites to deploy these technologies and a proposal to leverage passive DNS in order to further assist sites of various sizes to improve their security stance.
    
    We discuss the plans for this group for the coming year and the importance of acting together as a community to defend against these attacks.
    
    Speaker: Dr David Crooks (UKRI STFC)
    
    SOC-WG-HEPIX-APRIL-2022.pdf
    
    Video recording
- Miscellaneous: Group Photo Online workshop
  
  Online workshop
- 10:20
  
  Coffee Break
- End-User IT Services & Operating Systems Online workshop
  
  Online workshop
  - 17
    
    The new CERN Web Services Portal
    
    CDA-WF provides a central hosting infrastructure for Websites and Web applications as well as central web services for collaborative development and projects. In view of the ongoing consolidation of the hosting infrastructure on a common platform, the next generation of OpenShift, called OKD4, the new CERN Web Services Portal was designed and developed to facilitate the management of Web sites and Web applications. In addition to provide a modern and user-friendly interface it features also improved recommendations for services by classification of tools in categories to help navigate the current portfolio.
    
    Speaker: Aleksandra Wardzinska (CERN)
    
    HEPIX2022-WebServicePortal.pdf
    
    Video recording
- Computing & Batch Services Online workshop
  
  Online workshop
  - 18
    
    Rebalancing the HTCondor fairshare for mixed workloads
    
    The INFN Tier-1 data centre is the main italian computing site for scientific communities on High Energy Physics and astroparticle research. Access to the resources is arbitrated by a HTCondor batch system which is in charge of balancing the overall usage by several competing user groups according to their agreed quotas. The different workloads submitted to the computing cluster is highly heterogeneous and a vast set of different requirements is to be considered by the batch system in order to provide user groups with a satisfactory fair share over the available resources. To prevent or reduce usage disparities a system to self adjust imbalances has been developed and it is being used with satisfactory results. This work explain how and when fair share implementations can miss optimal performances and describes a general method to improve them. Results of the current solution are presented and possible further developments are discussed.
    
    Speaker: Stefano Dal Pra (Universita e INFN, Bologna (IT))
    
    htc_fairshare_hepix_04_2022.pdf
    
    Video recording
- Networking & Security Online workshop
  
  Online workshop
  - 19
    
    Update from the HEPiX IPv6 working group
    
    During the last 6 months the HEPiX IPv6 working group has continued to encourage the deployment of dual-stack IPv4/IPv6 services. We also recommend dual-stack clients (worker nodes etc). Many data transfers are happening today over IPv6 but it is still true that many are not! This talk will present our recent activities including our investigations for the reasons behind ongoing use of IPv4 as well as planning for the move to an IPv6-only core WLCG.
    
    Speaker: David Kelsey (Science and Technology Facilities Council STFC (GB))
    
    kelsey26apr22.pdf
    
    kelsey26apr22.pdf
    
    kelsey26apr22.pptx
    
    Video recording
  - 20
    
    Research Networking Technical WG Status and Plans
    
    The high-energy physics community, along with the WLCG sites and Research and Education (R&E) networks are collaborating on network technology development, prototyping and implementation via the Research Networking Technical working group (RNTWG). As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon.
    
    In this talk we’ll give an update on the Research Networking Technical working group activities, challenges and recent updates. In particular we’ll focus on the flow labeling and packet marking technologies (scitags), tools and approaches that have been identified as important first steps for the work of the group.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    HEPiX Spring 2022 - RNTWG Update.pdf
    
    Video recording
- Miscellaneous: Group Photo Online workshop
  
  Online workshop
- 16:55
  
  Coffee Break
- Computing & Batch Services Online workshop
  
  Online workshop
  - 21
    
    Status and prospects of the WLCG HEP-SCORE deployment task force
    
    We will report on the status and the future plans of the WLCG HEP-SCORE deployment task force.
    
    Speaker: Helge Meinhard (CERN)
    
    2022-04-26-HEPiX-HEPScoreTFReport.pdf
    
    Video recording
  - 22
    
    Benchmarking Working Group activities
    
    The HEPiX working group has been very active in the past months to find a replacement for HS06. The WG is working in strict contact with the WLCG HEPscore deployment task force. This talk will focus on the technical aspect of the new benchmark and on the framework of the Benchmarking Suite in particular the analysis of the last results.
    
    Speaker: Dr Michele Michelotto (Universita e INFN, Padova (IT))
    
    Hepix-Spring-2022-Michelotto-v1.pdf
    
    Video recording
  - 23
    
    HEPCloud, an elastic virtual cluster from heterogeneous computing resources
    
    Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today.
    The current computing landscape is more heterogeneous because of the elevated capacity and capability of commercial clouds and the push of funding agencies toward supercomputers. Both add new complications. Commercial cloud resources are highly virtualized and customizable but need to be managed. High Performance Computers are each one of a kind with different access rules and restrictions, like limited network connectivity or complex access patterns.
    HEPCloud is a single managed portal that allows more scientists, experiments, and projects to use more resources to extract more science. Its goal is to provide cost-effective access by optimizing usage across all available types of computing resources and elastically expand the resource pool on short notice (e.g. by renting temporary resources on commercial clouds).
    Fermilab HEPCloud facility has been used successfully in production for over three years providing and 2021 saw a big ramp up, especially for CMS that used all its Frontera quota 6 months ahead of expiry and used 90M NERSC-hours bonus after consuming all its allocation.
    The Decision Engine is the software at the heart of HEPCloud, deciding where and how much to provision. It is an open-source project (https://github.com/HEPCloud/decisionengine) and recently version 2.0 was released, a release that we consider ready for wider adoption: it has a simplified installation and configuration, it is fully Python 3 code with strict coding best practices, it has a revised architecture with robust message passing between the components making the decisions.
    
    Speaker: Marco Mambelli (Fermilab (US))
    
    hepcloud-hepix-20220426.pdf
    
    Video recording
- End-User IT Services & Operating Systems Online workshop
  
  Online workshop
  - 24
    
    Tracking Kernel Rate of Change
    
    How fast is the Stream8 kernel moving? How do we tell? What can we learn from this information?
    
    Speaker: Patrick Riehecky (Fermi National Accelerator Lab. (US))
    
    kernel rate of change.pdf
    
    Video recording
Wednesday 27 April
- Storage & File Systems Online workshop
  
  Online workshop
  - 25
    
    EOS Report, Evolution & Strategy
    
    The presentation will summarize highlights from the 6th EOS workshop and discuss evolution of EOS services and the development roadmap during Run-3.
    
    Speaker: Andreas Joachim Peters (CERN)
    
    EOS 2022 Report & Roadmap - HEPIX.pdf
    
    Video recording
  - 26
    
    IO Shaping in EOS
    
    EOS services are used by large user communities and in many cases exposed and operated as a very large shared resource - though the criticality of individual IO activities varies. To give operational handles to shape data access by activity we have recently added support for direct IO, IO priorities, bandwidth policies and filesystem stream overload protection. For meta-data access EOS provides user specific configurable thread-pool and meta-data operation frequency limits.
    The presentation will discuss how these can be used and configured for production services.
    
    Speaker: Andreas Joachim Peters (CERN)
    
    IO Shaping in EOS - HEPIX.pdf
    
    Video recording
  - 27
    
    Third-party-copy transfer service status of JUNO experiment
    
    Jiangmen Underground Neutrino Observatory (JUNO) is an under-construction neutrino experiment located in Jiangmen, China, which is expected to generate about 3 PB experimental data per year. JUNO plan to share those data to all JUNO collaborators from 4 main data centers in China, France, Italy and Russia.
    Distributed data management system with Third-Party-Copy (TPC) data transfer support is introduced and developed for JUNO experiment. This talk will report our status and experience of third-party-copy service in JUNO distributed system, including HTTP-TPC deployments on data centers with different storage systems, token-based data authentication with macaroon and sci-tokens, operation test results for data transfer service. A system developed for monitoring all JUNO data centers TPC performance will alse be introduced in this talk.
    
    Speaker: Xuantong Zhang (Chinese Academy of Sciences (CN))
    
    JUNOTPCStatusTalk.pdf
    
    Video recording
- 10:15
  
  Coffee Break
- Storage & File Systems Online workshop
  
  Online workshop
  - 28
    
    bulkrequests: a simple tool for managing file QoS on top of dCache REST API
    
    Bulkrequests is a small tool that communicates with dCache through its REST API. It arises from the need to be able to consult and modify in a massive way the qos and locality of files stored on tape, such as to pin or unpin a set of files to/from disk as required. It was designed to cover this need in a simple way through a command line tool waiting for the new dCache bulk REST API that incorporates the processing of this type of requests. This tool is based on an existing development called dcacheclient (https://github.com/neicnordic/dcacheclient), supports the same authentication methods, and uses particularly the namespace section to query and change qos and locality. It will be reformulated to use the new bulk REST API when it becomes available.
    
    Speaker: Dario Graña (IATE - CONICET)
    
    Dario Graña - BulkRequests - Hepix Spring 2022.pdf
    
    Video recording
  - 29
    
    CERN’s Run 3 Tape Infrastructure
    
    LHC Run 3 is imposing unprecedented data rates on the tape infrastructure at CERN T0. Here we report on the nature of the challenge in terms of performance and reliability, on the hardware we have procured, and how it is deployed, configured and managed. We share details of our experience with the technology selected, a mix of IBM and SpectraLogic libraries and Enterprise and LTO drives. In particular, LTO-9 is a new technology and we cover low level details including media initialisation and its native Recommended Access Order (RAO). We conclude with an outlook on the likely evolution of the infrastructure.
    
    Speaker: Richard Bachmann (CERN)
    
    cern-tape-infra-2022.pdf
    
    Video recording
  - 30
    
    The CERN Tape Archive (CTA) - running Tier 0 tape
    
    During the ongoing long shutdown, all elements in LHC data-taking have been upgraded. As the last step in the T0 data-taking chain, the CERN Tape Archive (CTA) has done its homework and redesigned its full architecture in order to match LHC Run 3 data rates.
    
    This contribution will give an overview of the CTA service and how it has been deployed in production. We discuss the measures taken to assess and improve its performance and efficiency against various workflows, especially the latest data challenges realised on T0 tape endpoints. We illustrate the monitoring and alerting which is required to maintain performance and reliability during operations, and discuss the outlook for service evolution.
    
    Speaker: Julien Leduc (CERN)
    
    220427_HEPIX_The_CERN_Tape_Archive.pdf
    
    Video recording
- IT Facilities & Business Continuity Online workshop
  
  Online workshop
  - 31
    
    Next business day, or whenever we can
    
    We've operated data center hardware from various major vendors in the last two decades. For most systems we took out expensive support contracts for three to five years so defective hardware (memory, hard drives, motherboards) would be replaced in one business day.
    In recent years we have been noticing a considerable drop in the quality of delivering this support, where suppliers were unable to fulfill their obligations on time for various reasons.
    We'll discuss the probable causes behind this decline, the implications for our operations, and the way we can address this going forward.
    
    Speaker: Mr Dennis van Dok
    
    Notes of Next business day (or whenever).pdf
    
    Slides
    
    Video recording
- Storage & File Systems Online workshop
  
  Online workshop
  - 32
    
    dCache integration with CTA
    
    The ever increasing amount of data that is produced by modern scientific facilities like EuXFEL or LHC puts a high pressure on the data management infrastructure at the laboratories. This includes poorly shareable resources of archival storage, typically, tape libraries. To achieve maximal efficiency of the available tape resources a deep integration between hardware and software components are required.
    
    The CERN Tape Archive (CTA) is an open-source storage management system developed by CERN to manage LHC experiment data on tape. Although today CTA's primary target is CERN Tier-0, the data management group at DESY considers the CTA as a main alternative to commercial HSM systems.
    
    dCache has an exible tape interface which allows connectivity to any tape system. There are two ways that a le can be migrated to tape. Ether dCache calls a tape system specific copy command or through interaction via an in-dCache tape system specific driver. The latter has been shown (by TRIUMF and KIT Tier-1s), to provide better resource utilization and efficiency. Together with the CERN Tape Archive team we are working on seamless integration of CTA into dCache.
    
    This presentation will show the design of dCache-CTA integration, current status and first test results at DESY.
    
    Speaker: Mr Tigran Mkrtchyan (DESY)
    
    dcache-cta-integration.pdf
    
    Video recording
  - 33
    
    EOS and XCache data access performance for LHC analysis at CERN
    
    Physics analysis is done at CERN in several different ways, using both interactive and batch resources and EOS for data storage. In order to understand if and how the CERN computer centre should change the way analysis is supported for Run3, we performed several performance studies on two fronts: measuring the performance and utilisation levels of EOS with respect to the current analysis workloads, and looking at the performance of different storage configurations, including SSD-based and HDD-based XCache instances, with respect to specific, I/O intensive analysis workloads from ATLAS and CMS. The collected results indicate that the current infrastructure is adequate and works well below saturation, and that specific needs can be fulfilled by dedicated high performance/throughput servers. We expect this type of studies to continue and the CERN infrastructure to adapt to the evolving needs of the LHC analysis community.
    
    Speaker: Dr Andrea Sciabà (CERN)
    
    LHC Analysis at CERN HEPiX.pdf
    
    LHC Analysis at CERN HEPiX.pptx
    
    Video recording
  - 34
    
    Open Source Erasure Coding Technologies
    
    This presentation will provide a short overview and comparison of four available Open Source erasure coding technologies for storage (MINIO, RADOS, EOS, XRootd EC) in the context of the Erasure Coding Working Group.
    
    Speaker: Andreas Joachim Peters (CERN)
    
    Open Source Erasure Coding Technologies - HEPIX.pdf
    
    Video recording
- 17:15
  
  Coffee Break
- Storage & File Systems Online workshop
  
  Online workshop
  - 35
    
    XRootD object storage: native EC-based file store and S3 proxy
    
    Over the last years we have observed increasing importance of object storage in the WLCG community. In this contribution we report on our effort to accommodate object storage use cases within XRootD, a software framework that is a critical component for data access and management at WLCG sites. Firstly, we introduce a high performance erasure coding (EC) based file storage module motivated by the ALICE O2 use case and compatible with any type of XRootD backend storage. Furthermore, we discuss the XRootD proxy for S3 storage and native XRootD EC-based file store that provides the WLCG required data-transfer-node (DTN) facilities like third-party-copy, checksum query, VOMS authentication and access token support.
    
    Speaker: Michal Kamil Simon (CERN)
    
    hepix_2022_xrootd_object_storage_v3.pptx
    
    Video recording
  - 36
    
    Introducing PostgreSQL Table Partitioning to Dcache
    
    Database systems have been known to deliver impressive performance for large classes of workloads. Nevertheless, database systems with mammoth data sets or high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server and having working set sizes larger than the system's RAM stresses the I/O capacity of disk drives. This presentation will show how we use postgres table partitioning to take the edge off some of these issues to improve performance.
    
    Speaker: Mwai Karimi
    
    Postgres_table_partitioning_HEPiX.pdf
    
    Video recording
- IT Facilities & Business Continuity Online workshop
  
  Online workshop
  - 37
    
    SDCC Transition to the New Data Center
    
    The BNL Computing Facility Revitalization (CFR) project aimed at repurposing the former National Synchrotron Light Source (NSLS-I) building (B725) located on BNL site as a new data center for Scientific Data and Computing Center (SDCC). The CFR project finished the design phase in the first half of 2019, completed the construction phase by the end of FY2021, and entered the early occupancy phase in Jun-Aug 2021. The occupancy of the B725 data center for production CPU and DISK resources of the ATLAS experiment at the LHC at CERN, STAR, PHENIX and sPHENIX experiments at RHIC Collider at BNL, the Belle II Experiment at KEK (Japan) started in 2021Q4 and ramped up in 2022Q1 to the level of 40 racks populated with equipment in the B725 Main Data Hall (MDH). At the same time, two library rows in B725 Tape Room were populated with IBM TS4500 tape libraries serving ATLAS and sPHENIX experiments. The occupancy of B725 MDH is expected to further increase to 70 racks by the end of FY2022. The new HPC clusters and storage systems of BNL Computational Science Initiative (CSI) are to be deployed in B725 data center starting from early FY2023 as well. The transition of the SDCC data center environment for using B725 data center for hosting the majority of CPU and DISK resources, and leaving the old (B515 based) data center for hosting predominantly TAPE resources, is expected to continue until the end of FY2023. In this talk I am going to summarize the main design features of the new SDCC datacenter, report on how the transition to B725 data center occupancy was carried out in 2021Q4-2022Q1 time frame, and highlight the plans for scaling up the occupancy and infrastructure utilization for both old and new data centers up to FY2026.
    
    Speaker: Alexandr Zaytsev (Brookhaven National Laboratory (US))
    
    azaytsev_BNL_SDCC_B725_DC_HEPiX_27042022_v3a.pdf
    
    Video recording
Thursday 28 April
- Grid, Cloud & Virtualisation Online workshop
  
  Online workshop
  - 38
    
    CERN Cloud Infrastructure - operations and service update
    
    CERN's private OpenStack cloud offers more than 300,000 cores to over 3,500 users with services for compute, multiple storage types, baremetal, container clusters, and more.
    CERN Cloud Team constantly works on improving these services while maintaining stability and availability that is critical for many services in IT and the experiment workflows.
    This talk will cover the challenges and our approach to high availability , VMs live migrations and monitoring live migration executions.
    Also, this talk will bring an update of the evolution of the cloud service over the past year, and the plans for the upcoming year.
    
    Speaker: Jayaditya Gupta (CERN)
    
    CERN Cloud Infrastructure - operations and service update.pdf
    
    CERN Cloud Infrastructure - operations and service update.pptx
    
    Video recording
  - 39
    
    Anomaly Detection System for the CERN Cloud Monitoring
    
    As CERN cloud service managers, one of our tasks is to make sure that the desired computational power is delivered to all users of our scientific community. This task is accomplished by monitoring the utilization metrics of each hypervisor and reacting to alarms in case of server saturation to mitigate the interference between VMs.
    
    In order to maximize the efficiency of our cloud infrastructure and to reduce the monitoring effort for service managers, we have developed an Anomaly Detection System that leverages unsupervised machine learning methods for time series metrics. Moreover, adopting ensemble strategies, we combine traditional and deep learning approaches.
    
    This contribution presents the design of our Anomaly Detection system, the algorithms exploited and their performance in the daily operation of the CERN cloud. The analytics pipeline relies on open-source tools and frameworks adopted at CERN, such as pyOD, Tensorflow, Spark, Apache Airflow, Grafana, Elasticsearch.
    
    Speaker: Antonin Dvorak (Czech Academy of Sciences (CZ))
    
    Anomaly_Detection.pdf
    
    Video recording
- 09:50
  
  Coffee Break
- Basic IT Services Online workshop
  
  Online workshop
  - 40
    
    Transcoding as a Service
    
    As part of the modernization of the Weblecture service, a new Transcoding infrastructure has been put in place, based on the FOSS product Opencast [1], to cover the needs of the Weblecture and CDS services.
    
    In this talk we will explain the work done in order to adapt Opencast to CERN workloads, extension of the metadata to operate with Indico, encoding profiles, visualization using the default Opencast player based on Paella [3], intro/outro, trimming, the actual infrastructure the service is running on, future trends of the Opencast project and last but not least how to access the TaaS [2] service illustrating with CDS as a use case.
    
    [1] https://opencast.org/
    [2] https://taas.docs.cern.ch/
    [3] https://paellaplayer.upv.es/
    
    Speakers: Miguel Angel Valero Navarro (Valencia Polytechnic University (ES)), Ruben Domingo Gaspar Aparicio (CERN)
    
    HEPiX_TaaS.pdf
    
    Video recording
  - 41
    
    Databases @ DESY
    
    DESY has relied on a central database service based on Oracle for decades.
    With APEX, this service has received an additional impetus in application development. An unlimited license agreement followed.
    
    However, Oracle is not supported by all applications and the users are looking for alternatives. The pressure from the users is great.
    In order to confirm the great meaning and importance of databases and not to lose their importance in the database business, a change must be made.
    
    DESY opts for a database group within IT that also focuses on other databases.
    
    Speaker: Christine Apfel (DESY)
    
    DatabasesAtDESY_HEPiX_202204.pdf
    
    Video recording
- Board meeting (closed session) Online workshop
  
  Online workshop
- Grid, Cloud & Virtualisation Online workshop
  
  Online workshop
  - 42
    
    Getting FTS at CERN ready for LHC Run3
    
    The File Transfer Service (FTS) is responsible for distributing the majority of the LHC data across the WLCG infrastructure. FTS schedules and executes data transfers, maximizing the use of available network and storage resources whilst easing the complexity of the grid environment by masking the details of the different underlying transfer protocols and storage endpoints.
    
    The FTS service is used by more than 30 experiments within the WLCG. In 2021 FTS transferred more than one billion files across various WLCG sites, adding up to more than one exabyte of data. With Run3 rapidly approaching, the CERN service has shifted focus on service consolidation, aiming for increased reliability, ease of operation and built-in service-health monitoring. The software stack has been modernized to facilitate this consolidation, most notably this has included the replacement of all Python2 components by their new Python3 counterparts.
    
    This presentation will share the lessons learnt and the improvements accomplished whilst preparing the FTS service for LHC Run 3. In particular the presentation will cover the new log-based monitoring service and the new database deployment strategy. An overview of the software improvements will also be given.
    
    Speaker: Joao Pedro Lopes
    
    HepixPresentaition.pdf
    
    Video recording
  - 43
    
    Updates on the Integration of the JLAB Computing and Storage resources with the OSG Cyberinfrastructure in support of collaborative research
    
    Several enhancements have been introduced in the Jefferson Lab infrastructure to increase the robustness of the existing integration with computing pools for a number of collaborations doing experimental research in High Energy Physics. JLAB has provisioned access, entry, and execution points which allow the multiple collaboration users at the facility to submit HTCondor jobs to various pools and accept jobs submitted from other facilities to run in its computing farm. Jefferson Lab has completed infrastructure enhancements in support of multi-VO Open Science grid operations for CLAS12, EIC, GlueX, and MOLLER. Two networks were established for grid-facing services: A Science DMZ network for data transfer nodes outside the firewall and a science portals network for less data-intensive services that benefit from application layer firewalling. The Lab’s existing 2x10Gbit ESNet connections are being upgraded to 2x100Gbit in 2022, which will result in the capability for end-to-end flows supporting reconstruction in addition to simulations. With the system and network upgrades in place, work is in progress on the infrastructure for SciTokens, which is essential for authorization and authentication using federated identities. Work at present involves CILogon, OSG, and JLab, and aims at using SciTokens in HTCondor jobs to support VO-differentiated access to storage on the Science DMZ, both through Open Science Data Federation (OSDF) and to dedicated storage resources.
    
    Speakers: Mr Bryan Hess (Jefferson Lab), Dr Paschalis Paschos
    
    Hepix 2022.pdf
    
    Video recording
- 16:50
  
  Coffee Break
- Basic IT Services Online workshop
  
  Online workshop
  - 44
    
    Moving from Elasticsearch to OpenSearch at CERN
    
    The centralised Elasticsearch service has already been running at CERN for over 6 years, providing the search and analytics engine for numerous CERN users, supporting various aspects of the High Energy Physics community. The service has been based on the open-source version of Elasticsearch, surrounded by a set of external open-source plugins offering security, multi-tenancy, extra visualization types and more. Motivated by the recent license change of Elasticsearch and by the streamlined deployment of the feature-rich OpenSearch project as a 100% open-source environment, the decision was taken to migrate the service at CERN towards it. This presentation covers the motivation, design and implementation of this change, the current state and the future plans of the service.
    
    Speaker: Mr Sokratis Papadopoulos (Ministere des affaires etrangeres et europeennes (FR))
    
    Moving from Elasticsearch to OpenSearch at CERN.pdf
    
    slides
    
    Video recording
- Miscellaneous: Workshop wrap-up Online workshop
  
  Online workshop
  
  Convener: Tony Wong
  - 45
    
    Workshop wrap-up
    
    Speaker: Tony Wong
    
    HEPIX Spring 2022 Wrap-up.pdf
    
    Video recording

Choose timezone

HEPiX Spring 2022 online Workshop

HEPiX Spring 2022 online Workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop

Online workshop