HEPiX Spring 2023 Workshop

Asia/Taipei
1F Conference Room (Research Center for Environmental Changes (RCEC), Academia Sinica )

1F Conference Room

Research Center for Environmental Changes (RCEC), Academia Sinica

128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
Peter van der Reest, Tomoaki Nakamura
Description

HEPiX Spring 2023 at ASGC, Taipei, Taiwan

 

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.

The workshop is hosted by Academia Sinica Grid Computing (ASGC) Center, Taipei Taiwan.

Co-located events

The International Symposium on Grids and Clouds 2023, beginning on Sunday, March 19th, is also organized by ASGC.

    • 9:00 AM
      Registration 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Miscellaneous 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Peter van der Reest, Tomoaki Nakamura
    • Site Reports 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
      • 3
        ASGC site report

        ASGC site report

        Speakers: Felix.hung-te Lee (Academia Sinica (TW)), Ms Jingya You (ASGC)
      • 4
        CERN site report

        News from CERN since the last HEPiX workshop. This talk gives a general update from services in the CERN IT department.

        Speaker: Jarek Polok (CERN)
      • 5
        PIC report

        This is the PIC report for HEPiX Spring 2023 Workshop

        Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
      • 6
        KEK Site Report

        The KEK Central Computer System (KEKCC) is a computer service and facility that provides large-scale computer resources, including Grid and Cloud computing systems and essential IT services, such as e-mail and web services.

        Following the procurement policy for the large-scale computer system requested by the Japanese government, we replace the entire system once every four or sometimes five years. The current system has replaced the previous system and has been in production since September 2020, and decommissioning will be planned to begin in Q3 of 2024.

        During about 30 months of operation in the current system, we have decommissioned some legacy Grid services, like LFC, and migrated some Grid services to the newer operating system, CentOS7. In this talk, we would like to share our experiences and challenges regarding Grid services introduced in the KEKCC. Also, we will review the ongoing activity to enable Grid services in the token-only environment.

        Speaker: Go Iwai (KEK)
    • 11:30 AM
      Coffee break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Site Reports 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
      • 7
        IHEP Site Report

        Site news report from IHEP, CAS, including status of computing platform construction, grid, network, storage and so on, since last workshop report.

        Speaker: Xuantong Zhang (Chinese Academy of Sciences (CN))
      • 8
        DESY site report

        an overview of developments at DESY

        Speaker: Peter van der Reest
      • 9
        FZU (Prague) Site Report

        The usual site report.

        Speaker: Jiri Chudoba (Czech Academy of Sciences (CZ))
      • 10
        AGLT2 Site Report Spring 2023

        We will present an update on our site since the Fall 2021 report, covering our changes in software, tools and operations.

        Some of the details to cover include the details of our recent hardware purchases, our network upgrades and our preparations to select and implement our next operating system and associated provisioning systems. We will also discuss our work with Elasticsearch and our efforts to implement the WLCG Security Operations Center components. We conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.

        Speaker: Shawn Mc Kee (University of Michigan (US))
    • 1:00 PM
      Lunch break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Basic IT Services & End User Services 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Dennis van Dok (Nikhef), Erik Mattias Wadenstein (University of Umeå (SE)), 石京燕 shijy
      • 11
        Dynamic Deployment of Data Collection and Analysis Stacks at CSCS

        Complexity and scale of systems increase rapidly and the amount of related
        monitoring and accounting data grows accordingly.

        Managing this vast amount of data is a challenge that CSCS solved by
        introducing a Kubernetes cluster dedicated to dynamically deploying
        data collection and analysis stacks comprising Elastic Stack, Kafka and
        Grafana both for internal usage and for external customers' use cases.

        This service proved to be crucial at CSCS to provide correlation of
        events and meaningful insights from event-related data: bridging the
        gap between the computation workload and resources status enables
        failure diagnosis, telemetry and effective collection of accounting
        data.

        Currently at CSCS the main production Elastic Stack is handling more than
        200B online documents. The integrated environment from data collection to
        visualization let internal and external users produce their own powerful
        dashboards and monitoring displays that are fundamental for their data
        analysis needs.

        Speaker: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))
      • 12
        Monitoring Windows Server Infrastructure using Open Source products

        More than 500 servers are actively managed by the Windows Infrastructure team on the CERN site. These servers run critical services for the laboratory such as controlling some of the accelerator most critical systems through Terminal Servers, managing all CERN users and computers registered in Active Directory, hosting accelerator designs in DFS storage or enabling Engineering software licensing. Having full time visibility on their state is critical for a smooth operation of the laboratory. In 2021, in the context of replacing Microsoft System Center Configuration manager as an in-depth Windows host monitoring system, a project was launched to implement an open source lightweight icinga2 ecosystem. This presentation will describe the implementation of such system and the technical choices and configurations made to transparently deploy and manage the icinga2 infrastructure across the Windows Infrastructure at CERN.

        Speaker: Mr Pablo Martin Zamora (CERN)
      • 13
        The unified identity authentication platform for IHEP

        The Institute of High Energy Physics of the Chinese Academy of Sciences is a comprehensive research base in China engaged in high -energy physical research, advanced accelerator physics and technology research and development and utilization, and advanced ray technology and application.
        The Sing sign on(SSO) system of the High Energy Institute has more than 22,000 users, the calculation cluster (AFS) users 3.2K, Web applications of more than 150, and more than 10 client applications. With the development of the high energy institute, international cooperation has become more and more frequent, so the SSO system of the high energy institute has been generated.
        The SSO system of the High Energy Institute integrates all personnel systems and AFS user accounts. And realize the Chinese certification federation CARSI and international federation EduGain access, not only realize the unified account management of the within the place, but also gradually realize domestic universities and international organization certification.

        Speaker: qi luo (中科院高能物理所计算中心)
    • 3:45 PM
      Coffee break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Basic IT Services & End User Services 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Dennis van Dok (Nikhef), Erik Mattias Wadenstein (University of Umeå (SE)), 石京燕 shijy
      • 14
        Status of CERN Authentication and Authorisation

        Authentication and Authorisation is the core service to secure access for computing resources at any large-scale organisation. At CERN we handle around 25,000 logins per day of 35,000 individual users, granting them access to more than 9,000 applications and websites that use the organisation's Single Sign-On (SSO). To achieve this, we have built an Identity and Access Management platform based on open source and commercial software. CERN has also many different needs and use cases, which needed to be adapted or implemented by leveraging existing solutions and protocols. These needs included a general need for machine-to-machine automated authentication, CLI access and two-factor authentication (2FA). We will describe our authentication landscape and focus on key challenges that we hope will be relevant for other communities.

        Speaker: Asier Aguado Corman (CERN)
      • 15
        Federated ID and Token Transition Status

        Present the current status and roadmap for federated ID via Comanage Registry at CILOGON as well as progress on implementing token based services at Brookhaven National Laboratory Scientific Data and Computing Center.

        Speaker: Robert Hancock
    • 6:30 PM
      Welcome Reception
    • 9:00 AM
      Registration 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Grid, Cloud & Virtualisation and Operating Systems 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Andreas Haupt (DESY), Ian Collier (Science and Technology Facilities Council STFC (GB)), Tomoaki Nakamura
      • 16
        Cloud Infrastructure Update: Operations, Campaigns and Evolution

        CERN has been running an OpenStack based private cloud infrastructure in production since 2013. This presentation will give a status update of the deployment, and then dive into specific topics, such as the 12-month work to replace the network control plane for 4400 virtual machines or the live-migration machinery used for interventions or reboot campaigns.

        Speaker: Maryna Savchenko (CERN)
      • 17
        Providing ARM and GPU resources in the CERN Private Cloud Infrastructure

        The CERN Cloud Infrastructure service has recently commissioned a set of ARM and GPU nodes as hypervisors. This presentation will cover all the steps required to prepare the provisioning of ARM based VMs: the creation of multi-arch docker images for our GitLab pipelines, the preparation of ARM user images, or adaptions to the PXE and Ironic setup to manage this additional architecture. There would be also presented an overview of Cloud GPU resources and explained the difference between PCI passthrough, vGPU and multi-instance GPU.

        Speaker: Maryna Savchenko (CERN)
      • 18
        Kubernetes cluster for Helmholtz users

        The Helmholtz Association's federated IT platform HIFIS enables the individual Helmholtz centres to share IT services and resources for the benefit of all users in the association. For that purpose a central service catalog - the Helmholtz cloud portal - lists all these services so that scientists, technicians and administrators can make use of them. In this context, DESY offers access to a Kubernetes platform managed by Rancher for all users with a clear purpose to test and try their application deployments on Kubernetes without having to pay for resources at a commercial provider.

        Using Kubernetes as a development and deployment platform for web-based applications has become a de-facto standard in industry as well as in the cloud-based open source community. We observe that many useful services and tools can easily be deployed on Kubernetes if one has a cluster at hand, which is not as commonplace at the moment as it could be. With offering Kubernetes to the Helmholtz Association's members, DESY hopes to contribute to a more widespread adoption of modern cloud-based workflows in science and its surroundings. The abstraction layer Kubernetes offers makes for more reusable software in the long run, which would be benefitial to the whole scientific community.

        In our presentation at the HEPiX workshop, we will show how we deploy our Kubernetes clusters using Rancher and other tools and which applications we deem necessary to achieve basic usability of the clusters. A major part of the presentation will be on the integration of the clusters with the Helmholtz AAI, resource management and integration with development workflows. Finally we will highlight use cases from different Helmholtz centres that already make use of our clusters and how they gained access and first introductions to the platform itself.

        Speaker: Tim Wetzel
    • 11:15 AM
      Coffee break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Grid, Cloud & Virtualisation and Operating Systems 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Andreas Haupt (DESY), Ian Collier (Science and Technology Facilities Council STFC (GB)), Tomoaki Nakamura
      • 19
        Fully automated: Updates on the Continuous Integration for supported Linux distributions at CERN

        Historically the release processes for supported CERN Linux distributions involved tedious manual procedures that were often prone to human error. In addition, and as a knock-on effect from the turmoil created in 2020 with the CentOS Linux 8 end-of-life announcement; the CERN Linux team have now been required to support an increasing number of Linux distributions.
        To cope with this additional workload (currently 8 Linux distributions: CC7, CS8, CS9, RHEL7, RHEL8, RHEL9, ALMA8, ALMA9), our team have been forced to adopt full scale automation, testing and continuous integration; all whilst significantly reducing the need for human interventions.
        Automation can now be found in every part of our process: cloud and Docker image building, base-line testing, CERN specific testing and full-stack functional testing. For this we use a combination of GitLab CI capabilities, Koji, OpenStack Nova, OpenStack Ironic central services, nomad and a healthy dose of Python and Bash. Test suites now cover unmanaged, managed (puppet), virtual and physical machines; which allows us to certify that our next image release continues to meet the needs of the organization.

        Speaker: Ben Morrice (CERN)
      • 20
        Update on the Linux Strategy for CERN (and WLCG)

        This presentation will be a follow-up to the presentation and “Linux Strategy” BoF in Umea and summarise the latest evolution on the strategy for Linux at CERN (and WLCG): a recap of the situation (e.g. the issues with Stream or the changes for the RHEL license), a presentation of the agreed strategy as well as insights into the decision making process (in particular for the choice of AlmaLinux as the EL rebuild).

        Speaker: Ben Morrice (CERN)
    • Site Reports 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
    • Photo session 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • 1:15 PM
      Lunch break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Grid, Cloud & Virtualisation and Operating Systems 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Andreas Haupt (DESY), Ian Collier (Science and Technology Facilities Council STFC (GB)), Tomoaki Nakamura
      • 22
        The WLCG Journey at CSCS: from Piz Daint to Alps

        The Swiss National Supercomputing Centre (CSCS), in close collaboration with the Swiss Institute for Particle Physics (CHiPP), provides the Worldwide LHC Computing Grid (WLCG) project with cutting-edge HPC and HTC resources. These are reachable through a number of Computing Elements (CEs) that, along with a Storage Element (SE), characterise CSCS as a Tier-2 Grid site. The current flagship system, an HPE Cray XC named Piz Daint, has been the platform where all the computing requirements for the Tier-2 have been met for the last 6 years. With the commissioning of the future flagship infrastructure, an HPE Cray EX referred to as Alps, CSCS is gradually moving the computational resources to the new environment. The Centre has been investing heavily in the concept of Infrastructure as Code (IaC) and it is embracing the multi-tenancy paradigm for its infrastructure. As a result, the project leverages modern approaches and technologies borrowed from the cloud to perform a complete re-design of the service. During this process, Kubernetes, Harvester, Rancher, and ArgoCD technologies have been playing a leading role, providing CSCS with enhanced flexibility in terms of the orchestration of clusters and applications. This contribution aims to describe the journey, design choices, and challenges encountered along the way to implement the new WLCG platform, which is also profited from by other projects such as the Cherenkov Array Telescope (CTA) and the Square Kilometre Array (SKA).

        Speaker: Dr Riccardo Di Maria (CERN)
      • 23
        An Update from the SLATE Project

        We will provide an update on the SLATE project (https://slateci.io), an NSF funded effort to securely enable service orchestration in Science DMZ (edge) networks across institutions. The Kubernetes-based SLATE service provides a step towards a federated operations model, allowing innovation of distributed platforms, while reducing operational effort at resource providing sites. The SLATE project is in its last year and is working to wrap up while also preparing for what comes next.

        The presentation will cover our recent efforts including transitioning to k8s (Kubernetes) 1.24 and adding OpenTelemety, revising our build and update system and augmenting our catalog of applications in preparation for future possibilities.

        Speaker: Shawn Mc Kee (University of Michigan (US))
    • Board Meeting (closed session) 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • 9:00 AM
      Registration 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Networking & Security 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: David Kelsey (Science and Technology Facilities Council STFC (GB)), Shawn Mc Kee (University of Michigan (US))
      • 24
        CERN Computer Centre(s) Network evolution: achievements during the last 3 years and expected next steps

        During HEPIX Fall 2019, a CERN presentation explained our plans to prepare the Computer Centre Network for LHC Run3. This year’s presentation will explain what has been achieved and the forthcoming steps. Points to be covered include:
        - the current CERN Datacentre Network architecture,
        - how we handled a full datacentre network migration during COVID19 lockdown period,
        - how the connections between the main datacentre and other CERN sites
        (including a 2.4Tbps link for ALICE O2 setup, and a total of 2.1Tbps
        connection to containers located on LHCb site) evolved,
        - the new tools and features we introduced (Zero Touch Provisioning for Juniper switches, Vlan support up to the ToR),
        - the issues we faced and how we handled them (bug affecting DHCPv6, hardware delivery delays),
        - datacentre network plans for 2023, and
        - the network setup and features to be deployed in the new Prévessin Computer Centre.

        Speaker: Vincent Ducret (CERN)
      • 25
        CERN Computer Centre(s) Network evolution: achievements during the last 3 years and expected next steps
        Speaker: Vincent Ducret (CERN)
    • Basic IT Services & End User Services 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Dennis van Dok (Nikhef), Erik Mattias Wadenstein (University of Umeå (SE)), 石京燕 shijy
      • 26
        Token based solutions for SSH with OIDC

        OIDC (OpenID Connect) is widely used for transforming our digital
        infrastructures (e-Infrastructures, HPC, Storage, Cloud, ...) into the token
        based world.

        OIDC is an authentication protocol that allows users to be authenticated
        with an external, trusted identity provider. Although typically meant for
        web- based applications, there is an increasing need for integrating
        shell- based services.

        This contribution delivers an overview of several tools, each of which
        provides a solution to a specific aspect of using tokens on the
        commandline in production services:

        oidc-agent is the tool for obtaining oidc-access tokens on the
        commandline. It focuses on security and manages to provide ease of use
        at the same time. The agent operates on a users workstation or laptop
        and is well integrated with graphical user interfaces of several
        operating systems, such as Linux, MacOS, and Windows. Advanced features
        include agent-forwarding which allows users to securely obtain access
        tokens from remote machines to which they are logged in.

        mytoken is both, a server software and a new token type. Mytokens allow
        obtaining access tokens for long time spans, of up to multiple years. It
        introduces the concept of "capabilities" and "restrictions" to limit the
        power of long living tokens. It is designed to solve difficult use-cases
        such as computing jobs that are queued for hours before they run for
        days. Running (and storing the output of) such a job is straightforward,
        reasonably secure, and fully automisable using mytoken.

        pam-ssh-oidc is a pam module that allows accepting access tokens in the
        Unix pluggable authentication system. This allows using access tokens
        for example in ssh sessions or other unix applications such as su. Our
        pam module allows verification of the access token via OIDC or via 3rd
        party REST interfaces.

        motley-cue is a REST based service that works together with pam-ssh-oidc
        to validate access tokens. Along the validation of access tokens,
        motley-cue may - depending on the enabled features - perform additional
        useful steps in the "SSH via OIDC" use-case. These include
        Authorisation (based on VO membership)
        Authorisation (based on identity assurance)
        Dynamic user creation
        One-time-password generation (in case the access token is too long for
        the SSH-client used)
        Account provisioning via plugin based system (interfaces with local
        Unix accounts, LDAP accounts, and external REST interfaces)

        Account blocking (by authorised administrators in case of a security
        incident)

        mccli is a client side tool that enables clients to use OIDC
        access-tokens that normally do not support them. Currently, ssh, sftp
        and scp are supported protocols.

        oidc-plugin for putty makes use of the new putty plugin interface to use
        access tokens for authentication, whenever an ssh-server supports it.
        The plugin interfaces with oidc-agent for windows to obtain tokens.

        The combination of the tools presented allows creative new ways of using
        the new token-based AAIs with old and new tools. Given enough time, this
        contribution will include live-demos for all of the presented tools.

        Speaker: Marcus Hardt (Kalrsruhe Institute of Technology)
    • 11:15 AM
      Coffee break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Networking & Security 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: David Kelsey (Science and Technology Facilities Council STFC (GB)), Shawn Mc Kee (University of Michigan (US))
      • 27
        HEPiX IPv6 Working Group - Update

        The transition of WLCG storage services to dual-stack IPv6/IPv4 is nearing completion. Monitoring of data transfers shows that many are happening today over IPv6 but it is still true that many are not! The agreed endpoint of the WLCG transition to IPv6 remains the deployment of IPv6-only services, thereby removing the complexity and security concerns of operating dual stacks. The HEPiX IPv6 working group is investigating the obstacles to the use of IPv6 in WLCG. This talk will present our recent activities including investigations for the reasons behind the ongoing use of IPv4.

        Speaker: Bruno Heinrich Hoeft (KIT - Karlsruhe Institute of Technology (DE))
      • 28
        Status and Plans for the Research Networking Technical WG

        The high-energy physics community, along with the WLCG sites and Research and Education (R&E) networks have been collaborating on network technology development, prototyping and implementation via the Research Networking Technical working group (RNTWG) since early 2020.

        As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon.

        In this talk we’ll give an update on the Research Networking Technical working group activities, challenges and recent updates. In particular we’ll focus on the flow labeling and packet marking technologies (scitags), the new effort on packet pacing and related tools and approaches that have been identified as important first steps for the work of the group.

        Speaker: Shawn Mc Kee (University of Michigan (US))
      • 29
        perfSONAR Global Monitoring and Analytics Framework Update

        WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The IRIS-HEP/OSG-LHC Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and constituents. We will report on the changes and updates that have occurred since the last HEPiX meeting.

        We will cover the status of, and plans for, the evolution of the WLCG/OSG perfSONAR infrastructure, as well as the new, associated applications that analyze and alert upon the metrics that are being gathered.

        Speaker: Shawn Mc Kee (University of Michigan (US))
    • 1:00 PM
      Lunch break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Show Us Your ToolBox 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Convener: Peter van der Reest
    • Networking & Security 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      • 30
        Computer Security Landscape Update

        This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves.

        Speaker: Liviu Valsan (CERN)
    • 6:30 PM
      Gala dinner
    • 9:00 AM
      Registration 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Storage and Filesystems 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Ofer Rind (Brookhaven National Laboratory), Peter van der Reest
      • 31
        Migrating DPM to the federated NDGF-T1 dCache at the UNIBE-LHEP ATLAS Tier-2

        Following the DPM EOL announcement, we have considered options for transitioning to a supported Grid storage solution. Our choice has been to integrate our site storage in Bern with the NDGF-T1 distributed dCache environment. In this presentation we outline the option scenario, motivate our choice, and give a summary of the technical implementation as experienced from the remote site point of view.

        Speaker: Francesco Giovanni Sciacca (Universitaet Bern (CH))
      • 32
        Operating a federated dCache system

        Over the years, we have built a federated dCache over multiple sites. As of late, we have integrated a couple of new sites, as well as improved our automation and monitoring. This talk will focus on the current state of dCache deployment, administration, and monitoring at NDGF-T1.

        Speaker: Erik Mattias Wadenstein (University of Umeå (SE))
      • 33
        Status of CERN Tape Archive operations during Run3

        The CERN Tape Archive (CTA) is CERN’s physics data archival storage solution for Run-3. Since 2020, CTA has been progressively ramping up to serve all the LHC and non-LHC tape workflows which were previously handled by CASTOR. 2022 marked a very successful initial Run-3 data taking campaign on CTA, reaching the nominal throughput of 10 GB/s per experiment and setting new monthly records of archived data volume.

        In this presentation, we review key production service lessons learnt during the beginning of Run-3. The transition of Tier-0 tape from a Hierarchical Storage Model (HSM) to a pure tape endpoint has enabled multiple optimisations, initially targetting low-latency archival write performance of DAQ data. We will discuss how the redesign of tape workflows has led to gains in terms of performance and automation, and present several ongoing activities which improve production feedback to the experiments. Finally, we will present upcoming challenges for CTA, as tape storage is becoming “warmer” in all workflows.

        Speaker: Julien Leduc (CERN)
    • 11:15 AM
      Coffee break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Storage and Filesystems 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Ofer Rind (Brookhaven National Laboratory), Peter van der Reest
      • 34
        CERN Storage Services Status and Outlook for 2023

        CERN Storage Services are key enablers for CERN IT workflows: from scientific collaboration and analysis on CERNBox, to LHC data taking with CTA and EOS, to fundamental storage for cloud-native and virtualized applications.

        On the physics storage, 2022 was marked by the absence of a Heavy Ion run. The energy crisis resulted in a shorter LHC run than anticipated and the impact will continue to be felt during 2023. Consequently, the Proton run will target a higher integrated luminosity for most experiments and there is a likelihood of an extended Heavy Ion run. We will review the impact of the physics planning changes on the storage infrastructure and how CERN storage is preparing for the resulting increased data rates in 2023. Notably, ALICE storage workflows through EOS ALICE O2 and EOSCTA ALICE.

        We will review progress in HTTP protocol activities for T0 transfers as well as synergies on transverse functionalities, such as monitoring.

        Physics storage successfully migrated to EOS5 during 2022, 2023 is the migration year for all the other services running on EOS software: notably EOSCTA and CERNBox infrastructures. We summarise plans and evolution for CERNBox, building on the new refreshed platform reported at the last meeting.

        CERN maintains a large Ceph installation and we report on recent evolutions, including investments in delivering a high(er) available service and in hardening upstream features for backups and recovery.

        Last but not least, we will report on the storage group migration strategy out of CentOS7 which will be implemented in 2023.

        Speaker: Julien Leduc (CERN)
      • 35
        Archive Storage for Project sPhenix

        How we make use of tape systems to help big data processing for Project sPhenix.
        Project sPhenix has a projected data volume of 650PB through 2026. Using tape system smartly will provide us a fast and reliable storage and lower the storage cost.

        Speaker: Mr tchou@bnl.gov Chou (Brookhaven National Lab)
    • 12:35 PM
      Lunch break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Computing and Batch Services 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Michel Jouvin (Université Paris-Saclay (FR)), Dr Michele Michelotto (Universita e INFN, Padova (IT))
      • 36
        HEPscore: a new benchmark for WLCG compute resources

        The HEPiX CPU Benchmark Working Group has developed a new CPU benchmark, called HEPScore, based on HEP applications. HEPScore will replace the current HEPSpec06 benchmark that is currently used by the WLCG for accounting and resource pledges.The new benchmark will be based on contributions by many WLCG experiments and will be able to run on x86 and ARM processor systems. We present the results that led to the current candidate for the HEPScore benchmark, which is expected to be released for production use in April 2023. We will briefly describe the transition plan for migrating from the current HEPSpec06 benchmark to HEPScore in 2023 and 2024. In addition, the current interest in reducing electricity consumption and minimizing the carbon footprint of HEP computing, focused the community on producing workloads that can run on ARM processors. We highlight some of the early results of the studies and the effort by the Working Group to include power utilization information into the summary information.

        Speaker: Randall Sobie (University of Victoria (CA))
      • 37
        Job Accounting for HTCondor with Heterogeneous Systems

        While ArcCE and other systems allow for a single HEPSpec06 value in their configuration that is used when reporting the accounting information to APEL/EGI, sites usually have more than one kind of systems. In such heterogeneous systems an average needs to be used which can't reflect the real CPU usage of jobs especially when running jobs for different VOs and with different run times. In such case it would be better to use a HEPSPEC06 value per job reflecting the real system where a job run.
        In this presentation we will show an easy solution that stores a HEPSPEC06 value together with the job information in HTCondor's job history and uses the condor job history to report to Apel. While there may be other solutions out there and developed over the last years, we think this may still be useful for other sites too. This solution may also be interesting once the new HEPScore benchmark will be used for accounting where one can run parts of it within a short time for example at boot time of a VM.

        Speaker: Dr Marcus Ebert (University of Victoria)
      • 38
        Cloudscheduler V.2 from an HTCondor admin's point of view

        We developed a new version of Cloudscheduler on which we reported before from a technical point of view. Cloudscheduler is a system that manages VM resources on demand on local and remote compute clouds depending on job requirements and makes those VMs available to HTCondor pools. Via cloud-init and yaml files, VMs can be provisioned depending on the needs of a VO.
        In this presentation, we will focus on our experience with the new Cloudscheduler running HTCondor jobs for HEP Grid jobs (Atlas and Belle-II), Astronomy (Dune) and HEP non-Grid jobs (BaBar) in a distributed HTCondor environment from a user and administrator point of view. We will show how it integrates with an existing HTCondor system and how it can be used to extend an existing pool with cloud resources when needed, for example in times of high demand or during down times of bare metal worker nodes, and how the system usage is monitored.

        Speaker: Dr Marcus Ebert (University of Victoria)
    • 3:20 PM
      Coffee break 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
    • Computing and Batch Services 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E
      Conveners: Michel Jouvin (Université Paris-Saclay (FR)), Dr Michele Michelotto (Universita e INFN, Padova (IT))
      • 39
        Towards energy efficient compute clusters at DESY

        Environmental and political constraints have made energy usage a top priority. As scientific computing draws significant power, sites have to adapt to the changing conditions and need to optimize their clusters utilization and energy consumption.
        We present our current status in our endeavour to make DESY's compute clusters more energy efficient. With a broad mix of different compute users with various use cases as well as overall power consumers on site, DESY faces a number of different power usage profiles but also the opportunity to integrate these in a more power efficient holistic approach. For example, age dependent load shedding of worker nodes combined with opportunistic utilization of unallocated compute resources will allow for an breathing adaptation to the dynamic green energy from off-shore wind energy farms.

        Speakers: Andreas Haupt (DESY), Christoph Beyer, Kai Leffhalm (Deutsches Elektronen-Synchrotron (DE)), Krunoslav Sever (Deutsches Elektronen-Synchrotron DESY), Thomas Hartmann (Deutsches Elektronen-Synchrotron (DE)), Yves Kemp
      • 40
        Expand local cluster to the worker node of the remote site

        The Large High Altitude Air Shower Observatory (LHAASO) is a large-scale astrophysics experiment led by China. The whole experiment data is stored at the Institute of High Energy Physics(IHEP) local EOS file system and processed by the IHEP local HTCondor cluster. Since the experiment data has been increased rapidly, the CPU cores of the local cluster are not enough to support the data processing.
        As the LHAASO experimental cooperation groups’ resources are located geographically and most of them have the characteristics of limited scale, low stability, and lack of human support, it is difficult to integrate them via Grid. We designed and developed a system to expand LHAASO local cluster to the remote site. The system keeps the IHEP cluster as the main cluster and extends the cluster to the worker nodes of the remote site based on ” HTCondor startd automatic cluster joining”. LHAASO jobs are submitted to the IHEP cluster and are dispatched to the remote worker node in the system. We classified LHAASO job into several types and wrapped by the dedicated script which make the job have no direct access to IHEP local file system. User’s token is wrapped and transferred with the job to the remote worker nodes.About 125 worker nodes with 4k CPU cores at the remote site have been joined into IHEP LHAASO cluster till now and have produced 700TB simulation data in 6 months.

        Speaker: 石京燕 shijy
    • Miscellaneous 1F Conference Room

      1F Conference Room

      Research Center for Environmental Changes (RCEC), Academia Sinica

      128 Academia Road, Section 2 Nankang, Taipei 11529 Taiwan 25°2′45″N 121°36′37″E