HEPiX Spring 2025 Workshop

Name: HEPiX Spring 2025 Workshop
Start: 2025-03-31T08:00:00+02:00
End: 2025-04-04T14:30:00+02:00
Location: Hotel De La Paix

31 March 2025 to 4 April 2025

Hotel De La Paix

Europe/Zurich timezone

Organisers

hepix-2025spring-support@hepix.org

Contribution List

32. Welcome talk

Pablo Fernandez

31/03/2025, 09:00

Welcome

33. Logistics talk

Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))

31/03/2025, 09:15

Welcome

86. CSCS site report

Miguel Gila (ETH Zurich)

31/03/2025, 09:30

Site Reports

CSCS will provide all updates as Tier 2

81. KEK Site Report

Ryo Yonamine (KEK)

31/03/2025, 09:45

Site Reports

KEK is promoting various accelerator science projects by fully utilizing the electron accelerator in Tsukuba and the proton accelerator in Tokai.
These projects require a large amount of data processing, and our central computing system, KEKCC, takes a key role in their success. KEKCC
also has an aspect that works as part of the Grid system, which is essential to the Bell II project.
We...

44. NCG-INGRID-PT site report

Jorge Gomes (LIP)

31/03/2025, 10:00

Site Reports

Evolution of the NCG-INGRID-PT site and future perspectives.

29. NDGF Site Report

Mattias Wadenstein (University of Umeå (SE))

31/03/2025, 10:15

Site Reports

New development in the distributed Nordic Tier-1 and it's participant sites.

85. DESY site report

Yves Kemp

31/03/2025, 11:00

Site Reports

DESY site report

83. BNL Site Report

Ofer Rind (Brookhaven National Laboratory)

31/03/2025, 11:15

Site Reports

An update on recent developments at the Scientific Computing and Data Facilities (SCDF) at BNL.

62. US ATLAS SouthWest Tier2 Site Report

Horst Severini (University of Oklahoma (US))

31/03/2025, 11:30

Site Reports

An update on recent advancements at the US ATLAS SouthWest Tier2 Center (UTA/OU).

50. Introduction of LHCb Tier2 Site at Lanzhou University

Dong Xiao (Lanzhou University)

31/03/2025, 11:45

Site Reports

This report introduce the LHCb Tier-2 site at Lanzhou University(LZU-T2), which is a major new computing resource designed to support the LHCb experiment. It is part of the Worldwide LHC Computing Grid, which distributes data processing and storage across a network of international computing centers. The LZU-T2 site plays a critical role in processing, analyzing, and storing the vast amounts...

11. CERN site report

Elvin Alin Sindrilaru (CERN)

31/03/2025, 12:00

Site Reports

News from CERN since the last HEPiX workshop. This talk gives a general update from services in the CERN IT department.

10. EOS latest developments and operational experience

Elvin Alin Sindrilaru (CERN)

31/03/2025, 13:30

Storage & data management

EOS is an open-source storage system developed at CERN that is used as the main platform to store LHC data. The architecture of the EOS system has evolved over the years to accommodate ever more diverse use-cases and performance requirements coming both from the LHC experiments as well as from the user community running their analysis workflows on top of EOS. In this presentation, we discuss...

7. Design and production experience with a multi-petabyte file-system backup service at CERN

Roberto Valverde Cameselle (CERN)

31/03/2025, 13:50

Storage & data management

We report on our experience with the production backup orchestration via “cback”, a tool developed at CERN and used to back up our primary mounted filesystem offerings: EOS (eosxd) and Ceph (CephFS). In a storage system that handles non-reproducible data, a robust backup and restore system is essential for effective disaster recovery and business continuity. When designing a backup solution,...

12. Refurbishing the Meyrin Data Centre: Storage Juggling and Operations

Octavian-Mihai Matei

31/03/2025, 14:10

Storage & data management

The 50-year-old Meyrin Data Centre (MDC), still remains indispensable due to its strategic geographical location and unique electrical power resilience even if CERN IT recently commissioned the Prévessin Data Centre (PDC), doubling the organization’s hosting capacity in terms of electricity and cooling. The Meyrin Data Centre (Building 513) retains an essential role for the CERN Tier-0 Run 4...

9. A Distributed Storage Odyssey: from CentOS7 to ALMA9

Cedric Caffy (CERN)

31/03/2025, 14:30

Storage & data management

On the 30th of June 2024, the end of CentOS 7 support marked a new era for the operation of the multi-petabytes distributed disk storage system used by CERN physics experiments. The EOS infrastructure at CERN is composed of aproximately 1000 disk servers and 50 metadata management nodes. Their transition from CentOS 7 to Alma 9 was not as straightforward as anticipated.

This presentation...

65. Label-based Virtual Directories In dCache

Marina Sahakyan

31/03/2025, 14:50

Storage & data management

Traditional filesystems organize data into directories based on a single criterion, such as the starting date of the experiment, experiment name, beamline ID, measurement device, or instrument. However, each file within a directory can belong to multiple logical groups, such as a special event type, experiment condition, or part of a selected dataset. dCache, a storage system designed to...

66. NVMe-HDD Solution-Level Usage Models, Features and Advantages

Hugo Bergmann (Seagate Technology), Mohamad El-Batal

31/03/2025, 15:10

Storage & data management

The NVMe HDD Specification were released back in 2022, but only very early Engineering Demo Units have been created so far from a single source. That said, the market demand is definitely growing, and the industry must pay attention to the potential TCO and storage stack optimizations that a unified NVMe storage interface could offer. In this session, we will go over the TCO analysis details...

38. Updates on CPUs, GPUs and AI accelerators

Dr Michele Michelotto (Universita e INFN, Padova (IT))

31/03/2025, 16:00

Environmental sustainability, business continuity, and Facility improvement

In this presentation we try to give an update on CPU, GPU and AI accelerators in the market today.

40. Trends of Energy Efficiency for computing and data centre

Eric Yen (Academia Sinica (TW))

31/03/2025, 16:20

Environmental sustainability, business continuity, and Facility improvement

The objective of this talk is to share the tentative plan of energy efficiency status review by the TechWatch WG. Progress of the primary tasks such as reviewing and understanding of the trends of industry, market and technology, efforts of WLCG and sites, as well as the strategy from measurement/ data collection/ analysis/ modeling/ to estimation will be shared. Through this report, the...

60. Nikhef is renovated. So what to do with our new meetingrooms technology-wise?

Tristan Suerink

31/03/2025, 16:40

Environmental sustainability, business continuity, and Facility improvement

Nikhef has recently renovated their building and upgraded almost everything to the latest standards. Including the Audio/Video setup in the new meetingrooms.

This talk will give an insight in the proces from choosing which technologies and tendering to installation, testing and getting everything working. What went wrong and what not. How you would think that 4K 60Hz is easy these days. Why...

70. Cherenkov Telescope Array Observatory

Andrii Neronov (EPFL and APC Paris)

01/04/2025, 09:00

Miscellaneous

Science talk

Cherenkov Telescope Array Observatory (CTAO) is a next-generation ground-based gamma-ray astronomical observatory that is in construction phase on two sites: in Northern and Southern hemispheres. CTAO telescopes use the atmosphere as giant detector of high-energy particles. CTAO data contain "events" of extensive air showers of high-energy particles. Most of the showers are induced by charged...

49. IHEP site report

Chaoqi Guo (Institute of High Energy Physics of the Chinese Academy of Sciences)

01/04/2025, 09:30

Site Reports

The progress and status of IHEP site since last Hepix.

73. RAL Site Report

Martin Bly (STFC-RAL)

01/04/2025, 09:45

Site Reports

An update on activities at RAL

77. CTAO Swiss Data Center and Data Processing and Preservation System Deployment Strategy

Dr Volodymyr Savchenko (EPFL, Switzerland)

01/04/2025, 10:00

Site Reports

The Cherenkov Telescope Array Observatory (CTAO) is the next-generation gamma-ray telescope facility, currently under construction.

The CTAO recently reached a set of crucial milestones: it has been established as a European Research Infrastructure Consortium (ERIC), all four Large-Sized Telescopes at the northern site of the Observatory reached key construction milestones, and the first...

59. Infrastructure Monitoring for GridKa and beyond

Evelina Buttitta (Karlsruhe Institute of Technology (KIT))

01/04/2025, 10:15

Software and Services for Operation

The Infrastructure Monitoring helps to control and monitor in real-time servers and applications involved in the operation of the WLCG Tier1 center GridKa, including the online and tape storages, the batch system and the GridKa network.
Monitoring data like server metrics (CPU, Memory, Disk, Network), storage operations (I/O Statistics) or visualizing real-time sensors data such as...

61. CSCS Sustainability: Utilizing Lake Water for Cooling and Reusing Waste Heat

Mr Tiziano Belotti (CSCS)

01/04/2025, 11:00

Environmental sustainability, business continuity, and Facility improvement

The Swiss National Supercomputing Centre (CSCS) is committed to sustainable high-performance computing. This talk will explore how CSCS leverages lake water for efficient cooling, significantly reducing energy consumption. Additionally, we will discuss the reuse of waste heat to support local infrastructure, demonstrating a practical and efficient approach to sustainability in supercomputing.

18. Natural job drainage and power reduction studies in PIC Tier-1 using HTCondor

Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))

01/04/2025, 11:20

Environmental sustainability, business continuity, and Facility improvement

This study presents analyses of natural job drainage and power reduction patterns in the PIC Tier-1 data center, which uses HTCondor for workload scheduling. By examining historical HTCondor logs from 2023 and 2024, we simulate natural job drainage behaviors, in order to understand natural job drainage patterns: when jobs naturally conclude without external intervention. These findings provide...

23. Transforming the Disaster Recovery of the Cloud Service

Varsha Bhat

01/04/2025, 11:40

Environmental sustainability, business continuity, and Facility improvement

The CERN Cloud Infrastructure Service provides access to large compute and storage resources for the laboratory that includes virtual and physical machines, volumes, fileshares, loadbalancers, etc. across 2 different datacenters. With the recent addition of the Prevessin Data Center, one of the main objectives of the CERN IT Department is to ensure that all services have up-to-date procedures...

42. Update on Energy Efficiency: AmpereOne and Turin

David Britton (University of Glasgow (GB))

01/04/2025, 12:00

Environmental sustainability, business continuity, and Facility improvement

Extending the data presented at the last few HEPiX workshops, we present new measurements on the energy efficiency (HEPScore/Watt) of the recently available AmpereOne-ARM and AMD Turin-x86 machines.

90. Introduction to HEPIX-OTF Topical Session

Alessandro Di Girolamo (CERN), Helge Meinhard (CERN), James Letts (Univ. of California San Diego (US))

01/04/2025, 13:30

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

91. Data challenges (DOMA + capacity vs performance)

Shawn Mc Kee (University of Michigan (US))

01/04/2025, 13:35

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

36. HEPiX Technology Watch Working Group Report

Dr Andrea Sciabà (CERN)

01/04/2025, 13:55

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

The Technology Watch Working Group, established in 2018 to take a close look at the evolution of the technology relevant to HEP computing, has resumed its activities after a long pause. In this report, we provide an overview of the hardware technology landscape and some recent developments, highlighting the impact on the HEP computing community.

93. Italy vision

Daniele Spiga (Universita e INFN, Perugia (IT))

01/04/2025, 14:20

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

37. IDAF @ DESY: Interdisciplinary Data and Analysis Facility: Status and Plans

Yves Kemp (Deutsches Elektronen-Synchrotron (DE))

01/04/2025, 14:50

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

DESY operates the IDAF (Interdisciplinary Data and Analysis Facility) for all science branches: high energy physics, photon science, and accelerator R&D and operations.
The NAF (National Analysis Facility) is an integrated part, and acts as an analysis facility for the German ATLAS and CMS community as well as the global BELLE II community since 2007.
This presentation will show the current...

94. German University Tier-2s evolution

Michael Boehler (University of Freiburg (DE))

01/04/2025, 15:50

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

Transition of German University Tier-2 Resources to HPC Compute and Helmholz Storage

The March 2022 perspective paper of the German Committee for Elementary Particle Physics proposes a transformation of the provision of computing resources in Germany. In preparation for the HL-LHC, the German university Tier 2 centres are to undergo a transition towards a more resource-efficient and...

95. Evolution of US ATLAS Sites

Ofer Rind (Brookhaven National Laboratory)

01/04/2025, 16:20

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

97. "Round table" - Facilities in WLCG Technical Roadmap

01/04/2025, 16:50

Mid-long term evolution of facilities (Topical Session)

Mid-long term evolution of facilities (Topical Session with WLCG OTF)

69. The Big Data Challenge of Radio Astronomy

Dr Emma Elizabeth Tolley

02/04/2025, 09:00

Miscellaneous

Science talk

Radio astronomers are engaged in an ambitious new project to detect faster, fainter, and more distant astrophysical phenomena using thousands of individual radio receivers linked through interferometry. The expected deluge of data (up to 300 PB per year) poses a significant computational challenge that requires rethinking and redesigning the state-of-the-art data analysis pipelines.

51. Kubernetes and Cloud Native at the SKA Regional Centres

Lukas Gehrig (FHNW)

02/04/2025, 09:30

Cloud Technologies, Virtualization & Orchestration, Operating Systems

The SKA Observatory is expected to be producing up to 600 petabytes of scientific data per year, which would set a new record in data generation within the field of observational astronomy. The SRCNet infrastructure is meant for handling these large volumes of astronomy data, which requires a global network of distributed regional centres for the data- and compute-intensive astronomy use...

57. Cloud-native ATLAS T2 on Kubernetes

Ryan Taylor (University of Victoria (CA))

02/04/2025, 09:50

Cloud Technologies, Virtualization & Orchestration, Operating Systems

The University of Victoria operates a scientific OpenStack cloud for Canadian researchers, and the CA-VICTORIA-WESTGRID-T2 grid site for the ATLAS experiment at CERN. We are shifting both of these service offerings towards a Kubernetes-based approach. We have exploited the batch capabilities of Kubernetes to run grid computing jobs and replace the conventional grid computing elements by...

39. Hyper-converged cloud infrastructure at CSCS

Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre)), Elia Oggian (ETH Zurich (CH))

02/04/2025, 10:10

Cloud Technologies, Virtualization & Orchestration, Operating Systems

This presentation provides a detailed overview of the hyper-converged cloud infrastructure implemented at the Swiss National Supercomputing Centre (CSCS). The main objective is to provide a detailed overview of the integration between Kubernetes (RKE2) and ArgoCD, with Rancher acting as a central tool for managing and deploying RKE2 clusters infrastructure-wide.

Rancher is used for direct...

19. CERN Prevessin Datacentre network - Overview and feedback after one year in production.

Vincent Ducret (CERN)

02/04/2025, 11:00

Networking & Security

Network & Security

This presentation will explain the network design implemented in the CERN Prévessin Datacentre (built in 2022/2023, in production since February 2024). We will show how, starting from an empty building, the current network best practices could be adopted (and partly adapted to match the specific requirements in term of interconnection with the rest of CERN network). We will also provide...

79. Cybersecurity at the Speed of HPC - Monitoring and Incident Response

Mr Fabio Zambrino (CSCS)

02/04/2025, 11:20

Networking & Security

Network & Security

High-Performance Computing (HPC) environments demand extreme speed and efficiency, making cybersecurity particularly challenging. The need to implement security controls without compromising performance presents a unique dilemma: how can we ensure robust protection while maintaining computational efficiency?
This presentation will give an insight into real-world challenges and measures...

5. Improving CERN's security with an Endpoint Detection and Response Solution

Alexandros Petridis

02/04/2025, 11:40

Networking & Security

Network & Security

The deployment of an Endpoint Detection & Response (EDR) solution at CERN has been a project aimed at enhancing the security posture of endpoint devices. In this presentation we’ll share our infrastructure's architecture and how we rolled out the solution. We will also see how we addressed and overcome challenges on multiple fronts from administrator’s fears to fine-tuning detections and...

54. Computer Security Update

Jose Carlos Luna Duran (CERN)

02/04/2025, 11:55

Networking & Security

Network & Security

This presentation aims to give an update on the global security landscape from the past year. The global political situation has introduced a novel challenge for security teams everywhere. What's more, the worrying trend of data leaks, password dumps, ransomware attacks and new security vulnerabilities does not seem to slow down.

We present some interesting cases that CERN and the wider HEP...

8. How CERN’s New Datacenter Enhances Cloud Infrastructure and Data Resilience with Ceph

Roberto Valverde Cameselle (CERN)

02/04/2025, 13:30

Storage & data management

The storage needs of CERN’s OpenStack cloud infrastructure are fulfilled by Ceph, which provides diverse storage solutions including volumes with Ceph RBD, file sharing through CephFS, and S3 object storage via Ceph RadosGW. The integration between storage and compute resources is possible thanks a to close collaboration between OpenStack and Ceph teams. In this talk we review the architecture...

20. CERN update on tape technology

Vladimir Bahyl (CERN)

02/04/2025, 13:45

Storage & data management

This presentation with start with the evolution of the tape technology market in the recent years and the expectations from the INSIC roadmap.

From there, with LHC now in the middle of Run 3, we will reflect on the evolution of our capacity planning vs. increasing storage requirements of the experiments. We will then describe our current tape hardware setup and present our experience with...

17. Evolution of Continuous Integration for the CERN Tape Archive (CTA)

Niels Alexander Buegel

02/04/2025, 14:10

Storage & data management

The CERN Tape Archive (CTA) software is used for physics archival at CERN and other scientific institutes. CTA’s Continuous Integration (CI) system has been around since the inception of the project, but over time several limitations have become apparent. The migration from CERN CentOS 7 to Alma 9 introduced even more challenges. The CTA team took this as an opportunity to make significant...

16. The CERN Tape Archive Beyond CERN

Niels Alexander Buegel

02/04/2025, 14:30

Storage & data management

The CERN Tape Archive (CTA) is CERN’s Free and Open Source Software system for data archival to tape. Across the Worldwide LHC Computing Grid (WLCG), the tape software landscape is quite heterogeneous, but we are entering a period of consolidation. A number of sites have reevaluated their options and have chosen CTA for their tape archival storage needs. To facilitate this, the CTA team have...

52. Storage Technology Outlook

Ed Childers (SpectraLogic)

02/04/2025, 14:50

Storage & data management

Storage Technology Outlook
The rapid growth of data has outpaced traditional hard disk drive (HDD) scaling, leading to challenges in cost, capacity, and sustainability. This presentation examines the trends in storage technologies highlighting the evolving role of tape technology in archive solutions. Unlike HDDs, tape continues to scale without hitting fundamental physics barriers, offering...

67. Online Seamless HDD Self-Healing Options & Capabilities

Curtis Stevens, Hugo Bergmann (Seagate Technology)

02/04/2025, 15:10

Storage & data management

o The most common mechanical failures in today's modern HDDs in the datacenter are no longer due to motor/actuator failures of head crashes. The great majority of these failures are due to Writer head degradation with time and heat, a small minority to Reader failures and a very small number of failures are due to other causes. The scope of this presentation is to present and discuss the...

21. HEPiX Benchmarking Working Group Report

Domenico Giordano (CERN)

02/04/2025, 16:00

Computing & Batch Services

Computing and Batch Services

The Benchmarking Working Group (WG) has been actively advancing the HEP Benchmark Suite to meet the evolving needs of the Worldwide LHC Computing Grid (WLCG). This presentation will provide a comprehensive status report on the WG’s activities, highlighting the intense efforts to enhance the suite’s capabilities with a focus on performance optimization and sustainability.

In response to...

14. Continuous calibration and monitoring of WLCG site corepower with HEPScore23

Natalia Diana Szczepanek (CERN)

02/04/2025, 16:25

Computing & Batch Services

Computing and Batch Services

The performance score per CPU core — corepower — reported annually by WLCG sites is a critical metric for ensuring reliable accounting, transparency, trust, and efficient resource utilization across experiment sites. It is therefore essential to compare the published CPU corepower with the actual runtime corepower observed in production environments. Traditionally, sites have reported annual...

88. ARC 7 - new ARC major release - and future plans

Mattias Wadenstein (University of Umeå (SE))

02/04/2025, 16:45

Computing & Batch Services

Computing and Batch Services

The Nordugrid Advanced Resource Connector Middleware (ARC) will manifest itself as ARC 7 this spring, after a long release preparation process. ARC 7 represents a significant advancement in the evolution of the Advanced Resource Connector Middleware, building upon elements introduced in the ARC 6 release from 2019, and refined over the subsequent years.

This new version consolidates...

48. MTCA starterkits of powerBridge

Thomas Holzapfel

02/04/2025, 17:05

Computing & Batch Services

Computing and Batch Services

MTCA starterkits next step evolution

In this presentation, you will learn more about the powerBridge starterkits. The starterkits from powerBridge do include MTCA.0, Rev. 3 changes as well as new exciting products, including payload cards and are available in different sizes and flavours. They do allow an easy jumpstart for new MTCA users.

24. Windows device management at CERN: A new era

Siavas Firoozbakht (CERN)

02/04/2025, 17:20

Software and Services for Operation

More than 10,000 Windows devices are managed by the Windows team and delegated administrators at CERN. Ranging from workstations on which scientists run heavy simulation software, to security-hardened desktops in the administrative sector and Windows Servers that manage some of the most critical systems in the Organisation – today these systems are managed using a unified MDM solution named...

4. Keeping the LHC colliding: Providing Extended Lifecycle support for EL7

Ben Morrice (CERN)

03/04/2025, 09:00

Cloud Technologies, Virtualization & Orchestration, Operating Systems

The operation of the Large Hadron Collider (LHC) is critically dependent on several hundred Front-End Computers (FECs), that manage all facets of its internals. These custom systems were not able to be upgraded during the long shutdown (LS2), and with the coinciding end-of-life of EL7 of 30.06.2024, this posed a significant challenge to the successful operation of Run 3.

This presentation...

22. Roadmap to LS3: CERN’s Linux Strategy

Ben Morrice (CERN)

03/04/2025, 09:20

Cloud Technologies, Virtualization & Orchestration, Operating Systems

As CERN prepares for the third Long Shutdown (LS3), its evolving Linux strategy is critical to maintaining the performance and reliability of its infrastructure. This presentation will outline CERN’s roadmap for Linux leading up to LS3, highlighting the rollout of RHEL and AlmaLinux 10 to ensure stability and adaptability within the Red Hat ecosystem. In parallel, we will discuss efforts to...

87. Exploring SUSE Open-Source Technology for Your Datacenter

Mr Nikolaj Majorov (SUSE)

03/04/2025, 09:40

Cloud Technologies, Virtualization & Orchestration, Operating Systems

This talk provides an overview of SUSE’s open-source solutions for modern data centers. We will discuss how SUSE technologies support various workloads while leveraging open-source flexibility and security.
Topics include:
- OpenSUSE Linux – A secure and open Linux system designed for
high-performance workloads.
- Harvester Project – An open-source alternative for virtualization,
...

47. Update from the HEPiX IPv6 Working Group

Bruno Heinrich Hoeft (KIT - Karlsruhe Institute of Technology (DE))

03/04/2025, 10:00

Networking & Security

Network & Security

The HEPiX IPv6 Working Group has been encouraging the deployment of IPv6 in WLCG and elsewhere for many years. At the last HEPiX meeting in November 2024 we reported on the status of our GGUS ticket campaign for WLCG sites to deploy dual-stack computing elements and worker nodes. Work on this has continued. We have also continued to monitor the use of IPv4 and IPv6 on the LHCOPN, with the aim...

71. Activities Update from the Research Networking Technical Working Group

Shawn Mc Kee (University of Michigan (US))

03/04/2025, 11:00

Networking & Security

Network & Security

The high-energy physics community, along with the WLCG sites and Research and Education (R&E) networks have been collaborating on network technology development, prototyping and implementation via the Research Networking Technical working group (RNTWG) since early 2020. The group is focused on three main areas: Network visibility, network optimization and network control and management....

72. WLCG Network Monitoring Infrastructure and perfSONAR Evolution

Shawn Mc Kee (University of Michigan (US))

03/04/2025, 11:20

Networking & Security

Network & Security

The WLCG Network Throughput Working Group along with its collaborators in OSG, R&E networks and the perfSONAR team have collaboratively operated, managed and evolved a network measurement platform based upon the deployment of perfSONAR toolkits at WLCG sites worldwide.

This talk will focus on the status of the joint WLCG and IRIS-HEP/OSG-LHC infrastructure, including the resiliency and...

13. Single Sign-On Evolution at CERN

Paul Van Uytvinck (CERN)

03/04/2025, 11:40

Networking & Security

Network & Security

The Single Sign-On (SSO) service at CERN has undergone a significant evolution over recent years, transitioning from a Puppet-hosted solution to a Kubernetes-based infrastructure. Since September 2023, the current team has focused on cementing SSO as a stable and reliable cornerstone of CERN's IT services. Effort was concentrated on implementing best practices in service management - a mid...

34. Network design and implementation status of HEPS

曾珊 zengshan

03/04/2025, 12:00

Networking & Security

Network & Security

Introduce the network architecture design of HEPS, including the general network, production network and data center network and etc.
The running status for all the network parts will also be described.

1. Summary of the 2024 autumns european HTC workshop

Christoph Beyer

03/04/2025, 13:30

Computing & Batch Services

Computing and Batch Services

The tenth european HTCondor workshop took place at NIKHEF Amsterdam autumn last year and as always covered most if not all aspects of up-to-date high throughput computing.

Here comes a short summary of the parts of general interest if you like :)

74. Current and Future Accounting with AUDITOR

Dirk Sammel (University of Freiburg (DE))

03/04/2025, 13:50

Computing & Batch Services

Computing and Batch Services

In the realm of High Throughput Computing (HTC), managing and processing large volumes of accounting data across diverse environments and use cases presents significant challenges. AUDITOR addresses this issue by providing a flexible framework for building accounting pipelines that can adapt to a wide range of needs.
At its core, AUDITOR serves as a centralized storage solution for...

82. Provisioning and Usage of GPUs at GridKa

Matthias Jochen Schnepf

03/04/2025, 14:05

Computing & Batch Services

Computing and Batch Services

For years, GPUs have become increasingly interesting for particle physics. Therefore, GridKa provides some GPU machines to the Grid and the particle physics institute at KIT.
Since GPU usage and provisioning differ from CPUs, some development on the provider and user side is necessary.
The provided GPUs allow the HEP community to use GPUs in the Grid environment and develop solutions for...

75. User and "queue" caps in HTCondor

Jeff Templon

03/04/2025, 14:20

Computing & Batch Services

Computing and Batch Services

At Nikhef, we've based much of our "fairness" policy implementation around User, group, and job-class (e.g. queue) "caps", that is, setting upper limits on the number of simultaneous jobs (or used cores). One of the main use cases for such caps is to prevent one or two users from acquiring the whole cluster for days at a time, blocking all other usage.

When we started using HTCondor, there...

53. Efficiency of job processing in many-core Grid and HPC environments

Gianfranco Sciacca (Universitaet Bern (CH))

03/04/2025, 14:35

Computing & Batch Services

Computing and Batch Services

Developments in microprocessor technology have confirmed the trend towards higher core-counts and decreased amount of memory per core, resulting in major improvements in power efficiency for a given level of performance. Per node core-counts have increased significantly over the past five years for the x86_64 architecture, which is dominating in the LHC computing environment, and the higher...

46. Smart HPC-QC: flexible approaches for Quantum workloads integration

Mr Simone Rizzo (E4 COMPUTER ENGINEERING Spa)

03/04/2025, 14:55

Computing & Batch Services

Computing and Batch Services

Many efforts have tried to combine the HPC and QC fields, proposing integrations between quantum computers and traditional clusters. Despite these efforts, the problem is far from solved, as quantum computers face a continuous evolution. Moreover, nowadays, quantum computers are scarce compared to the traditional resources in the HPC clusters: managing the access from the HPC nodes is...

78. Preserving IT History for more than a decade at CC-IN2P3

Dr Fabien WERNLI

03/04/2025, 15:15

Miscellaneous

Since its launch in 2011 [CC-IN2P3's computer history museum][1] has been visited by 13'000 people. It is home to more than 1000 artefacts, among which are [France's first web server][2] and a mysterious french micro-computer called the [CHADAC][3].
We will demonstrate through our experience and several examples that physical and digital preservation of IT infrastructure components while...

55. High-performance end-user analysis code; an example

Dr Daniël Geerts (Nikhef)

03/04/2025, 16:00

Miscellaneous

With an increasing focus on green computing, and with the high luminosity LHC fast approaching, we need every bit of extra throughput that we can get. In this talk, I'll be exploring my old ATLAS analysis code, as an example of how improvements to end-user code can significantly better performance. Not only does this result in a more efficient utilisation of the available resources, it also...

58. CC-IN2P3 user documentation

Gino MARCHETTI

03/04/2025, 16:20

Miscellaneous

CC-IN2P3 provides storage and computing resources to around 2,700 users. Other services reach an even larger community, such as GitLab and its 10,000 users. It is therefore vital for the CC-IN2P3 to provide an accurate user documentation.

In this presentation, we'll give an experience feedback of five years managing CC-IN2P3 user documentation. We will begin outlining outline the reasons...

41. RAL use case of XRootD Managers

Thomas Jyothish (STFC)

03/04/2025, 16:35

Miscellaneous

RAL makes use of the XRootD Cluster Management System to manage our
XRootD server frontends for disk based storage (ECHO).
In this session, I'll give an overview of our configuration, custom scripts used and observations on its interaction on different setups.

76. JUNO Distributed Computing Infrastructure and Services Monitoring System

Xuantong Zhang (Institute of High Enegry Physics, Chinese Academy of Sciences (CN))

04/04/2025, 09:00

Software and Services for Operation

JUNO is an international collaborative neutrino experiment located in Kaiping City, southern China. The JUNO experiment employs a WLCG-based distributed computing system for official data production. The JUNO distributed computing sites are from China, Italy, France, and Russia. To monitor the operational status of the distributed computing sites and other distributed computing services, as...

80. Integrated Configuration Management at Karlsruhe Institute of Technology (KIT)

Nico Schlitter (Karlsruhe Institute of Technology (DE)))

04/04/2025, 09:20

Software and Services for Operation

At KIT we operate more than 800 hosts to run the Large Scale Data Facility (LSDF) and the WLCG Tier1 center GridKa. Thereby, our Config Management efforts aim for a reliable, consistent and reproducible host deployment which allows for unattended mass deployment of stateless machines like the GridKa Compute Farm. In addition, our approach supports efficient patch management to tackle security...

56. From Batch to Interactive: The "INK" for High Energy Physics Data Analysis at IHEP

Dr Jingyan Shi (IHEP)

04/04/2025, 09:35

Software and Services for Operation

IHEP computing platform faces new requirement in data analysis, including limited access to login nodes, increasing demand for code debugging tools, and efficient data access for collaborative workflows.. We have developed an Interactive aNalysis workbench (INK), a web-based platform leveraging the HTCondor cluster. This platform transforms traditional batch-processing resources into a...

15. Grafana dashboards as code with Jsonnet

Ewoud Ketele (CERN)

04/04/2025, 09:55

Software and Services for Operation

Grafana dashboards are easy to make but hard to maintain. Since changes can be made easily, the questions that remain are how to avoid changes that overwrite other work? How to keep track of changes? And how to communicate these to the user? Another question that pops up frequently is how to apply certain changes consistently to multiple visualizations and dashboards. One partial solution is...

63. MarmotGraph @ CSCS - A knowledge graph for linked HPC data

Oliver Schmid

04/04/2025, 10:15

Software and Services for Operation

A High-Performance Computing (HPC) center typically consists of various domains. From the physical world (hardware, power supplies, etc.) up to highly abstracted and virtualized, dynamic execution environments (cloud infrastructures, software, and service dependencies, central services, etc.). The tools used to manage those different domains are as heterogeneous as the domains themselves....

6. Managing Microsoft SQL Infrastrucuture at CERN

Ricardo Martins Goncalves

04/04/2025, 11:00

Miscellaneous

For its operations, CERN depends on an extensive range of applications, achievable only through the use of diverse technologies, including more than one relational database management system (RDBMS). This presentation provides an overview of CERN’s Microsoft SQL Server (MSSQL) infrastructure, highlighting how we manage servers and design solutions for large-scale databases with different...

84. PIC's Big Data Analysis Facility

Francesc Torradeflot

04/04/2025, 11:15

Miscellaneous

The Port d'Informació Científica (PIC) provides advanced data analysis services to a diverse range of scientific communities.

This talk will detail the status and evolution of PIC's Big Data Analysis Facility, centered around its Hadoop platform. We will describe the architecture of the Hadoop cluster and the services running on top, including CosmoHub, a web application that exemplifies...

35. Machine learning for developers and administrators

Andrey Shevel (Petersburg Nuclear Physics Institute, University of Information Technology, Mechanics and Optics)

04/04/2025, 11:30

Miscellaneous

Developing and managing computing systems is complex due to rapidly changing technology, evolving requirements during development, and ongoing maintenance throughout their lifespan. Significant post-deployment maintenance includes troubleshooting, patching, updating, and modifying components to meet new features or security needs. Investigating unusual events may involve reviewing system...

89. Enabling Accessibility to CERN audiovisual content via Automated Speech Recognition

Ruben Domingo Gaspar Aparicio (CERN)

04/04/2025, 11:45

Miscellaneous

A key stepping stone in promoting diversity and accessibility at CERN consists in providing users with subtitles for all CERN-produced multimedia content. Subtitles not only enhance accessibility for individuals with impairments and non-native speakers but also make what would otherwise be opaque content fully searchable. The “Transcription and Translation as a Service” (TTaaS) project [1]...

31. Wrap-up

Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))

04/04/2025, 12:00

Wrap-up

Choose timezone

HEPiX Spring 2025 Workshop

Organisers