On the 30th of June 2024, the end of CentOS 7 support marked a new era for the operation of the multi-petabytes distributed disk storage system used by CERN physics experiments. The EOS infrastructure at CERN is composed of aproximately 1000 disk servers and 50 metadata management nodes. Their transition from CentOS 7 to Alma 9 was not as straightforward as anticipated.
This presentation...
This presentation will explain the network design implemented in the CERN Prévessin Datacentre (built in 2022/2023, in production since February 2024). We will show how, starting from an empty building, the current network best practices could be adopted (and partly adapted to match the specific requirements in term of interconnection with the rest of CERN network). We will also provide...
News from CERN since the last HEPiX workshop. This talk gives a general update from services in the CERN IT department.
This presentation with start with the evolution of the tape technology market in the recent years and the expectations from the INSIC roadmap.
From there, with LHC now in the middle of Run 3, we will reflect on the evolution of our capacity planning vs. increasing storage requirements of the experiments. We will then describe our current tape hardware setup and present our experience with...
The performance score per CPU core — corepower — reported annually by WLCG sites is a critical metric for ensuring reliable accounting, transparency, trust, and efficient resource utilization across experiment sites. It is therefore essential to compare the published CPU corepower with the actual runtime corepower observed in production environments. Traditionally, sites have reported annual...
We report on our experience with the production backup orchestration via “cback”, a tool developed at CERN and used to back up our primary mounted filesystem offerings: EOS (eosxd) and Ceph (CephFS). In a storage system that handles non-reproducible data, a robust backup and restore system is essential for effective disaster recovery and business continuity. When designing a backup solution,...
Developments in microprocessor technology have confirmed the trend towards higher core-counts and decreased amount of memory per core, resulting in major improvements in power efficiency for a given level of performance. Per node core-counts have increased significantly over the past five years for the x86_64 architecture, which is dominating in the LHC computing environment, and the higher...
EOS is an open-source storage system developed at CERN that is used as the main platform to store LHC data. The architecture of the EOS system has evolved over the years to accommodate ever more diverse use-cases and performance requirements coming both from the LHC experiments as well as from the user community running their analysis workflows on top of EOS. In this presentation, we discuss...
The CERN Tape Archive (CTA) software is used for physics archival at CERN and other scientific institutes. CTA’s Continuous Integration (CI) system has been around since the inception of the project, but over time several limitations have become apparent. The migration from CERN CentOS 7 to Alma 9 introduced even more challenges. The CTA team took this as an opportunity to make significant...
Grafana dashboards are easy to make but hard to maintain. Since changes can be made easily, the questions that remain are how to avoid changes that overwrite other work? How to keep track of changes? And how to communicate these to the user? Another question that pops up frequently is how to apply certain changes consistently to multiple visualizations and dashboards. One partial solution is...
The Benchmarking Working Group (WG) has been actively advancing the HEP Benchmark Suite to meet the evolving needs of the Worldwide LHC Computing Grid (WLCG). This presentation will provide a comprehensive status report on the WG’s activities, highlighting the intense efforts to enhance the suite’s capabilities with a focus on performance optimization and sustainability.
In response to...
The Technology Watch Working Group, established in 2018 to take a close look at the evolution of the technology relevant to HEP computing, has resumed its activities after a long pause. In this report, we provide an overview of the hardware technology landscape and some recent developments, highlighting the impact on the HEP computing community.
The storage needs of CERN’s OpenStack cloud infrastructure are fulfilled by Ceph, which provides diverse storage solutions including volumes with Ceph RBD, file sharing through CephFS, and S3 object storage via Ceph RadosGW. The integration between storage and compute resources is possible thanks a to close collaboration between OpenStack and Ceph teams. In this talk we review the architecture...
This presentation provides a detailed overview of the hyper-converged cloud infrastructure implemented at the Swiss National Supercomputing Centre (CSCS). The main objective is to provide a detailed overview of the integration between Kubernetes (RKE2) and ArgoCD, with Rancher acting as a central tool for managing and deploying RKE2 clusters infrastructure-wide.
Rancher is used for direct...
DESY operates the IDAF (Interdisciplinary Data and Analysis Facility) for all science branches: high energy physics, photon science, and accelerator R&D and operations.
The NAF (National Analysis Facility) is an integrated part, and acts as an analysis facility for the German ATLAS and CMS community as well as the global BELLE II community since 2007.
This presentation will show the current...
The progress and status of IHEP site since last Hepix.
The operation of the Large Hadron Collider (LHC) is critically dependent on several hundred Front-End Computers (FECs), that manage all facets of its internals. These custom systems were not able to be upgraded during the long shutdown (LS2), and with the coinciding end-of-life of EL7 of 30.06.2024, this posed a significant challenge to the successful operation of Run 3.
This presentation...
The SKA Observatory is expected to be producing up to 600 petabytes of scientific data per year, which would set a new record in data generation within the field of observational astronomy. The SRCNet infrastructure is meant for handling these large volumes of astronomy data, which requires a global network of distributed regional centres for the data- and compute-intensive astronomy use...
Developing and managing computing systems is complex due to rapidly changing technology, evolving requirements during development, and ongoing maintenance throughout their lifespan. Significant post-deployment maintenance includes troubleshooting, patching, updating, and modifying components to meet new features or security needs. Investigating unusual events may involve reviewing system...
This study presents analyses of natural job drainage and power reduction patterns in the PIC Tier-1 data center, which uses HTCondor for workload scheduling. By examining historical HTCondor logs from 2023 and 2024, we simulate natural job drainage behaviors, in order to understand natural job drainage patterns: when jobs naturally conclude without external intervention. These findings provide...
New development in the distributed Nordic Tier-1 and it's participant sites.
Introduce the network architecture design of HEPS, including the general network, production network and data center network and etc.
The running status for all the network parts will also be described.
RAL makes use of the XRootD Cluster Management System to manage our
XRootD server frontends for disk based storage (ECHO).
In this session, I'll give an overview of our configuration, custom scripts used and observations on its interaction on different setups.
The 50-year-old Meyrin Data Centre (MDC), still remains indispensable due to its strategic geographical location and unique electrical power resilience even if CERN IT recently commissioned the Prévessin Data Centre (PDC), doubling the organization’s hosting capacity in terms of electricity and cooling. The Meyrin Data Centre (Building 513) retains an essential role for the CERN Tier-0 Run 4...
As CERN prepares for the third Long Shutdown (LS3), its evolving Linux strategy is critical to maintaining the performance and reliability of its infrastructure. This presentation will outline CERN’s roadmap for Linux leading up to LS3, highlighting the rollout of RHEL and AlmaLinux 10 to ensure stability and adaptability within the Red Hat ecosystem. In parallel, we will discuss efforts to...
The Single Sign-On (SSO) service at CERN has undergone a significant evolution over recent years, transitioning from a Puppet-hosted solution to a Kubernetes-based infrastructure. Since September 2023, the current team has focused on cementing SSO as a stable and reliable cornerstone of CERN's IT services. Effort was concentrated on implementing best practices in service management - a mid...
Many efforts have tried to combine the HPC and QC fields, proposing integrations between quantum computers and traditional clusters. Despite these efforts, the problem is far from solved, as quantum computers face a continuous evolution. Moreover, nowadays, quantum computers are scarce compared to the traditional resources in the HPC clusters: managing the access from the HPC nodes is...
The tenth european HTCondor workshop took place at NIKHEF Amsterdam autumn last year and as always covered most if not all aspects of up-to-date high throughput computing.
Here comes a short summary of the parts of general interest if you like :)
The CERN Tape Archive (CTA) is CERN’s Free and Open Source Software system for data archival to tape. Across the Worldwide LHC Computing Grid (WLCG), the tape software landscape is quite heterogeneous, but we are entering a period of consolidation. A number of sites have reevaluated their options and have chosen CTA for their tape archival storage needs. To facilitate this, the CTA team have...
The CERN Cloud Infrastructure Service provides access to large compute and storage resources for the laboratory that includes virtual and physical machines, volumes, fileshares, loadbalancers, etc. across 2 different datacenters. With the recent addition of the Prevessin Data Center, one of the main objectives of the CERN IT Department is to ensure that all services have up-to-date procedures...
The objective of this talk is to share the tentative plan of energy efficiency status review by the TechWatch WG. Progress of the primary tasks such as reviewing and understanding of the trends of industry, market and technology, efforts of WLCG and sites, as well as the strategy from measurement/ data collection/ analysis/ modeling/ to estimation will be shared. Through this report, the...
The HEPiX IPv6 Working Group has been encouraging the deployment of IPv6 in WLCG and elsewhere for many years. At the last HEPiX meeting in November 2024 we reported on the status of our GGUS ticket campaign for WLCG sites to deploy dual-stack computing elements and worker nodes. Work on this has continued. We have also continued to monitor the use of IPv4 and IPv6 on the LHCOPN, with the aim...
Extending the data presented at the last few HEPiX workshops, we present new measurements on the energy efficiency (HEPScore/Watt) of the recently available AmpereOne-ARM and AMD Turin-x86 machines.
In this presentation we try to give an update on CPU, GPU and AI accelerators in the market today.
More than 10,000 Windows devices are managed by the Windows team and delegated administrators at CERN. Ranging from workstations on which scientists run heavy simulation software, to security-hardened desktops in the administrative sector and Windows Servers that manage some of the most critical systems in the Organisation – today these systems are managed using a unified MDM solution named...