We present an update of the changes at our site since the last report. Advancements, developments, roadblocks and achievements made concerning various aspects including: WLCG, Unix, Windows, Infrastructure, will be presented.
We will present a follow-up on the activities on-going at the CC-IN2P3 since the last site report done in Fall 2022.
The KEK Central Computer System (KEKCC) provides large-scale computing resources, including Grid computing systems and essential IT services, to support many research activities in KEK. We will report on the current status of KEKCC in this presentation.
Update of the WLCG and Scientific computing services, technology and resource at ASGC.
News from CERN since the last HEPiX workshop. This talk gives a general update from services in the CERN IT department.
An update on recent developments at the Scientific Data & Computing Center (SDCC) at BNL.
IHEP operates a comprehensive infrastructure comprising an HTC cluster, HPC cluster, and WLCG grid site, dedicated to facilitating data processing for over 20 experiments. Additionally, ongoing research in AI and QC is actively pursued.
Recently, we expanded our local HTC cluster by integrating it with 250 worker nodes from a remote Slurm cluster through gliding job slots. This enhancement...
We will present an update on our site since the Spring 2023 report, covering our changes in software, tools and operations.
The three primary areas to report on our work on performance evaluation with ZFS vs. Dell RAID systems, our plans and status for our transition from EL7 to RHEL9 and the work to deploy an operational WLCG Security operations Center implementation.
We conclude with a...
AlmaLinux has been chosen by many across the world as the replacement for CentOS Linux, but there is still a lot of confusion around AlmaLinux's governance and build pipeline. Given AlmaLinux's newness and the instability in the greater enterprise Linux ecosystem, a strong understanding of how and where AlmaLinux OS started, where AlmaLinux it is today, and where we expect to go in the future...
The recent turmoils in the Red Hat ecosystem and the corresponding uncertainties they created in the HEP community have triggered the CERN Linux team to review their options for a multi-year Linux strategy. This presentation will summarise the state of Linux at CERN and discuss options moving forward as input to the Linux-themed discussion at this HEPiX meetup.
The CernVM File System (CVMFS) provides the software distribution backbone for High Energy and Nuclear Physics experiments and many other scientific communities in the form of a globally available, shared, read-only filesystem. However recently, CVMFS has found major adoption beyond academia: in particular Jump Trading, an algorithmic and high-frequency trading firm, now uses CVMFS for...
CERN Storage and Data Management group is responsible for ensuring that all data produced by physics experiments at CERN is safely stored and reliably accessible by the user community. 2023 Run-3 and especially the Heavy Ion Run have pushed further the previous records in terms of data volume and transfer rates delivered by the main LHC experiments. The targets anticipated by the data...
Despite of the growing number of flash-based data storage systems the usage of spinning disks (HDDs) for large on-line data storage systems is still advantageous. Measurements of the read-write behaviour of a cluster file system using external storage controllers backed by HDDs are presented. Contrary to commonly expected balanced read and write rates, resp., or even read rates slightly...
Most data center data is stored on hard drives. But can they compete in the future, and what is the role and future of hard drives and the different hard drive technologies available? The challenge is efficiently scaling storage infrastructure while optimizing for write/read performance, TCO, and sustainability goals.
In this session, we will explore how areal density and the latest hard...
Managing a data center poses multifaceted challenges, with monitoring emerging as a pivotal aspect for ensuring stability and service quality assurance. Since the first implementation of a monitoring system in the Green IT Cube data center at GSI in 2016, comprising RRDtool and Ganglia, ongoing efforts have been dedicated to enhancing monitoring capabilities. The initial system's tight...
The INFN Information System project was established in 2001 with the aim of computerizing and standardizing the administrative processes of the Institute and gradually moving towards dematerialization and digitization of documents. During these two decades the aim of the project has been accomplished by a series of web applications (what we call sysinfo apps) serving INFN researchers,...
In the last 20 years, CERN’s Live Streaming service [1] has been a pivotal communication tool connecting CERN users and the High Energy Physics (HEP) community in real time. From its initial stages, employing Real Media technologies and Flash, to its present state, integrating cutting-edge technologies like HTTP Live Streaming (HLS), the service has been instrumental in fostering global...
The Institute for Experimental Particle Physics (ETP) at the Karlsruhe Institute of Technology has access to several computing and storage resources. Besides the local resources such as worker nodes and storage, the ETP has access to the HPC cluster NEMO in Freiburg and to the Throughput Optimized Analysis System (TOpAS) cluster and Grid storage at the WLCG-Tier1 GridKa.
Hence, we use a...
We will give an overview and status, what over the past year is new and where we plan to go with our compute clusters. The migration to EL9 will be used for an overall update of Condor & Jupyter including a renovation & rewrite of the current configuration and some enhancements concluding from the past experience running the NAF.
With the recent developments in ARM technology and ongoing efforts by experiments in the integration of it into their workflows, there is increasing interest in getting Tier2 sites to obtain ARM kit in future procurements for testing and potential pledging. Here we present tests conducted by Glasgow on a variety of next-generation CPU to strengthen this case of future heterogenous computing...
KM3NeT is a research infrastructure currently under construction in the Mediterranean Sea.
It consists of two neutrino detectors: ARCA for studying astrophysical sources and ORCA for studying neutrino properties.
Currently 15% of the infrastructure is operational.
The output of the entire infrastructure will eventually amount to a data rate of 100 Gbps, and a data volume of 500 TB per...
The adoption of HEPScore23 as replacement of HS06 in April 2023 marked a significant milestone for the WLCG community. After one year since that change, we conduct a thorough review of the experience, lessons learned, and areas for improvement. In addition, triggered by the community feedback and demand, the Benchmarking WG has started a new development effort to expand the Benchmark Suite...
The HEP Benchmark suite has been expanded beyond assessing only the CPU execution speed of a server via HEPScore23. In fact the suite incorporates metrics such as machine load, memory usage, memory swap, and notably, power consumption. In this report we detail the ongoing studies enabled by these new features.
With the advent of new species of ARM architecture on the market, and increasing developments by Intel/AMD to match the power-savings by ARM, it can be difficult for Grid sites to decide which machines to target in future procurements. While cost is an important factor, sites are increasingly able to make at least part of their choices on sustainability grounds. Obtaining test machines and...
In an era defined by the exponential rise of artificial intelligence and data analytics, the value of data is more valuable than ever, propelling the expansion of data centers to accommodate vast datasets. However, this growth comes with a sobering reality: the significant energy consumption of data centers, often rivaling that of entire nations. The need to create a sustainable and scalable...
The Euregio Meuse-Rhine border region between Belgium, the Netherlands and Germany is a potential site for the Einstein Telescope. In late October of 2023 Nikhef was asked to organise a backup and archiving of seismic survey data. This talk covers how the seismic data was then being shared and not backed up. The quick fix to backup the existing data; some custom python code being used as a...
CERNBox is an innovative scientific collaboration platform, built using solely open-source components to meet the unique requirements of scientific workflows. Used at CERN for the last decade, the service satisfies the 35K users at CERN and seamlessly integrates with batch farms and Jupyter-based services.
Following the presentations given at the CS3 Workshop 2024[1] and CERN Storage...
ESS is getting ready for its next major milestone, which we call 'beam on dump,' where we will commission the full LINAC at the end of this year.
Although ESS is not yet completed, we have already built most of the IT infrastructure to support the control system for the accelerator, target, and neutron instruments.
System experts, operators, and beam physicists are already requesting to...
Data centers are at the forefront of managing vast amounts of data, relying on three primary storage mediums: Flash/SSD, Hard Drives, and Tape. This presentation deals with the special characteristics of the individual technologies and examines the question of whether the forthcoming advances will lead to one technology being displaced by the other. Central to our discussion is the examination...
The Grand Unified Token (GUT)-profile working group is trying to create a single OAuth2 token profile to replace the main token profiles: SciTokens, WLCG and AARC. These token profiles are being used by infrastructures and collaborations such as LIGO, HTCondor, WLCG, EGI etc. for a "new" authentication method, replacing the current X.509-based authentication. All these profiles share various...
In previous HEPiX meetings we have presented on the strategic direction of the Security Operations Centre working group, focused on building reference designs for sites to deploy the capability to actively use threat intelligence with fine-grained network monitoring and other tools. This work continues in an environment where the cybersecurity risk faced by research and education, notably from...
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves.
Given the importance of the network to WLCG, it is important to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. This talk will focus on the status and plans for the joint WLCG and IRIS-HEP/OSG-LHC effort to operate a global perfSONAR deployment and develop associated network metric...
The high-energy physics community, along with the WLCG sites and Research and Education (R&E) networks have been collaborating on network technology development, prototyping and implementation via the Research Networking Technical working group (RNTWG) since early 2020.
In this talk we’ll give an update on the Research Networking Technical working group activities, challenges and recent...
A robust computing infrastructure is essential for the success of scientific collaborations. However, smaller collaborations often lack the resources to establish and maintain such an infrastructure, resulting in a fragmented analysis environment with varying solutions for different members. This fragmentation can lead to inefficiencies, hinder reproducibility, and create collaboration...
After a brief introduction on the Muon Alignment optical system and the dataflow for the optical lines, I will illustrate our infrastructure which is based on Micro-services in Java (mainly for the access to Oracle DB) and C++ (for the alignment algorithm itself) and deployed on a dedicated K8s cluster at CERN.
High-performance digital technology has entered a new era in recent years with the arrival of exascale. Anticipated by experts for more than ten years, exascale is supposed to respond to increasingly varied uses that go far beyond traditional numerical simulation. Data processing and AI are shaking up the HPC landscape and have implications at all levels, from hardware to software, at the...
By modelling the life cycle emissions for a given unit of scientific computing under various scenarios of hardware replacement and computing facilities (including the emissions from the local power generation mix), we can find optimal computing hardware replacement cycles in order to minimize carbon emissions.
The majority of this work was presented at ISGC on March 28th:...
As the UK’s journey towards NetZero accelerates, we need robust information to inform both strategic and operational decisions, from policy development and funding allocation to hardware procurement, code optimisation and job scheduling.
The UKRI Digital Research Infrastructure NetZero Scoping Project published its technical report and recommendations in August 2023 [1] and funded the...
At our site we have varied the datacenter inlet temperature between 23-25C while monitoring the effects on the total system power usage and temperature. In this talk i will give a overview of the results and findings of this. And how we collected all the relevant information, and how to visualize this in a useful format.
The Technology Watch Working Group, established in 2018 to take a close look at the evolution of the technology relevant to HEP computing, has resumed its activities after a long pause. In this first official report after such pause, we describe our goals, how the group is organized and the first results of our work.
This presentation provides a detailed overview of the hyper-converged cloud infrastructure implemented at the Swiss National Supercomputing Centre (CSCS). The main objective is to provide a detailed overview of the integration between Kubernetes (RKE2) and ArgoCD, with Rancher acting as a central tool for managing and deploying RKE2 clusters infrastructure-wide.
Rancher is used for direct...
In preparation of the increasing computing needs of the HL-LHC, the IT department has built a new datacenter in Prevessin. These additional capacity will also enable IT services to prepare their Business Continuity and Disaster recovery plans.
With those two in mind, the cloud service has prepared a new deployment fully decoupled from the existing in Meyrin. This new setup allows users to...
The Space-based multi-band astronomical Variable Objects Monitor (SVOM) is a French-Chinese mission dedicated to the study of the most distant explosions of stars, the gamma-ray bursts. This talk will cover a brief overview of the whole mission infrastructure before focusing on the French Scientific Ground Segment (FSGS) infrastructure. The FSGS relies on a micro-services architecture with a...
The cloud service provides resources to the whole CERN community in two datacentres (Meyrin and Prevessin). The deployment of the new datacenter in Prevessin allowed us to reconsider all the design choices we made for Meyrin. In the networking area, we are increasing the flexibility and adding even more options to users by offering Software Defined Networking. In this talk, we will explain the...
Windows 10 is dead, long live Windows 11! Device compatibility, hardware replacement plans, strategies for upgrade campaigns, privacy are all discussed and illustated with examples based on 10000 CERN PCs. Generative AI (Copilot) is also coming to the OS layer. Are you ready?
The HEPiX IPv6 Working Group has been encouraging the deployment of IPv6 in WLCG for many years. At the last HEPiX meeting in Canada we reported that more than 97% of all LHC experiment Tier-2 storage services are IPv6-capable. Since then, we have turned our attention to compute services and have launched a GGUS ticket campaign for WLCG sites to deploy dual-stack computing elements and worker...
The end of life of CentOS 7 accelerates the transition from VOMS proxies to OAuth tokens as the means to convey authorization information on a Grid/Cloud infrastructure. As a consequence, the VOMS and VOMS-Admin services will be abandoned in favor of INDIGO-IAM (or equivalent products) for the management of VO membership and the issuance of proxies and tokens.
In this contribution we present...
In recent times, experiment analysis frameworks and physics data formats of the LHC experiments have been evolving in a direction that makes interactive analysis with short turnaround times much more feasible. In parallel, many sites have set up Analysis Facilities to provide users with tools and interfaces to computing and storage that are optimised for interactive analysis. At CERN we...
More than 2000 daily users use the remote desktop access service from outside CERN, supporting remote work capabilities in the organization.
To improve the operation of the service, a new self-service solution for remote desktop access was launched in 2023 that empower users to manage their devices set up for remote access. The solution includes parallelization, caching, and automated...
Based on extensive experience in system maintenance and advanced artificial intelligence technology, we have designed the IHEP computing platform's intelligent operations and maintenance system. Its primary goal is to ensure optimal utilization and efficiency of computing resources.
This system automatically detects user jobs that cause anomalies in computing services and dynamically adjusts...
Nix is a tool for packaging software with a heavy focus on reproducibility. NixOS is a Linux distribution based on the Nix package manager.
This talk is a series of demonstrations of what Nix and NixOS can do for you. Depending on the time, here is what I'm going to show off:
- A presentation of the Nix model
- Its advantages in terms of supply-chain security
- Building minimal,...
An important aspect of IT security is the management, controlled sharing and storage of sensitive data such as passwords or API tokens. In this talk we present how HashiCorp Vault is used at DESY to address this challenge and how the system is integrated into workflows like certificate management and the existing IT infrastructure such as Puppet and GitLab. As secret management is a critical...