- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.
Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.
Site news report from IHEP, including status of computing platform construction, grid, network, storage and so on, since last workshop report.
This is the PIC report for HEPiX Autumn 2021 Workshop
A brief update on what's going on at INFN-T1
News from the lab
Diamond Light Source is a Synchrotron Light Source based at the RAL site. This is a summary of what Diamond has been up to in cloud, storage and compute, as well as a few extras.
CentOS Stream is a great place to develop for whats next in RHEL. But what if I want to have a hybrid infrastructure? How should I think about compatibility between CentOS Stream and released versions of RHEL?
As part of the changing Linux landscape, we now have to support more Linux distributions with higher release cadence than ever before. In order to adapt to these changes, we must automate the entire process to remove the human bottlenecks from the equation.
In this presentation, we will discuss how CERN automates the release of new packages for CentOS Linux 8, CentOS Stream 8 and soon CentOS Stream 9. We produce daily snapshots of upstream content, modify it when necessary, promote packages to production and notify users of changes automatically.
Back in 2019, CERN Linux Support had to run tedious manual procedures to maintain CERN’s distro releases: SLC6, CERN CentOS 7, Red Hat 6 and 7. Since then, we have added CentOS 8, CentOS Stream 8, Red Hat 8, and we may be adding other Red Hat rebuilds soon. Given the growing number of supported distros, our team has been increasingly adopting automation and continuous integration in order to deal with all the extra load while reducing the need for human intervention.
Automation can now be found in every part of our process: cloud and Docker image building, base-line testing, CERN specific testing and full-stack functional testing. For this we use a combination of GitLab CI capabilities, Koji, OpenStack Nova, OpenStack Ironic central services and a healthy dose of Python and Bash. Test suites now cover unmanaged, managed, virtual or physical machines so we can certify that our next image release continues to meet the needs of the organization.
In December 2020, Red Hat and CentOS announced the early end-of-life of CentOS 8 (scheduled for December 2021), and its replacement with CentOS Stream 8. Unlike CentOS, which is a clone of RHEL, CentOS Stream is a forward-distribution of RHEL, containing package updates before they are released, or possibly ever included in RHEL. In this talk, we discuss the history and current status of Linux use at BNL's Scientific Data and Computer Center (SDCC), as well our future plans for this operating system.
In conjunction with our proposal to CERN and the broader HEPIX community, I wanted to share the basis for the proposal and provide an opportunity for questions to be asked.
In this talk, we would like to report the recent update and status on the KEK central computing system, Grid services, and international network situation in Japan from the previous HEPiX workshop.
We present an update of the changes at our site since the last report. Advancements, developments, roadblocks and achievements made concerning various aspects including: WLCG, Unix, Windows, Infrastructure, will be presented.
The Helmoltz-based platform HIFIS builds and sustains an IT infrastructure connecting all Helmholtz research fields and centres.
The services provided by HIFIS include a secure and easy-to-use collaborative environment with efficiently accessible IT services from anywhere. HIFIS further supports Research Software Engineering (RSE) with a high level of quality, visibility and sustainability.
In this talk, we present the scope, current implementation status, as well as exemplary services of HIFIS. The interplay of the multiple modular and decentralized HIFIS components, including the Helmholtz Cloud Portal, the Authentication and Authorization Infrastructure (Helmholtz AAI), intermediary components such as the Helmholtz Cloud Agent and cloud services provided at various sites will be showcased.
How can we turn a "chore" into a community of makers, in a scientific Organization that doesn't enforce strict standards? Drupal has a long history at CERN. Site builders of >1k unique websites take advantage of a highly automated, but bespoke infrastructure, that automates website operations: provision, backup, clone, update, delete. Relying on direct support through tickets worked, until Drupal 7 migrations overloaded the support lines, while engineers constantly come and go, taking know-how with them. Technical debt started weighing us down.
Does this sound familiar?
To overcome these service challenges, we came up with a Technical and a Human solution. We built a new, cloud-native infrastructure on Kubernetes and a standard "CERN Drupal Distribution", fully open source and available. Onboarding newcomers becomes simpler thanks to the standard technologies. Patiently we also grew a Drupal Community at CERN that brought site builders together to absorb knowledge in a social network and rely less on the singular expert.
At HEPiX we'll share with you our journey through these problems and our responses, describing them in details. We hope to give you both a reference frame and concrete solutions.
A summary of the annual "European" HTCondor workshop recently held on-line
We will present an update on our site since the Fall 2019 report, covering our changes in software, tools and operations. In addition, we will cover significant changes that are underway at both the University of Michigan and Michigan State University sites.
We conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.
Summary
Update on AGLT2 including changes in software, hardware and site configurations and summary of status and future work.
HIFIS: VO Federation for EFSS
Following the first rough ideas on Virtual Organisation (VO; Community AAI [1] based group of any size) based Enterprise File Sync&Share (EFSS) Federation [2] which were presented by HIFIS [3] on CS3 Conference 2021, we have since moved further along working on a first implementation. During Summer 2021, we have clarified the use case and identified the basic technical architecture for this future VO Federation App in Nextcloud.
When users who are distributed across multiple institutes want to collaborate within a Virtual Organisation they currently have two options to do so. One is to use the OCM protocol [4] to share files and folders with the individual VO members who are based on remote EFSS instances . This would cause considerable effort on the sharer's side, as they need to keep track of to whom they have shared which content with. The second option is for all VO members to convene on one institution’s local EFSS instace, which would cause many redundant accounts and confusion on the user’s side. Especially, as they need to know where to log in for working on a specific project and as they have no central entry point for all of their projects on their local EFSS instance.
We want to tackle this issue by enabling users to use federated shares with entire VOs instead of individual users. This way, every user within a VO will receive the share, no matter which EFSS instance they are based on. Updates to VO membership will also be communicated between federation members, resulting in new VO members automatically receiving existing VO shares and former VO members losing access to VO shares. Based on a new interface between EFSS and Community AAI, this whole process is also planned to be GDPR compliant. To ensure that this interface will also work with other Community AAIs, we are collaborating with AARC to create an AARC guideline with the aim of standardizing the interface specifications.
While the initial implementation is set to be done within a Nextcloud environment, the new features will be based on the CS3 APIs [5] and consequently be ready to also be implemented by further EFSS vendors.
[1] AARC Blueprint for Community AAIs: https://aarc-project.eu/architecture/
[2] 2021 CS3 Contribution: https://indico.cern.ch/event/970232/contributions/4157924/
[3] HIFIS Website: https://hifis.net/
[4] OCM Project documentation: https://wiki.geant.org/display/OCM/Open+Cloud+Mesh
[5] CS3 APIs GitHub page: https://github.com/cs3org/cs3apis; CS3 APIS are implemented in the REVA middleware: https://reva.link/
We present a technical report on the new ATLAS Analysis Facility hosted at the University of Chicago. Designed to support both traditional batch computing and novel analysis frameworks, this facility represents a shift in how future clusters will be deployed, federated, monitored and operated. We will describe the "cloud native" underpinnings of the facility (Kubernetes, GitOps, and Rook), how we leverage federated service deployment via SLATE, and how we incorporate technologies from IRIS-HEP such as ServiceX and Coffea Casa. The flexibility offered by these technologies allows the resource to configured for the IRIS-HEP Scalable Systems Laboratory, providing a declarative platform for analysis and data delivery systems development.
The Benchmarking WG will present the semestral report about its activity that since few years is focused on the delivery of a new benchmark for HEP, HEPscore, together with a set of software tools that enable the WLCG community to seamlessly run and collect benchmark results (HEP Benchmark Suite) and to maintain the HEP reference applications needed for the benchmarking purposes (HEP Workloads).
This report will update on the recent software developments as well as on the feedback collected from early adopters of the whole toolkit.
In addition, benchmark results obtained running on tens of CPU models will be presented and will compare the HS06 and HEPscore_beta, a prototype of HEPscore based on LHC run2 workloads.
Status and plans of the task force
CNAF Tier-1, composed of almost 1000 worker nodes and nearly 40000 cores, completed its migration to HTCondor more than one year ago. After having adapted existing monitoring tools (built with Sensu, Influx and Grafana) to work with the new batch system, an effort has started to collect a more rich and “condor oriented” set of metrics that are used to provide better insights on the pool status.
Moreover we developed a similar tool with bare metal information collection, enabling sysadmins to have a global view of hardware (IPMI) events on the farm.
In this contribution we are going to report on the latest results of the R&D activity aiming at preparing the EOS ALICE O2 storage cluster for the extremely demanding requirements of LHC Run 3.
Taking into consideration the latest upgrades of the LHC and of the ALICE detectors, the data throughput from the ALICE Data Acquisition system is expected to increase significantly, reaching 100GB/s data rate during Heavy-Ions collisions.
During this talk we are going to display the roadmap to meeting these demands with the EOS ALICE O2 storage instance, including software improvements, the storage nodes hardware setup, operating system tweaks and how we have overcome some of the hurdles in the process.
Given the anticipated increase in the amount of scientific data, it is widely accepted that primarily disk based storage will become prohibitively expensive. Tape based storage, on the other hand, provides a viable and affordable solution for the ever increasing demand for storage space. Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow for low latency access, it turns tape based systems into active archival storage (write once, read many) that imposes additional demands on data flow optimization compared to traditional backup setups (write once, read never). In order to preserve the lifetime of tapes and minimize the inherently higher access latency, different tape usage strategies are being evaluated.
As an important disk storage system for scientific data that transparently handles tape access, dCache is making efforts to contribute to tape recall optimization by introducing a high-level tape recall request scheduling component within its SRM implementation. This presentation will include first experiences with this component on scientific data in a production environment.
EOS is the open source distributed storage technology developed in the CERN IT
Department and used at the Large Hadron Collider (LHC). EOS has been operated in
production for more than 10 years and it now manages over half an exabyte of
disk storage for both LHC & non LHC experiments. Since its first deployment in
2010, the software has evolved a lot catering to the large amounts of data and
user requirements. After 4 major version releases, the next version, EOS 5,
codename diopside is getting ready for deployment. This presentation will give
an overview about EOS, the major features this release brings, a roadmap for
expected future features and how all of these features contribute towards the
next LHC Run.
While WLCG may be considered at the vanguard of data-intense
scientific research, many other scientific communities are finding
their data storage requirements growing beyond their current
capabilities. Simultaneously, with science increasingly involving
broad collaborations, the ability to support and manage scientists
from different institutes is becoming essential.
At DESY, we have developed and deployed a community storage solution.
The primary users are the Helmholtz scientific community, which
includes a broad spectrum of fundamental research activity. The
service is not restricted to Helmholz users; for example, through the
ESCAPE project, we also support other communities such as SKA, CTA,
LSST and FAIR.
In this talk will detail the primary use-cases we wish to address with
this service, the powerful features that support them, and the
technologies that underpin those features. We will also describe the
current status of the storage and our plans for the future.
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks to and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves.
The COVID-19 pandemic has introduced a novel challenge for security teams everywhere by expanding the attack surface to include everyone's personal devices / home networks and causing a shift to new, risky software for a remote-first working environment. It was also a chance for attackers to get creative by taking advantage of the fear and confusion to devise new tactics and techniques.
What's more, the worrying trend of data leaks, password dumps, ransomware attacks and new security vulnerabilities does not seem to slow down.
This talk is based on contributions and input from the CERN Computer Security Team.
On account of a question after the subject came up in site reports: what do sites have planned in terms of measures to increase users security awareness
To counteract the spread of the COVID-19 virus as much as possible, a device has been developed at CERN, the Proximerter, which sends information about contact tracing via an IoT network. In this respect, some hurdles had to be overcome in terms of data protection and compromises concerning the network and the actual protocol.
This presentation reports on the ongoing migration of archival library and tape technology at The Scientific Data and Computing Center (SDCC). With the planned transition from Oracle libraries to IBM libraries, we deployed our first IBM TS4500 Library with ~20k tape slots in early 2021. This talk discusses our experience with the IBM TS4500, our continued transition to these libraries, as well as our experience with new tape technologies.
CASTOR was used as CERN's primary archival storage system for the last two decades, including Run-1 and Run-2 of the LHC. For Run-3, CASTOR has been replaced by the CERN Tape Archive (CTA). At the end of Run-2, there were 340 Petabytes of data stored in CASTOR, which had to be migrated to CTA during Long Shutdown 2. Over 90% of this data is an active archive — the custodial copy of physics data belonging to the four LHC experiments and around a dozen smaller experiments at CERN. The migration and switch from CASTOR to CTA had to be accomplished with minimal interruption to experiment activities; to further complicate the problem, each experiment has a slightly different workflow and data management stack. This presentation will describe our experiences and lessons learned during the two-year period of the migration.
The RX protocol inherited from IBM AFS is incapable of filling a network pipe with a single RPC when the pipe's bandwidth delay product exceeds 44 1/4 KB. On a 1 Gbit/sec pipe with a 1ms RTT, the maximum theoretical throughput is 360 Mbit/sec with a maximum window size of 44 KB. The RX ACK packet format provides for a theoretical maximum window of 65535 packets but the Selective Acknowledgment (SACK) Table is limited to 255 packets. with 255 packets the maximum window size is 351 KB or maximum throughput of 2.875 Gbit/sec with 1ms RTT. Increase the RTT to 8ms and the maximum throughput is once again reduced to 360 Mbit/sec. The RTT on cross-Atlantic commodity internet pipes often exceed 110ms which reduces the theoretical throughput to 28 MBit/sec.
This presentation will describe AuriStor's proposed RX Extended SACK Table protocol extension, prior efforts at extending the maximum window size, and preliminary results on real-world networks using AuriStor's prototype supporting maximum windows up to 8192 packets or 11MB.
The Belle II experiment is a detector coupled to the SuperKEKB electron-positron collider, designed to collect 50 times the data produced by the previous generation of B-factories. Process and analyzing these high volumes of data in a timely fashion requires an efficient interconnection of software and computing resources. In this talk, the analysis model on the Belle II experiment is presented, summarizing the distributed computing technologies adopted and current challenges.
A retrospective view of the transition from Vidyo to Zoom platform.
As ATLASSIAN is changing its product offering, DESY has started to evaluate alternatives to the products from the firm's product portfolio.
For the field of repositories and CI/CD software development, a suitable replacement for Jira Software/Bitbucket/Bamboo was found in gitlab.
This talk will outline the pilot project, the current production system and the offering we make to users to help them during migration.
In late 2020, CC-IN2P3 decided to look for a service desk ticketing system as an alternative to OTRS.
During last winter we carried out a search for a new candidate and chose Zammad.
We transitionned to Zammad in summer and are now decommissionning OTRS.
This talk will present the features offered by Zammad and the lessons learned from that transition.
Last year we deployed a new automated backup and recovery setup for the Database on Demand Service at CERN. This setup comprises of 2 services-
1. Backup service - Stores encrypted and zipped backups of all DBOD production instances to EOS storage.
2. Restore service - Continuously restores and tests connectivity of snapshots of all production instances.
This presentation aims to shed light on the structure and working of this setup.
The Weblecture service is in charge of capturing, processing & delivery CERN productions: e-learning, Computing Seminars, Conferences, Outreach events, CERN related communication e.g. DG, HR, Staff association, etc.. Tightly linked to Webcast and Videoconference services from which most AV content is provided. The services delivers AV content on different formats so our community can watch them in different platforms e.g. mobile or a desktop device.
The talk presents the past status running on a legacy stack and the work done to provide same and expanded functionality: new capture agents like Epiphan with improved capabilities, better and faster transcoding which is also provided as a service to other units, unified video player for both conference and web-lectures, keeping backward compatibility, new CERN SSO, etc.
The new architecture has been built around an in-house development and integrations of two FOSS projects Paella player [1] and Opencast [2].
[1] https://paellaplayer.upv.es/
[2] https://opencast.org/
As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon. In this talk we’ll give an update on the Research Networking Technical working group activities, challenges and recent updates.
In particular we'll focus on the packet marking technologies (scitags), tools and approaches that have been identified and we are going to discuss status of the implementation and plans for the near-term future.
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing.
The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and constituents. We will report on the changes and updates that have occurred since the last HEPiX meeting.
The primary areas to cover include the status of and plans for the WLCG/OSG perfSONAR infrastructure, the WLCG Throughput Working Group and the activities in the IRIS-HEP and the now completed SAND projects.
Increasing use of cloud resources, and other developments in new workflows, have posed an important question on which certificate providers are most appropriate for different use cases. Certificate authorities under discussion include Let’s Encrypt but also the CAs of commercial cloud providers. We discuss the posing of this question along with the key stakeholders, including sites, operations, experiments, and identity management and security experts, as well as requirements that need to be included in this discussion. We report on the formation of new WLCG task force on this topic as we approach the first meeting of that group.
During this year the HEPiX IPv6 working group has continued to encourage the deployment of dual-stack IPv4/IPv6 services. We also recommend dual-stack clients (worker nodes etc). Many data transfers are happening today over IPv6. This talk will present our recent work including the ongoing planning for moving to an IPv6-only core WLCG.
The threat faced by the research and education sector from determined and well-resourced attackers has been growing in recent years and is now acute. We must act together as a community to defend against these attacks. A vital means of achieving this is to share threat intelligence - key indicators of compromise of an ongoing incident including network locations and file hashes - with trusted partners. We must couple this with a robust, fine-grained source of network monitoring. The combination of these elements along with storage, visualisation and alerting is called a Security Operations Centre. The WLCG SOC working group has been pursuing an interconnected network of SOC-equipped sites for several years. We report here on recent progress, including new deployments against multiple 100Gb/s sites, and future plans for the coming year.
DESY's current user registry is about to retire. During its
use boundary conditions have shifted, the user groups have
become more heterogeneous and in addition to collaborations
with stable memberships the photon science community brought
a higher rate of fluctuation of IT users. The successor in-
corporates those changes and delivers functionality based
on Oracle's database (database, APEX, ORDS).