HEPiX Spring 2021 online Workshop
HEPiX Spring 2021 online Workshop
The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.
Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.
-
-
08:00
→
08:10
Miscellaneous: Welcome & Logistics Online workshop
Online workshop
Convener: Peter van der Reest (Deutsches Elektronen-Synchrotron DESY) -
08:10
→
09:25
Site Reports Online workshop
Online workshop
-
08:10
KEK Site Report 15m
The KEK Central Computer System (KEKCC) is a computer service and facility that provides large-scale computer resources, including Grid and Cloud computing systems and common IT services, such as e-mail and web services.
Following the procurement policy for the large scale computer system requested by the Japanese government, we replace the entire KEKCC every four or sometimes five years. The current system has replaced the previous system and has been in operation since September 2020, and decommissioning will start in early 2024.
In this talk, we would like to share our experiences and challenges for introducing Grid services in the KEKCC. In particular, we report several improvements for the Belle II experiment. Also, we pick up some issues that we have to address in the future.
Speaker: Go Iwai (KEK) -
08:25
ASGC site report 15m
ASGC site report
Speaker: Felix.hung-te Lee (Academia Sinica (TW)) -
08:40
CERN Site Report 15m
News from CERN since the previous HEPiX workshop.
Speaker: Andrei Dumitru (CERN) -
08:55
RAL Site Report 15m
An update on developments and some plans at the RAL Tier1
Speaker: Martin Bly (STFC-RAL) -
09:10
PIC report 15m
This is the PIC report for HEPiX Spring 2021 Workshop
Speaker: Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
-
08:10
-
09:25
→
09:45
Coffee Break 20m
-
09:45
→
11:00
End-User IT Services & Operating Systems Online workshop
Online workshop
-
09:45
Daisy: Data analysis integrated software system for X-ray experiments 25m
Daisy (Data Analysis Integrated Software System) has been designed for the analysis and visualization of the X-ray experiments. To address an extensive range of Chinese radiation facilities community’s requirements from purely algorithmic problems to scientific computing infrastructure, Daisy sets up a cloud-native platform to support on-site data analysis services with fast feedback and interaction. The plugs-in based application is convenient to process the expected high throughput data flow in parallel at next-generation facilities such as the High Energy Photon Source (HEPS). The objectives, functionality and architecture of Daisy are described in this article.
Speaker: Haolai Tian (Institute of High Energy Physics) -
10:10
OnlyOffice and Collabora Online Experience at CERN 25m
Collaboration features are nowadays a key aspect for efficient team work with productivity tools. During 2020, CERN has deployed OnlyOffice and Collabora Online solutions and monitored their usage in CERNBox.
This presentation will focus on technical aspects of deploying and maintaining OnlyOffice and Collabora Online within CERN and their integration with CERNBox. It will also give an overview of our user community and the main challenges we face when interacting with these applications.
Speaker: Maria Alandes Pradillo (CERN) -
10:35
Windows desktop service from computer to user centric IT 25m
- Over the last decades we mainly focused our MS Windows management policy on hardening machines, we wanted to control and manage how and when security updates were deployed, how software could be installed, licensed and monitored on a machine… But times have changed, IT has evolved and users can now be empowered and regain their freedom. Let’s see together which solutions we put in place to reduce control over Windows users while maintaining a secure and user-friendly experience. Be part of a Windows Active Directory domain is not anymore the Alpha and Omega of a Windows-based computer.
Speaker: Sebastien Dellabella (CERN)
-
09:45
-
16:00
→
16:10
Miscellaneous: Welcome & Logistics Online workshop
Online workshop
Convener: Tony Wong (Brookhaven National Laboratory) -
16:10
→
17:25
Site Reports Online workshop
Online workshop
-
16:10
INFN-T1 site report 15m
A short presentation on what's going on at INFN-T1 site
Speaker: Mr Andrea Chierici (Universita e INFN, Bologna (IT)) -
16:25
BNL Site Report 15m
An update on BNL activities since the Fall 2020 workshop
Speaker: Costin Caramarcu (Brookhaven National Laboratory (US)) -
16:40
Diamond Light Source Site Report 15m
Diamond Light Source is a Synchrotron Light Source based at the RAL site. This is a summary of what Diamond has been up to in cloud, storage and compute, as well as a few extras.
Speaker: Frederik Ferner -
16:55
GSI site report 15m
News and status report from GSI
Speaker: Mr Christopher Huhn -
17:10
Canadian ATLAS Tier-1 site report 15m
News and updates of the Canadian ATLAS Tier-1 center over past years. The presentation will cover the site configuration and tools used, how we operate a 'federated' Tier-1 center and improve the CPU utilization.
Speaker: Di Qing (TRIUMF (CA))
-
16:10
-
17:25
→
17:45
Coffee Break 20m
-
17:45
→
19:00
End-User IT Services & Operating Systems Online workshop
Online workshop
-
17:45
Linux at CERN: current status and future 25m
CERN has historically used RedHat derived Linux distribtions; favored for their relative stability and long life cycle. In December 2020, the CentOS board announced that the end-of-life for CentOS Linux 8 would be changed from a 10 year life cycle to 2 years.
This talk focuses on what CERN will be doing in the short-term to adapt to this announcement, and what the Linux future could look like after the end-of-life of CentOS Linux 8.Speaker: Ben Morrice (CERN) -
18:10
Red Hat Products and Programs 25m
A "just the facts" look at the products and programs Red Hat offers, followed by a Question and Answer session.
In the past six months, Red Hat has made some dramatic announcements. We are aware that these announcements affect how the High Energy Physics community does computing. We want you, Hepix, to make the best informed decisions as you decide your next steps forward. This presentation will hopefully clear up questions, and stand as a reference, while you have your discussions.
Speaker: Troy Dawson -
18:35
Building up and migrating to a FOSS-focused e-mail service at CERN 25m
This talk is an update on CERN's project to build up a new e-mail service at CERN, focused on Free and Open Source Software, and to migrate all of its users. As presented in HEPiX Autumn 2019, CERN has been working on migrating out of Microsoft Exchange since Spring 2018. However in early Spring 2020, in the middle of the migration to its first pilot, CERN had to stop and redesign a new solution. After shortly explaining the reasons of this change, this talk will focus on the design of the newly adopted solution, presenting the new architecture and implementation in detail. Then the migration process, from both the previous pilot and production services, will be discussed. This talk will conclude on some early results, both in term of infrastructure and usability.
Speaker: Vincent Brillault (CERN)
-
17:45
-
08:00
→
08:10
-
-
08:00
→
09:15
Network & Security Online workshop
Online workshop
-
08:00
CERN central DHCP service: Migration from ISC DHCP to Kea 25m
The presentation discusses the change of the DHCP software used for the CERN central DHCP service, namely the migration from ISC DHCP to Kea. It outlines the motivation behind the replacement of ISC DHCP and describes the main steps of the transition process. It covers the translation of the current CERN ISC DHCP configuration, testing the new Kea configuration, and the implementation of the software for generating Kea configuration based on data from the CERN central database. The focus is put on the main differences between ISC DHCP and Kea and the problems linked to the transition.
Speaker: Maria Hrabosova (CERN) -
08:25
Computer Security Update 25m
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks to and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing and around computer security.
The COVID-19 pandemic has introduced a novel challenge for security teams everywhere by expanding the attack surface to include everyone's personal devices / home networks and causing a shift to new, risky software for a remote-first working environment. It was also a chance for attackers to get creative by taking advantage of the fear and confusion to devise new tactics and techniques. What's more, the worrying trend of data leaks, password dumps, ransomware attacks and new security vulnerabilities does not seem to slow down.
This talk is based on contributions and input from the CERN Computer Security Team.
Speaker: Liviu Valsan (CERN) -
08:50
IPv6-only on WLCG - update from the IPv6 working group 25m
The transition of WLCG storage and central services to dual-stack IPv4/IPv6 has gone well, thus enabling the use of IPv6-only CPU resources as mandated by the WLCG Management Board. Many WLCG data transfers now take place over IPv6. The dual-stack deployment does however result in a networking environment which is much more complex than when using just IPv4 or just IPv6. During recent months the HEPiX IPv6 working group continues to encourage the use of IPv6 as the primary networking protocol for WLCG data transfers and in relation to the completion of the transition to IPv6, is considering the removal of the IPv4 protocol in more places. We will present our recent work and future plans.
Speaker: Dr Andrea Sciabà (CERN)
-
08:00
-
09:15
→
09:35
Coffee Break 20m
-
09:35
→
11:05
Computing & Batch Services Online workshop
Online workshop
-
09:35
Unchaining JupyterHub: Running notebooks on resources without inbound connectivity 25m
JupyterLab has become an increasingly popular platform for rapid prototyping, teaching algorithms or sharing small analyses in a self-documenting manner.
However, it is commonly operated using dedicated cloud-like infrastructures (e.g. Kubernetes) which often need to be maintained in addition to existing HTC systems. Furthermore, federation of resources or opportunistic usage are not possible due to a requirement of direct inbound connectivity to the execute nodes.
This talk presents a new, open development in the context of the JupyterHub batchspawner:
Extending the existing functionality to leverage the connection broker of the HTCondor batch system, the requirement for inbound connectivity to the execute nodes can be dropped, and only outbound connectivity to the Hub is needed.Combined with a container runtime leveraging user namespaces, unprivileged CVMFS and the HTCondor file transfer mechanism, notebooks can not only be executed directly on existing local HTC systems, but also on opportunistically usable resources such as HPC centres or clouds via an overlay batch system.
The presented prototype paves the way towards a federation of heterogeneous and distributed resources behind a single point of entry.
Speaker: Dr Oliver Freyermuth (University of Bonn (DE)) -
10:00
Dynamic integration of opportunistic compute resources 25m
Exploitation of heterogeneous opportunistic resources is an important ingredient to fulfil the computing requirements of large HEP experiments in the future. Potential candidates for integration are Tier 3 centres, idling cores in HPC centres, cloud resources, etc. To make this work, it is essential to choose a technology which offers an easy integration of those resources into the computing infrastructure of the experiments. We present such an approach based on COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology as core components. The challenging part of dynamically integrating and subsequently removing resources with fluctuating availability and utilization is undertaken by the COBalD/TARDIS resource manager.
Speaker: Peter Wienemann (University of Bonn (DE)) -
10:25
Moving to HTCondor (and fighting covid in the middle) 25m
On March 2020, INFN-T1 started the process of moving all the Worker Nodes managed by LSF to the HTCondor batch system, which was set up and tested in the previous months and was considered ready to handle the workload of the whole computing cluster. On March 20, while in the middle of the migration process, a sudden request came to provide 50% of our computing power for a period of one month to the "Sibylla Biotech" research project, to study the folding of the protein ACE2, present on the membrane of human cells, that allows the virus to enter. We report about how the migration process was organized and how it was possible to handle such a request, quite distant from our usual use cases. Finally we report about our experience so far with HTCondor and HTCondor-CE
Speaker: Stefano Dal Pra (Universita e INFN, Bologna (IT))
-
09:35
-
14:30
→
16:00
Birds of a Feather (BoF) session: Linux DiscussionConvener: Frank Wuerthwein (Univ. of California San Diego (US))
-
16:00
→
17:40
Network & Security Online workshop
Online workshop
-
16:00
WLCG Authorization WG Update 25m
Since 2017, the Worldwide LHC Computing Grid (WLCG) has been working towards enabling Token based authentication and authorisation throughout its entire middleware stack. Following the publication of the WLCGv1.0 Token Schema in 2019, middleware developers have been able to enhance their services to consume and validate OAuth2.0 tokens and process the authorization information they convey. This talk will present a status update of the WLCG Authorization Working Group and provide an overview of the token based authorization model. We will put the progress for WLCG in context with larger efforts by the Research and Education sector.
Speaker: Hannah Short (CERN) -
16:25
WLCG/OSG Network Activities, Status and Plans 25m
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and constituents. We will report on the changes and updates that have occurred since the last HEPiX meeting.
The primary areas to cover include the status of and plans for the WLCG/OSG perfSONAR infrastructure, the WLCG Throughput Working Group and the activities in the IRIS-HEP and SAND projects.
Speaker: Shawn Mc Kee (University of Michigan (US)) -
16:50
Research Networking Technical WG Update 25m
As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon. In this talk we’ll give an update on the Research Networking Technical working group's recent activities, updates from R&E network providers as well as plans for the near-term future.
In particular we'll focus on the packet marking technologies, tools and approaches that have been identified and we are going to discuss status of the prototypes and possible future directions.
Speaker: Marian Babik (CERN) -
17:15
Cybersecurity Framework for Research/Education Organizations 25m
The Trusted CI Framework provides a structure for organizations to establish and, improve, and evaluate their cybersecurity programs. The framework empowers organizations to confront their cybersecurity challenges from a mission-oriented, programmatic, and full organizational lifecycle perspective.
The Trusted CI Framework is structured around 4 Pillars that support a cybersecurity program: Mission Alignment, Governance, Resources, and Controls. Composing these pillars are 16 Musts that identify the concrete, critical requirements for a competent cybersecurity program. The 4 Pillars and the 16 Musts make up the “Framework Core,” which is designed to be applicable in any environment and for any organization.
On March 1, Trusted CI published the first Framework Implementation Guide (FIG). This FIG is designed for use by research cyberinfrastructure operators, including, but are not limited to, major research facilities, research computing centers, and major computational resources supporting research computing. It has been reviewed by our Framework Advisory Board, a diverse group hailing from the Research and Higher Education communities, and their comments incorporated in the published document. The FIG chapters provide roadmaps toward a mature cybersecurity program and advice on potential challenges. Tools and templates are provided to assist cybersecurity program implementation.
This session provides an overview of the Trusted CI Framework and the guidance and tools in the FIG.
Speaker: Bob Cowles (BrightLite Information Security)
-
16:00
-
17:40
→
18:00
Coffee Break 20m
-
18:00
→
19:15
Computing & Batch Services Online workshop
Online workshop
-
18:00
The WLCG HEP-SCORE deployment task force 25m
Following up from the work of the HEPiX benchmarking working group, WLCG launched a task force primarily tasked to concretely propose a successor to HEP-SPEC 06 as standard benchmark for CPU resources in WLCG. We will present an overview of the
mandate and composition of the task force and will report on status and plans.Speaker: Helge Meinhard (CERN) -
18:25
HEP Benchmarks: updates and demo 25m
Since 2 years the HEPiX Benchmarking Working Group has been developing a benchmark based on actual software workloads of the High Energy Physics community, called HEPscore. This approach, based on container technologies, is designed to provide a benchmark that is better correlated with the actual throughput of the experiment production workloads. In addition, the procedures to run and collect benchmark results have been reviewed and implemented in a tool called HEP Benchmark Suite. HEPscore v1.0 and HEP Benchmark suite v2.0 have been released. This contribution will highlight the major functionalities recently introduced and offer a demo to demonstrate the ease usage.
Speaker: Domenico Giordano (CERN) -
18:50
New institutional resources at BNL 25m
BNL's first institutional cluster is reaching the end of life, and it has started the process of replacing its capabilities with new resources. This presentation reviews historical usage of existing resources and describes the replacement process, including timelines, composition and plans for expansion of the user community that will use the new resources.
Speaker: Tony Wong (Brookhaven National Laboratory)
-
18:00
-
08:00
→
09:15
-
-
08:00
→
09:15
Storage & File Systems Online workshop
Online workshop
-
08:00
The design of Data Management System at HEPS 25m
According to the estimated data rates, we predict 24 PB raw experimental data will be produced per month from 14 beamlines at the first stage of High Energy Photon Source (HEPS), and the volume of experimental data will be even greater with the completion of over 90 beamlines at the second stage in the future. To make sure that huge amount of data collected at HEPS is accurate, available and accessible, an effective data management system (DMS) is crucial piece of deploying the IT systems. We design a DMS for HEPS which is responsible for automating the organization, transfer, storage, distribution and sharing of the data produced from experiments. First, the general situation of HEPS is introduced in this paper. Second, the architecture and data flow of the HEPS DMS are described from the perspective of facility users and IT, and the key techniques implemented in this system are introduced. Furthermore, the progress and the effect of the DMS deployed as a testbed at 1W1A of BSRF are shown.
Speaker: Hao Hu (Institute of High Energy of Physics) -
08:25
CTA production experience 25m
The CERN Tape Archive is the tape back-end to EOS and the replacement for CASTOR for Run3 physics archival system.
The EOSCTA service entered production at CERN during summer 2020 and since then the 4 biggest LHC experiments have been migrated.
This talk will outline the challenges and the experience we accumulated during CTA service production ramp up as well as an updated overview of the next milestones toward Run3 final deployment.Speaker: Julien Leduc (CERN) -
08:50
Distribution of container images: From tiny deployments to massive analysis on the grid 25m
In recent years, containers became the de-facto standard to package and distribute modern applications and their dependencies. A crucial role in the container ecosystem is played by container registries (specialized repositories meant to store and distribute container images) which have seen an ever-increasing need for additional storage and network capacity to withstand the demand from users. The HEP community also demonstrates an increasing interest, with scientists encapsulating their analysis workflow and code inside a container image. The analysis is first validated on a small dataset and minimal hardware resources to then run at scale on the massive computing capacity provided by the grid.
CERN IT offers a centralized GitLab Container Registry based on S3 storage. This registry is tightly integrated with code repositories hosted on CERN GitLab and allows for building and publishing images via CI/CD pipelines. Plans are to complement the GitLab Registry with Harbor, the Open Cloud Initiative container registry, which provides advanced capabilities including security scans of uploaded images, non-blocking garbage collection of unreferenced blobs, and proxying/replication from/to other registries.
In this context of HEP, the CernVM File System (CVMFS) has recently introduced the support for ingestion and distribution of container images. It implements file-level deduplication and an optimized distribution and caching mechanism that overcome the limitations of the push-pull model used by traditional registries, ultimately making the distribution of containers more efficient across the WLCG resources. A prototype integration between Harbor and CVMFS has been developed to provide the end-users with a unified management portal for their container images while supporting the large-scale analysis scenarios typical of the HEP world.
Speaker: Enrico Bocchi (CERN)
-
08:00
-
09:15
→
09:35
Coffee Break 20m
-
09:35
→
10:00
Storage & File Systems Online workshop
Online workshop
-
09:35
Ceph at RAL 25m
The Rutherford Appleton Laboratory runs three production Ceph clusters providing: Object Storage to the LHC experiments and many others; RBD storage underpinning the STFC OpenStack Cloud and CephFS for local users of the ISIS neutron source. The requirements and hardware for these clusters is very different yet it is underpinned by the same storage technology. This talk will cover the status of Ceph at RAL, operations and lessons learnt during lockdown as well as some of our future plans to continually improve the service offered.
Speaker: Mr Morgan Robinson (Science and Technology Facilities Council)
-
09:35
-
10:00
→
10:50
IT Facilities & Business Continuity Online workshop
Online workshop
-
10:00
HARRY: Aggregate hardware usage metrics to optimise procurement of computing resources 25m
Procuring new IT equipment for the CERN data centre requires
optimizing the computing power and storage capacity while minimizing
the costs. In order to achieve this, understanding how the existing
hardware resources are used in production is key.
To that extent, leveraging traditional monitoring data seems to be the
way to go.
This presentation will explain how we extract interesting
signals from the hardware monitoring metrics and how we use them as a
feedback loop when tendering for new hardware.
It will describe the underlying software project, named HARRY, based
on open source tools.
Finally, it will show how we use HARRY as a long
term trending tool for capacity planning purposes.Speaker: Herve Rousseau (CERN) -
10:25
Evolving the Monitoring and Operations services at RAL [to overcome the challenges of 2020] 25m
STFC's Scientific Computing Department, based at RAL, runs an ever increasing number of services to support the High Energy Physics, Astronomy and Space Science Communities. RAL’s monitoring and operations services were already struggling to scale to meet these demands and the global pandemic highlighted the importance of these systems as home working was enforced. This talk will cover the work that has been completed in the last year to upgrade our exceptional handling monitoring, on call services and ticket system as well as the significant improvements that were made to our time series monitoring service.
Speaker: Mr Christos Nikitas (STFC)
-
10:00
-
14:30
→
16:00
Birds of a Feather (BoF) session: DPM DiscussionConvener: Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
-
16:00
→
17:15
Storage & File Systems Online workshop
Online workshop
-
16:00
LHC Run 3 tape infrastructure plans 25m
CERN IT-ST-TAB section will outline the tape infrastructure hardware plans for the upcoming LHC run 3 period. This presentation will discuss the expected configuration of the tape libraries, tape drives and the necessary quantity of the tape media.
Speaker: Vladimir Bahyl (CERN) -
16:25
Small-file aggregation for dCache tape interface 25m
Since 2015 a so-called Small File Service has been deployed at DESY, to pack small files into containers before writing to tape. As existing detectors have been updated to run under higher trigger rates and new beamlines become operational, the number of arriving files has increased drastically, bringing the pack service to its limits. To cope with increased file arrival rate, the Small File Service is redesigned for better resource utilization and scalability.
Speakers: Mr Tigran Mkrtchyan (DESY), Ms Svenja Meyer (DESY) -
16:50
XRootD5: what's in it for you? 25m
With the latest major release (5.0.0) XRootD framework introduced not only a multitude of architectural improvements and functional enhancements, but also brought a TLS based, secure version of the xroot/root data access protocol (a prerequisite for supporting access tokens). In this contribution we discuss all the ins and outs of the xroots/roots protocol including the importance of asynchronous I/O for ensuring low latencies and high throughput. Furthermore, we report on other developments finalized in release 5, such as the SciTokens authorization plug-in, extended attribute support, universal VOMS attribute extractor and many more.
Speaker: Michal Kamil Simon (CERN)
-
16:00
-
17:15
→
17:35
Coffee Break 20m
-
17:35
→
18:25
Storage & File Systems Online workshop
Online workshop
-
17:35
Magnetic Tape for Mass Storage in HEP 25m
Abstract: Storage technology has changed over the decade, as has the role of storage in experimental research. Traditionally, magnetic tape has been the technology of choice for archival and narrowly targeted near line storage. In recent years there has been a push to have tape play a larger role in near line storage. In this presentation, the economics of tape are examined in light of changing requirements and technology evolution in disk and tape.
Speaker: Shigeki Misawa (Brookhaven National Laboratory (US)) -
18:00
The HEPiX Erasure Coding Working Group 25m
One of the recommendations to come out of the HSF / WLCG Workshop in November 2020 was to create an Erasure Coding Working Group. Its purpose is to help solve some of the data challenges that will be encountered during HL-LHC by enabling sites to store data more efficiently and robustly using Erasure Coding techniques. The working group aims to:
- To provide a forum to allow sites to identify the best underlying storage for their use cases.
- To provide recommendations on how to configure storage to effectively use Erasure Coding.
- To work with VOs to ensure that their workflows run efficiently when accessing data stored via Erasure Coding.Speakers: Alastair Dewhurst (Science and Technology Facilities Council STFC (GB)), Andreas Joachim Peters (CERN), Shigeki Misawa (Brookhaven National Laboratory (US))
-
17:35
-
18:25
→
19:15
IT Facilities & Business Continuity Online workshop
Online workshop
-
18:25
SDCC Operations During Transition to the New Data Center 25m
The BNL Computing Facility Revitalization (CFR) project is aimed at repurposing the former National Synchrotron Light Source (NSLS-I) building (B725) located on BNL site as a new data center for Scientific Data and Computing Center (SDCC). The CFR project has finished the design phase in the first half of 2019 and then entered the construction phase in the second half of 2019 which is currently projected to be finished in May-June 2021 timeframe. The occupancy of the B725 data center for CPU, DISK and TAPE resources of the ATLAS Experiment at the LHC at CERN is expected to begin in June 2021, and for all other collaborations supported by the SDCC Facility including STAR, PHENIX and sPHENIX experiments at RHIC Collider at BNL, the Belle II Experiment at KEK (Japan) in July 2021, hence before the end of FY2021. The new HPC clusters and storage systems of BNL Computational Science Initiative (CSI) are expected to be deployed in B725 data center starting from early FY2022. The period of migration of IT equipment and services to the new data center is going to start with the installation of the new central network equipment and deployment of fiber and copper Ethernet connectivity infrastructure in B725, and then followed by the installation of the new tape library for BNL ATLAS Tier-1 site in 2021Q3. This transition period is expected to continue until the end of FY2023, at which stage the majority of CPU and DISK resources hosted by the SDCC Facility are expected to be located on the floor of B725 and only TAPE resources are to remain split between the old and the new data centers. In this talk I am going to highlight the main design features of the new SDCC datacenter, summarize the preparation activities already underway in our existing data center since FY2018 that are needed to ensure a smooth transition into B515 and B725 datacenters inter-operation period starting in 2021Q3, discuss plans to migrate a subset of IT equipment between the old and the new data centers in CY2021, plans to perform a gradual replacement of IT equipment hosted in the old data center during CY2021-2024 period, and show the expected state of occupancy and infrastructure utilization for both data centers up to FY2026.
Speaker: Mr Alexandr Zaytsev (Brookhaven National Laboratory (US)) -
18:50
Supporting a new Light Source at Brookhaven 25m
In this presentation we give an overview of SDCC's new support for the National Synchrotron Light Source 2 (NSLS-II) at Brookhaven National Lab. This includes the operational changes needed in order to adapt to the needs of BNL's photon science community.
Speaker: William Strecker-Kellogg (Brookhaven National Lab)
-
18:25
-
08:00
→
09:15
-
-
08:15
→
08:30
Site Reports Online workshop
Online workshop
-
08:15
IHEP Site Report 15m
Site Report about Computing platform update and support systems development at IHEP during the past half year.
Speaker: Ran Du
-
08:15
-
08:30
→
08:55
Network & Security Online workshop
Online workshop
-
08:30
LoRaWAN and proximeters against Covid 25m
The SARS COV 2 virus, the cause of the better known COVID-19 disease, has greatly altered our personal and professional lives. Many people are now expected to work from home but this is not always possible and, in such cases, it is the responsibility of the employer to implement protective measures. One simple such measure is to require that people maintain a distance of 2 metres but this places responsibility on employees and leads to two problems. Firstly, the likelihood that safety distances are not maintained and secondly that someone who becomes infected does not remember with whom they may have been in contact. To address both problems, CERN has developed the “proximeter”, a device that, when worn by employees, detects when they are in close proximity to others. Information about any such close contacts is sent securely over a Low Power Wide Area Network (LPWAN) and stored in a manner that respects confidentiality and privacy requirements. In the event that an employee becomes infected with COVID-19 CERN can thus identify all the possible contacts and so prevent the spread of the virus. We discuss here the details of the proximeter device, the LPWAN infrastructure deployed at CERN, the communication mechanisms and the protocols used to respect the confidentiality of personal data.
Speaker: Christoph Merscher (CERN)
-
08:30
-
08:55
→
09:20
Grid, Cloud & Virtualisation Online workshop
Online workshop
-
08:55
Enterprise Cyber-Physical Edge Virtualization Engine (EVE) Project 25m
The Linux Foundation’s FOSS project EVE Edge Virtualization Engine (www.lfedge.org/projects/eve/) is providing a flexible foundation for IoT edge deployments with choice of any hardware, application and cloud. The mission of the Project is to develop an open source project to provide a light-weight virtualization engine for IoT edge gateways and edge servers with built-in security. EVE acts as an operating system and aims to do for the swarms of edge devices what Android did for mobile by creating an open edge computing engine enabling the development, orchestration and security of cloud-native and legacy applications on distributed edge compute nodes. Supporting containers and clusters (Dockers and Kubernetes), virtual machines and unikernels. The EVE runtime can be deployed on any bare metal hardware (e.g. x86, Arm, GPU) or within a VM to provide consistent system and orchestration services. EVE contains enhanced virtualization engines (KVM or XEN) and containerd enabling running virtual machines, docker containers and edge-containers, which are similar to Docker Swarm/Kubernetes pods. The scope of the Project includes software development under an OSI-approved open source license supporting the mission, including documentation, testing, integration and the creation of other artifacts that aid the development, deployment, operation or adoption of the open source software project.
Speaker: Mr Oleg Sadov (ITMO University)
-
08:55
-
09:20
→
09:45
Grid, Cloud & Virtualisation Online workshop
Online workshop
-
09:20
CERN Cloud Infrastructure status update 25m
CERN's private OpenStack cloud offers more than 300,000 cores to over 3,400 users that can programatically access resources like compute, multiple storage types, baremetal, container clusters, and more.
CERN Cloud Team constantly works on improving these services while maintaining stability and availability that is critical for many services in IT and the experiment workflows.
This talk will cover the updates done on CERN Cloud over the past months, current status and further plans.Speaker: Patrycja Ewa Gorniak (Ministere des affaires etrangeres et europeennes (FR))
-
09:20
-
09:45
→
10:05
Coffee Break 20m
-
10:05
→
10:55
Basic IT Services Online workshop
Online workshop
-
10:05
Setting up a PGPool II Cluster 25m
Databases have to fulfil a variety of requirements in an operational system. They should be highly-available, redundant, suffer minimal downtime during maintenance/upgrade works and be easily recoverable in case of critical system failure.
All of these requirements can be realized with a PGPool II cluster that uses PostgreSQL as backends. The high-availability of the backends are provided by the PGPool II frontend while the redundancy is provided by the PostgreSQL backends. Backups of the backends can be managed via pgBackRest. To further reduce downtimes after critical system failures all nodes are puppetized allowing for easy re-installations if necessary.
This talk will introduce the setup and commissioning phase of the described components and pitfalls identified during our tests.
Speaker: Michael Hubner (University of Bonn (DE)) -
10:30
CERNphone update 25m
Since the last HEPiX, CERNphone has evolved from an internal pilot to a widely growing service with hundreds of users across the Organization. In this presentation, we will cover the current deployment of the mobile clients and the status of the upcoming desktop application. We will also describe advanced use cases such as team calls for handling piquet services and replacing shared office phones, call delegation and transfers, and how they were implemented in the back-end.
Speaker: German Cancio (CERN)
-
10:05
-
14:30
→
15:55
Board meeting (closed session) Online workshop
Online workshop
-
16:00
→
16:25
Storage & File Systems Online workshop
Online workshop
-
16:00
The CERN-Solid code investigation project 25m
In this talk we shall introduce the Solid project, launched by Sir Tim Berners-Lee in 2016, as a set of open standards aiming to re-decentralize the Web and empower users’ control over their own data. Solid includes standards, missing from the original Web specifications, giving back to the users ownership of their data, private, shared, and public, choice on the storage where these data reside and control over who has access to them. A brief overview of the Solid specifications, existing implementations and the test suites will be presented. Then our CERN-Solid project will be explained, in which a Solid Proof of Concept (PoC) is being developed, in the form of Indico extensions. Indico being a very popular application and not wishing to store users’ personal data, the PoC consists of implementing two extensions (for users’ Comments in Indico meetings and Personal data in Conference registrations) to be held in the users’ Solid pods and not in Indico. The development design and the architecture and software choices will be explained.
Speaker: Jan Schill
-
16:00
-
16:25
→
17:15
Grid, Cloud & Virtualisation Online workshop
Online workshop
-
16:25
Anomaly Detection in the CERN Cloud Infrastructure 25m
Anomaly Detection in the CERN Openstack Cloud is a challenging task due to the large scale of the computing infrastructure and the large volume of data to monitor.
The current solution to spot anomalous server machines in the cloud infrastructure relies on a threshold-based alarming system carefully set by the system managers on the performance metrics of each infrastructure component. The goal of this work is to explore fully automated and unsupervised machine learning solutions in the Anomaly Detection field. We exploit the current state-of-the-art solutions, including both traditional Anomaly Detection and Deep Anomaly Detection approaches.
This contribution will firstly describe the end-to-end data analytics pipeline that has been implemented to digest the large amount of monitoring data and expose anomalies to the system managers. The pipeline uses open source tools and frameworks, such as Spark, Apache Airflow, Kubernetes, Grafana, Elasticsearch. In addition, the performance of the aforementioned Anomaly Detection algorithms will be discussed.
Speaker: Domenico Giordano (CERN) -
16:50
FTS3: Data Movement Service in containers deployed in OKD 25m
The File Transfer Service (FTS3) is a data movement service developed at CERN which is used to distribute the majority of the Large Hadron Collider's data across the Worldwide LHC Computing Grid (WLCG) infrastructure. At Fermilab, we have deployed a couple of FTS3 instances for Intensity Frontier experiments (e.g. DUNE) to transfer data in America and Europe, using a container-based strategy.
During this talk, we are going to present the two different configurations currently running at Fermilab, comparing and contrasting a Docker-based OKD deployment against a traditional RPM-based deployment and giving an overview of the possible issues encountered. In addition, we discuss our method of certificate management and maintenance utilizing Kubernetes cronjobs.Speaker: Lorena Lobato Pardavila (Fermi National Accelerator Lab. (US))
-
16:25
-
17:15
→
17:35
Coffee Break 20m
-
17:35
→
18:00
Grid, Cloud & Virtualisation Online workshop
Online workshop
-
17:35
Shoal - a dynamic squid cache publishing and advertising tool 25m
Shoal is a squid cache publishing and advertising tool designed to work in
fast changing environments, consistent of three components - the
shoal-server, the shoal-agent, and the shoal-client.
The purpose of shoal is to have a continually updated list of squid
caches. Each squid runs shoal-agent which uses AMQP messages to
publish its existence and the load of the squid to the central
shoal-server. The shoal-server keeps a list of squids in memory and
removes any squid which has not sent it a message recently. The IPs of all
squid servers are geo-referenced. Clients contact the squid server using a
REST interface to retrieve an ordered list of the nearest squids.
While the initial version was based on Python2, we updated the code to be
compatible with Python3. We also used the opportunity to make large
improvements on the functionality of the different components, especially
testing of squids to be used and using an ordering of squids not only
based on geo-location but also on the accessibility of the squids; the
nearest squid may not be the best squid to use. In addition we also
simplified the configuration of the shoal-client and shoal-agent which
will largely configure itself now.Speaker: Dr Marcus Ebert (University of Victoria)
-
17:35
-
18:00
→
18:50
Basic IT Services Online workshop
Online workshop
-
18:00
CERN Authentication and Authorization 25m
CERN is redesigning its authentication and authorization infrastructures around open source software, such as Keycloak for the Single Sign On service and FreeIPA for the LDAP backend.
The project, which is part of the larger CERN MALT initiative, was first introduced at the HEPiX Autumn/Fall 2018 Workshop.
This talk will provide an overview of the new services, which are now in a production or pre-production phase, and discuss the migration strategies.Speaker: Paolo Tedesco (CERN) -
18:25
A Unified approach towards Multi-factor Authentication (MFA) 25m
With more applications and services deployed in BNL SDCC that rely on authentication services, adoption of Multi-factor Authentication (MFA) became inevitable. While web applications can be protected by Keycloak (a open source Single sign-on solution directed by Red Hat) with its MFA feature, other service components within the facility rely on FreeIPA (an open source identity management software directed by Red Hat) for MFA authentication. While this satisfies cyber security requirements, it creates a situation where users need to manage multiple tokens and differentiation of them depends upon what they access. Not only is this a major irritation for users, it also adds a burden for staff members who manage user tokens. To tackle the challenges, a solution needs to be found to provide a unified way for token management. In the presentation, we elaborate a solution that was explored and implemented at the SDCC, and also plan to extend it’s capabilities and flexibility’s for future application integration’s.
Speaker: Masood Zaran (Brookhaven National Labratory)
-
18:00
-
18:50
→
19:10
Workshop wrap-up 20mSpeaker: Tony Wong (Brookhaven National Laboratory)
-
08:15
→
08:30