The KEK Central Computer System (KEKCC) is a computer service and facility that provides large-scale computer resources, including Grid and Cloud computing systems and common IT services, such as e-mail and web services.
Following the procurement policy for the large scale computer system requested by the Japanese government, we replace the entire KEKCC every four or sometimes five years....
News from CERN since the previous HEPiX workshop.
An update on developments and some plans at the RAL Tier1
This is the PIC report for HEPiX Spring 2021 Workshop
Daisy (Data Analysis Integrated Software System) has been designed for the analysis and visualization of the X-ray experiments. To address an extensive range of Chinese radiation facilities community’s requirements from purely algorithmic problems to scientific computing infrastructure, Daisy sets up a cloud-native platform to support on-site data analysis services with fast feedback and...
Collaboration features are nowadays a key aspect for efficient team work with productivity tools. During 2020, CERN has deployed OnlyOffice and Collabora Online solutions and monitored their usage in CERNBox.
This presentation will focus on technical aspects of deploying and maintaining OnlyOffice and Collabora Online within CERN and their integration with CERNBox. It will also give an...
- Over the last decades we mainly focused our MS Windows management policy on hardening machines, we wanted to control and manage how and when security updates were deployed, how software could be installed, licensed and monitored on a machine… But times have changed, IT has evolved and users can now be empowered and regain their freedom. Let’s see together which solutions we put in place to...
A short presentation on what's going on at INFN-T1 site
An update on BNL activities since the Fall 2020 workshop
Diamond Light Source is a Synchrotron Light Source based at the RAL site. This is a summary of what Diamond has been up to in cloud, storage and compute, as well as a few extras.
News and updates of the Canadian ATLAS Tier-1 center over past years. The presentation will cover the site configuration and tools used, how we operate a 'federated' Tier-1 center and improve the CPU utilization.
CERN has historically used RedHat derived Linux distribtions; favored for their relative stability and long life cycle. In December 2020, the CentOS board announced that the end-of-life for CentOS Linux 8 would be changed from a 10 year life cycle to 2 years.
This talk focuses on what CERN will be doing in the short-term to adapt to this announcement, and what the Linux future could look like...
A "just the facts" look at the products and programs Red Hat offers, followed by a Question and Answer session.
In the past six months, Red Hat has made some dramatic announcements. We are aware that these announcements affect how the High Energy Physics community does computing. We want you, Hepix, to make the best informed decisions as you decide your next steps forward. This...
This talk is an update on CERN's project to build up a new e-mail service at CERN, focused on Free and Open Source Software, and to migrate all of its users. As presented in HEPiX Autumn 2019, CERN has been working on migrating out of Microsoft Exchange since Spring 2018. However in early Spring 2020, in the middle of the migration...
The presentation discusses the change of the DHCP software used for the CERN central DHCP service, namely the migration from ISC DHCP to Kea. It outlines the motivation behind the replacement of ISC DHCP and describes the main steps of the transition process. It covers the translation of the current CERN ISC DHCP configuration, testing the new Kea configuration, and the implementation of the...
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks to and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects...
The transition of WLCG storage and central services to dual-stack IPv4/IPv6 has gone well, thus enabling the use of IPv6-only CPU resources as mandated by the WLCG Management Board. Many WLCG data transfers now take place over IPv6. The dual-stack deployment does however result in a networking environment which is much more complex than when using just IPv4 or just IPv6. During recent months...
JupyterLab has become an increasingly popular platform for rapid prototyping, teaching algorithms or sharing small analyses in a self-documenting manner.
However, it is commonly operated using dedicated cloud-like infrastructures (e.g. Kubernetes) which often need to be maintained in addition to existing HTC systems. Furthermore, federation of resources or opportunistic usage are not...
Exploitation of heterogeneous opportunistic resources is an important ingredient to fulfil the computing requirements of large HEP experiments in the future. Potential candidates for integration are Tier 3 centres, idling cores in HPC centres, cloud resources, etc. To make this work, it is essential to choose a technology which offers an easy integration of those resources into the computing...
On March 2020, INFN-T1 started the process of moving all the Worker Nodes managed by LSF to the HTCondor batch system, which was set up and tested in the previous months and was considered ready to handle the workload of the whole computing cluster. On March 20, while in the middle of the migration process, a sudden request came to provide 50% of our computing power for a period of one month...
Since 2017, the Worldwide LHC Computing Grid (WLCG) has been working towards enabling Token based authentication and authorisation throughout its entire middleware stack. Following the publication of the WLCGv1.0 Token Schema in 2019, middleware developers have been able to enhance their services to consume and validate OAuth2.0 tokens and process the authorization information they...
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and...
As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as...
The Trusted CI Framework provides a structure for organizations to establish and, improve, and evaluate their cybersecurity programs. The framework empowers organizations to confront their cybersecurity challenges from a mission-oriented, programmatic, and full organizational lifecycle perspective.
The Trusted CI Framework is structured around 4 Pillars that support a cybersecurity program:...
Following up from the work of the HEPiX benchmarking working group, WLCG launched a task force primarily tasked to concretely propose a successor to HEP-SPEC 06 as standard benchmark for CPU resources in WLCG. We will present an overview of the
mandate and composition of the task force and will report on status and plans.
Since 2 years the HEPiX Benchmarking Working Group has been developing a benchmark based on actual software workloads of the High Energy Physics community, called HEPscore. This approach, based on container technologies, is designed to provide a benchmark that is better correlated with the actual throughput of the experiment production workloads. In addition, the procedures to run and collect...
BNL's first institutional cluster is reaching the end of life, and it has started the process of replacing its capabilities with new resources. This presentation reviews historical usage of existing resources and describes the replacement process, including timelines, composition and plans for expansion of the user community that will use the new resources.
According to the estimated data rates, we predict 24 PB raw experimental data will be produced per month from 14 beamlines at the first stage of High Energy Photon Source (HEPS), and the volume of experimental data will be even greater with the completion of over 90 beamlines at the second stage in the future. To make sure that huge amount of data collected at HEPS is accurate, available and...
The CERN Tape Archive is the tape back-end to EOS and the replacement for CASTOR for Run3 physics archival system.
The EOSCTA service entered production at CERN during summer 2020 and since then the 4 biggest LHC experiments have been migrated.
This talk will outline the challenges and the experience we accumulated during CTA service production ramp up as well as an updated overview of the...
In recent years, containers became the de-facto standard to package and distribute modern applications and their dependencies. A crucial role in the container ecosystem is played by container registries (specialized repositories meant to store and distribute container images) which have seen an ever-increasing need for additional storage and network capacity to withstand the demand from users....
The Rutherford Appleton Laboratory runs three production Ceph clusters providing: Object Storage to the LHC experiments and many others; RBD storage underpinning the STFC OpenStack Cloud and CephFS for local users of the ISIS neutron source. The requirements and hardware for these clusters is very different yet it is underpinned by the same storage technology. This talk will cover the status...
Procuring new IT equipment for the CERN data centre requires
optimizing the computing power and storage capacity while minimizing
the costs. In order to achieve this, understanding how the existing
hardware resources are used in production is key.
To that extent, leveraging traditional monitoring data seems to be the
way to go.
This presentation will explain how we extract...
STFC's Scientific Computing Department, based at RAL, runs an ever increasing number of services to support the High Energy Physics, Astronomy and Space Science Communities. RAL’s monitoring and operations services were already struggling to scale to meet these demands and the global pandemic highlighted the importance of these systems as home working was enforced. This talk will cover the...
CERN IT-ST-TAB section will outline the tape infrastructure hardware plans for the upcoming LHC run 3 period. This presentation will discuss the expected configuration of the tape libraries, tape drives and the necessary quantity of the tape media.
Since 2015 a so-called Small File Service has been deployed at DESY, to pack small files into containers before writing to tape. As existing detectors have been updated to run under higher trigger rates and new beamlines become operational, the number of arriving files has increased drastically, bringing the pack service to its limits. To cope with increased file arrival rate, the Small File...
With the latest major release (5.0.0) XRootD framework introduced not only a multitude of architectural improvements and functional enhancements, but also brought a TLS based, secure version of the xroot/root data access protocol (a prerequisite for supporting access tokens). In this contribution we discuss all the ins and outs of the xroots/roots protocol including the importance of...
Abstract: Storage technology has changed over the decade, as has the role of storage in experimental research. Traditionally, magnetic tape has been the technology of choice for archival and narrowly targeted near line storage. In recent years there has been a push to have tape play a larger role in near line storage. In this presentation, the economics of tape are examined in light of...
One of the recommendations to come out of the HSF / WLCG Workshop in November 2020 was to create an Erasure Coding Working Group. Its purpose is to help solve some of the data challenges that will be encountered during HL-LHC by enabling sites to store data more efficiently and robustly using Erasure Coding techniques. The working group aims to:
- To provide a forum to allow sites to...
The BNL Computing Facility Revitalization (CFR) project is aimed at repurposing the former National Synchrotron Light Source (NSLS-I) building (B725) located on BNL site as a new data center for Scientific Data and Computing Center (SDCC). The CFR project has finished the design phase in the first half of 2019 and then entered the construction phase in the second half of 2019 which is...
In this presentation we give an overview of SDCC's new support for the National Synchrotron Light Source 2 (NSLS-II) at Brookhaven National Lab. This includes the operational changes needed in order to adapt to the needs of BNL's photon science community.
Site Report about Computing platform update and support systems development at IHEP during the past half year.
The SARS COV 2 virus, the cause of the better known COVID-19 disease, has greatly altered our personal and professional lives. Many people are now expected to work from home but this is not always possible and, in such cases, it is the responsibility of the employer to implement protective measures. One simple such measure is to require that people maintain a distance of 2 metres but this...
The Linux Foundation’s FOSS project EVE Edge Virtualization Engine (www.lfedge.org/projects/eve/) is providing a flexible foundation for IoT edge deployments with choice of any hardware, application and cloud. The mission of the Project is to develop an open source project to provide a light-weight virtualization engine for IoT edge gateways and edge servers with built-in security. EVE acts as...
CERN's private OpenStack cloud offers more than 300,000 cores to over 3,400 users that can programatically access resources like compute, multiple storage types, baremetal, container clusters, and more.
CERN Cloud Team constantly works on improving these services while maintaining stability and availability that is critical for many services in IT and the experiment workflows.
This talk...
Databases have to fulfil a variety of requirements in an operational system. They should be highly-available, redundant, suffer minimal downtime during maintenance/upgrade works and be easily recoverable in case of critical system failure.
All of these requirements can be realized with a PGPool II cluster that uses PostgreSQL as backends. The high-availability of the backends are provided...
Since the last HEPiX, CERNphone has evolved from an internal pilot to a widely growing service with hundreds of users across the Organization. In this presentation, we will cover the current deployment of the mobile clients and the status of the upcoming desktop application. We will also describe advanced use cases such as team calls for handling piquet services and replacing shared office...
In this talk we shall introduce the Solid project, launched by Sir Tim Berners-Lee in 2016, as a set of open standards aiming to re-decentralize the Web and empower users’ control over their own data. Solid includes standards, missing from the original Web specifications, giving back to the users ownership of their data, private, shared, and public, choice on the storage where these data...
Anomaly Detection in the CERN Openstack Cloud is a challenging task due to the large scale of the computing infrastructure and the large volume of data to monitor.
The current solution to spot anomalous server machines in the cloud infrastructure relies on a threshold-based alarming system carefully set by the system managers on the performance metrics of each infrastructure component. The...
The File Transfer Service (FTS3) is a data movement service developed at CERN which is used to distribute the majority of the Large Hadron Collider's data across the Worldwide LHC Computing Grid (WLCG) infrastructure. At Fermilab, we have deployed a couple of FTS3 instances for Intensity Frontier experiments (e.g. DUNE) to transfer data in America and Europe, using a container-based strategy....
Shoal is a squid cache publishing and advertising tool designed to work in
fast changing environments, consistent of three components - the
shoal-server, the shoal-agent, and the shoal-client.
The purpose of shoal is to have a continually updated list of squid
caches. Each squid runs shoal-agent which uses AMQP messages to
publish its existence and the load of the squid to the...
CERN is redesigning its authentication and authorization infrastructures around open source software, such as Keycloak for the Single Sign On service and FreeIPA for the LDAP backend.
The project, which is part of the larger CERN MALT initiative, was first introduced at the HEPiX Autumn/Fall 2018 Workshop.
This talk will provide an overview of the new services, which are now in a production...
With more applications and services deployed in BNL SDCC that rely on authentication services, adoption of Multi-factor Authentication (MFA) became inevitable. While web applications can be protected by Keycloak (a open source Single sign-on solution directed by Red Hat) with its MFA feature, other service components within the facility rely on FreeIPA (an open source identity management...