Introduction to Barcelona and PIC
The journal Computing and Software for Big Science has been starting 18 months ago as a peer-reviewed journal, with a strong participation of the HEP community. This presentation will give the journal status after this initial period.
Site report, news and ongoing activities at the Swiss National Supercomputing Centre T2 site (CSCS-LCG2) running ATLAS, CMS and LHCb.
- Brief description about the site, location, size, resource plans until 2020
- Complexities of the collaboration between the 4 parties
- Next steps after LHConCRAY and Tier-0 spillover tests
We will present an update on AGLT2, focusing on the changes since the Spring 2018 report. The primary topics to cover include status of IPv6, an update on VMware, dCache and our Bro/MISP deployment. We will also describe the new equipment purchases and personnel changes for our site.
All projects hosted at KEK have actively proceeded in 2018. The SuperKEKB/Belle II experiment succeeded in observing the first collisions in April 2018 and accumulating the data until July continuously. The J-PARC accelerator has also provided various kinds of the beam simultaneously. Most of the experimental data are transferred to the KEK central computer system (KEKCC) and stored. In this...
News from PIC since the HEPiX Spring 2018 workshop at Madison, Wisconsin, USA.
News from CERN since the HEPiX Spring 2018 workshop at University of Wisconsin-Madison, Madison, USA.
The Scientific Data & Computing Center (SDCC) at Brookhaven National Laboratory (BNL) serves the computing needs of experiments at RHIC, while also serving as the US ATLAS Tier-1, as well as Belle-2 Tier-1 facility. This presentation provides an overview of the BNL SDCC, highlighting significant developments since the last HEPiX meeting at University of Wisconsin-Madison.
News and updates from the distributed NDGF Tier1 site.
JLab high performance and experimental physics computing environment updates since the Fall 2016 meeting, with recent and upcoming hardware procurements for compute nodes, including Skylake, Volta, and/or Intel KNL accelerators; our Supermicro storage; Lustre status; 12GeV computing status; integrating offsite resources; Data Center modernization.
CERN's networks comprise approximately 400 routers and 4000 switches from multiple vendors and from different generations, fulfilling various purposes (campus network, datacentre network, and dedicated networks for the LHC accelerator and experiments control).
To ensure the reliability of the networks, the IT Communication Systems group has developed an in-house Perl-based software called...
During recent times, we can observe a situation where Wi-Fi service starts to be primary Internet connection method in a campus networking. Wireless access is no longer an additional service or just interesting technology for conference and meeting rooms - now support for mobility is expected. In 2016, CERN launched global Wi-Fi renewal project across its campus. The subject of the project is...
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and...
The transition of WLCG central and storage services to dual-stack IPv4/IPv6 is progressing well, thus enabling the use of IPv6-only CPU resources as agreed by the WLCG Management Board and presented by us at previous HEPiX meetings.
All WLCG Tier 1 data centres have IPv6 connectivity and much of their storage is now accessible over IPv6. The LHC experiments have also requested all WLCG Tier 2...
High Energy Physics (HEP) experiments have greatly benefited from a strong relationship with Research and Education (R&E) network providers and thanks to the projects such as LHCOPN/LHCONE and REN contributions, have enjoyed significant capacities and high performance networks for some time. RENs have been able to continually expand their capacities to over-provision the networks relative to...
After several years operating the same manufacturer in the CERN campus network, it's now the time to renew the equipment. CERN Communication Systems group is preparing the introduction of a new manufacturer wth some changes based on requirements received from the users at CERN
During last years we observed a significant increase of the interest in Internet of Things (IoT) devices. Such equipment of different kind is more and more often considered as valuable and useful tool in the industry. Therefore, IoT comprises a wide variety of devices, user cases and can be connected to the network, using many different access methods.
As CERN network team, we would like to...
PDSF, the Parallel Distributed Systems Facility has been in continuous operation since 1996, serving high energy physics research. The cluster is a tier-1 site for Star, a tier-2 site for Alice and a tier-3 site for Atlas.
This site report will describe lessons learned and challenges met running containerized software stacks using Shifter, as well as upcoming changes to systems management and...
Update on computing at LAL
Site report for Surfsara, formerly known as SARA, part of the Dutch Tier1 site.
We will give an overview of the site and will share experience with these topics: update to the current release of DPM, HT-Condor configuration, Foreman installation and setup.
The Tokyo regional analysis center located at International Center for Elementary Particle Physics in the University of Tokyo supports ATLAS VO as one of the WLCG Tier2 sites.
It provides 10,000 CPU cores and 10 PB disk storage including the local resource dedicated for the ATLAS-Japan member.
All hardware devices are supplied by the three years rental, and current contract will finish at the...
In the CERN IT agile infrastructure, Puppet is the key asset for automated configuration management of the more than forty thousand machines in the CERN Computer Centre. The large number of virtual and physical machines runs a variety of services that need to interact with each other at run time.
These needs triggered the creation the CERNMegas project, which automates the communication...
In Belgium, the 'Université catholique de Louvain' (UCLouvain) hosts a Tier-2 WLCG site. The computing infrastructure has recently been merged with the General Purpose cluster of the university. During that merge, the deployment process for the compute nodes has been re-thought, using a combination of three open-source software tools: Cobbler, Ansible and Salt. Those three tools work together...
The current authentication schemes used at CERN are based on Kerberos for desktop and terminal access, and on Single Sign-On (SSO) tokens for web-based applications.
Authorization schemes are managed through LDAP groups, leading to privacy concerns and requiring a CERN accounts to make possible the mapping to to a group.
This scenario is completely separated from WLCG, where authentication...
This talk will provide an overview of the recent changes in architecture and development procedures in use to manage all CERN networks (campus, experiments and technical networks).
These include:
- the migration from 20 years old PL/SQL code to Java using modern microservices architecture,
- the migration from multiple Git repositories to a single one in order to simplify organization and...
CERN has been using ITIL Service Management methodologies and ServiceNow since early 2011. Initially a joint project between just the Information Technology and the General Services Departments, now most of CERN is using this common methodology and tool, and all departments are represented totally or partially in the CERN Service Catalogue.
We will present a summary of the current situation...
An update on CERN Linux support distributions and services.
An update on the CentOS community and CERN involvement will be given.
We will discuss software collections, virtualization and openstack SIGs update.
We will present our anaconda plugin and evolution of the locmap tool.
A brief status on alternative arches (aarch64, ppc64le, etc...) work done by the community will be given.
In this presentation, we will report on Indico's usage within the High Energy Physics Community, with a particular focus in the adoption of Indico 2.x.
We will also go over the most recent developments in the project, including the new Room Booking interface in version 2.2, as well as plans for the future.
Over the years, CERN activities and services have been increasingly relying on commercial software and solutions to deliver core services, often leveraged by interesting financial conditions based on recognizing CERN statuses like academic, non-profit, research, etc. Once installed, well spread, heavily used, the leverage used to attract CERN service managers to the commercial solutions tends...
Introduced in 2016, the "PaaS for Web Applications" service aims at providing CERN users with a modern environment for web application hosting, following the Platform-as-a-Service paradigm. The service leverages the Openshift (now OKD) container orchestrator.
We will provide a quick overview of the project, its use cases and how it evolved over its more than two years of production phase....
The BNL Scientific Data and Computing Center (SDCC) has been developing a user analysis portal based on Jupyterhub and leveraging the large scale computing resources available at SDCC. We present the current status of the portal and issues of growing and integrating the user base.
IHEP computing platform has been running a Tier2 WLCG site and a 12,000 local HTCluster with ~ 9PB storage. The talk will talk about the optimization of HTCondor local cluster, the next plan for HPC cluster, the progress has been done with IHEP campus network and new functions provided to users.
Private and public cloud infraestructures had become a reality in the recent years. Science is looking at this solutions to extend the amount of computing and storage facilities for the research projects that are becoming bigger and bigger. This is the aim of Helix Nebula Science Cloud (HNSciCloud) a project lead by CERN in which we submitted two use cases for the astrophysics projects MAGIC...
Presenting what is Function as a service (FaaS) and how it can be used in the data center. Will be showing how NERSC is using FaaS to get added functionality to the data we collect. Also presented will be some basic python on to perform these functions and how to connect to Elastic both natively and via the API.
Evolving the current data storage, management and access models is the main challenge in WLCG and certainly in scientific computing for the coming years. HL-LHC will be exceeding what funding agencies can provide by an order of magnitude. Forthcoming experiments in particle physics, cosmology and astrophysics also foresee similar magnitudes in data volumes. The concepts of storage federations,...
The report introduces the design of the IHEP SDN campus network which aims to the separation of the control plane and the data forwarding plane. The new designed SDN network, integrated with IHEP authentication system, has achieved 802.1X-based user access control. It can obtain the network information in the data forwarding plane through the controller and provide more network management...
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks to and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects...
The processing of personal data is inevitable within a work context and CERN aims to follow and observe best practices as regards the collection, processing and handling of this type of data. This talk aims to give an overview on how CERN and especially the IT department is implementing Data Protection.
Trusted CI, the National Science Foundation's Cybersecurity Center of Excellence, is in the process of updating their Guide (published in 2014) and recasting it as a framework for establishing and maintaining a cybersecurity program for open science projects. The framework is envisioned as being appropriate for a wide range of projects, both in terms of scale and life cycle. The Framework is...
AMD returned to the server CPU market in 2017 with the release of their EPYC line of CPUs, based on the Zen microarchitecture. In this presentation, we'll provide an overview of the AMD EPYC CPU architecture, and how it differs from Intel's Xeon Skylake. We'll also present performance and cost comparisons between EPYC and Skylake, with an emphasis on use in HEP/NP computing environments.
The HEPiX Benchmarking Working Group is working on a new 'long-running' benchmark to measure installed capacities to replace the currently used HS06. This presentation will show the current status.
The classic workflow of a expermient in a synchrotron facility starts with the users coming physically to the facility with their samples, they analyze those samples with the beamline equipment and finally they get back to their institution with a huge amount of data in a portable hard disk.
The data reduction and analysis is done majorly on the scientific institution of the user. As data...
This is a report on the recently held workshop at BNL on Central Computing Facilities support for Photon Sciences, with participation from various Light Source facilities from Europe and the US.
Scaling an OpenMP or MPI application on modern TurboBoost-enabled CPUs is getting harder and harder. Using some simple 'openssl' commands, however, it is possible to adjust OpenMP benchmarking results to correct for the TurboBoost frequencies of modern Intel and AMD CPUs. In this talk I will explain how to achieve better OpenMP scaling numbers and will show how a non-root user can determine...
Predictions for requirements for the LHC computing for Run 3 and for Run 4 (HL_LHC) over the course of the next 10 years show a considerable gap between required and available resources, assuming budgets will globally remain flat at best. This will require some radical changes to the computing models for the data processing of the LHC experiments. The use of large scale general purpose...
PDSF, the Parallel Distributed Systems Facility, has been in continuous operation since 1996 serving high-energy and nuclear physics research. It is currently a tier-1 site for STAR, a tier-2 site for ALICE, and a tier-3 site for ATLAS. We are in the process of migrating the PDSF workload from the existing commodity cluster to the Cori Cray XC40 system. Docker containers enable running the...
With the demands of LHC computing, coupled with pressure on the traditional resources available, we need to find new sources of compute power. We have described, at HEPiX and elsewhere how we have started to explore running batch workloads on storage servers at CERN, and on public cloud resources. Since the summer of 2018, ATLAS & LHCb have started to use a pre-production service on storage...
Report from kick-off meeting
The HSF/WLCG cost and performance modeling working group was established in November 2017 and has since then achieved considerable progress in our understanding of the performance factors of the LHC applications, the estimation of the computing and storage resources and the cost of the infrastructure and its evolution for the WLCG sites. This contribution provides an update on the recent...
This is an overview of status and plans for the procurements of compute/storage for CERN data centres and some recent adaptations to better benefit from technology trends. The talk will also cover our workflow for hardware repairs as well as status and plans of our ongoing efforts in establishing inventories for deployed assets and spare parts. It will also cover some recent hardware issues...
This talk will be about the proposed superfacility structure that NERSC is working toward. This is the next model of HPC computing combining all aspects of the data center, infrastructure, WAN and the experiments. We are in the early stages of defining and standing up such a facility.
The CERN IT Storage group operates multiple distributed storage systems and it is responsible for the support of the CERN storage infrastructure, ranging from the physics data of the LHC and non-LHC experiments to users' files.
This talk will summarise our experience and the ongoing work in evolving our infrastructure, focusing on some of the most important areas.
EOS is the high-performance...
In 2019 and 2020 there will be the long shutdown of the LHC.
Besides technical interventions on the accelerator itself, we plan to review our practices and upgrade our Oracle databases.
In this session I will show what goals we have on the Oracle Database side and how we intend to achieve them.
The CERN IT-Storage group Analytics and Development section is responsible for the development of Data Management solutions for Disk Storage and Data Transfer. These solutions include EOS - the high-performance CERN IT distributed storage for High-Energy Physics, DPM - a system for managing disk storage at small and medium sites, and FTS - the service responsible for distributing the majority...
In this talk, I will give an update to the ATLAS data carousel R&D project, focused mainly on the recent tape performance tests, on the T1 sites. Main topics to be covered include :
1) the overall throughput delivered by the tape system
2) the overall throughput delivered to the end user (rucio)
3) any issues/bottlenecks in the various service layers (tape, SRM, FTS, rucio) observed
4)...
The NSF-funded OSiRIS project (http://www.osris.org), which is creating a multi-institutional storage infrastructure based upon Ceph and SDN is entering its fourth year. We will describe the project, its science domain users and use cases, and the technical and non-technical challenges the project has faced. We will conclude by outlining the plans for the remaining two years of the project...
Fujifilm is the world's leading manufacturer of magnetic tapes (LTO, 3592 and T10000). More than 80% of the storage capacity delivered on tapes comes from Fujifilm's manufacturing and assembly plants in Odawara (Japan) and Bedford (USA). Fujifilm is working in partnership with IBM on the development of the
next tape generations: a roadmap is established, describing the tape formats that will...
FlexiRemap® Technology
Patented design, built from the ground up for replacing disk- based
RAID in flash storage arrays.
Faster performance, longer SSD lifespan, and advanced data
protection over traditional RAID technology.
CERN has purchased LTO-8 tape drives that we have started to use with LTO-8 and LTO-7 type M media. While LTO is attractive in terms of cost/TB, it lacks functionality for fast positioning available on enterprise drives, such as a high-resolution tape directory and drive-assisted file access ordering (aka RAO). In this contribution, we will outline our experiences with LTO so far, describe...
High performance computing (HPC) environments continually test the limits of technology and require peak performance from their equipment—including storage. Slow overall writing of data and long seek times between file reads due to non-consecutive files, library partitioning, or laborious loading mechanisms often plague tape library efficiencies for large tape users managing massive sets of...
Short report on the workshop held at RAL in September, and an outlook to the next workshop
The language Rust has many potential benefits for the physics community. This talk explains why I think this is true in general and for HEP.
This will be a general introduction to the rust programing language, as a replacement for C/C++ and way to extend python.
CTA is designed to replace CASTOR as the CERN Tape Archive solution, in order to face scalability and performance challenges arriving with LHC Run-3.
This presentation will give an overview of the initial software deployment on production grade infrastructure. We discuss its performance against various workloads: from artificial stress tests to production condition data transfer sessions with...
CERN's Backup Service hosts around 11 PB of data, in more than 2.1 billion files. We have over 500 clients which back up or restore an average of 80 TB of data each day. At the current growth rate, we expect to have about 13 PB by the end of 2018.
In this contribution we review the impact of the latest changes of the backup infrastructure. We will see how these optimizations helped us to...