Conveners
Track 9 – Exascale Science: HPC facilities
- Fabio Hernandez (IN2P3 / CNRS computing centre)
Track 9 – Exascale Science: Strategies by experiments and organizations
- Wei Yang (SLAC National Accelerator Laboratory (US))
Track 9 – Exascale Science: Porting applications to HPCs
- Steven Farrell (Lawrence Berkeley National Lab (US))
Track 9 – Exascale Science: Scheduling, computing environment
- Fabio Hernandez (IN2P3 / CNRS computing centre)
Track 9 – Exascale Science: Software environment, quantum algorithms, others
- Steven Farrell (Lawrence Berkeley National Lab (US))
The INFN Tier-1 located at CNAF in Bologna (Italy) is a major center of the WLCG e-Infrastructure, supporting the 4 major LHC collaborations and more than 30 other INFN-related experiments.
After multiple tests towards elastic expansion of CNAF compute power via Cloud resources (provided by Azure, Aruba and in the framework of the HNSciCloud project), but also building on the experience...
The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a...
Nowadays, a number of technology R&D activities has been launched in Europe trying to close the gap with traditional HPC providers like USA and Japan and more recently emerging ones like China.
The EU HPC strategy, funded through EuroHPC initiative, leverages on two different pillars: the first one targets the procurement and the hosting of two/three commercial pre-Exascale systems, in order...
The Dutch science funding organization NWO is in the process of drafting requirements for the procurement of a future high-performance compute facility. To investigate the requirements for this facility to potentially support high-throughput workloads in addition to traditional high-performance workloads, a broad range of HEP workloads are being functionally tested on the current facility. The...
We present recent work in supporting deep learning for particle physics and cosmology at NERSC, the US Dept. of Energy mission HPC center. We describe infrastructure and software to support both large-scale distributed training across (CPU and GPU) HPC resources and for productive interfaces via Jupyter notebooks. We also detail plans for accelerated hardware for deep learning in the future...
High Performance Computing (HPC) centers are the largest facilities available for science. They are centers of expertise for computing scale and local connectivity and represent unique resources. The efficient usage of HPC facilities is critical to the future success of production processing campaigns of all Large Hadron Collider (LHC) experiments. A substantial amount of R&D investigations...
High Energy Physics (HEP) experiments will enter a new era with the start of the HL-LHC program, where computing needs required will surpass by large factors the current capacities. Looking forward to this scenario, funding agencies from participating countries are encouraging the HEP collaborations to consider the rapidly developing High Performance Computing (HPC) international...
The High-Luminosity LHC will provide an unprecedented data volume of complex collision events. The desire to keep as many of the "interesting" events for investigation by analysts implies a major increase in the scale of compute, storage and networking infrastructure required for HL-LHC experiments. An updated computing model is required to facilitate the timely publication of accurate physics...
High Performance Computing (HPC) supercomputers are expected to play an increasingly important role in HEP computing in the coming years. While HPC resources are not necessarily the optimal fit for HEP workflows, computing time at HPC centers on an opportunistic basis has already been available to the LHC experiments for some time, and it is also possible that part of the pledged computing...
Predictions for requirements for the LHC computing for Run 3 and Run 4 (HL_LHC) over the course of the next 10 years show a considerable gap between required and available resources, assuming budgets will globally remain flat at best. This will require some radical changes to the computing models for the data processing of the LHC experiments. Concentrating computational resources in fewer...
MPI-learn and MPI-opt are libraries to perform large-scale training and hyper-parameter optimization for deep neural networks. The two libraries, based on Message Passing Interface, allows to perform these tasks on GPU clusters, through different kinds of parallelism. The main characteristic of these libraries is their flexibility: the user has complete freedom in building her own model,...
CERN IT department has been maintaining different HPC facilities over the past five years, one in Windows and the other one on Linux as the bulk of computing facilities at CERN are running under Linux. The Windows cluster has been dedicated to engineering simulations and analysis problems. This cluster is a High Performance Computing (HPC) cluster thanks to powerful hardware and low-latency...
The upcoming generation of exascale HPC machines will all have most of their computing power provided by GPGPU accelerators. In order to be able to take advantage of this class of machines for HEP Monte Carlo simulations, we started to develop a Geant pilot application as a collaboration between HEP and the Exascale Computing Project. We will use this pilot to study and characterize how the...
Covariance matrices are used for a wide range of applications in particle ohysics, including Kalman filter for tracking purposes, as well as for Primary Component Analysis and other dimensionality reduction techniques. The covariance matrix contains covariance and variance measures between all permutations of data dimensions, leading to high computational cost.
By using a novel decomposition...
Detailed simulation is one of the most expensive tasks, in terms of time and computing resources for High Energy Physics experiments. The need for simulated events will dramatically increase for the next generation experiments, like the ones that will run at the High Luminosity LHC. The computing model must evolve and in this context, alternative fast simulation solutions are being studied....
In view of the increasing computing needs for the HL-LHC era, the LHC experiments are exploring new ways to access, integrate and use non-Grid compute resources. Accessing and making efficient use of Cloud and supercomputer (HPC) resources present a diversity of challenges. In particular, network limitations from the compute nodes in HPC centers impede CMS experiment pilot jobs to connect to...
ATLAS distributed computing is allowed to opportunistically use resources of the Czech national HPC center IT4Innovations in Ostrava. The jobs are submitted via an ARC Compute Element (ARC-CE) installed at the grid site in Prague. Scripts and input files are shared between the ARC-CE and the shared file system located at the HPC, via sshfs. This basic submission system has worked there since...
The ATLAS experiment is using large High Performance Computers (HPC's) and fine grained simulation workflows (Event Service) to produce fully simulated events in an efficient manner. ATLAS has developed a new software component (Harvester) which provides resource provisioning and workload shaping. In order to run effectively on the largest HPC machines, ATLAS develop Yoda-Droid software to...
The Solenoidal Tracker at RHIC (STAR) is a multi-national supported experiment located at Brookhaven National Lab and is currently the only remaining running experiment at RHIC. The raw physics data captured from the detector is on the order of tens of PBytes per data acquisition campaign, which makes STAR fit well within the definition of a big data science experiment. The production of the...
Abstract: Over the last few years, many physics experiments migrated their computations from customized locally managed computing clusters to orders of magnitude larger multi-tenant HPC systems often optimized for highly parallelizable long-runtime computations. Historically, physics simulations and analysis workflows were designed for a single core CPUs with abundant RAM, plenty of local...
The Square Kilometre Array (SKA) project is an international effort to build the world’s largest radio telescope, led by the SKA Organisation based at the Jodrell Bank Observatory near Manchester, UK. The SKA will conduct transformational science to improve our understanding of the Universe and the laws of fundamental physics, monitoring the sky in unprecedented detail and mapping it hundreds...
The software pipeline for ASKAP has been developed to run on the Galaxy supercomputer as a succession of MPI enabled coarsely parallelised applications. We have been using OpenACC to develop more finely grained parallel applications within the current code base that can utilise GPU accelerators if they are present. Thereby eliminating the overhead of maintaining two versions of the software...
In November 2018, KISTI-5 supercomputer has launched. It is the heterogeneous machine of 25.3 PF Cray 3112-AA000T with Intel Xeon Phi KNL (Knight Landing) 7250 processor which has 68 cores per processor. The goal of this presentation is to discuss the application and usages of Intel KNL-based system of KISTI-5 supercomputer for physics beyond the Standard Model.
The world is made of dark...
Within the fusion radiation transport community for many years the de facto standard codebase for simulation was and still is MCNP. MCNP suffers from very few community perceived drawbacks having widely validated and verified physics, large user base, simple interface, but the main issue in the age of democratised computing access is prohibitive licence conditions. Thus, if we need to be able...
The open source ROCm platform for GPU computing provides an uniform framework to support both the NVIDIA and AMD GPUs, and also the possibility to porting the CUDA code to the ROCm-compatible one. We will present the porting progress on the Overlap fermion inverter (GWU-code) based on thrust and also a general inverter package - QUDA.