Conveners
Parallel (Track 9): Analysis facilities and interactive computing
- Marta Czurylo (CERN)
- Nick Smith (Fermi National Accelerator Lab. (US))
Parallel (Track 9): Analysis facilities and interactive computing
- Enric Tejedor Saavedra (CERN)
- Nicole Skidmore (University of Warwick)
Parallel (Track 9): Analysis facilities and interactive computing
- Enric Tejedor Saavedra (CERN)
- Nicole Skidmore (University of Warwick)
Description
Analysis facilities and interactive computing
Experiment analysis frameworks, physics data formats and expectations of scientists at the LHC have been evolving towards interactive analysis with short turnaround times. Several sites in the community have reacted by setting up dedicated Analysis Facilities, providing tools and interfaces to computing and storage resources suitable for interactive analysis. It is expected that this demand...
The National Analysis Facility at DESY has been in production for nearly 15 years. Over various stages of development, experiences gained in continuous operations have continuously been feed and integrated back into the evolving NAF. As a "living" infrastructure, one fundamental constituent of the NAF is the close contact between NAF users, NAF admins and storage admins & developers. Since the...
The anticipated surge in data volumes generated by the LHC in the coming years, especially during the High-Luminosity LHC phase, will reshape how physicists conduct their analysis. This necessitates a shift in programming paradigms and techniques for the final stages of analysis. As a result, there's a growing recognition within the community of the need for new computing infrastructures...
We explore the adoption of cloud-native tools and principles to forge flexible and scalable infrastructures, aimed at supporting analysis frameworks being developed for the ATLAS experiment in the High Luminosity Large Hadron Collider (HL-LHC) era. The project culminated in the creation of a federated platform, integrating Kubernetes clusters from various providers such as Tier-2 centers,...
This work is going to show the Spanish Tier-1 and Tier-2s contribution to the computing of the ATLAS experiment at the LHC during the Run3 period. The Tier-1 and Tier-2 GRID infrastructures, encompassing data storage, processing, and involvement in software development and computing tasks for the experiment, will undergo updates to enhance efficiency and visibility within the experiment.
The...
The analysis of data collected by the ATLAS and CMS experiments at CERN, ahead of the next phase of high-luminosity at the LHC, requires a flexible and dynamic access to big amounts of data, as well as an environment capable of dynamically accessing distributed resources. An interactive high throughput platform, based on a parallel and geographically distributed back-end, has been developed in...
Scientific computing relies heavily on powerful tools like Julia and Python. While Python has long been the preferred choice in High Energy Physics (HEP) data analysis, there’s a growing interest in migrating legacy software to Julia. We explore language interoperability, focusing on how Awkward Array data structures can connect Julia and Python. We discuss memory management, data buffer...
During the ESCAPE project, the pillars of a pilot analysis facility were built following a bottom-up approach, in collaboration with all the partners of the project. As a result, the CERN Virtual Research Environment (VRE) initiative proposed a workspace that facilitates the access to the data in the ESCAPE Data Lake, a large scale data management system defined by Rucio, along with the...
The ROOT framework provides various implementations of graphics engines tailored for different platforms, along with specialized support of batch mode. Over time, as technology evolves and new versions of X11 or Cocoa are released, maintaining the functionality of correspondent ROOT components becomes increasingly challenging. The TWebCanvas class in ROOT represents an attempt to unify all...
Over the last few years, an increasing number of sites have started to offer access to GPU accelerator cards but in many places they remain underutilised. The experiment collaborations are gradually increasing the fraction of their code that can exploit GPUs, driven in many case by developments of specific reconstruction algorithms to exploit the HLT farms when data is not being taken....
The success and adoption of machine learning (ML) approaches to solving HEP problems has been widespread and fast. As useful a tool as ML has been to the field, the growing number of applications, larger datasets, and increasing complexity of models creates a demand for both more capable hardware infrastructure and cleaner methods of reproducibilty and deployment. We have developed a prototype...
Machine Learning (ML) is driving a revolution in the way scientists design, develop, and deploy data-intensive software. However, the adoption of ML presents new challenges for the computing infrastructure, particularly in terms of provisioning and orchestrating access to hardware accelerators for development, testing, and production.
The INFN-funded project AI_INFN ("Artificial Intelligence...
The ROOT software package provides the data format used in High Energy Physics by the LHC experiments. It offers a data analysis interface called RDataFrame, which has proven to adapt well to the requirements of modern physics analyses. However, with increasing data collected by the LHC experiments, the challenge to perform an efficient analysis expands. One of the solutions to ease this...
The ATLAS experiment is currently developing columnar analysis frameworks which leverage the Python data science ecosystem. We describe the construction and operation of the infrastructure necessary to support demonstrations of these frameworks, with a focus on those from IRIS-HEP. One such demonstrator aims to process the compact ATLAS data format PHYSLITE at rates exceeding 200 Gbps. Various...
As a part of the IRIS-HEP “Analysis Grand Challenge” activities, the Coffea-casa AF team executed a “200 Gbps Challenge”. One of the goals of this challenge was to provide a setup for execution of a test notebook-style analysis on the facility that could process a 200 TB CMS NanoAOD dataset in 20 minutes.
We describe the solutions we deployed at the facility to execute the challenge tasks....
In the data analysis pipeline for LHC experiments, a key aspect is the step in which small groups of researchers—typically graduate students and postdocs—reduce the smallest, common-denominator data format down to a small set of specific histograms suitable for statistical interpretation. Here, we will refer to this step as “analysis” with the recognition that in other contexts, “analysis”...
China’s High-Energy Photon Source (HEPS), the first national high-energy synchrotron radiation light source, is under design and construction. HEPS computing center is the principal provider of high-performance computing and data resources and services for science experiments of HEPS. The mission of HEPS scientific computing platform is to accelerate the scientific discovery for the...
We have created a Snakemake computational analysis workflow corresponding to the IRIS-HEP Analysis Grand Challenge (AGC) example studying ttbar production channels in the CMS open data. We describe the extensions to the AGC pipeline that allowed porting of the notebook-based analysis to Snakemake. We discuss the applicability of the Snakemake multi-cascading paradigm for running...