CHEP 2018 Conference, Sofia, Bulgaria

Name: CHEP 2018 Conference, Sofia, Bulgaria
Start: 2018-07-09T08:00:00+03:00
End: 2018-07-13T13:00:00+03:00
Location: Sofia, Bulgaria

9–13 Jul 2018

Sofia, Bulgaria

Europe/Sofia timezone

Contact us

Session

T3 - Distributed computing

9 Jul 2018, 11:00

Hall 7 (National Palace of Culture)

Hall 7

National Palace of Culture

T3 - Distributed computing: Experiment Frameworks and HPC

David Cameron (University of Oslo (NO))

T3 - Distributed computing: Facilities

Julia Andreeva (CERN)

T3 - Distributed computing: Testing, Monitoring and Accounting

Julia Andreeva (CERN)

T3 - Distributed computing: Performance Optimization, Security and Federated Identity

David Cameron (University of Oslo (NO))

T3 - Distributed computing: Experiment Frameworks and Operational Experiences (1)

Hannah Short (CERN)

T3 - Distributed computing: Experiment Frameworks and Operational Experiences (2)

Julia Andreeva (CERN)

T3 - Distributed computing: Computing Models and Future Views

David Cameron (University of Oslo (NO))

There are no materials yet.

104. Experience running IceCube simulation workloads on the Titan supercomputer

David Schultz (University of Wisconsin-Madison)

09/07/2018, 11:00

Track 3 – Distributed computing

presentation

IceCube Neutrino Observatory is a neutrino detector located at the South Pole. Here we present experiences acquired when using HTCondor to run IceCube’s GPU simulation worksets on the Titan supercomputer. Titan is a large supercomputer geared for High Performance Computing (HPC). Several factors make it challenging to use Titan for IceCube’s High Throughput Computing (HTC) workloads: (1) Titan...

127. Production experience and performance for ATLAS data processing on a Cray XC-50 at CSCS

Gianfranco Sciacca

09/07/2018, 11:15

Track 3 – Distributed computing

presentation

Predictions for requirements for the LHC computing for Run 3 and for Run 4 (HL_LHC) over the course of the next 10 years, show a considerable gap between required and available resources, assuming budgets will globally remain flat at best. This will require some radical changes to the computing models for the data processing of the LHC experiments. The use of large scale computational...

188. Enabling production HEP workflows on Supercomputers at NERSC

Wahid Bhimji (Lawrence Berkeley National Lab. (US))

09/07/2018, 11:30

Track 3 – Distributed computing

presentation

Many HEP experiments are moving beyond experimental studies to making large-scale production use of HPC resources at NERSC including the knights landing architectures on the Cori supercomputer. These include ATLAS, Alice, Belle2, CMS, LSST-DESC, and STAR among others. Achieving this has involved several different approaches and has required innovations both on NERSC and the experiments’ sides....

445. BigPanDA Workflow Management on Titan for HENP and extreme scale applications

Alexei Klimentov (Brookhaven National Laboratory (US))

09/07/2018, 11:45

Track 3 – Distributed computing

presentation

The Titan supercomputer at Oak Ridge National Laboratory prioritizes the scheduling of large leadership class jobs, but even when the supercomputer is fully loaded and large jobs are standing in the queue to run, 10 percent of the machine remains available for a mix of smaller jobs, essentially ‘filling in the cracks’ between the very large jobs. Such utilisation of the computer resources is...

601. PanDA and RADICAL-Pilot Integration: Enabling the Pilot Paradigm on HPC Resources

Pavlo Svirin

09/07/2018, 12:00

Track 3 – Distributed computing

presentation

PanDA executes millions of ATLAS jobs a month on Grid systems with more than
300k cores. Currently, PanDA is compatible only with few HPC resources due to
different edge services and operational policies, does not implement the pilot
paradigm on HPC, and does not dynamically optimize resource allocation among
queues. We integrated the PanDA Harvester service and the RADICAL-Pilot (RP)
system...

652. #585 slot

09/07/2018, 12:15

Track 3 – Distributed computing

presentation

287. Advances and enhancements in the FabrIc for Frontier Experiments Project at Fermilab

Vito Di Benedetto (Fermi National Accelerator Lab. (US))

09/07/2018, 14:00

Track 3 – Distributed computing

presentation

The FabrIc for Frontier Experiments (FIFE) project within the Fermilab Scientific Computing Division is charged with integrating offline computing components into a common computing stack for the non-LHC Fermilab experiments, supporting experiment offline computing, and consulting on new, novel workflows. We will discuss the general FIFE onboarding strategy, the upgrades and enhancements in...

425. HEPCloud, an Elastic Hybrid HEP Facility using an Intelligent Decision Support System

Eric Vaandering (Fermi National Accelerator Lab. (US))

09/07/2018, 14:15

Track 3 – Distributed computing

presentation

HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet peak demands of the next generation of High Energy Physics experiments, Fermilab must either plan to locally provision enough resources to cover the forecasted need, or find ways to elastically expand its computational capabilities. Commercial...

565. Modeling and Simulation of Load Balancing Strategies for Computing in High Energy Physics

Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE))

09/07/2018, 14:30

Track 3 – Distributed computing

presentation

The amount of data to be processed by experiments in high energy physics is tremendously increasing in the coming years. For the first time in history the expected technology advance itself will not be sufficient to cover the arising gap between required and available resources based on the assumption of maintaining the current flat budget hardware procurement strategy. This leads to...

359. The LZ UK Data Centre

Daniela Bauer (Imperial College (GB))

09/07/2018, 14:45

Track 3 – Distributed computing

presentation

LZ is a Dark Matter experiment based at the Sanford Underground Research Facility. It is currently under construction and aims to start data taking in 2020. Its computing model is based on two data centres, one in the USA (USDC) and one in the UK (UKDC), both holding a complete copy of its data. During stable periods of running both data centres plan to concentrate on different aspects of...

384. THE JINR DISTRIBUTED COMPUTING ENVIRONMENT

Vladimir Korenkov (Joint Institute for Nuclear Research (RU))

09/07/2018, 15:00

Track 3 – Distributed computing

presentation

Computing in the field of high energy physics requires usage of heterogeneous computing resources and IT, such as grid, high performance computing, cloud computing and big data analytics for data processing and analysis. The core of the distributed computing environment at the Joint Institute for Nuclear Research is the Multifunctional Information and Computing Complex (MICC). It includes...

88. Extending CERN computing to volunteers - LHC@home consolidation and outlook

David Cameron (University of Oslo (NO))

09/07/2018, 15:15

Track 3 – Distributed computing

presentation

LHC@home has provided computing capacity for simulations under BOINC since 2005. Following the introduction of virtualisation with BOINC to run HEP Linux software in a virtual machine on volunteer desktops, initially started on the test BOINC projects, like Test4Theory and ATLAS@home, all CERN applications distributed to volunteers have been consolidated under a single LHC@home BOINC project....

582. Many hands make light work: Experiences from a shared resource WLCG Tier-2 computing site

Andrew John Washbrook (The University of Edinburgh (GB))

09/07/2018, 15:30

Track 3 – Distributed computing

presentation

The Edinburgh (UK) Tier-2 computing site has provided CPU and storage resources to the Worldwide LHC Computing Grid (WLCG) for close to 10 years. Unlike other sites, resources are shared amongst members of the hosting institute rather than being exclusively provisioned for Grid computing. Although this unconventional approach has posed challenges for troubleshooting and service delivery there...

150. Advances in ATLAS@Home towards a major ATLAS computing resource

David Cameron (University of Oslo (NO))

09/07/2018, 15:45

Track 3 – Distributed computing

presentation

The volunteer computing project ATLAS@Home has been providing a stable computing resource for the ATLAS experiment since 2013. It has recently undergone some significant developments and as a result has become one of the largest resources contributing to ATLAS computing, by expanding its scope beyond traditional volunteers and into exploitation of idle computing power in ATLAS data centres....

527. EGI Dataset Accounting and the WLCG

Mr Adrian Coveney (STFC)

10/07/2018, 11:00

Track 3 – Distributed computing

presentation

While the WLCG and EGI have both made significant progress towards solutions for storage space accounting, one area that is still quite exploratory is that of dataset accounting. This type of accounting would enable resource centre and research community administrators to report on dataset usage to the data owners, data providers, and funding agencies. Eventually decisions could be made about...

566. GRACC: GRid ACcounting Collector

Brian Paul Bockelman (University of Nebraska Lincoln (US))

10/07/2018, 11:15

Track 3 – Distributed computing

presentation

The OSG has long maintained a central accounting system called Gratia. It uses small probes on each computing and storage resource in order to usage. The probes report to a central collector which stores the usage in a database. The database is then queried to generate reports. As the OSG aged, the size of the database grew very large. It became too large for the database technology to...

86. Evolution of HammerCloud to commission CERN Compute resources

Jaroslava Schovancova (CERN)

10/07/2018, 11:30

Track 3 – Distributed computing

presentation

HammerCloud is a testing service and framework to commission, run continuous tests or on-demand large-scale stress tests, and benchmark computing resources and components of various distributed systems with realistic full-chain experiment workflows.

HammerCloud, userd by the ATLAS and CMS experiments in production, has been a useful service to commission both compute resources and various...

243. Monitoring system for the Belle II distributed computing

Yuji Kato

10/07/2018, 11:45

Track 3 – Distributed computing

presentation

The Belle II is an asymmetric energy e+e- collider experiment at KEK, Japan. The Belle II aims to reveal the physics beyond the standard model with a data set of about 5×10^10 BB^bar pairs and starts the physics run in 2018. In order to store such a huge amount of data including simulation events and analyze it in a timely manner, Belle II adopts a distributed computing model with DIRAC...

306. Towards the integrated ALICE Online-Offline (O2) monitoring subsystem

Adam Wegrzynek (CERN)

10/07/2018, 12:00

Track 3 – Distributed computing

presentation

ALICE (A Large Ion Collider Experiment) is preparing for a major upgrade of the detector, readout system and computing for LHC Run 3. A new facility called O2 (Online-Offline) will play a major role in data compression and event processing. To efficiently operate the experiment, we are designing a monitoring subsystem, which will provide a complete overview of the O2 overall health, detect...

208. CRIC: a unified information system for WLCG and beyond

Alexey Anisenkov (Budker Institute of Nuclear Physics (RU))

10/07/2018, 12:15

Track 3 – Distributed computing

presentation

The WLCG Information System (IS) is an important component of the huge heterogeneous distributed infrastructure. Considering the evolution of LHC computing towards high luminosity era and analyzing experience accumulated by the computing operations teams and limitations of the current information system, the WLCG IS evolution task force came up with the proposal to develop Computing Resource...

411. Minimising wasted CPU time with interruptible LHCb Monte Carlo

Andrew McNab (University of Manchester)

10/07/2018, 14:00

Track 3 – Distributed computing

presentation

During 2017 LHCb developed the ability to interrupt Monte Carlo
simulation jobs and cause them to finish cleanly with the events
simulated so far correctly uploaded to grid storage. We explain
how this functionality is supported in the Gaudi framework and handled
by the LHCb simulation framework Gauss. By extending DIRAC, we have been
able to trigger these interruptions when running...

131. ATLAS Grid Workflow Performance Optimization

Johannes Elmsheuser (Brookhaven National Laboratory (US))

10/07/2018, 14:15

Track 3 – Distributed computing

presentation

The CERN ATLAS experiment grid workflow system manages routinely 250 to
500 thousand concurrently running production and analysis jobs
to process simulation and detector data. In total more than 300 PB
of data is distributed over more than 150 sites in the WLCG.
At this scale small improvements in the software and computing
performance and workflows can lead to significant resource usage...

379. Improving efficiency of analysis jobs in CMS

Todor Trendafilov Ivanov (University of Sofia (BG)), Jose Hernandez (CIEMAT)

10/07/2018, 14:30

Track 3 – Distributed computing

presentation

Hundreds of physicists analyse data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) using the CMS Remote Analysis builder (CRAB) and the CMS GlideinWMS global pool to exploit the resources of the World LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time the CMS collaboration is committed on...

148. Using AWS Athena analytics to monitor pilot job health on WLCG compute sites

Peter Love (Lancaster University (GB))

10/07/2018, 14:45

Track 3 – Distributed computing

presentation

ATLAS Distributed Computing (ADC) uses the pilot model to submit jobs to Grid computing resources. This model isolates the resource from the workload management system (WMS) and helps to avoid running jobs on faulty resources. A minor side-effect of this isolation is that the faulty resources are neglected and not brought back into production because the problems are not visible to the WMS. In...

40. Federated Identity Management for Research

Hannah Short (CERN)

10/07/2018, 15:00

Track 3 – Distributed computing

presentation

Federated identity management (FIM) is an arrangement that can be made among multiple organisations that lets subscribers use the same identification data to obtain access to the secured resources of all organisations in the group. In many research communities there is an increasing interest in a common approach to FIM as there is obviously a large potential for synergies. FIM4R [1] provides a...

216. Operational security, threat intelligence & distributed computing: the WLCG Security Operations Center Working Group

David Crooks (University of Glasgow (GB))

10/07/2018, 15:15

Track 3 – Distributed computing

presentation

The modern security landscape for distributed computing in High Energy Physics (HEP) includes a wide range of threats employing different attack vectors. The nature of these threats is such that the most effective method for dealing with them is to work collaboratively, both within the HEP community and with partners further afield - these can, and should, include institutional and campus...

522. Macaroons: looking back and looking forward

Paul Millar (DESY)

10/07/2018, 15:30

Track 3 – Distributed computing

presentation

X.509 is the dominate security infrastructure used in WLCG. Although
this technology has worked well, it has some issues. One is that,
currently, a delegated proxy can do everything the parent credential
can do. A stolen "production" proxy could be used from any machine in
the world to delete all data owned by that VO on all storage systems
in the grid.

Generating a delegated X.509...

580. EOSC-hub AAI: A federated authentication and authorisation infrastructure for international scientific collaboration at scale

Mr Nicolas Liampotis (Greek Research and Technology Network - GRNET)

10/07/2018, 15:45

Track 3 – Distributed computing

presentation

The European Open Science Cloud (EOSC) aims to enable trusted access to services and the re-use of shared scientific data across disciplinary, social and geographical borders. The EOSC-hub will realise the EOSC infrastructure as an ecosystem of research e-Infrastructures leveraging existing national and European investments in digital research infrastructures. EGI Check-in and EUDAT B2ACCESS...

356. COMPASS Grid Production System

Artem Petrosyan (Joint Institute for Nuclear Research (RU))

11/07/2018, 11:30

Track 3 – Distributed computing

presentation

LHC Computing Grid was a pioneer integration effort, managed to unite computing and
storage resources all over the world, thus making them available to experiments on the Large Hadron Collider. During decade of LHC computing, Grid software has learned to effectively utilise different types of computing resources, such as classic computing clusters, clouds and hyper power computers. While the...

149. Overview of the ATLAS distributed computing system

Johannes Elmsheuser (Brookhaven National Laboratory (US))

11/07/2018, 11:45

Track 3 – Distributed computing

presentation

The CERN ATLAS experiment successfully uses a worldwide
computing infrastructure to support the physics program during LHC
Run 2. The grid workflow system PanDA routinely manages 250 to
500 thousand concurrently running production and analysis jobs
to process simulation and detector data. In total more than 300 PB
of data is distributed over more than 150 sites in the WLCG and
handled by the...

167. IceProd - A dataset management system for IceCube: Update

David Schultz (University of Wisconsin-Madison)

11/07/2018, 12:00

Track 3 – Distributed computing

presentation

IceCube is a cubic kilometer neutrino detector located at the south pole. IceProd is IceCube’s internal dataset management system, keeping track of where, when, and how jobs run. It schedules jobs from submitted datasets to HTCondor, keeping track of them at every stage of the lifecycle. Many updates have happened in the last years to improve stability and scalability, as well as increase...

107. LHCb and DIRAC strategy towards the LHCb upgrade

Federico Stagni (CERN)

11/07/2018, 12:15

Track 3 – Distributed computing

presentation

The DIRAC project is developing interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for both Workload and Data Management tasks of large scientific communities. DIRAC is adopted by a growing number of collaborations, including LHCb, Belle2, the Linear Collider, and CTA.

The LHCb experiment will be upgraded during the...

549. Challenges of processing growing volumes of data for the CMS experiment during the LHC Run2

Matteo Cremonesi (Fermi National Accelerator Lab. (US))

11/07/2018, 12:30

Track 3 – Distributed computing

presentation

In recent years the LHC delivered a record-breaking luminosity to the CMS experiment making it a challenge to successfully handle all the demands for the efficient Data and Monte Carlo processing. In the presentation we will review major issues managing such requests and how we were able to address them. Our main strategy relies on the increased automation and dynamic workload and data...

653. #461 slot

11/07/2018, 12:45

Track 3 – Distributed computing

presentation

598. The XENON1T Data Distribution and Processing Scheme

Boris Bauermeister (Stockholm University)

12/07/2018, 11:00

Track 3 – Distributed computing

presentation

The Xenon Dark Matter experiment is looking for non baryonic particle Dark Matter in the universe. The demonstrator is a dual phase time projection chamber (TPC), filled with a target mass of ~2000 kg of ultra pure liquid xenon. The experimental setup is operated at the Laboratori Nazionali del Gran Sasso (LNGS).
We present here a full overview about the computing scheme for data distribution...

135. Harvester : an edge service harvesting heterogeneous resources for ATLAS

Tadashi Maeno (Brookhaven National Laboratory (US))

12/07/2018, 11:15

Track 3 – Distributed computing

presentation

The Production and Distributed Analysis (PanDA) system has been successfully used in the ATLAS experiment as a data-driven workload management system. The PanDA system has proven to be capable of operating at the Large Hadron Collider data processing scale over the last decade including the Run 1 and Run 2 data taking periods. PanDA was originally designed to be weakly coupled with the WLCG...

563. Using PanDA WMS for LSST Simulations on Distributed Infrastructure

Pavlo Svirin (Brookhaven National Laboratory (US))

12/07/2018, 11:30

Track 3 – Distributed computing

presentation

A goal of LSST (Large Synoptic Survey Telescope) project is to conduct a 10-year survey of the sky that is expected to deliver 200 petabytes of data after it begins full science operations in 2022. The project will address some of the most pressing questions about the structure and evolution of the universe and the objects in it. It will require a large amount of simulations to understand the...

447. The Cherenkov Telescope Array production system for data-processing and Monte Carlo simulation

Luisa Arrabito

12/07/2018, 11:45

Track 3 – Distributed computing

presentation

The Cherenkov Telescope Array (CTA) is the next-generation instrument in the field of very high energy gamma-ray astronomy. It will be composed of two arrays of Imaging Atmospheric Cherenkov Telescopes, located at La Palma (Spain) and Paranal (Chile). The construction of CTA has just started with the installation of the first telescope on site at La Palma and the first data expected by the end...

296. Multicore workload scheduling in JUNO

Xiaomei Zhang (Chinese Academy of Sciences (CN))

12/07/2018, 12:00

Track 3 – Distributed computing

presentation

The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment which will start in 2020. To fasten JUNO data processing over multicore hardware, the JUNO software framework is introducing parallelization based on TBB. To support JUNO multicore simulation and reconstruction jobs in the near future, a new workload scheduling model has to be explored and implemented in...

438. Exploring GlideinWMS and HTCondor scalability frontiers for an expanding CMS Global Pool

Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéti cas Medioambientales y Tecno)

12/07/2018, 12:15

Track 3 – Distributed computing

presentation

The CMS Submission Infrastructure Global Pool, built on GlideinWMS and HTCondor, is a worldwide distributed dynamic pool responsible for the allocation of resources for all CMS computing workloads. Matching the continuously increasing demand for computing resources by CMS requires the anticipated assessment of its scalability limitations. Extrapolating historical usage trends, by LHC Run III...

417. CMS Computing Resources: Meeting the demands of the high-luminosity LHC physics program

David Lange (Princeton University (US))

12/07/2018, 14:00

Track 3 – Distributed computing

presentation

The HL-LHC program has seen numerous extrapolations of its needed computing resources that each indicate the need for substantial changes if the desired HL-LHC physics program is to be supported within the current level of computing resource budgets. Drivers include large increases in event complexity (leading to increased processing time and analysis data size) and trigger rates needed (5-10...

419. Towards a computing model for the LHCb Upgrade

Stefan Roiser (CERN)

12/07/2018, 14:15

Track 3 – Distributed computing

presentation

The LHCb experiment will be upgraded for data taking in the LHC Run 3. The foreseen trigger output bandwidth trigger of a few GB/s will result in datasets of tens of PB per year, which need to be efficiently streamed and stored offline for low-latency data analysis. In addition, simulation samples of up to two orders of magnitude larger than those currently simulated are envisaged, with big...

143. The Future of Distributed Computing Systems in ATLAS: Boldly Venturing Beyond Grids

Fernando Harald Barreiro Megino (University of Texas at Arlington)

12/07/2018, 14:30

Track 3 – Distributed computing

presentation

The Production and Distributed Analysis system (PanDA) for the ATLAS experiment at the Large Hadron Collider has seen big changes over the past couple of years to accommodate new types of distributed computing resources: clouds, HPCs, volunteer computers and other external resources. While PanDA was originally designed for fairly homogeneous resources available through the Worldwide LHC...

394. JAliEn: the new ALICE high-performance and high-scalability Grid framework

Miguel Martinez Pedreira (Johann-Wolfgang-Goethe Univ. (DE))

12/07/2018, 14:45

Track 3 – Distributed computing

presentation

The ALICE experiment will undergo an extensive detector and readout upgrade for the LHC Run3 and will collect a 10 times larger data volume than today. This will translate into increase of the required CPU resources worldwide as well as higher data access and transfer rates. JAliEn (Java ALICE Environment) is the new Grid middleware designed to scale-out horizontally and satisfy the ALICE...

415. System Performance and Cost Modelling in LHC computing

Andrea Sciaba (CERN)

12/07/2018, 15:00

Track 3 – Distributed computing

presentation

The increase in the scale of LHC computing expected for Run 3 and even more so for Run 4 (HL-LHC) over the course of the next ten years will most certainly require radical changes to the computing models and the data processing of the LHC experiments. Translating the requirements of the physics programmes into computing resource needs is an extremely complicated process and subject to...

49. OSG and GPUs: A tale of two use cases

Edgar Fajardo Hernandez (Univ. of California San Diego (US))

12/07/2018, 15:15

Track 3 – Distributed computing

presentation

With the increase of power and reduction of cost of GPU accelerated processors a corresponding interest in their uses in the scientific domain has spurred. OSG users are no different and they have shown an interest in accessing GPU resources via their usual workload infrastructures. Grid sites that have these kinds of resources also want to make them grid available. In this talk, we discuss...

Building timetable...

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Session

T3 - Distributed computing

Hall 7

National Palace of Culture

Conveners

T3 - Distributed computing: Experiment Frameworks and HPC

T3 - Distributed computing: Facilities

T3 - Distributed computing: Testing, Monitoring and Accounting

T3 - Distributed computing: Performance Optimization, Security and Federated Identity

T3 - Distributed computing: Experiment Frameworks and Operational Experiences (1)

T3 - Distributed computing: Experiment Frameworks and Operational Experiences (2)

T3 - Distributed computing: Computing Models and Future Views

Presentation materials

Choose timezone

CHEP 2018 Conference, Sofia, Bulgaria

Contact us

Conveners

T3 - Distributed computing: Experiment Frameworks and HPC

T3 - Distributed computing: Facilities

T3 - Distributed computing: Testing, Monitoring and Accounting

T3 - Distributed computing: Performance Optimization, Security and Federated Identity

T3 - Distributed computing: Experiment Frameworks and Operational Experiences (1)

T3 - Distributed computing: Experiment Frameworks and Operational Experiences (2)

T3 - Distributed computing: Computing Models and Future Views

Presentation materials

Share this page

Direct link

Social networks

Calendaring