24th International Conference on Computing in High Energy & Nuclear Physics

Name: 24th International Conference on Computing in High Energy & Nuclear Physics
Start: 2019-11-04T08:00:00+10:30
End: 2019-11-08T13:00:00+10:30
Location: Adelaide Convention Centre

4–8 Nov 2019

Adelaide Convention Centre

Australia/Adelaide timezone

Contact us

Session

Track 3 – Middleware and Distributed Computing

4 Nov 2019, 11:00

Riverbank R3 (Adelaide Convention Centre)

Riverbank R3

Adelaide Convention Centre

Track 3 – Middleware and Distributed Computing: Workload Management & Cost Model

Catherine Biscarat (L2I Toulouse, IN2P3/CNRS (FR))

Track 3 – Middleware and Distributed Computing: HTC Sites and Related Services

Stefan Roiser (CERN)

Track 3 – Middleware and Distributed Computing: HPCs and Related Services

James Letts (Univ. of California San Diego (US))

Track 3 – Middleware and Distributed Computing: Operations & Monitoring

Stefan Roiser (CERN)

Track 3 – Middleware and Distributed Computing: Security & Collaboration

Catherine Biscarat (L2I Toulouse, IN2P3/CNRS (FR))

Track 3 – Middleware and Distributed Computing: Infrastructure & Identity

James Letts (Univ. of California San Diego (US))

Track 3 – Middleware and Distributed Computing: Information Systems, BOINC & User Analysis

Tomoe Kishimoto (University of Tokyo (JP))

There are no materials yet.

173. The DIRAC interware: current, upcoming and planned capabilities and technologies

Federico Stagni (CERN)

04/11/2019, 11:00

Track 3 – Middleware and Distributed Computing

Oral

Efficient access to distributed computing and storage resources is mandatory for the success of current and future High Energy and Nuclear Physics Experiments. DIRAC is an interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for the Workload, Data and Production Management tasks of large scientific communities. A single...

579. Automated and Distributed Monte Carlo Generation for GlueX

Thomas Britton (JLab)

04/11/2019, 11:15

Track 3 – Middleware and Distributed Computing

Oral

MCwrapper is a set of system that manages the entire Monte Carlo production workflow for GlueX and provides standards for how that Monte Carlo is produced. MCwrapper was designed to be able to utilize a variety of batch systems in a way that is relatively transparent to the user, thus enabling users to quickly and easily produce valid simulated data at home institutions worldwide. ...

383. Production processing and workflow management software evaluation in the DUNE collaboration

Dr Kenneth Richard Herner (Fermi National Accelerator Laboratory (US))

04/11/2019, 11:30

Track 3 – Middleware and Distributed Computing

Oral

The Deep Underground Neutrino Experiment (DUNE) will be the world’s foremost neutrino detector when it begins taking data in the mid-2020s. Two prototype detectors, collectively known as ProtoDUNE, have begun taking data at CERN and have accumulated over 3 PB of raw and reconstructed data since September 2018. Particle interaction within liquid argon time projection chambers are challenging to...

73. Evolution of the CMS Global Submission Infrastructure for the HL-LHC Era

Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéti cas Medioambientales y Tecno)

04/11/2019, 11:45

Track 3 – Middleware and Distributed Computing

Oral

Efforts in distributed computing of the CMS experiment at the LHC at CERN are now focusing on the functionality required to fulfill the projected needs for the HL-LHC era. Cloud and HPC resources are expected to be dominant relative to resources provided by traditional Grid sites, being also much more diverse and heterogeneous. Handling their special capabilities or limitations and maintaining...

311. Efficient Iterative Calibration on the Grid using iLCDirac

Andre Sailer (CERN)

04/11/2019, 12:00

Track 3 – Middleware and Distributed Computing

Oral

Software tools for detector optimization studies for future experiments need to be efficient and reliable. One important ingredient of the detector design optimization concerns the calorimeter system. Every change of the calorimeter configuration requires a new set of overall calibration parameters which in its turn requires a new calorimeter calibration to be done. An efficient way to perform...

296. New developments in cost modeling for the LHC computing

Andrea Sciabà (CERN)

04/11/2019, 12:15

Track 3 – Middleware and Distributed Computing

Oral

The increase in the scale of LHC computing during Run 3 and Run 4 (HL-LHC) will certainly require radical changes to the computing models and the data processing of the LHC experiments. The working group established by WLCG and the HEP Software Foundation to investigate all aspects of the cost of computing and how to optimise them has continued producing results and improving our understanding...

71. Lightweight site federation for CMS support

Antonio Delgado Peris (Centro de Investigaciones Energéti cas Medioambientales y Tecno)

04/11/2019, 14:00

Track 3 – Middleware and Distributed Computing

Oral

There is a general trend in WLCG towards the federation of resources, aiming for increased simplicity, efficiency, flexibility, and availability. Although general, VO-agnostic federation of resources between two independent and autonomous resource centres may prove arduous, a considerable amount of flexibility in resource sharing can be achieved, in the context of a single WLCG VO, with a...

158. Provision and use of GPU resources for distributed workloads via the Grid

Dr Daniel Peter Traynor (Queen Mary University of London (GB))

04/11/2019, 14:15

Track 3 – Middleware and Distributed Computing

Oral

The Queen Mary University of London WLCG Tier-2 Grid site has been providing GPU resources on the Grid since 2016. GPUs are an important modern tool to assist in data analysis. They have historically been used to accelerate computationally expensive but parallelisable workloads using frameworks such as OpenCL and CUDA. However, more recently their power in accelerating machine learning,...

404. A Lightweight Job Submission Frontend and its Toolkits – HepJob

Xiaowei Jiang (IHEP（中国科学院高能物理研究所）)

04/11/2019, 14:30

Track 3 – Middleware and Distributed Computing

Oral

In a HEP Computing Center, at least 1 batch systems are used. As an example, at IHEP, we’ve used 3 batch systems, PBS, HTCondor and Slurm. After running PBS as local batch system for 10 years, we replaced it by HTCondor (for HTC) and Slurm (for HPC). During that period, problems came up on both user and admin sides.

On user side, the new batch systems bring a set of new commands, which...

571. Dirac-based solutions for JUNO production system

Xiaomei Zhang (Chinese Academy of Sciences (CN))

04/11/2019, 14:45

Track 3 – Middleware and Distributed Computing

Oral

The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment, which plans to take about 2PB raw data each year starting from 2021. The experiment data plans to be stored in IHEP and have another copy in Europe (CNAF, IN2P3, JINR data centers). MC simulation tasks are expected to be arranged and operated through a distributed computing system to share efforts among...

500. Scalable processing for storage events and automation of scientific workflows

Michael Schuh (Deutsches Elektronen-Synchrotron DESY)

04/11/2019, 15:00

Track 3 – Middleware and Distributed Computing

Oral

Low latency, high throughput data processing in distributed environments is a key requirement of today's experiments. Storage events facilitate synchronisation with external services where the widely adopted request-response pattern does not scale because of polling as a long-running activity. We discuss the use of an event broker and stream processing platform (Apache Kafka) for storage...

183. Harnessing the power of supercomputers using the PanDA Pilot 2 in the ATLAS Experiment

Paul Nilsson (Brookhaven National Laboratory (US))

05/11/2019, 11:00

Track 3 – Middleware and Distributed Computing

Oral

The unprecedented computing resource needs of the ATLAS experiment have motivated the Collaboration to become a leader in exploiting High Performance Computers (HPCs). To meet the requirements of HPCs, the PanDA system has been equipped with two new components; Pilot 2 and Harvester, that were designed with HPCs in mind. While Harvester is a resource-facing service which provides resource...

314. IceProd Supercomputer Mode: How IceCube Production Runs on Firewalled Clusters

David Schultz (University of Wisconsin-Madison)

05/11/2019, 11:15

Track 3 – Middleware and Distributed Computing

Oral

For the past several years, IceCube has embraced a central, global overlay grid of HTCondor glideins to run jobs. With guaranteed network connectivity, the jobs themselves transferred data files, software, logs, and status messages. Then we were given access to a supercomputer, with no worker node internet access. As the push towards HPC increased, we had access to several of these...

347. Scheduling, deploying and monitoring 100 million tasks

Prof. Andreas Wicenec (International Centre of Radio Astronomy Research)

05/11/2019, 11:30

Track 3 – Middleware and Distributed Computing

Oral

The SKA will enable the production of full polarisation spectral line cubes at a very high spatial and spectral resolution. Performing a back-of-the-evelope estimate gives you the incredible amount of around 75-100 million tasks to run in parallel to perform a state-of-the-art faceting algorithm (assuming that it would spawn off just one task per facet, which is not the case). This simple...

215. Managing the ATLAS Grid through Harvester

Fernando Harald Barreiro Megino (University of Texas at Arlington)

05/11/2019, 11:45

Track 3 – Middleware and Distributed Computing

Oral

ATLAS Computing Management has identified the migration of all resources to Harvester, PanDA’s new workload submission engine, as a critical milestone for Run 3 and 4. This contribution will focus on the Grid migration to Harvester.
We have built a redundant architecture based on CERN IT’s common offerings (e.g. Openstack Virtual Machines and Database on Demand) to run the necessary Harvester...

527. Nordugrid ARC cache: Efficiency gains on HPC and cloud resources

Maiken Pedersen (University of Oslo (NO))

05/11/2019, 12:00

Track 3 – Middleware and Distributed Computing

Oral

The WLCG is today comprised of a range of different types of resources such as cloud centers, large and small HPC centers, volunteer computing as well as the traditional grid resources. The Nordic Tier 1 (NT1) is a WLCG computing infrastructure distributed over the Nordic countries. The NT1 deploys the Nordugrid ARC CE, which is non-intrusive and lightweight, originally developed to cater for...

534. Reusing distributed computing software and patterns for midscale collaborative science

Benedikt Riedel (University of Wisconsin-Madison), Benedikt Riedel (University of Wisconsin-Madison)

05/11/2019, 12:15

Track 3 – Middleware and Distributed Computing

Oral

Many of the challenges faced by the LHC experiments (aggregation of distributed computing resources, management of data across multiple storage facilities, integration of experiment-specific workflow management tools across multiple grid services) are similarly experienced by "midscale" high energy physics and astrophysics experiments, particularly as their data set volumes are increasing at...

79. Operational Intelligence

Alessandro Di Girolamo (CERN)

05/11/2019, 14:00

Track 3 – Middleware and Distributed Computing

Oral

In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions...

59. Big data solutions for CMS computing monitoring and analytics

Federica Legger (Universita e INFN Torino (IT))

05/11/2019, 14:15

Track 3 – Middleware and Distributed Computing

Oral

The CMS computing infrastructure is composed by several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered in several sources and typically accessible only by experts....

212. Implementation of ATLAS Distributed Computing monitoring dashboards using InfluxDB and Grafana

Thomas Beermann (University of Innsbruck (AT))

05/11/2019, 14:30

Track 3 – Middleware and Distributed Computing

Oral

For the last 10 years, the ATLAS Distributed Computing project has based its monitoring infrastructure on a set of custom designed dashboards provided by CERN-IT. This system functioned very well for LHC Runs 1 and 2, but its maintenance has progressively become more difficult and the conditions for Run 3, starting in 2021, will be even more demanding; hence a more standard code base and more...

146. Automatic log analysis with NLP for the CMS workflow handling

Lukas Layer (Universita e sezione INFN di Napoli (IT))

05/11/2019, 14:45

Track 3 – Middleware and Distributed Computing

Oral

The central Monte-Carlo production of the CMS experiment utilizes the WLCG infrastructure and manages daily thousands of tasks, each up to thousands of jobs. The distributed computing system is bound to sustain a certain rate of failures of various types, which are currently handled by computing operators a posteriori. Within the context of computing operations, and operation intelligence, we...

567. Easy-to-use data schema management scheme for RDBMS that includes the utilization of the column-store features

Tadashi Murakami (KEK)

05/11/2019, 15:00

Track 3 – Middleware and Distributed Computing

Oral

Relational database (RDB) and its management system (RDBMS) offer many advantages to us, such as a rich query language, maintainability gained from a concrete schema, robust and reasonable backup solutions such as differential backup, and so on. Recently, some of RDBMS has supported column-store features that offer data compression with a high level of both data size and query performance....

407. Monitoring distributed computing beyond the traditional time-series histogram

Peter Love (Lancaster University (GB))

05/11/2019, 15:15

Track 3 – Middleware and Distributed Computing

Oral

In this work we review existing monitoring outputs and recommend some novel alternative approaches to improve the comprehension of large volumes of operations data that are produced in distributed computing. Current monitoring output is dominated by the pervasive use of time-series histograms showing the evolution of various metrics. These can quickly overwhelm or confuse the viewer due to the...

87. WLCG Authorisation; from X.509 to Tokens

Andrea Ceccanti (Universita e INFN, Bologna (IT))

05/11/2019, 16:30

Track 3 – Middleware and Distributed Computing

Oral

The WLCG Authorisation Working Group formed in July 2017 with the objective to understand and meet the needs of a future-looking Authentication and Authorisation Infrastructure (AAI) for WLCG experiments. Much has changed since the early 2000s when X.509 certificates presented the most suitable choice for authorisation within the grid; progress in token based authorisation and identity...

290. Harnessing the power of threat intelligence for WLCG cybersecurity

David Crooks (Science and Technology Facilities Council STFC (GB))

05/11/2019, 16:45

Track 3 – Middleware and Distributed Computing

Oral

The information security threats currently faced by WLCG sites are both sophisticated and highly profitable for the actors involved. Evidence suggests that targeted organisations take on average more than six months to detect a cyber attack, with more sophisticated attacks being more likely to pass undetected.

An important way to mount an appropriate response is through the use of a...

313. A New Authorization System for IceCube Applications

David Schultz (University of Wisconsin-Madison)

05/11/2019, 17:00

Track 3 – Middleware and Distributed Computing

Oral

As part of a modernization effort at IceCube, a new unified authorization system has been developed to allow access to multiple applications with a single credential. Based on SciTokens and JWT, it allows for the delegation of specific accesses to cluster jobs or third party applications on behalf of the user. Designed with security in mind, it includes short expiration times on access...

433. Computer security in 2019

Sebastian Lopienski (CERN)

05/11/2019, 17:15

Track 3 – Middleware and Distributed Computing

Oral

In this talk, the speaker will present the computer security risk landscape as faced by academia and research organisations; look into various motivations behind attacks; and explore how these threats can be addressed. This will be followed by details of several types of vulnerabilities and incidents recently affecting HEP community, and lessons learnt. The talk will conclude with a outlook...

449. Building an IRIS trust framework

David Crooks (Science and Technology Facilities Council STFC (GB))

05/11/2019, 17:30

Track 3 – Middleware and Distributed Computing

Oral

IRIS is the co-ordinating body of a UK science eInfrastructure and is a collaboration between UKRI-STFC, its resource providers and representatives from the science activities themselves. We document the progress of an ongoing project to build a security policy trust framework suitable for use across the IRIS community.

The EU H2020-funded AARC projects addressed the challenges involved in...

91. China-EU scientific cooperations on JUNO distributed computing

Giuseppe Andronico (Universita e INFN, Catania (IT))

05/11/2019, 17:45

Track 3 – Middleware and Distributed Computing

Oral

The Jiangmen Underground Neutrino Observatory (JUNO) is an underground 20 kton liquid scintillator detector being built in the south of China and expected to start data taking in late 2021. The JUNO physics program is focused on exploring neutrino properties, by means of electron anti-neutrinos emitted from two nuclear power complexes at a baseline of about 53km. Targeting an unprecedented...

353. DIRACOS: a cross platform solution for grid tools

Christophe Haen (CERN)

07/11/2019, 11:00

Track 3 – Middleware and Distributed Computing

Oral

DIRACOS is a project aimed to provide a stable base layer of dependencies, on top of which the DIRAC middleware is running. The goal was to produce a coherent environment for grid interaction and streamline the operational overhead. Historically the DIRAC dependencies were grouped in two bundles; Externals containing Python and standard binary libraries, and the LCGBundle which contained all...

90. CERN's Identity and Access Management, a journey to Open Source

Hannah Short (CERN)

07/11/2019, 11:15

Track 3 – Middleware and Distributed Computing

Oral

Until recently, CERN had been considered eligible for academic pricing of Microsoft products. Now, along with many other research institutes, CERN has been disqualified from this educational programme and faces a 20 fold increase in license costs. CERN’s current Authentication and Authorisation Infrastructure comprises Microsoft services all the way down from the web Single-Sign-On to the...

431. Beyond X.509: token-based authentication and authorization in practice

Andrea Ceccanti (Universita e INFN, Bologna (IT))

07/11/2019, 11:30

Track 3 – Middleware and Distributed Computing

Oral

One of the key challenges identified by the HEP R&D roadmap for software and computing is the ability to integrate heterogeneous resources in support of the computing needs of HL-LHC. In order to meet this objective, a flexible Authentication and Authorization Infrastructure (AAI) has to be in place, to allow the secure composition of computing and storage resources provisioned across...

319. Distributing User Code with the CernVM FileSystem

Kenneth Richard Herner (Fermi National Accelerator Laboratory (US))

07/11/2019, 11:45

Track 3 – Middleware and Distributed Computing

Oral

The CernVM FileSystem (CVMFS) is widely used in High Throughput Computing to efficiently distributed experiment code. However, the standard CVMFS publishing tools are designed for a small group of people from each experiment to maintain common software, and the tools don't work well for the majority of users that submit jobs related to each experiment. As a result, most user code, such as code...

473. GOCDB - new communities, new requirements, new architecture

Greg Corbett (STFC)

07/11/2019, 12:00

Track 3 – Middleware and Distributed Computing

Oral

GOCDB is the official repository for storing and presenting EGI and WLCG topology and resource information. It is a definitive information source, with the emphasis on user communities to maintain their own data. It is intentionally designed to have no dependencies on other operational tools for information.

In recent years, funding sources and user communities have evolved and GOCDB is...

504. Migrating INFN-T1 from CREAM-CE/LSF to HTCondor-CE/HTCondor

Stefano Dal Pra (Universita e INFN, Bologna (IT))

07/11/2019, 12:15

Track 3 – Middleware and Distributed Computing

Oral

The INFN Tier-1 datacentre provides computing resources to several HEP and Astrophysics experiments. These are organized in Virtual Organizations submitting jobs to our computing facilities through Computing Elements, acting as Grid interfaces to the Local Resource Manager. We are phasing-out our current LRMS (IBM/Platform LSF 9.1.3) and CEs (CREAM) set to adopt HTCondor as a replacement for...

292. Grid Information Systems: Past, Present and Future

Markus Schulz (CERN)

07/11/2019, 14:00

Track 3 – Middleware and Distributed Computing

Oral

Grid information systems enable the discovery of resources in a Grid computing infrastructure and provide further information about their structure and state.
The original concepts for a grid information system were defined over 20 years ago and the GLUE 2.0 information model specification was published 10 years ago.
This contribution describes the current status and highlights the changes...

464. Evolution of the WLCG Information Infrastructure

Julia Andreeva (CERN)

07/11/2019, 14:15

Track 3 – Middleware and Distributed Computing

Oral

The WLCG project aimed to develop, build, and maintain a global computing facility for storage and analysis of the LHC data. While currently most of the LHC computing resources are being provided by the classical grid sites, over last years the LHC experiments have been using more and more public clouds and HPCs, and this trend will certainly increase. The heterogeneity of the LHC computing...

505. CRIC: Computing Resource Information Catalogue as a unified topology system for a large scale, heterogeneous and dynamic computing infrastructure

Alexey Anisenkov (Budker Institute of Nuclear Physics (RU))

07/11/2019, 14:30

Track 3 – Middleware and Distributed Computing

Oral

CRIC is a high-level information system which provides flexible, reliable and complete topology and configuration description for a large scale distributed heterogeneous computing infrastructure. CRIC aims to facilitate distributed computing operations for the LHC experiments and consolidate WLCG topology information. It aggregates information coming from various low-level information sources...

68. Exploiting CRIC to streamline the configuration management of glideinWMS factories for CMS support

James Letts (Univ. of California San Diego (US))

07/11/2019, 14:45

Track 3 – Middleware and Distributed Computing

Oral

GlideinWMS is a workload management and provisioning system that allows sharing computing resources distributed over independent sites. Based on the requests made by glideinWMS Frontends, a dynamically sized pool of resources is created by glideinWMS pilot Factories via pilot job submission to resource sites' computing elements. More than 400 computing elements (CE) are currently serving more...

460. Distributed data analysis with ROOT RDataFrame

Enric Tejedor Saavedra (CERN)

07/11/2019, 15:00

Track 3 – Middleware and Distributed Computing

Oral

Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However,...

182. Adapting ATLAS@Home to trusted and semi-trusted resources

David Cameron (University of Oslo (NO))

07/11/2019, 15:15

Track 3 – Middleware and Distributed Computing

Oral

ATLAS@Home is a volunteer computing project which enables members of the public to contribute computing power to run simulations of the ATLAS experiment at CERN. The computing resources provided to ATLAS@Home increasingly come not only from traditional volunteers, but from data centres or office computers at institutes associated to ATLAS. The design of ATLAS@Home was built around not giving...

Building timetable...

24th International Conference on Computing in High Energy & Nuclear Physics

Contact us

Session

Track 3 – Middleware and Distributed Computing

Riverbank R3

Adelaide Convention Centre

Conveners

Track 3 – Middleware and Distributed Computing: Workload Management & Cost Model

Track 3 – Middleware and Distributed Computing: HTC Sites and Related Services

Track 3 – Middleware and Distributed Computing: HPCs and Related Services

Track 3 – Middleware and Distributed Computing: Operations & Monitoring

Track 3 – Middleware and Distributed Computing: Security & Collaboration

Track 3 – Middleware and Distributed Computing: Infrastructure & Identity

Track 3 – Middleware and Distributed Computing: Information Systems, BOINC & User Analysis

Presentation materials

Choose timezone

24th International Conference on Computing in High Energy & Nuclear Physics

Contact us

Conveners

Track 3 – Middleware and Distributed Computing: Workload Management & Cost Model

Track 3 – Middleware and Distributed Computing: HTC Sites and Related Services

Track 3 – Middleware and Distributed Computing: HPCs and Related Services

Track 3 – Middleware and Distributed Computing: Operations & Monitoring

Track 3 – Middleware and Distributed Computing: Security & Collaboration

Track 3 – Middleware and Distributed Computing: Infrastructure & Identity

Track 3 – Middleware and Distributed Computing: Information Systems, BOINC & User Analysis

Presentation materials