4–8 Nov 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Session

Track 3 – Middleware and Distributed Computing

T3
4 Nov 2019, 11:00
Riverbank R3 (Adelaide Convention Centre)

Riverbank R3

Adelaide Convention Centre

Conveners

Track 3 – Middleware and Distributed Computing: Workload Management & Cost Model

  • Catherine Biscarat (L2I Toulouse, IN2P3/CNRS (FR))

Track 3 – Middleware and Distributed Computing: HTC Sites and Related Services

  • Stefan Roiser (CERN)

Track 3 – Middleware and Distributed Computing: HPCs and Related Services

  • James Letts (Univ. of California San Diego (US))

Track 3 – Middleware and Distributed Computing: Operations & Monitoring

  • Stefan Roiser (CERN)

Track 3 – Middleware and Distributed Computing: Security & Collaboration

  • Catherine Biscarat (L2I Toulouse, IN2P3/CNRS (FR))

Track 3 – Middleware and Distributed Computing: Infrastructure & Identity

  • James Letts (Univ. of California San Diego (US))

Track 3 – Middleware and Distributed Computing: Information Systems, BOINC & User Analysis

  • Tomoe Kishimoto (University of Tokyo (JP))

Presentation materials

There are no materials yet.

  1. Federico Stagni (CERN)
    04/11/2019, 11:00
    Track 3 – Middleware and Distributed Computing
    Oral

    Efficient access to distributed computing and storage resources is mandatory for the success of current and future High Energy and Nuclear Physics Experiments. DIRAC is an interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for the Workload, Data and Production Management tasks of large scientific communities. A single...

    Go to contribution page
  2. Thomas Britton (JLab)
    04/11/2019, 11:15
    Track 3 – Middleware and Distributed Computing
    Oral

    MCwrapper is a set of system that manages the entire Monte Carlo production workflow for GlueX and provides standards for how that Monte Carlo is produced. MCwrapper was designed to be able to utilize a variety of batch systems in a way that is relatively transparent to the user, thus enabling users to quickly and easily produce valid simulated data at home institutions worldwide. ...

    Go to contribution page
  3. Dr Kenneth Richard Herner (Fermi National Accelerator Laboratory (US))
    04/11/2019, 11:30
    Track 3 – Middleware and Distributed Computing
    Oral

    The Deep Underground Neutrino Experiment (DUNE) will be the world’s foremost neutrino detector when it begins taking data in the mid-2020s. Two prototype detectors, collectively known as ProtoDUNE, have begun taking data at CERN and have accumulated over 3 PB of raw and reconstructed data since September 2018. Particle interaction within liquid argon time projection chambers are challenging to...

    Go to contribution page
  4. Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
    04/11/2019, 11:45
    Track 3 – Middleware and Distributed Computing
    Oral

    Efforts in distributed computing of the CMS experiment at the LHC at CERN are now focusing on the functionality required to fulfill the projected needs for the HL-LHC era. Cloud and HPC resources are expected to be dominant relative to resources provided by traditional Grid sites, being also much more diverse and heterogeneous. Handling their special capabilities or limitations and maintaining...

    Go to contribution page
  5. Andre Sailer (CERN)
    04/11/2019, 12:00
    Track 3 – Middleware and Distributed Computing
    Oral

    Software tools for detector optimization studies for future experiments need to be efficient and reliable. One important ingredient of the detector design optimization concerns the calorimeter system. Every change of the calorimeter configuration requires a new set of overall calibration parameters which in its turn requires a new calorimeter calibration to be done. An efficient way to perform...

    Go to contribution page
  6. Andrea Sciabà (CERN)
    04/11/2019, 12:15
    Track 3 – Middleware and Distributed Computing
    Oral

    The increase in the scale of LHC computing during Run 3 and Run 4 (HL-LHC) will certainly require radical changes to the computing models and the data processing of the LHC experiments. The working group established by WLCG and the HEP Software Foundation to investigate all aspects of the cost of computing and how to optimise them has continued producing results and improving our understanding...

    Go to contribution page
  7. Antonio Delgado Peris (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
    04/11/2019, 14:00
    Track 3 – Middleware and Distributed Computing
    Oral

    There is a general trend in WLCG towards the federation of resources, aiming for increased simplicity, efficiency, flexibility, and availability. Although general, VO-agnostic federation of resources between two independent and autonomous resource centres may prove arduous, a considerable amount of flexibility in resource sharing can be achieved, in the context of a single WLCG VO, with a...

    Go to contribution page
  8. Dr Daniel Peter Traynor (Queen Mary University of London (GB))
    04/11/2019, 14:15
    Track 3 – Middleware and Distributed Computing
    Oral

    The Queen Mary University of London WLCG Tier-2 Grid site has been providing GPU resources on the Grid since 2016. GPUs are an important modern tool to assist in data analysis. They have historically been used to accelerate computationally expensive but parallelisable workloads using frameworks such as OpenCL and CUDA. However, more recently their power in accelerating machine learning,...

    Go to contribution page
  9. Xiaowei Jiang (IHEP(中国科学院高能物理研究所))
    04/11/2019, 14:30
    Track 3 – Middleware and Distributed Computing
    Oral

    In a HEP Computing Center, at least 1 batch systems are used. As an example, at IHEP, we’ve used 3 batch systems, PBS, HTCondor and Slurm. After running PBS as local batch system for 10 years, we replaced it by HTCondor (for HTC) and Slurm (for HPC). During that period, problems came up on both user and admin sides.

    On user side, the new batch systems bring a set of new commands, which...

    Go to contribution page
  10. Xiaomei Zhang (Chinese Academy of Sciences (CN))
    04/11/2019, 14:45
    Track 3 – Middleware and Distributed Computing
    Oral

    The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment, which plans to take about 2PB raw data each year starting from 2021. The experiment data plans to be stored in IHEP and have another copy in Europe (CNAF, IN2P3, JINR data centers). MC simulation tasks are expected to be arranged and operated through a distributed computing system to share efforts among...

    Go to contribution page
  11. Michael Schuh (Deutsches Elektronen-Synchrotron DESY)
    04/11/2019, 15:00
    Track 3 – Middleware and Distributed Computing
    Oral

    Low latency, high throughput data processing in distributed environments is a key requirement of today's experiments. Storage events facilitate synchronisation with external services where the widely adopted request-response pattern does not scale because of polling as a long-running activity. We discuss the use of an event broker and stream processing platform (Apache Kafka) for storage...

    Go to contribution page
  12. Paul Nilsson (Brookhaven National Laboratory (US))
    05/11/2019, 11:00
    Track 3 – Middleware and Distributed Computing
    Oral

    The unprecedented computing resource needs of the ATLAS experiment have motivated the Collaboration to become a leader in exploiting High Performance Computers (HPCs). To meet the requirements of HPCs, the PanDA system has been equipped with two new components; Pilot 2 and Harvester, that were designed with HPCs in mind. While Harvester is a resource-facing service which provides resource...

    Go to contribution page
  13. David Schultz (University of Wisconsin-Madison)
    05/11/2019, 11:15
    Track 3 – Middleware and Distributed Computing
    Oral

    For the past several years, IceCube has embraced a central, global overlay grid of HTCondor glideins to run jobs. With guaranteed network connectivity, the jobs themselves transferred data files, software, logs, and status messages. Then we were given access to a supercomputer, with no worker node internet access. As the push towards HPC increased, we had access to several of these...

    Go to contribution page
  14. Prof. Andreas Wicenec (International Centre of Radio Astronomy Research)
    05/11/2019, 11:30
    Track 3 – Middleware and Distributed Computing
    Oral

    The SKA will enable the production of full polarisation spectral line cubes at a very high spatial and spectral resolution. Performing a back-of-the-evelope estimate gives you the incredible amount of around 75-100 million tasks to run in parallel to perform a state-of-the-art faceting algorithm (assuming that it would spawn off just one task per facet, which is not the case). This simple...

    Go to contribution page
  15. Fernando Harald Barreiro Megino (University of Texas at Arlington)
    05/11/2019, 11:45
    Track 3 – Middleware and Distributed Computing
    Oral

    ATLAS Computing Management has identified the migration of all resources to Harvester, PanDA’s new workload submission engine, as a critical milestone for Run 3 and 4. This contribution will focus on the Grid migration to Harvester.
    We have built a redundant architecture based on CERN IT’s common offerings (e.g. Openstack Virtual Machines and Database on Demand) to run the necessary Harvester...

    Go to contribution page
  16. Maiken Pedersen (University of Oslo (NO))
    05/11/2019, 12:00
    Track 3 – Middleware and Distributed Computing
    Oral

    The WLCG is today comprised of a range of different types of resources such as cloud centers, large and small HPC centers, volunteer computing as well as the traditional grid resources. The Nordic Tier 1 (NT1) is a WLCG computing infrastructure distributed over the Nordic countries. The NT1 deploys the Nordugrid ARC CE, which is non-intrusive and lightweight, originally developed to cater for...

    Go to contribution page
  17. Benedikt Riedel (University of Wisconsin-Madison), Benedikt Riedel (University of Wisconsin-Madison)
    05/11/2019, 12:15
    Track 3 – Middleware and Distributed Computing
    Oral

    Many of the challenges faced by the LHC experiments (aggregation of distributed computing resources, management of data across multiple storage facilities, integration of experiment-specific workflow management tools across multiple grid services) are similarly experienced by "midscale" high energy physics and astrophysics experiments, particularly as their data set volumes are increasing at...

    Go to contribution page
  18. Alessandro Di Girolamo (CERN)
    05/11/2019, 14:00
    Track 3 – Middleware and Distributed Computing
    Oral

    In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions...

    Go to contribution page
  19. Federica Legger (Universita e INFN Torino (IT))
    05/11/2019, 14:15
    Track 3 – Middleware and Distributed Computing
    Oral

    The CMS computing infrastructure is composed by several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered in several sources and typically accessible only by experts....

    Go to contribution page
  20. Thomas Beermann (University of Innsbruck (AT))
    05/11/2019, 14:30
    Track 3 – Middleware and Distributed Computing
    Oral

    For the last 10 years, the ATLAS Distributed Computing project has based its monitoring infrastructure on a set of custom designed dashboards provided by CERN-IT. This system functioned very well for LHC Runs 1 and 2, but its maintenance has progressively become more difficult and the conditions for Run 3, starting in 2021, will be even more demanding; hence a more standard code base and more...

    Go to contribution page
  21. Lukas Layer (Universita e sezione INFN di Napoli (IT))
    05/11/2019, 14:45
    Track 3 – Middleware and Distributed Computing
    Oral

    The central Monte-Carlo production of the CMS experiment utilizes the WLCG infrastructure and manages daily thousands of tasks, each up to thousands of jobs. The distributed computing system is bound to sustain a certain rate of failures of various types, which are currently handled by computing operators a posteriori. Within the context of computing operations, and operation intelligence, we...

    Go to contribution page
  22. Tadashi Murakami (KEK)
    05/11/2019, 15:00
    Track 3 – Middleware and Distributed Computing
    Oral

    Relational database (RDB) and its management system (RDBMS) offer many advantages to us, such as a rich query language, maintainability gained from a concrete schema, robust and reasonable backup solutions such as differential backup, and so on. Recently, some of RDBMS has supported column-store features that offer data compression with a high level of both data size and query performance....

    Go to contribution page
  23. Peter Love (Lancaster University (GB))
    05/11/2019, 15:15
    Track 3 – Middleware and Distributed Computing
    Oral

    In this work we review existing monitoring outputs and recommend some novel alternative approaches to improve the comprehension of large volumes of operations data that are produced in distributed computing. Current monitoring output is dominated by the pervasive use of time-series histograms showing the evolution of various metrics. These can quickly overwhelm or confuse the viewer due to the...

    Go to contribution page
  24. Andrea Ceccanti (Universita e INFN, Bologna (IT))
    05/11/2019, 16:30
    Track 3 – Middleware and Distributed Computing
    Oral

    The WLCG Authorisation Working Group formed in July 2017 with the objective to understand and meet the needs of a future-looking Authentication and Authorisation Infrastructure (AAI) for WLCG experiments. Much has changed since the early 2000s when X.509 certificates presented the most suitable choice for authorisation within the grid; progress in token based authorisation and identity...

    Go to contribution page
  25. David Crooks (Science and Technology Facilities Council STFC (GB))
    05/11/2019, 16:45
    Track 3 – Middleware and Distributed Computing
    Oral

    The information security threats currently faced by WLCG sites are both sophisticated and highly profitable for the actors involved. Evidence suggests that targeted organisations take on average more than six months to detect a cyber attack, with more sophisticated attacks being more likely to pass undetected.

    An important way to mount an appropriate response is through the use of a...

    Go to contribution page
  26. David Schultz (University of Wisconsin-Madison)
    05/11/2019, 17:00
    Track 3 – Middleware and Distributed Computing
    Oral

    As part of a modernization effort at IceCube, a new unified authorization system has been developed to allow access to multiple applications with a single credential. Based on SciTokens and JWT, it allows for the delegation of specific accesses to cluster jobs or third party applications on behalf of the user. Designed with security in mind, it includes short expiration times on access...

    Go to contribution page
  27. Sebastian Lopienski (CERN)
    05/11/2019, 17:15
    Track 3 – Middleware and Distributed Computing
    Oral

    In this talk, the speaker will present the computer security risk landscape as faced by academia and research organisations; look into various motivations behind attacks; and explore how these threats can be addressed. This will be followed by details of several types of vulnerabilities and incidents recently affecting HEP community, and lessons learnt. The talk will conclude with a outlook...

    Go to contribution page
  28. David Crooks (Science and Technology Facilities Council STFC (GB))
    05/11/2019, 17:30
    Track 3 – Middleware and Distributed Computing
    Oral

    IRIS is the co-ordinating body of a UK science eInfrastructure and is a collaboration between UKRI-STFC, its resource providers and representatives from the science activities themselves. We document the progress of an ongoing project to build a security policy trust framework suitable for use across the IRIS community.

    The EU H2020-funded AARC projects addressed the challenges involved in...

    Go to contribution page
  29. Giuseppe Andronico (Universita e INFN, Catania (IT))
    05/11/2019, 17:45
    Track 3 – Middleware and Distributed Computing
    Oral

    The Jiangmen Underground Neutrino Observatory (JUNO) is an underground 20 kton liquid scintillator detector being built in the south of China and expected to start data taking in late 2021. The JUNO physics program is focused on exploring neutrino properties, by means of electron anti-neutrinos emitted from two nuclear power complexes at a baseline of about 53km. Targeting an unprecedented...

    Go to contribution page
  30. Christophe Haen (CERN)
    07/11/2019, 11:00
    Track 3 – Middleware and Distributed Computing
    Oral

    DIRACOS is a project aimed to provide a stable base layer of dependencies, on top of which the DIRAC middleware is running. The goal was to produce a coherent environment for grid interaction and streamline the operational overhead. Historically the DIRAC dependencies were grouped in two bundles; Externals containing Python and standard binary libraries, and the LCGBundle which contained all...

    Go to contribution page
  31. Hannah Short (CERN)
    07/11/2019, 11:15
    Track 3 – Middleware and Distributed Computing
    Oral

    Until recently, CERN had been considered eligible for academic pricing of Microsoft products. Now, along with many other research institutes, CERN has been disqualified from this educational programme and faces a 20 fold increase in license costs. CERN’s current Authentication and Authorisation Infrastructure comprises Microsoft services all the way down from the web Single-Sign-On to the...

    Go to contribution page
  32. Andrea Ceccanti (Universita e INFN, Bologna (IT))
    07/11/2019, 11:30
    Track 3 – Middleware and Distributed Computing
    Oral

    One of the key challenges identified by the HEP R&D roadmap for software and computing is the ability to integrate heterogeneous resources in support of the computing needs of HL-LHC. In order to meet this objective, a flexible Authentication and Authorization Infrastructure (AAI) has to be in place, to allow the secure composition of computing and storage resources provisioned across...

    Go to contribution page
  33. Kenneth Richard Herner (Fermi National Accelerator Laboratory (US))
    07/11/2019, 11:45
    Track 3 – Middleware and Distributed Computing
    Oral

    The CernVM FileSystem (CVMFS) is widely used in High Throughput Computing to efficiently distributed experiment code. However, the standard CVMFS publishing tools are designed for a small group of people from each experiment to maintain common software, and the tools don't work well for the majority of users that submit jobs related to each experiment. As a result, most user code, such as code...

    Go to contribution page
  34. Greg Corbett (STFC)
    07/11/2019, 12:00
    Track 3 – Middleware and Distributed Computing
    Oral

    GOCDB is the official repository for storing and presenting EGI and WLCG topology and resource information. It is a definitive information source, with the emphasis on user communities to maintain their own data. It is intentionally designed to have no dependencies on other operational tools for information.

    In recent years, funding sources and user communities have evolved and GOCDB is...

    Go to contribution page
  35. Stefano Dal Pra (Universita e INFN, Bologna (IT))
    07/11/2019, 12:15
    Track 3 – Middleware and Distributed Computing
    Oral

    The INFN Tier-1 datacentre provides computing resources to several HEP and Astrophysics experiments. These are organized in Virtual Organizations submitting jobs to our computing facilities through Computing Elements, acting as Grid interfaces to the Local Resource Manager. We are phasing-out our current LRMS (IBM/Platform LSF 9.1.3) and CEs (CREAM) set to adopt HTCondor as a replacement for...

    Go to contribution page
  36. Markus Schulz (CERN)
    07/11/2019, 14:00
    Track 3 – Middleware and Distributed Computing
    Oral

    Grid information systems enable the discovery of resources in a Grid computing infrastructure and provide further information about their structure and state.
    The original concepts for a grid information system were defined over 20 years ago and the GLUE 2.0 information model specification was published 10 years ago.
    This contribution describes the current status and highlights the changes...

    Go to contribution page
  37. Julia Andreeva (CERN)
    07/11/2019, 14:15
    Track 3 – Middleware and Distributed Computing
    Oral

    The WLCG project aimed to develop, build, and maintain a global computing facility for storage and analysis of the LHC data. While currently most of the LHC computing resources are being provided by the classical grid sites, over last years the LHC experiments have been using more and more public clouds and HPCs, and this trend will certainly increase. The heterogeneity of the LHC computing...

    Go to contribution page
  38. Alexey Anisenkov (Budker Institute of Nuclear Physics (RU))
    07/11/2019, 14:30
    Track 3 – Middleware and Distributed Computing
    Oral

    CRIC is a high-level information system which provides flexible, reliable and complete topology and configuration description for a large scale distributed heterogeneous computing infrastructure. CRIC aims to facilitate distributed computing operations for the LHC experiments and consolidate WLCG topology information. It aggregates information coming from various low-level information sources...

    Go to contribution page
  39. James Letts (Univ. of California San Diego (US))
    07/11/2019, 14:45
    Track 3 – Middleware and Distributed Computing
    Oral

    GlideinWMS is a workload management and provisioning system that allows sharing computing resources distributed over independent sites. Based on the requests made by glideinWMS Frontends, a dynamically sized pool of resources is created by glideinWMS pilot Factories via pilot job submission to resource sites' computing elements. More than 400 computing elements (CE) are currently serving more...

    Go to contribution page
  40. Enric Tejedor Saavedra (CERN)
    07/11/2019, 15:00
    Track 3 – Middleware and Distributed Computing
    Oral

    Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However,...

    Go to contribution page
  41. David Cameron (University of Oslo (NO))
    07/11/2019, 15:15
    Track 3 – Middleware and Distributed Computing
    Oral

    ATLAS@Home is a volunteer computing project which enables members of the public to contribute computing power to run simulations of the ATLAS experiment at CERN. The computing resources provided to ATLAS@Home increasingly come not only from traditional volunteers, but from data centres or office computers at institutes associated to ATLAS. The design of ATLAS@Home was built around not giving...

    Go to contribution page
Building timetable...