CHEP 2016 Abstracts

Europe/Zurich
San Francisco

San Francisco

Marion Marquis Hotel
  • Monday 10 October
    • 08:00 08:05
      PODIO - Applying plain-old-data for defining physics data models 5m
      Event data models (EDMs) are at the core of every HEP experiment’s software framework, essential for the communication between different algorithms in the data processing chain as well as for efficient I/O. Based on experience from LHC and the Linear collider community, where the existing solutions partly suffer from overly complex data models with deep object-hierarchies or unfavourable I/O performance, a new EDM toolkit for future particle physics experiments is developed in the context of the AIDA2020 EU programme. PODIO is a C++ library that supports the automatic creation and efficient handling of HEP event data models that addresses the aforementioned problems. It is based on the idea of employing plain-old-data (POD) data structures wherever possible, while avoiding deep object-hierarchies and virtual inheritance. At the same time it provides the necessary high-level interface towards the developer physicist, such as the support for inter-object relations, and automatic memory-management, as well as a (ROOT assisted) Python interface. To simplify the creation of efficient data models, PODIO employs code generation from a simple yams-based markup language. In order to support the usage of modern computing hardware, PODIO was developed from the start with concurrency in mind and gives basic support for vectorization technologies. This contribution presents the PODIO design, first experience in the context of the FCC and LC software projects, as well as performance figures when using ROOT as storage backend.
      Speaker: Benedikt Hegner (CERN)
    • 08:05 08:10
      Towards more common build tools - experience with using spack in HEP 5m
      Software development in high energy physics follows the paradigm of open-source software (OSS). Experiments as well as the theory community heavily rely on software being developed outside of the field. The number of such third party software (so-called \emph{externals}) used within a given context can easily reach over 100 interdependent software packages. Creating a consistent and working stack out of 100s of packages, on a variety of platforms is a non-trivial task. Within the field multiple technical solutions (so-called build tools) exist to configure and build those stacks. Furthermore, quite often software has to be ported to new platforms and operating systems and subsequently patches to the individual externals need to be created. This is a manual and time consuming task, requiring a very special kind of expert knowledge. None of this work is experiment specific. For this reason the HSF packaging working group evaluated various HEP- and non-HEP tools. The HPC tool spack was identified as a very promising candidate as an experiment independent build tool. This contribution summarizes the build tool evaluations, presents the first experience with using spack in HEP, the required extensions to it, and discusses its potential for HEP-wide adoption.
      Speakers: Benedikt Hegner (CERN), Brett Viren (Brookhaven National Laboratory), Elizabeth Sexton-Kennedy (Fermi National Accelerator Lab. (US))
    • 08:15 08:20
      The FCC software - how to keep SW experiment independent 5m
      The Future Circular Collider software effort aims to support all experiments that target the hadron-hadron, electron-electron or electron-hadron collider. As such, the framework has to be independent of the detector layout and the collider configuration. The project aims at using existing software packages that are experiment independent. Other packages that are close to independent are modified to combine the efforts of the communities, such as the LHCb simulation framework or the ATLAS tracking software. At the same time, new technologies are developed with this independence in mind to also allow usage outside of the FCC software project: The Python data analysis front-end is decoupled from the main software stack, only having a dependency on the event data model. The event data model itself is generated from configuration files, allowing customisation, and enables parallelisation by supporting a corresponding data layout. The contribution will give a concise overview over the FCC software project and highlight developments that can be of use to other HEP experiments, such as the experiment independent event data model library, a fast and full simulation framework and the tracking package.
      Speakers: Anna Zaborowska (Warsaw University of Technology (PL)), Benedikt Hegner (CERN), Joschka Lingemann (CERN), Valentin Volkl (University of Innsbruck (AT))
    • 08:20 08:25
      The integrated fast and full simulation framework of FCC 5m
      Software for the new generation of experiments, like those at the Future Circular Collider, should by design efficiently exploit the computing resources, especially in terms of parallel execution. The simulation package of the FCC Common Software Framework, FCCSW, makes use of parallel data processing framework provided by Gaudi, with a careful integration of external packages commonly used in HEP for the simulation: Geant4 and Delphes. The geometry description is provided by DD4hep. Using the toolset of Geant4 for full simulation that takes into account all physics processes and transports the particles through matter is CPU intensive and time consuming. At the early stage of detector design and for some physics studies such accuracy is not needed. Therefore, the overall response of the detector may be simulated in a parametric way. Geant4 provides the tools to define a parametrisation, which for the tracking detectors is performed by smearing the particle space-momentum coordinates and for calorimeters by reproducing the particle showers. The parametrisation may come from either external sources, or from the full simulation (being detector-dependent but also more accurate). The tracker resolutions may be derived from measurements of the existing detectors or from the external tools, for instance tkLayout, used in the CMS tracker performance studies. Regarding the calorimeters, the longitudinal and radial shower profiles can be parametrised using GFlash library. The Geant4 fast simulation can be applied to any type of particle in any region of the detector. The possibility to run both full and fast simulation in Geant4 creates a chance for an interplay, performing the CPU-consuming full simulation only for the regions and particles of interest. FCCSW incorporates also Delphes framework for fast simulation studies in a multipurpose detector. Phenomenological studies may be performed in an idealised geometry model, simulating the overall response of the detector. Having Delphes inside FCCSW allows users to create the analysis tools that may be used for full simulation studies as well. This presentation will show the status of the simulation package of the FCC common software framework.
      Speaker: Anna Zaborowska (Warsaw University of Technology (PL))
    • 08:25 08:30
      SWAN: a Service for Web-Based Data Analysis in the Cloud 5m
      SWAN is a novel service to perform interactive data analysis in the cloud. SWAN allows users to write and run their data analyses with only a web browser, leveraging the widely-adopted Jupyter notebook interface. The user code, executions and data live entirely in the cloud. SWAN makes it easier to produce and share results and scientific code, access scientific software, produce tutorials and demonstrations as well as preserve analyses. Furthermore, it is also a powerful tool for non-scientific data analytics. The SWAN backend combines state-of-the-art software technologies, like Docker containers, with a set of existing IT services such as user authentication, virtual computing infrastructure, mass storage, file synchronisation and sharing, specialised clusters and batch systems. In this contribution, the architecture of the service and its integration with the aforementioned CERN services is described. SWAN acts as a "federator of services" and the reasons why this feature boosts the existing CERN IT infrastructure are reviewed. Furthermore, the main characteristics of SWAN are compared to similar products offered by commercial and free providers. Use-cases extracted from workflows at CERN are outlined. Finally, the experience and feedback acquired during the first months of its operation are discussed.
    • 08:30 08:35
      Expressing Parallelism in ROOT 5m
      The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
    • 08:35 08:40
      The New ROOT Interface: Jupyter Notebooks 5m
      Notebooks represent an exciting new approach that will considerably facilitate collaborative physics analysis. They are a modern and widely-adopted tool to express computational narratives comprising, among other elements, rich text, code and data visualisations. Several notebook flavours exist, although one of them has been particularly successful: the Jupyter open source project. In this contribution we demonstrate how the ROOT framework is integrated with the Jupyter technology, reviewing features such as an unprecedented integration of Python and C++ languages and interactive data visualisation with JavaScript ROOT. In this context, we show the potential of the complete interoperability of ROOT with other analysis ecosystems such as SciPy. We discuss through examples and use-cases how the notebook approach boosts the productivity of physicists, engineers and non-coding lab scientists. Opportunities in the field of outreach, education and open-data initiatives are also reviewed.
    • 08:40 09:00
      Status and Evolution of ROOT 20m
      With ROOT 6 in production in most experiments, ROOT has changed gear during the past year: the development focus on the interpreter has been redirected into other areas. This presentation will summarize the developments that have happened in all areas of ROOT, for instance concurrency mechanisms, the serialization of C++11 types, new graphics palettes, new "glue" packages for multivariate analyses, and the state of the Jupyter and JavaScript interfaces and language bindings. It will lay out the short term plans for ROOT 5 and ROOT 6 and try to forecast the future evolution of ROOT, for instance with respect to more robust interfaces and a fundamental change in the graphics and GUI system.
    • 09:00 09:20
      New Directions in the CernVM File System 20m
      The CernVM File System today is commonly used to host and distribute application software stacks. In addition to this core task, recent developments expand the scope of the file system into two new areas. Firstly, CernVM-FS emerges as a good match for container engines to distribute the container image contents. Compared to native container image distribution (e.g. through the ``Docker registry’’), CernVM-FS massively reduces the network traffic for image distribution. This has been shown, for instance, by a prototype integration of CernVM-FS into Mesos developed by Mesosphere, Inc. We present possible paths for a smooth integration in Docker and necessary changes to the CernVM-FS server to support the typical container image work flow and lifecycle. Secondly, CernVM-FS recently raised interest as an option for the distribution of experiment data file catalogs. While very powerful tools are in use for accessing data files in a distributed and scalable manner, finding the file names is typically done by a central, experiment specific SQL database. A name space on CernVM-FS can particularly benefit from an existing, scalable infrastructure, from the POSIX interface and the end-to-end content verification. For this use case, we outline necessary modifications to the CernVM-FS server in order to provide a generic, distributed namespace that supports billions of files and thousands of writes per second.
      Speaker: Jakob Blomer (CERN)
    • 09:20 09:40
      Validation of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV project 20m
      High-energy particle physics (HEP) has advanced greatly over recent years and current plans for the future foresee even more ambitious targets and challenges that have to be coped with. Amongst the many computer technology R&D areas, simulation of particle detectors stands out as the most time consuming part of HEP computing. An intensive R&D and programming effort is required to exploit the new opportunities offered by technological developments in order to support the scientific progress and the corresponding increasing demand of computing power necessary for future experimental HEP programs. The GeantV project aims at narrowing the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting the latest advances in computer technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries profiting by instruction level parallelism (SIMD and SIMT) and task-level parallelism (multithreading), following both the multi-core and the many-core opportunities. We present preliminary validation results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, alternative sampling techniques have been implemented and tested. Some of these techniques introduce intervals and discrete tables. We identify artefacts that are introduced by different discrete sampling techniques and determine the energy range in which these methods provide acceptable approximation. We introduce a set of automated statistical analysis in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data. The validation presented here is part of a larger effort, involving Cern, Fermilab and SLAC, for a common development of new physics validation framework designed for various particle physics detector simulation software and is focused on the extension for GeantV.
      Speaker: Marilena Bandieramonte (CERN)
    • 09:40 10:00
      Stochastic optimisation of GeantV code by use of genetic algorithms 20m
      GeantV simulation is a complex system based on the interaction of different modules needed for detector simulation, which include transportation (heuristically managed mechanism of sets of predefined navigators), scheduling policies, physics models (cross-sections and reaction final states) and a geometrical modeler library with geometry algorithms. The GeantV project is recasting the simulation framework to get maximum benefit from SIMD/MIMD computational architecture and highly massive parallel systems. This involves finding the appropriate balance of several aspects influencing computational performance (floating-point performance, usage of off-chip memory bandwidth, specification of cache hierarchy, and etc.) and a large number of program parameters that have to be optimized to achieve the best speedup of simulation. This optimisation task can be treated as a "black-box” optimization problem, which requires searching the optimum set of parameters using only point-wise function evaluations. The goal of this study is to provide a mechanism for optimizing complex systems (high energy physics particle transport simulations) with the help of genetic algorithms and evolution strategies as a tuning process for massive coarse-grain parallel simulations. One of the described approaches is based on introduction of specific multivariate analysis operator that could be used in case of resource expensive or time consuming evaluations of fitness functions, in order to speed-up the convergence of the "black-box" optimization problem.
      Speaker: Oksana Shadura (National Technical Univ. of Ukraine "Kyiv Polytechnic Institute)
    • 10:00 10:20
      GeantV phase 2: developing the particle transport library 20m
      After an initial R&D stage of prototyping portable performance for particle transport simulation, the GeantV project reaches a new phase where the different components such as kernel libraries, scheduling, geometry and physics are rapidly developing. The increase in complexity is accelerating by the multiplication of demonstrator examples and tested platforms, while trying to maintain a balance between code stability and new developments. While some of the development efforts start being available for the HEP community such as the geometry and vector core libraries, GeantV is passing to the stage of demonstrator in order to validate and extend its previous performance achievements on a variety of HEP detector setups. A strategy for adding native support for fast simulation was foreseen for both framework and user-defined parametrisations. This will allow integrating naturally fast simulation within the GeantV parallel workflow, without the need to run any additional programs. We will present the current status of the project, its most recent results and benchmarks, giving a perspective on the future usage of the software.
      Speaker: Andrei Gheata (CERN)
    • 10:20 10:40
      New Machine Learning Developments in ROOT 20m

      ROOT provides advanced statistical methods needed by the LHC experiments to analyze their data. These include machine learning tools for classification, regression and clustering. TMVA, a toolkit for multi-variate analysis in ROOT, provides these machine learning methods.
      We will present new developments in TMVA, including parallelisation, deep-learning neural networks, new features and
      additional interfaces to external machine learning packages.
      We will show the new modular design of the new version of TMVA, cross-validation and hyper parameter tuning capabilities, feature engineering and deep learning.
      We will further describe new parallelisation features including multi-threading, multi-processing, cluster parallelisation and present GPU support for intensive machine learning applications, such as deep learning.