ACAT 2016

UTFSM, Valparaíso (Chile)

UTFSM, Valparaíso (Chile)

Avenida España 1680, Valparaíso Chile

17th International workshop on Advanced Computing and Analysis Techniques in physics research (ACAT)

The ACAT Workshop series has a long tradition starting in 1990 (Lyon, France), and takes place in intervals of a year and a half. Formerly these workshops were known under the name AIHENP (Artificial Intelligence for High Energy and Nuclear Physics). These workshops are mainly focused to bring together experimental and theoretical high energy physicists and computer scientists, in order to exchange knowledge and experience in  computing system architectures, algorithms for data analysis, and algorithms and extended calculations in high energy physics. In addition, contributions in theoretical high energy physics (including lattice calculations), nuclear physics, astrophysics, condensed matter physics, seismology, and others, are very welcome.

The 17th edition of ACAT aims to once again bring together computer science researchers and practitioners, and researchers from particle and nuclear physics, astronomy and astrophysics and accelerator science to explore and confront the boundaries of computing, of automated data analysis as well as theoretical calculation technologies. It will create a forum for exchanging ideas among the fields and will explore and promote cutting-edge computing, data analysis and theoretical calculation technologies in fundamental physics research.

International Advisory And Coordination Committee (IACC): Denis Perret-Gallix

  • Local Organizing Committee (LOC): Luis Salinas
  • Scientific Program Committee (SPC): Federico Carminati

Important dates

  • Abstracts submission deadline - 15 November 2015 
  • Contribution acceptance announced - 25 November 2015   
  • Registration opens - 30 July 2015 
  • Early registration by - 10 December 2015 
  • Proceeding submission by February 29, 2016 March 31, 2016
  • -Sofian -Teber
  • Aleksandrs Aleksejevs
  • Alex Rogozhnikov
  • Alexander Kryukov
  • Alexandre Vaniachine
  • Alexei Klimentov
  • Alexey Baskakov
  • Alexis Pompili
  • Andreas von Manteuffel
  • Andrei Davydychev
  • Andrei Gheata
  • Andrei Kataev
  • Andrej Arbuzov
  • Arantxa Ruiz Martinez
  • Ariel Schwartzman
  • chao li
  • Christian Bogner
  • Claudio Esteban Torres
  • Daniele Bonacorsi
  • David Britton
  • Davide Cieri
  • Dzmitry Makatun
  • Enric Tejedor Saavedra
  • Federico Carminati
  • Fedor Prokoshin
  • Fons Rademakers
  • Gang Chen
  • Gionata Luisoni
  • Gorazd Cvetic
  • Gordon Watts
  • Graeme Stewart
  • Greg Corbett
  • Gregory Bell
  • Gudrun Heinrich
  • Henrikh Baghramyan
  • Igor Kondrashuk
  • Ivan Kisel
  • Jens Hoff
  • Jerome LAURET
  • Johannes Albrecht
  • Jorge Ibsen
  • Jose Seixas
  • Joseph Boudreau
  • Juan Eduardo Ramirez Vargas
  • Juan Guillermo Pavez Sepulveda
  • Juergen Reuter
  • Kiyoshi Kato
  • Konrad Meier
  • Lennart Johnsson
  • Liliana Teodorescu
  • Lorenzo Moneta
  • Lucio Anderlini
  • Luis Salinas
  • Manuel Giffels
  • Marcel Rieger
  • Maria Grigoryeva
  • Markus Fasel
  • Martin Ritter
  • Maxim Malyshev
  • Maxim Potekhin
  • Michael Poat
  • Michel Cure
  • Michele Selvaggi
  • Mikhail Kompaniets
  • Milos Lokajicek
  • Naoki Kimura
  • Nikita Kazeev
  • Niko Neufeld
  • Omar Andres Zapata Mesa
  • Oscar Castillo-Felisola
  • Paola Arce
  • Patricia Mendez Lorenzo
  • Pere Mato Vila
  • Peter Elmer
  • Peter Wegner
  • Radja Boughezal
  • Raquel Pezoa Rivera
  • Renato Quagliani
  • Ricardo Oyarzun
  • Roberto Leon
  • Rudolf Fruhwirth
  • Ryan Mackenzie White
  • Sameh Mannai
  • Sebastien Wertz
  • Sebastián Mancilla
  • Sergey Kulagin
  • Simon David Badger
  • Simon Ernesto Cardenas Zarate
  • Soon Yung Jun
  • Stanislav Poslavsky
  • Sudarshan Paramesvaran
  • Sudhir Raniwala
  • Takahiro Ueda
  • Tatiana Likhomanenko
  • Teng LI
  • Thomas Hahn
  • Thomas James Stevenson
  • Toby Burnett
  • Tomoyori Katsuaki
  • Tony Johnson
  • Viacheslav Bunichev
  • Vladyslav Shtabovenko
  • Wolfgang Waltenberger
  • Xia Dongmei
  • Xiaobin Ji
  • Xing-Tao Huang
  • Xing-Tao Huang
  • York Schröder
  • Zdenek Hubacek
    • Registration
    • Welcome
    • Plenary I
      • 1
        Christian Bogner — Generalizations of polylogarithms for Feynman integrals
        Speaker: Christian Bogner
    • 10:00 AM
      Coffee Break
    • Plenary I
    • 12:45 PM
      Lunch Break
    • Track 1: Computing Technology for Physics Research
      Convener: Niko Neufeld (CERN)
      • 6
        Reducing the energy consumption of scientific computing resources on demand
        The Rutherford Appleton Laboratory (RAL) data centre provides large-scale High Performance Computing facilities for the scientific community. It currently consumes approximately 1.5MW and this has risen by 25% in the past two years. RAL has been investigating leveraging preemption in the Tier 1 batch farm to save power. HEP experiments are increasing using jobs that can be killed to take advantage of opportunistic CPU resources or novel cost models such as Amazon’s spot pricing. Additionally, schemes from energy providers are available that offer financial incentives to reduce power consumption at peak times. Under normal operating conditions, 3% of the batch farm capacity is wasted due to draining machines. By using preemptable jobs, nodes can be rapidly made available to run multicore jobs without this wasted resource. The use of preemptable jobs has been extended so that at peak times machines can be hibernated quickly to save energy. This paper describes the implementation of the above and demonstrates that RAL could in future take advantage of such energy saving schemes.
        Speaker: Mr Greg Corbett (STFC - Rutherford Appleton Lab. (GB))
      • 7
        C++ Software Quality in the ATLAS experiment: Tools and Experience
        The ATLAS experiment at CERN uses about six million lines of code and currently has about 420 developers whose background is largely from physics. In this paper we explain how the C++ code quality is managed using a range of tools from compile-time through to run time testing and reflect on the great progress made in the last year largely through the use of static analysis tools such as $Coverity{®}$, an industry-standard tool which enables quality comparison with general open source C++ code. Other tools including cppcheck, Include-What-You-Use and run-time 'sanitizers' are also discussed.
        Speaker: Graeme Stewart (University of Glasgow (GB))
      • 3:15 PM
        Coffee break
      • 8
        Dynamic provisioning of a HEP computing infrastructure on a shared hybrid HPC system
        The Institut für Experimentelle Kernphysik (EKP) at KIT is a member of the CMS and Belle II experiments, located at the LHC and the Super-KEKB accelerators, respectively. These detectors share the requirement, that enormous amounts of measurement data must be processed and analyzed and a comparable amount of simulated events is required to compare experimental results with theory predictions. Nowadays, funding agencies encourage research groups to participate in shared HPC cluster models, were scientist from different domains use the same hardware to increase synergies. This shared usage proves to be challenging for high-energy physics (HEP) groups, due to their specialized software setup which includes a custom OS (often Scientific Linux), libraries and applications. To overcome this hurdle, the EKP and data center team of the University of Freiburg have developed a system to enable the HEP use case on a shared HPC cluster. To achieve this, an OpenStack-based virtualization layer is installed on-top of a bare-metal cluster. While other user groups can run their batch jobs via the Moab workload manager directly on bare-metal, HEP users can request virtual machines with a specialized machine image which contains a dedicated operating system and software stack. Contrary to similar installations, in this hybrid setup, no static partitioning of the cluster into a physical and virtualized segment is required. A seamless integration with the jobs sent by other users groups honors the fairshare policies of the cluster. The developed thin integration layer between OpenStack and Moab can be adapted to other batch servers and virtualization systems, making the concept also applicable for other cluster operators. This contribution will report on the concept and implementation of an OpenStack-virtualized cluster used for HEP workflows. While the full cluster will be installed in spring 2016, a test-bed setup with 800 cores has been used to study the overall system performance and dedicated HEP jobs were run in a virtualized environment over many weeks. Furthermore, the dynamic integration of the virtualized worker nodes, depending on the workload at the institute's computing system, will be described.
        Speaker: Konrad Meier (Albert-Ludwigs-Universität Freiburg)
      • 9
        Cluster Optimization with Evaluation of Memory and CPU usage via Cgroups of ATLAS & GridPP workloads running at a Tier-2
        Modern Linux Kernels include a feature set that enables the control and monitoring of system resources, called Cgroups. Cgroups have been enabled on a production HTCondor pool located at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. A system has been put in place to collect and aggregate metrics extracted from Cgroups on all worker nodes within the Condor pool. From this aggregated data, memory and CPU usage footprints are extracted. From the extracted footprints the resource usage for each type of ATLAS and GridPP workload can be obtained and studied. This system has been used to identify broken payloads, real-world memory usage, job efficiencies etc. The system has been running in production for 1 year and a large amount of data has been collected. From these statistics we can see the difference between the original memory requested and the real world memory usage of different types of jobs. These results were used to reduce the amount of memory requested (for scheduling purposes) from the batch system and an increase in cluster utilisation was observed, at around the 10% level. By analysing the overall real world job performance we have been able to increase the utilisation of the Glasgow site of the UKI-SCOTGRID distributed Tier-2.
        Speaker: Prof. David Britton (University of Glasgow)
      • 10
        Electromagnetic Physics Models for Parallel Computing Architectures
        The recent advent of hardware architectures characterized by many-core or accelerated processors has opened up new opportunities for parallel programming models using SIMD or SIMT. To meet ever increasing needs of computing performance for future HEP experimental programs, the GeantV project was initiated in 2012 to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. Major objectives of GeantV cover all levels of parallelism managed by a concurrent task scheduler for processing multiple particles in highly parallel manner with vectorized geometries and physics algorithms. In this paper we describe implementation of portable physics models of electromagnetic processes that can be commonly used in hybrid computing platforms. Preliminary performance evaluation and validation results of new vector physics models on both CPU and coprocessors will be presented as well.
        Speaker: Soon Yung Jun (Fermi National Accelerator Lab. (US))
      • 11
        The LHCb trigger and its upgrade
        The current LHCb trigger system consists of a hardware level, which reduces the LHC bunch-crossing rate of 30 MHz to 1 MHz, at which the entire detector is read out. In a second level, implemented in a farm of 20k parallel-processing CPUs, the event rate is reduced to 12.5 kHz. In the High Level Trigger, events are buffered locally on the farm nodes, which gives time to perform run-by-run detector calibrations. These allow publication quality reconstruction to be run in the trigger system, as demonstrated by the first LHCb analyses performed and published solely on trigger data. Special attention is given to the use of multivariate analyses in the High Level Trigger and their importance in controlling the output rate. The LHCb experiment plans a major upgrade of the detector and DAQ system in the LHC shutdown of 2018. In this upgrade, a purely software based trigger system is being developed, which will have to process the full 30 MHz of inelastic collisions delivered by the LHC. We review the performance of the LHCb trigger system during Run II of the LHC, focusing on the High Level Trigger. The upgrade trigger system will also be discussed.
        Speaker: Johannes Albrecht (Technische Universitaet Dortmund (DE))
      • 5:25 PM
      • 12
        New technologies for HEP - the CERN openlab
        Speaker: Fons Rademakers (CERN)
    • Track 2: Data analysis - Algorithms and Tools
      • 13
        Data Mining as a Service (DMaaS)
        Data Mining as a Service (DMaaS) is a software and computing infrastructure that allows interactive mining of scientific data in the cloud. It allows users to run advanced data analyses by leveraging the widely adopted Jupyter notebook interface. Furthermore, the system makes it easier to share results and scientific code, access scientific software, produce tutorials and demonstrations as well as preserve the analyses of scientists. In order to use DMaaS, the user connects to the service with a web browser. Once authenticated, the user is presented with an interface based on the Jupyter notebooks, where she can write and execute data analyses, see their results inlined (text, graphics) and combine those with explanations about what she is doing, everything in the same document. When finished, notebooks can be saved and shared with other colleagues who can review, modify and re-run those notebooks. The DMaaS service is entirely hosted in the cloud. All the user analyses are executed on a virtualised infrastructure, devoting to each user a private container that is isolated from the rest. Similarly, the input and output data of the analyses, as well as the notebook documents themselves, reside in cloud storage. The access and usage of this infrastructure is protected by the necessary security components, which ensure that every user is granted resources according to her credentials and permissions. This presentation describes how a first pilot of the DMaaS service is being deployed at CERN, starting from the notebook interface that has been fully integrated with the ROOT analysis framework, in order to provide all the tools for scientists to run their analyses. Additionally, we characterise the service backend, which combines a set of IT services such as user authentication, virtual computing infrastructure, mass storage, file synchronisation, conference management tools, development portals or batch systems. The added value acquired by the combination of the aforementioned categories of services is discussed. To conclude, the experience earned during the implementation of DMaaS within the portfolio of production services of CERN is reviewed, focussing on the opportunities offered by the CERNBox synchronisation service and its massive storage backend, EOS.
        Speaker: Enric Tejedor Saavedra (CERN)
      • 14
        Upgrading the ATLAS Fast Calorimeter Simulation
        Many physics and performance studies with the ATLAS detector at the Large Hadron Collider require very large samples of simulated events, and producing these using the full GEANT4 detector simulation is highly CPU intensive. Often, a very detailed detector simulation is not needed, and in these cases fast simulation tools can be used to reduce the calorimeter simulation time by a few orders of magnitude. In ATLAS, a fast simulation of the calorimeter systems was developed, called Fast Calorimeter Simulation (FastCaloSim). It provides a parametrized simulation of the particle energy response at the calorimeter read-out cell level. It is interfaced to the standard ATLAS digitization and reconstruction software, and can be tuned to data more easily than with GEANT4. The original version of FastCaloSim has been very important in the LHC Run-1, with several billion events simulated. An improved parametrisation is being developed, to eventually address shortcomings of the original version. It incorporates developments in geometry and physics lists of the last five years and benefits from knowledge acquired with the Run-1 data. It makes use of statistical techniques such as principal component analysis, and a neural network parametrisation to optimise the amount of information to store in the ATLAS simulation infrastructure. In this talk, we will review the latest developments of the new FastCaloSim parametrisation.
        Speaker: Zdenek Hubacek (Czech Technical University (CZ))
      • 15
        Parallel 4-Dimensional Cellular Automaton Track Finder for the CBM Experiment
        The future heavy-ion experiment CBM (FAIR/GSI, Darmstadt, Germany) will focus on measurement of very rare probes at interaction rates up to 10 MHz with data flow of up to 1 TB/s. The beam will provide free stream of beam particles without bunch structure. That requires full online event reconstruction and selection not only in space, but also in time, so-called 4D event building and selection. This is a task of the First-Level Event Selection (FLES). The main module of the FLES reconstruction and selection package is the Cellular Automaton (CA) based track finder. The CA algorithm consists of several logical parts. First, a short (2% of the total execution time) initialization, when we prepare the hit information for tracking, takes place. The main and the most time consuming part of the triplet construction takes 90.4% of the sequential execution time. Out of triplets we construct tracks, that takes about 4%, and in addition 3.4%, when we prepare the information for the next iteration. All steps of the algorithm were parallelized inside the time-slice, using different sources of parallelism at each step. In the initialization part hits are processed in parallel, split in portions and stored to the grid data structure. For the triplet construction part portions of hits are processed in order to obtain triplets, as well as their neighboring relations. These triplets in the next part of the track candidate construction serve as a source of parallelism, giving as a result a track-candidate for each triplet with a high level. In the track competition part the candidates are processed in parallel to reveal common hits and choose the best ones according to their chi^2-value. For the final stage portions of hits are checked in parallel in order to remove hits tagged as used from the grid structure and to prepare the input for the next track set search iteration. We describe in details all stages of the CA track finder and present results of tests on a many-core computer.
        Speaker: Prof. Ivan Kisel (FIAS, Goethe University, Frankfurt am Main)
      • 3:15 PM
        Coffe Break
      • 16
        GPUs for statistical data analysis in HEP: a performance study of GooFit on GPUs vs Roofit on CPUs
        In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the JPsiPhi invariant mass in the three-body decay B+ to JPsi Phi K+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerably resulting speed-up, while comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may apply or does not apply because its regularity conditions are not satisfied.
        Speaker: Alexis Pompili (Universita e INFN, Bari (IT))
      • 17
        The Matrix Element Method at the LHC: status and prospects for RunII
        The Matrix Element reweighting Method (MEM) is a powerful multivariate method allowing to maximally exploit the experimental and theoretical information available to an analysis. Several applications of the MEM at LHC experiments are discussed, such as searches for rare processes and measurements of properties of the Standard Model Higgs boson. The MadWeight phase-space generator, allowing for a fast and automated computation of MEM weights for any user-specified process, is briefly reviewed. A new implementation of the MEM in the C++ language, MEM++, is presented. MEM++ builds on the changes of variables used by MadWeight to accelerate the rate of convergence of the calculations, while aiming at a much improved modularity and maintainability, easing the use of the MEM for high-statistics data analyses. As examples of this modularity, the possibility to efficiently compute several weights in parallel (propagation of systematic uncertainties such as Jet Energy Scale, variations of theoretical parameters), and the straightforward implementation of the Differential MEM (DMEM), are discussed.
        Speaker: Sebastien Wertz (Universite Catholique de Louvain (UCL) (BE))
      • 18
        Status and new developments in Delphes 3
        A status of recent developments of the DELPHES C++ fast detector simulation framework will be given. New detector cards for the LHCb detector and prototypes for future e+ e- (ILC, FCC-ee) and p-p colliders at 100 TeV (FCC-hh) have been designed. The particle-flow algorithm has been optimised for high multiplicity environments such as high luminosity and boosted regimes. In addition, several new features such as photon conversions/brehmsstrahlung and vertex reconstruction including timing information have been included. State-of-the-art pile-up treatment and jet filtering/boosted techniques (such as PUPPI, SoftKiller, SoftDrop, Trimming, N-subjettiness, etc..) have been added. Finally, Delphes has been fully interfaced with the Pythia8 event generator allowing for a complete event generation/detector simulation sequence within the framework.
        Speaker: Michele Selvaggi (Universite Catholique de Louvain (UCL) (BE))
    • Track 3: Computations in theoretical Physics: Techniques and Methods
      • 19
        Automation of analytical calculations in particle physics and gravity with Redberry CAS
        With the increasing complexity of HEP problems, the performance of computer algebra systems (CASs) may become a bottleneck in a real calculation. For example, multiloop calculations in SM may involve thousands of diagrams and require to perform a huge amount of Dirac algebra and related simplifications in order to prepare expressions for further numerical analysis; calculations in field theory and gravity involve a huge amount of sophisticated tensor algebra, simplifications of Riemann monomials etc. Redberry is a free high-performance CAS written in Java and focused on the needs of (quantum) field theory. My talk will cover two applications: 1. Loop calculations in SM with FeynArts+Redberry+FIRE pipeline 2. Deriving Feynman rules and calculating oneloop counterterms in gravitational theories One of the advantages of Redberry is that calculations can be easily distributed over several threads/machines. I will give an example of how calculation of some SM process can be distributed in Amazon EC2 cloud.
        Speaker: Stanislav Poslavsky (IHEP, Protvino)
      • 20
        FeynCalc 9.0.0
        We present the version 9.0.0 of the Mathematica package FeynCalc, which is an open source tool for symbolic evaluation of Feynman diagrams and algebraic calculations in quantum field theory. This talk will focus on the highlights of the new version, that include improved tensor decomposition and partial fractioning routines for loop integrals. We also provide some examples for seamless interfacing of FeynCalc with other Mathematic packages for perturbative calculations using the FeynHelpers addon.
        Speaker: Vladyslav Shtabovenko (TUM)
      • 21
        FormCalc 9 and Extensions
        New features in FormCalc 9 are presented, most notably the combinability of (almost) arbitrary kinematics in one code and significant updates for the driver programs. Also, a new method for scripting calculations is presented, first used in the implementation of the two-loop $\mathcal{O}(\alpha_t^2)$ Higgs-mass corrections in FeynHiggs.
        Speaker: Thomas Hahn (MPI f. Physik)
      • 3:15 PM
        Coffee break
      • 22
        Computational tools for multiloop calculations and their application to the Higgs boson production cross section
        Computing the Higgs boson production cross section to N$^3$LO precision is a highly challenging task which demands for automatization to a high degree. This talk will cover two $\tt \text{Mathematica}$ packages that were written in that context but can also be applied to other processes: $\tt \text{MT}$ and $\tt \text{TopoID}$. The package $\tt \text{MT}$ is capable of computing convolution integrals that enter the infrared counterterms to partonic cross sections. The package $\tt \text{TopoID}$ is capable to analyze a given process and generate computer algebra code to perform large parts of its calculation, namely the reduction of the amplitude to scalar master integrals.
        Speaker: Dr Jens Hoff (Deutsches Elektronen-Synchrotron (DESY))
      • 23
        Calculating four-loop massless propagators with Forcer
        We present Forcer, a new FORM program for the calculation of four-loop massless propagators. The basic framework is similar to that of the Mincer program for three-loop massless propagators: the program reduces Feynman integrals to a set of master integrals in a parametric way. To overcome an ineludible complexity of the program structure at the four-loop level, most of the code was automatically generated or made with computer-assisted derivations. Correctness of the program has been checked with the recomputation of some quantities in the literature. Finally we show a number of new results.
        Speaker: Takahiro Ueda (Nikhef)
      • 24
        Automation of NLO processes and decays and POWHEG matching in WHIZARD
        We give a status report on the automation of next-to-leading order processes within the Monte Carlo event generator WHIZARD, using GoSam and OpenLoops as provider for one-loop matrix elements. To deal with divergences, WHIZARD uses automated FKS subtraction, and the phase space for singular regions is generated automatically. NLO examples for both scattering and decay processes with a focus on e+e- processes are shown. Also, first NLO-studies of observables for collisions of polarized leptons beams, e.g. at the ILC, will be presented. Furthermore, the automatic matching of the fixed-order NLO amplitudes with emissions from the parton shower within the Powheg formalism inside WHIZARD will be discussed. We also present results for top pairs at threshold in lepton collisions, including matching between a resummed threshold calculation and fixed-order NLO. This allows the investigation of more exclusive differential observables.
        Speaker: juergen reuter (DESY Hamburg, Germany)
      • 25
        Numerical multi-loop calculations: tools and applications
        In higher order calculations, multi-dimensional parameter integrals, containing also various types of singularities, are ubiquitous. We will present the program SecDec, which allows to isolate the singularities and numerically calculate their coefficients in a process-independent way. Therefore it can be used as a building block for the automation of higher order corrections beyond next-to-leading order, as a tool to provide numerical results for the master integrals occurring in an amplitude. We report on new features of the program which are devised towards this aim and give some illustrations, in particular for two-loop integrals with several mass scales.
        Speaker: Gudrun Heinrich (MPP Munich)
    • Group Photo
    • 7:00 PM
      Welcome Cocktail
    • Plenary II
      • 26
        Ariel Schwarzmann — Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics interpretation of LHC events
        Speaker: Ariel Gustavo Schwartzman (SLAC National Accelerator Laboratory (US))
      • 27
        Radja Boughezal — Scalable NNLO Phenomenology
        Speaker: Radja Boughezal (Argonne National Laboratory)
    • 10:00 AM
      Coffee Break
    • Plenary II
      • 28
        Jorge Ibsen — The ALMA Software: a end to end system to operate ALMA
      • 29
        Pete Elmer — Software and Data Citation in High Energy Physics - Current Practices and Ideas for the Future CANCELED
        Speaker: Peter Elmer (Princeton University (US))
      • 30
        Andreas von Manteuffel — New Methods for Multi-Loop Feynman Integrals
        Speaker: Andreas von Manteuffel (University of Mainz)
    • Group Photo
    • 12:50 PM
      Lunch Break
    • Track 1: Computing Technology for Physics Research
      • 31
        Evaluating Federated Data Infrastructure in Russian Academic Cloud for LHC experiments and Data Intensive Science
        The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitudes, it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of an architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersbourg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
        Speakers: Alexei Klimentov (Brookhaven National Laboratory (US)), Andrey Kirianov (B.P. Konstantinov Petersburg Nuclear Physics Institute - PNPI (), Andrey Zarochentsev (St. Petersburg State University (RU)), Dimitrii Krasnopevtsev (National Research Nuclear University MEPhI (RU))
      • 32
        Data Locality via Coordinated Caching for Distributed Processing
        With the increasing data volumes of the second LHC run, analysis groups have to handle unprecedented amounts of data. This puts many compute clusters relying on network based storage to their limit. In contrast, data locality based processing enables infrastructure to scale out practically indefinitely. However, data locality frameworks and infrastructure often add severe constraints and requirements. To address this, we have developed an approach of adding coordinated caches to existing compute clusters. Since the data stored locally is volatile and selected dynamically, only a fraction of local storage space is required. Our approach allows to freely select the degree at which data locality is provided. It may be used to work in conjunction with large network bandwidths, providing only highly used data to reduce peak loads. Alternatively, local storage may be scaled up to perform data analysis even with low network bandwidth. To prove the applicability of our approach, we have developed a prototype implementing all required functionality. It integrates seamlessly into batch systems, requiring practically no adjustments by users. We have now been actively using this prototype on a test cluster for HEP analyses. Specifically, it has been integral to our jet energy calibration analyses for CMS during run 2. The system has proven to be easily usable, while providing substantial performance improvements. Since confirming the applicability for our use case, we have investigated the design in a more general way. Simulations show that many infrastructure setups can benefit from our approach. For example, it may enable us to dynamically provide data locality in opportunistic cloud resources. The experience we have gained from our prototype enables us to realistically assess the feasibility for general production use.
        Speaker: Max Fischer (KIT - Karlsruhe Institute of Technology (DE))
      • 33
        LHCb data processing optimization using Event Index
        Experiments in high energy physics routinely require processing and storing massive amounts of data. LHCb Event Index is an indexing system for high-level event parameters. It’s primary function is to quickly select subsets of events. This paper discusses applications of Event Index to optimization of the data processing pipeline. The processing and storage capacity is limited and divided among different physics studies by expert assigned physics value. The selection pipeline consists of analyst-written algorithms (triggers and stripping lines). An event passes the selection if any of the algorithms finds it useful. Considering that some events mass more than one algorithm, the rate adjustment requires guesswork and has to be done in several iterations. In other words finding the optimal balance between the different algorithms is an unnecessary time-consuming burden an operator has to deal with. Having access to the set of per-event decisions Event Index can be used to optimize the selection procedure, relieving the algorithms authors from manually adjusting the parameters and achieving better overall efficiency. From the implementation point of view Event Index is based on Apache Lucene indices distributed over multiple shards on multiple nodes. The data is stored in a problem-neutral format, thus the system can easily be adapted for new tasks.
        Speaker: Nikita Kazeev (Yandex School of Data Analysis (RU))
      • 3:15 PM
        Coffee break
      • 34
        The software system for the Control and Data Acquisition for the Cherenkov Telescope Array
        The Cherenkov Telescope Array (CTA), as the next generation ground-based very high-energy gamma-ray observatory, is defining new areas beyond those related to physics; it is also creating new demands on the control and data acquisition system. CTA will consist of two installations, one in each hemisphere, containing tens of telescopes of different sizes. The ACTL (array control and data acquisition) system will consist of the hardware and software that is necessary to control and monitor the CTA array, as well as to time-stamp, read-out, filter and store -at aggregated rates of a few GB/s- the scientific data. The ACTL system must implement a flexible software architecture to permit the simultaneous automatic operation of multiple sub-arrays of telescopes with a minimum personnel effort on site. In addition ACTL must be able to modify the observation schedule on timescales of a few tens of seconds, to account for changing environmental conditions or to prioritize incoming scientific alerts from time-critical transient phenomena such as gamma ray bursts. This contribution summarizes the status of the development of the software architecture and the main design choices and plans.
        Speakers: Dr Matthias Fuessling (DESY), Peter Wegner (DESY)
      • 35
        ATLAS FTK a - very complex - custom parallel supercomputer
        In the ever increasing pile-up LHC environment advanced techniques of analysing the data are implemented in order to increase the rate of relevant physics processes with respect to background processes. The Fast TracKer (FTK) is a track finding implementation at hardware level that is designed to deliver full-scan tracks with $p_{T}$ above 1GeV to the ATLAS trigger system for every L1 accept (at a maximum rate of 100kHz). In order to achieve this performance a highly parallel system was designed and now it is under installation in ATLAS. In the beginning of 2016 it will provide tracks for the trigger system in a region covering the central part of the ATLAS detector, and during the year it's coverage will be extended to the full detector coverage. The system relies on matching hits coming from the silicon tracking detectors against 1 billion patterns stored in specially designed ASICS chips (Associative memory - AM06). In a first stage coarse resolution hits are matched against the patterns and the accepted hits undergo track fitting implemented at FPGA level. Tracks above the 1GeV threshold are delivered to the High Level Trigger within about 100 $\mu$s. The resolution of the tracks coming from FTK is close to the offline tracking resolution and it will allow for reliable detection of primary an<d secondary vertexes at trigger level and improved trigger performance for $b$-jets and $\tau$ leptons. This contribution will give an overview of the FTK system architecture and present the status of commissioning of the system. As well, a brief incursion in the expected performance of the FTK will be made.
        Speaker: Naoki Kimura (Aristotle Univ. of Thessaloniki (GR))
      • 36
        GeantV: from CPU to accelerators
        The GeantV project aims to research and develop the next generation simulation software describing the passage of particles through matter, targeting not only modern CPU architectures, but also more exotic resources such as GPGPU, Intel© Xeon Phi, Atom or ARM, which cannot be ignored any more for HEP computing. While the proof of concept GeantV prototype has been mainly engineered for CPU threads, we have foreseen from early stages a bridge for such accelerators, materialized in the form of architecture/technology specific backend templates. This approach allows to abstract out not only basic types such as scalar/vector, but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics specifications. While the main goal of this approach is performance and access to functionality, this comes as bonus with the insulation of the core application and algorithms from the technology layer, allowing our application to be long term maintainable and versatile to changes at the backend side. The talk will present the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture, as well as the work done for making the transport NUMA aware. We will present a detailed scalability and vectorization study conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We will also describe the current work and preliminary results for using the GeantV transport kernel on GPUs..
        Speaker: Andrei Gheata (CERN)
      • 37
        Software and Data Citation in High Energy Physics - Current Practices and Ideas for the Future
        High Energy Physics (HEP) is well known as a "Big Data" science, but it should also be seen as a "Big Software" enterprise. For example, to support the activities of the Large Hadron Collider at the European Laboratory for Particle Physics (CERN) tens of millions of lines of code have been written by thousands of researchers and engineers over the past 20 years. The wider scientific community has been investigating the development of standards for software and data citation. For software such standards can help with the attribution of credit to individuals for their contributions, and also provide metrics for assessing the impact of specific software. In addition emerging expectations regarding data and software preservation, and the reproducibility of scientific results, require greater attention to the software and data samples used. In this presentation, we will review current practices and initiatives for software and data citation and attribution in HEP. We will then explore how ideas being discussed in the wider scientific community could be applied in HEP and what could be gained in the process.
        Speaker: Peter Elmer (Princeton University (US))
    • Track 3: Computations in theoretical Physics: Techniques and Methods
      • 38
        Fun with higher-loop Feynman diagrams
        In high-energy physics experiments performed at current colliders such as the LHC, the flood of precision data requires matching theoretical efforts, in order extract the underlying event's structure. To this end, in this talk I will showcase a few techniques and results related to investigations of the structure of higher-loop Feynman integrals which provide one of the basic building blocks of high-precision perturbative calculations within elementary particle physics. I will discuss new results on the current (five-)loop frontier, pointing out some interesting links to dierent areas of mathematics such as graph theory and number theory.
        Speaker: York Schröder (UBB Chillán)
      • 39
        Geometrical splitting and reduction of Feynman diagrams
        A geometrical approach to the calculation of N-point Feynman diagrams is reviewed. It is shown that the geometrical splitting yields useful connections between Feynman integrals with different momenta and masses. It is demonstrated how these results can be used to reduce the number of variables in the occurring functions.
        Speaker: Andrei Davydychev
      • 40
        Alternative method of Reduction of the Feynman Diagrams to a set of Master Integrals
        We propose a new set of Master Integrals which can be used as a basis for multiloop calculation in any gauge massless field theory. In these theories we consider three-point Feynman diagrams with arbitrary number of loops. The corresponding multiloop integrals may be decomposed in terms of this set of the Master Integrals. We construct a new reduction procedure which we apply to perform this decomposition.
        Speaker: Igor Kondrashuk (UBB)
      • 3:15 PM
        Coffee break
      • 41
        Six loop beta function in $\phi^4$ model
        Using R* operation, IBP and integration of hyperlogarithms we calculate the 6-loop beta function of the $\phi^4$ model. One of the remarkable features of this result is that the 6 loop term contains multiple zeta values. We discuss different aspects of this calculation as well as series resummation and predictions for the 6 loop term based on the 1979 paper by Kazakov, Shirkov and Tarasov.
        Speaker: Mikhail Kompaniets (St. Petersburg State University (RU))
      • 42
        Evaluation of the bottom quark mass from Upsilon(1S)
        We present an extraction of the MSbar mass of bottom quark from the Upsilon(1S) system. We account for the leading renormalon effects in the extraction. We work in the renormalon subtracted scheme in order to control the divergence of the perturbation series coming from the pole mass renormalon, and we carefully take into account the charm quark effects.
        Speaker: Dr Gorazd Cvetic (UTFSM, Valparaiso)
      • 43
        Higgs boson production in association with jets in gluon-gluon fusion
        After the discovery of a Higgs boson during Run I at the LHC, Higgs physics has entered an era of precision measurements. Among the different production channels, gluon-gluon fusion (ggf) is the larges one, and constitutes also an irreducible background to the very important vector boson fusion process. A precise knowledge of the ggf channel is therefore fundamental. In this talk I will present detailed results for the production of a Standard Model Higgs boson in association with up to 3 jets, and the techniques which allowed to perform this computation.
        Speaker: Gionata Luisoni (CERN)
      • 44
        The flavour dependence for the four-loop QCD correction to the relation between pole and running heavy quark masses
        The semi-analytical expressions for the flavour-dependence of the O(\alpha_s^4) QCD correction between pole and running heavy quark masses is obtained using the least square method. The results are compared with the estimates obtained with the help of different approaches. The asymptotic structure of the presnted perturbative series is discussed/ The necessity of performing extra analytical calcultions to decrease fixed theoretcal uncertainties are emphasized. The should be of particular importance in the case of the determination of top-quark mass value.
        Speaker: Dr Andrei Kataev (Institute for Nuclear Research of the Russian Academy of Sciences)
    • Track 2: Data analysis - Algorithms and Tools
      • 45
        Development of Machine Learning Tools in ROOT
        ROOT, a data analysis framework, provides advanced statistical methods needed by the LHC experiments for analyzing their data. These include machine learning tools required for classification, regression and clustering. These methods are provided by the TMVA, a toolkit for multi-variate analysis within ROOT. We will present recent development in TMVA and new interfaces between ROOT and TMVA and other well known statistical tools based on R and Python. We will show a new modular design of TMVA, giving users a lot of flexibility, novel features for cross-validation, variable selection and parallelism.
        Speakers: Lorenzo Moneta (CERN), Omar Andres Zapata Mesa (Metropolitan Institute of Technology)
      • 46
        Support Vector Machines and generalisation in HEP
        We review the concept of support vector machines (SVMs) and discuss examples of their use in a number of scenarios. One of the benefits of SVM algorithms, compared with neural networks and decision trees is that they can be less susceptible to over fitting than those other algorithms are to over training. This issue is related to the generalisation of a multivariate algorithm (MVA); a problem that has often been overlooked in particle physics. We discuss cross validation and how this can be used to improve the generalisation of a MVA in the context of High Energy Physics analyses. The examples presented use the Toolkit for Multivariate Analysis (TMVA) based on ROOT and describe our improvements to the SVM functionality and new tools introduced for cross validation within this framework.
        Speaker: Thomas James Stevenson (University of London (GB))
      • 47
        Approximating Decomposed Likelihood Ratios using Machine Learning
        In High Energy Physics and many other fields likelihood ratios are a key tool when reporting results from an experiment. In order to evaluate the likelihood ratio the likelihood function is needed. However, it is common in HEP to have complex simulations that describe the distribution while not having a description of the likelihood that can be directly evaluated. This simulations are used to obtain a high dimensional observation by emulating the underlying physics of the process. Commonly, in this setting it is impossible or computationally expensive to evaluate the likelihood. We show how this problem can be solved by using discriminative classifiers in order to construct an equivalent version of the likelihood ratio that can be easily evaluated. We also show how this can be used to approximate the likelihood ratio when the underlying distribution is a weighted sum of probability distributions (e.g. signal plus background model). We demonstrate how the results can be considerably improved by decomposing the test and use a set of classifiers in a pairwise manner on the components of the mixture model and in which way this can be used to estimate the unknown coefficients of the model (e.g. the signal contribution). Finally, we present an application of the method on the estimation of non-SM coupling constants of the Higgs boson based on an effective field theory (EFT) approach and using a recently developed morphing method.
        Speaker: Juan Guillermo Pavez Sepulveda (Federico Santa Maria Technical University (CL))
      • 3:15 PM
        Coffe Break
      • 48
        Density Estimation Trees as fast non-parametric modelling tools
        Density Estimation Trees (DETs) are decision trees trained on a multivariate dataset to estimate its probability density function. While not competitive with kernel techniques in terms of accuracy, they are incredibly fast, embarrassingly parallel and relatively small when stored to disk. These properties make DETs appealing in the resource-expensive horizon of the LHC data analysis. Possible applications may include selection optimization, fast simulation and fast detector calibration. In this contribution I describe the bases of the algorithm and a hybrid, multi-threaded implementation relying on RooFit for the training, and on plain C++ for the evaluation of the density estimation. A set of applications under discussion within the LHCb Collaboration are also briefly illustrated.
        Speaker: Lucio Anderlini (Universita e INFN, Firenze (IT))
      • 49
        Boosted Decision Tree Reweighter
        Machine learning tools are commonly used in high energy physics (HEP) nowadays. In most cases, those are classification models based on ANN or BDT which are used to select the "signal" events from data. These classification models are usually trained using Monte Carlo (MC) simulated events. A frequently used method in HEP analyses is reweighting of MC to reduce the discrepancy between real processes and simulation. Typically this is done via so-called "histogram division" approach. While being very simple, this method has strong limitations in the applications. Recently classification ML tools were successfully applied to this problem [1]. Also in sociology ML-based survey reweighting is used to reduce non-respose bias [2]. In my talk I will present the novel method of reweighting, a modification of BDT algorithm, which alters the procedures of boosting and decision tree building. This method outperforms known reweighting approaches and makes it possible to reweight dozen of variables. When compared on the same problems, it requires less data. The other part of my talk is devoted to proper usage of reweighting in physical analysis, in particular, to correctly measuring the quality of reweighting. [1] Martschei, D., et al. "Advanced event reweighting using multivariate analysis." Journal of Physics: Conference Series. Vol. 368. No. 1. IOP Publishing, 2012. [2] Kizilcec, R. "Reducing non-response bias with survey reweighting: Applications for online learning researchers." Proceedings of the first ACM conference on Learning @ scale conference. ACM, 2014.
        Speaker: Aleksei Rogozhnikov (Yandex School of Data Analysis (RU))
      • 50
        Ring-shaped Calorimetry Information for a Neural e/$\gamma$ Identification with ATLAS Detector
        After the successful operation of the Large Hadron Collider resulting with the discovery of the Higgs boson, a new data-taking period (Run 2) has started. For the first time, collisions are produced with energies of 13 TeV in the centre of mass. It is foreseen the luminosity increase, reaching values as high as $10^{34}cm^{-2}s^{-1}$ yet in 2015. These changes in experimental conditions bring a proper environment for possible new physics key-findings. ATLAS is the largest LHC detector and was designed for general-purpose physics studies. Many potential physics channels have electrons or photons in their final states. For efficient studies on these channels precise measurement and identification of such particles is necessary. The identification task consists of disentangling those particles (signal) from collimated hadronic jets (background). Reported work concerns the identification process based on the calorimetric quantities. We propose the usage of ring-shaped calorimetry information, which explores the shower shape propagation throughout the calorimeter. This information is fed into a multivariate discriminator, currently an artificial neural network, responsible for hypothesis testing. The proposal is taken into account for both the Offline Reconstruction environment performed after data storage as well as the Online Trigger, used for reducing storage rate into viable levels while preserving collision events containing desired signals. . Specifically, this ring description for calorimeter data may be used in the ATLAS High-Level Trigger. Specifically, this ring description for calorimeter data may be used in the ATLAS High-Level Trigger as a calorimeter-based preselection at the first step in the trigger chain. Preliminary studies on Monte Carlo suggest that the fake rate can be reduced by as much as 50% over the current methods used in the High-Level Trigger, allowing for high-latency reconstruction algorithms such as tracking to run over regions of interest at a later stage of the trigger.
        Speaker: Joao Victor Da Fonseca Pinto (Univ. Federal do Rio de Janeiro (BR))
      • 51
        Multivariate Analysis for particle identification in a Highly Granularity Semi-Digital Hadronic Calorimeter for ILC Experiments
        The Semi-Digital Hadronic CALorimeter(SDHCAL) using Glass Resistive Plate Chambers (GRPCs) is one of the two hadronic calorimeter options proposed by the ILD (International Large Detector) project for the future (ILC) International Linear Collider experiments. It is a sampling calorimeter with 48 layers. Each layer has a size of 1 m² and finely segmented into cells of 1 cm² ensuring a high granularity which is required for the application of the Particle Flow Algorithm (PFA) in order to improve the jet energy resolution which is the corner stone of ILC experiments. The electronic of SDHCAL provide 2-bit readout. It is equiped with power pulisng mode reducing the power consumption and thus heating related problems. The performance of the SDHCAL technological prototype was tested successfully in beam tests several times at CERN during 2012, 2014 and 2015. The pion test beam data taken at CERN suffers from a significant contamination of muons and electrons which should be drastically reduced in order to study the hadronic showers, and reconstruct their energy. In this purpose, a selection based on a simple cut on a topological variables is applied to single out the pions. Furthermore, in order to achieve better results in particle identification, several MultiVariate methods, provided by TMVA toolkit, were tested on Monte Carlo simulation. The main classification methods used to separate the signal from the background events were the Neural Network method with the Multivariate Perceptron class (MLP) and the Boosted Decision Tree method. A comparison of MVA based cuts with the traditional cuts will be shown and discussed. Preliminary tests indicate this technique is promising and can be reliable for the real data analysis.
        Speaker: Mrs SAMEH MANNAI (Université Catholique de Louvain. Belgium)
    • Plenary III
    • 10:00 AM
      Coffee Break
    • Plenary III
      • 54
        Arantxa Ruiz Martinez — The Run-2 ATLAS Trigger System
        Speaker: Arantxa Ruiz Martinez (Carleton University (CA))
      • 55
        Andrei Arbuzov — Computer system SANC: its development and applications
      • 56
        Sergei Gleyzer — Evolution of Machine Learning methods and tools in HEP. CANCELED
        Speaker: Dr Sergei Gleyzer (University of Florida (US))
    • 12:45 PM
      Lunch Break
    • Excursion
    • Plenary IV
    • 10:00 AM
      Coffee Break
    • Plenary IV
      • 59
        Simon Badger — Automating QCD amplitude computations
        Speaker: Simon David Badger (University of Edinburgh (GB))
      • 60
        Karim Pichara — Data science for astronomy
      • 61
        Gregory Bell — Discovery, Unconstrained by Geography
        Speakers: Gregory Bell, Gregory Bell (Lawrence Berkeley National Laboratory)
    • 12:45 PM
      Lunch Break
    • Track 1: Computing Technology for Physics Research
      • 62
        Using NERSC High-Performance Computing (HPC) systems for high-energy nuclear physics applications
        High-Performance Computing Systems are powerful tools tailored to support large-scale applications that rely on low-latency inter-process communications to run efficiently. By design, these systems often impose constraints on application workflows, such as limited external network connectivity and whole node scheduling, that make more general-purpose computing tasks, such as those commonly found in high-energy nuclear physics applications, more difficult to carry out. In this work, we present a tool designed to simplify access to such complicated environments by handling the common tasks of job submission, software management, and local data management, in a framework that is easily adaptable to the specific requirements of various computing systems. The tool, initially constructed to process stand-alone ALICE simulations for detector and software development, was successfully deployed on the NERSC computing systems, Carver, Hopper and Edison, and is being configured to provide access to the next generation NERSC system, Cori. In this report, we describe the tool and discuss our experience running ALICE applications on NERSC HPC systems. The discussion will include our initial benchmarks of Cori compared to other systems and our attempts to leverage the new capabilities offered with Cori to support data-intensive applications, with a future goal of full integration of such systems into ALICE grid operations.
        Speaker: Markus Fasel (Lawrence Berkeley National Lab. (US))
      • 63
        Multi-resource planning: Simulations and study of a new scheduling approach for distributed data production in High Energy and Nuclear Physics
        Distributed data processing has found its application in many fields of science (High Energy and Nuclear Physics (HENP), astronomy, biology to name only those). We have focused our research on distributed data production which is an essential part of computations in HENP. Using our previous experience, we have recently proposed a new scheduling approach for distributed data production which is based on the network flow maximization model. It has a polynomial complexity which provides required scalability with respect to the size of computations. Our approach improves the overall data production throughput due to three factors: transferring input files in advance before their processing which allows to decrease I/O latency; balancing of the network traffic, which includes splitting the load between several alternative transfer paths; and transferring files sequentially in a coordinated manner, which allows to reduce the influence of possible network bottlenecks. In this contribution, we intend to present the results of our new simulations based on the GridSim framework which is one of the commonly used tools in the field of distributed computations. In these simulations we study the behavior of commonly used scheduling approaches compared to our recently proposed approach in a realistic environment created by using the data from the STAR and ALICE experiments. We will also discuss how the data production can be optimized with respect to possible bottlenecks (network, storage, CPUs) and study the influence of the background traffic on the simulated schedulers. The final goal of the research is to integrate the proposed scheduling approach into the real data production framework. In order to achieve this we are constantly moving our simulations towards real use cases, study scalability of the model and the influence of the scheduling parameters on the quality of the solution.
        Speaker: Dzmitry Makatun (Faculity of Nuclear Physics and Physical Engineering, Czech Technical University in Prague)
      • 64
        A scalable architecture for online anomaly detection of WLCG batch jobs
        For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving jobs preventing a smooth operation need to be detected as early as possible. At the GridKa Tier 1 centre we therefore operate a tool for monitoring traffic data and characteristics of WLCG jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network. The contribution discusses different issues regarding the optimisation of computational costs, network overhead, and accuracy of anomaly detection. Based on simulations we will show the influence of different parameters, e.g. network size, location of computation, but also characteristics of WLCG batch jobs. The simulations are based on real batch job network traffic data that has been collected for several months.
        Speaker: Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE))
      • 3:15 PM
        Coffee break
      • 65
        Performance and Advanced Data Placement Techniques with Ceph’s Distributed Storage System
        The STAR online computing environment is a demanding concentrated multi-purpose compute system with the objective to obtain maximum throughput and process concurrency. Motivation for extending the STAR compute farm from a simple job processing tool for data taking, into a multipurpose resource equipped with a large storage system would lead any dedicated resources to become an extremely efficient and an attractive multi-purpose facility. To achieve this goal, our compute farm is using the Ceph distributed storage system which has proven to be an agile solution due to its successful POSIX interface and excelling its object storage in I/O concurrency. With this we have taken our cluster one step further by squeezing more performance with investigating and leveraging new technologies and key features of Ceph. With an acquisition of a 10Gb backbone network we have ensured to eliminate the network as a limitation. With further acquisition of large fast drives (1TB SSDs) we will also show how one can customize the placement of data and make good use of the I/O performance tweaking options Ceph has to offer. Finally, we will be discussing OSD Pool mapping in the context of redundancy based on compute racks, rows, PDU’s and other physical parameters. We will also present and discuss the cost comparatives of our cluster with other traditional storage systems such as NAS and SAN and the performance of using older hardware to work as one cooperative storage system. We will present our latest performance results as well as the stability, lessons learned, and overall experience with the STAR Ceph cluster and the steps taken to mitigate the problems we’ve come across. Furthermore we will present the tools we used to manage, maintain, and monitor the Ceph cluster with the use of tools such as the CFEngine configuration management tool and the Icinga Infrastructure monitoring system giving the STAR admins a bird’s eye view of the cluster state and a centrally managed point to ensure configuration consistency. We hope our presentation will serve the community’s interest for the Ceph distributed storage solution.
        Speaker: Michael Poat (Brookhaven National Laboratory)
      • 66
        Experiments Toward a Modern Analysis Environment: Functional Programming Style, Scriptless, Continuous Integration, and Everything in Source Control
        A modern high energy physics analysis code is complex. As it has for decades, it must handle high speed data I/O, corrections to physics objects applied at the last minute, and multi-pass scans to calculate some corrections. More recently an analysis has to regularly accommodate multi-100 GB dataset sizes, multi-variate signal/background separation techniques, larger collaborative teams, and reproducibility and data preservation requirements. The result is often a series of scripts and separate programs stitched together by hand or automated by small driver programs scattered around an analysis team’s working directory and disks. Worse, the code is often much harder to read and understand because most of it is dealing with these requirements, not with the physics. This paper describes a framework that is built around the functional and declarative features of the C# language and its Language Integrated Query (LINQ) extensions to declare an analysis. The framework uses language tools to convert the analysis into C++ and runs ROOT or PROOF as a backend to determine the results. This gives the analyzer the full power of an object-oriented programming language to put together the analysis and at the same time the speed of C++ for the analysis loop. The tool allows one to incorporate C++ algorithms written for ROOT by others. A by-product of the design is the ability to cache results between runs, dramatically reducing the cost of adding one-more-plot and also to keep a complete record associated with each plot for, to aid with data preservation and log-book annotation. The code is mature enough to have been used in ATLAS analyses. The package is open source and available on the open source site GitHub. Recent improvements include the ability to run jobs on the GRID and access GRID datasets as a natural part of the analysis code, further tools to help with data preservation, and a start towards incorporating tools like TMVA, the multivariate analysis package in ROOT, into the code.
        Speaker: Gordon Watts (University of Washington (US))
      • 67
        Multi-threaded Software Framework Development for the ATLAS Experiment
        ATLAS's current software framework, Gaudi/Athena, has been very successful for the experiment in LHC Runs 1 and 2. However, its single threaded design has been recognised for some time to be increasingly problematic as CPUs have increased core counts and decreased available memory per core. Even the multi-process version of Athena, AthenaMP, will not scale to the range of architectures we expect to use beyond Run2. ATLAS examined the requirements on an updated multi-threaded framework and laid out plans for a new framework, including better support for high level trigger (HLT) use cases, in 2014. In this paper we report on our progress in developing the new multi-threaded task parallel extension of Athena, AthenaMT. Implementing AthenaMT has required many significant code changes. Progress has been made in updating key concepts of the framework, to allow the incorporation of different levels of thread safety in algorithmic code (from un-migrated thread-unsafe code, to thread safe copyable code to reentrant code). Substantial advances have also been made in implementing a data flow centric design, which has fundamental implications on the structure of the framework, as well as on the development of the new 'event views' infrastructure. These event views support partial event processing and are an essential component to support the HLT's processing of certain regions of interest and we give results from early tests. A major effort has also been invested to have an early version of AthenaMT that can run simulation on many core architectures, which has augmented the understanding gained from work on earlier demonstrators ATLAS demonstrators. We also discuss progress in planning the migration of the large ATLAS algorithmic code base to AthenaMT for Run3.
        Speaker: Graeme Stewart (University of Glasgow (GB))
      • 68
        The ATLAS EventIndex: data flow and inclusion of other metadata
        The ATLAS EventIndex is the catalogue of the event-related metadata for the information obtained from the ATLAS detector. The basic unit of this information is event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex are the event picking, providing information for the Event Service and data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the GRID, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalog AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the grid jobs. EventIndex developments started in 2013 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.
        Speaker: Fedor Prokoshin (Federico Santa Maria Technical University (CL))
    • Track 2: Data analysis - Algorithms and Tools
      • 69
        Deconvolving the detector from an observed signal in Fourier space.
        In this talk we discuss algorithms for the analysis of hadronic final states, with application to 1) Single top $t$-channel production; and 2) Heavy Higgs decaying as H->WW, in the lepton plus jets mode. In either case, nature has arranged for the triple decay rates in kinematic angles of the decay, $\theta$,$\theta^*$, and $\phi^*$ to be a short finite series in orthogonal functions, $a_{klm}Y_k^m(\theta, phi^*)Y_l^m(\theta^*, \phi^*)$ (summation implied). This observation can be exploited in two ways; first, a technique called orthogonal series density estimation may be employed to extract coefficients of the decay and physics parameters related to these coefficients; second, an angular analog of the convolution theorem may be employed to analytically deconolve detector resolution effects from an observed signal. The technique leads typically to likelihood contours in a multidimensional parameter space, and a simutaneous determination of physics parameters. This talk discusses analysis techniques in an experiment-independent way.
        Speaker: Joseph Boudreau (University of Pittsburgh (US))
      • 70
        Vertex finding by sparse model-based clustering
        Vertex finding in the presence of high-pile up is an important step for reconstructing events at the LHC. In this study we propose a novel approach based on the model-based clustering paradigm. Using the prior distribution of the number of vertices and the prior distribution of the cluster size, i.e. the number of tracks emerging from a vertex, the posterior distribution of the number of vertices can be computed, given the actual number of reconstructed tracks in an event. Utilizing this posterior distribution, a sparse model-based clustering algorithm [1] is employed to compute the optimal association of tracks to vertices. It starts with a deliberately too large number of clusters and implicitly estimates the correct number during iterative application of Markov chain Monte Carlo sampling with a shrinkage prior. We present results from a simplified simulation of vertices and tracks, obtained from a Pythia [2] simulation of proton-proton collisions, under various assumptions about the average pile-up. In addition, the sensitivity of the resulting clustering to the assumptions about the prior distributions is studied. [1] G. Malsiner-Walli, S. Frühwirth-Schnatter, B. Grün, Statistics and Computing (2014) [] [2] T. Sjöstrand, S. Mrenna and P. Skands, JHEP05 (2006) 026, Comput. Phys. Comm. 178 (2008) 852 []
        Speaker: Rudolf Fruhwirth (Austrian Academy of Sciences (AT))
      • 71
        Novel real-time alignment and calibration and track reconstruction for the upgrade at the LHCb detector.
        LHCb has introduced a novel real-time detector alignment and calibration strategy for LHC Run 2. Data collected at the start of the fill is processed in a few minutes and used to update the alignment, while the calibration constants are evaluated for each run. This procedure will improve the quality of the online alignment. Critically, this new real-time alignment and calibration procedure allows identical constants to be used in the online and offline reconstruction, thus improving the correlation between triggered and offline selected events. This offers the opportunity to optimise the event selection in the trigger by applying stronger constraints. The required computing time constraints are met thanks to a new dedicated framework using the multi-core farm infrastructure for the trigger. This combined to the improved tracking sequence allows to run in the software trigger the same reconstruction with the same performance as offline. Specific challenges of this novel configuration are discussed, as well as the working procedures of the framework and its performance. A similar scheme is planned to be used in the LHCb upgrade foreseen for 2020. At that time LHCb will run at an instantaneous luminosity of 2x10^33 cm^-2 s^-1 with a fully software based trigger with a read-out of the detector at a rate of 40 MHz. A full new tracking system is being developed: a vertex detector based on silicon pixel sensors, a new silicon micro-strip detector with a high granularity and the scintillating fibre tracker. The new tighter time constraint in the trigger, where only about 13ms are available per event, combined with a higher luminosity by a factor 5 represent a big challenge for the tracking. A new track finding strategy has been considered and new algorithms, partly based on GPUs, and using SIMD instructions are under study. We will present the new strategy and the new fast track reconstruction, including the performance and the highlights of the improvements with respect to the current tracking system of LHCb.
        Speaker: Renato Quagliani (Laboratoire de l'Accelerateur Lineaire (FR))
      • 72
        A novel method for event reconstruction in Liquid Argon Time Projection Chamber
        The Liquid Argon Time Projection Chamber (LArTPC) has the potential to provide exceptional level of detail in studies on neutrino interactions - a high prioritory field of Intensity Frontier research. Liquid Argon serves as both the target for neutrino interactions and the sensitive medium of the detector, which measures ionization produced by the reaction products. The LArTPC has characteristics suitable for precise reconstruction of infividual tracks as well as for calorimetric measurements. In order to gain sensitivity to reactions with very small cross-sections, modern LArTPC devices are built at a considerable scale, currently in hundreds of tons of instrumented volume of Liquid Argon. Future experiments such as the Deep Underground Neurtino Experiment (DUNE) will include tens of kilotons of the cryogenic medium. To be able to utilize sensitive volume that large while staying within practical limits of power consumption and cost of the front-end electronics, it is instrumented with arrays of wire electrodes grouped in readout planes, arranged with a stereo angle. This leads to certain challenges for object reconstruction due to ambiguities inherent in such scheme. We present a novel reconstruction method inspired by principles used in tomography, which brings the LArTPC technology closer to its full potential.
        Speaker: Dr Maxim Potekhin (Brookhaven National Laboratory)
    • Track 3: Computations in theoretical Physics: Techniques and Methods
      • 73
        Physics beyond the Standard Model at the Precision Frontier
        The best way to search for new physics is by using a diverse set of probes - not just experiments at the energy and the cosmic frontiers, but also the low-energy measurements relying on high precision and high luminosity. One example of ultra-precision experiments is MOLLER planned at JLab, which will measure the parity-violating electron-electron scattering asymmetry and allow a determination of the weak mixing angle with a factor of five improvement in precision over its predecessor, E-158. At this precision, any inconsistency with the Standard Model should signal new physics. Another promising new physics probe, Belle II experiment at SuperKEKB, will study low-energy electron-positron collisions at high luminosity. The talk will outline the recent developments of the theoretical and computational approaches to higher-order electroweak effects needed for the accurate interpretation of experimental data, and show how new physics particles enter at the one-loop level. For MOLLER and Belle II, we analyze the effects of Z'-boson and dark photon on the total calculated cross section and asymmetry, and show how these hypothetical interactions carriers may influence the future experimental results.
        Speaker: Aleksandrs Aleksejevs (Memorial University of Newfoundland)
      • 74
        SModelS: A Tool for Making Systematic Use of Simplified Models
        We present an automated software tool "SModelS" to systematically confront theories Beyond the Standard Model (BSM) with experimental data. The tool consists of a general procedure to decompose such BSM theories into their Simplified Model Spectra (SMS). In addition, SModelS features a database containing the majority of the published SMS results of CMS and ATLAS. These SMS results contain the 95% confidence level upper limits on signal production cross sections. These two components together allow us to quickly confront any BSM model with LHC results. Recently, support for signal efficiency maps has been added to our software framework, hence also efficiency maps published by the experimental collaborations can be used. Using recasting tools like MadAnalysis5 or CheckMATE, such efficiency maps can also be created outside the experimental collaborations, allowing us to further enrich our database and improve the constraining power of our approach. It is our aim to extend our effort beyond collider searches for new physics, exploiting also information about BSM physics contained in precision measurements, or dark matter searches. As show-case examples we will discuss an application of our procedure to specific supersymmetric models, show how the limits constrain these models, and point out regions in parameter space still unchallenged by the current SMS results. While the current implementation can handle null results only, it is our ultimate goal to build the next standard model in a bottom-up fashion from both negative and positive results of several experiments. The implementation is open source, written in python, and available from
        Speaker: Wolfgang Waltenberger (Austrian Academy of Sciences (AT))
      • 75
        Making extreme computations possible with virtual machines
        State-of-the-art algorithms generate scattering amplitudes for high-energy physics at leading order for high-multiplicity processes as compiled code (in Fortran, C or C++). For complicated processes the size of these libraries can become tremendous (many GiB). We show that amplitudes can be translated to byte-code instructions, which even reduce the size by one order of magnitude. The byte-code is interpreted by a Virtual Machine with runtimes comparable to compiled code and a better scaling with additional legs. We study the properties of this algorithm, as an extension of the Optimizing Matrix Element Generator (O'Mega). The bytecode matrix elements are available as alternative input for the event generator WHIZARD. The bytecode interpreter can be implemented very compactly, which will help with a future implementation on massively parallel GPUs.
        Speaker: juergen reuter (DESY Hamburg, Germany)
      • 76
        High Performance and Increased Precision Techniques for Feynman Loop Integrals
        For the investigation of physics beyond and within the Standard Model, the precise evaluation of higher order corrections in perturbative quantum field theory is required. We have been developing a computational method for Feynman loop integrals with a fully numerical approach. It is based on a numerical integration techniques and an extrapolation. In this presentation, we describe the status and new developments in our approaches for the numerical computation of Feynman loop integrals up to four loops. Founded on underlying asymptotic error expansions, extrapolation and transformation methods allow for accurate automatic evaluation of Feynman loop integrals in the presence of integration difficulties such as boundary singularities. These techniques include linear and non-linear extrapolations, and double exponential and other transformations. Iterated one-dimensional integration with extrapolation has provided good accuracy for low-dimensional problems, such as for an ultra-violet 2-loop vertex diagram that gives rise to a 3-dimensional integral. We are further focusing on improving the efficiency of these computations with respect to speed as well as precision. For accelerating the performance we have used the transparent and portable approach for multivariate integration offered by the parallel/distributed package ParInt, layered over MPI message passing for execution on a cluster, which implements a variety of methods and also comes with a quadruple (C long double) precision version. Alternatively, excellent speedups and precision have been obtained using dedicated hardware acceleration on double exponential/ trapezoidal rule sum approximations. Multivariate integration results will be included for 3- and 4-loop self-energy diagrams.
        Speaker: Prof. Kiyoshi Kato (Kogakuin Univ.)
      • 3:40 PM
        Coffee break
    • Conference Dinner: The sessions end at 17:25, so we have move the dinner one hour earlier. Please be aware!
    • Plenary V
      • 77
        Johannes Albrecht — Challenges for the LHC Run 3: Computing and Algorithms
        Speaker: Johannes Albrecht (Technische Universitaet Dortmund (DE))
    • 10:00 AM
      Coffee Break
    • Poster: Authors should stand next to their posters.
    • 78
      ACAT 2017 - Seattle
      Speaker: Gordon Watts (University of Washington (US))
    • 79
      Wolfram Research
    • 12:45 PM
      Lunch Break
    • Tracks summaries
      • 80
        Summary Track 1: Graeme Stewart
        Speaker: Graeme Stewart (University of Glasgow (GB))
      • 81
        Summary Track 2: Lorenzo Moneta
        Speaker: Lorenzo Moneta (CERN)
      • 82
        Summary Track 3: York Schroeder, Gionata Luisoni, Stanislav Poslavsky
        Speakers: Gionata Luisoni (CERN), Stanislav Poslavsky (IHEP, Protvino), York Schröder (UBB Chillán)
    • 3:40 PM
      Coffee Break
    • Best posters talks
    • Closing of Workshop