# 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Asia/Tokyo
OIST

#### OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495
Description
"Evolution of Software and Computing for Experiments" In 2015, the LHC will restart. Experimental groups at the LHC have reviewed their Run 1 experiences in detail, acquired the latest computing and software technologies, and constructed new computing models to prepare for Run 2. On the side of the intensity frontier, Super-KEKB will start commissioning in 2015, and fixed-target experiments at CERN, Fermilab and J-PARC are growing bigger in size. In nuclear physics field, FAIR is under construction and RHIC well engaged into its Phase-II research program facing increased datasets and new challenges with precision Physics. For the future, developments are progressing towards the construction of ILC. In all these projects, computing and software will be even more important than before. Beyond those examples, non-accelerator experiments are also seeking novel computing models as their apparatus and operation become larger and distributed. With much progress, thoughts and new directions made in recent years, we strongly feel we have reached the perfect time where meeting together and sharing our experiences will be highly beneficial.
• Monday, 13 April
• 09:00 10:30
Plenary: Session #1 Auditorium

#### Auditorium

Convener: Hiroshi Sakamoto (University of Tokyo (JP))
• 09:00
Speaker: Dr Ken Peach (OIST)
• 09:15
Future plan of KEK 45m
Speaker: Dr Masanori Yamauchi (KEK)
• 10:00
Computing at the Belle-II experiment 30m
Speaker: Dr Takanori Hara (KEK)
• 10:30 11:00
Coffee Break 30m
• 11:00 12:45
Plenary: Session #2 Auditorium

#### Auditorium

Convener: Oxana Smirnova (Lund University (SE))
• 11:00
Evolution of Computing and Software at LHC: from Run 2 to HL-LHC 45m
Speaker: Graeme Stewart (University of Glasgow (GB))
• 11:45
Intel Corporation 30m
• 12:15
Dell Inc. 30m
• 12:45 13:30
Lunch Break 45m
• 13:30 14:00
Poster session A: #1 Tunnel Gallery

#### Tunnel Gallery

The posters in this session are displayed on April 13-14

• 14:00 16:00
Track 1 Session: #1 (Upgrades and parallelism) Village Center (Village Center)

### Village Center

#### Village Center

Online computing

Convener: Markus Frank (CERN)
• 14:00
Pyrame, a rapid-prototyping framework for online systems 15m
High-energy physics experiments produce huge amounts of data that need to be processed and stored for further analysis and eventually treated in real time for triggering and monitoring purposes. In addition, more and more often these requirements are also being found on other fields such as on-line video processing, proteomics and astronomical facilities. The complexity of such experiments usually involves long development times on which simple test-bench prototypes evolve over time and require increasingly performant software solutions for both data acquisition and control systems. Flexibility and wide application range are, therefore, important conditions in order to keep a single software solution that can evolve over the duration of the development period. Existing solutions such as LabView provide proprietary solutions for control applications and small scale data acquisition, but fail on scaling to high-throughput experiments, add an important performance penalty and become difficult to maintain when the complexity of the software flow increases. Other lower-level solutions such as LabWindows/C, Matlab or TANGO allow to build more complex systems but still fail on providing an integrated and close to turn-key solution for the specific needs of prototypes evolving from bench-based tests to distributed environments with high data-throughput needs. The present work reports on the software Pyrame, an open-source framework designed with high-energy physics applications in mind and providing a light-weight, distributed, stable, performant and easy-to-deploy solution. Pyrame is a data-acquisition chain, a data-exchange network protocol and a wide set of drivers allowing the control of hardware components. Data-acquisition throughput is on the order of 1.7 Gb/s for UDP/IP and Pyrame’s protocol overhead is about 1.6 ms per command/response using the stock tools. Pyrame provides basic blocks (modules) for hardware control or data acquisition. It contains an important number of modules to manage commonly used hardware: high and low voltage power supplies (Agilent, CAEN, Hameg, Keithley...), pattern generators (Agilent), digital oscilloscopes (LeCroy), motion controllers (Newport) and bus adapters (GPIB, USB, RS-232, Ethernet, UDP, TCP…). Generic modules give a unified access to machines from different generations and native or emulated functions depending on the possibilities of the machines. Pyrame also provides service modules, including: a multimedia and multistream high-performance acquisition chain; a variable module allowing to share data between all the hardware modules; and a configuration management module allowing to save or load an image of the global configuration of the system at any time. These blocks can be assembled together through Pyrame’s software bus and protocol to quickly obtain complete systems for test benches. The framework is very flexible and provides a wide range of options, allowing the system to evolve as fast as the prototype. In order to ease the development of extensions, Pyrame uses open standards: TCP and XML. They are implemented on most platforms and in most languages making the development of new modules fast and easy. The use of TCP/IP eases the distribution of code on multi-computer setups, but also its use on embedded platforms. Multiple test-beams at DESY have validated the stability of the system. In particular, our tests have involved data acquisition sessions of hundreds of hours from up to 10 simultaneous Si-based particle detectors distributed over multiple computers. Very long calibration procedures involving hundreds of thousands of configuration operations have also been performed. On another optical bench, simultaneous control and data acquisition of oscilloscopes, motorized stages and power supplies confirmed the same level of stability for long acquisition sessions.
Speaker: Frederic Bruno Magniette (Ecole Polytechnique (FR))
• 14:15
The Front-End Electronics and the Data Acquisition System for a Kinetic Inductance Detector 15m
Speaker: Dr paolo branchini (INFN Roma Tre)
• 14:30
Operation of the upgraded ATLAS Level-1 Central Trigger System 15m
The ATLAS Level-1 Central Trigger (L1CT) system is a central part of ATLAS data-taking and is configured, controlled, and monitored by a software framework with emphasis on reliability and flexibility. The hardware has undergone a major upgrade for Run 2 of the LHC, in order to cope with the expected increase of instantaneous luminosity of a factor of 2 with respect to Run 1. It offers more flexibility in the trigger decisions due to the double amount of trigger inputs and usable trigger channels. It also provides an interface to the new topological trigger system. Operationally - particularly useful for commissioning, calibration and test runs - it allows concurrent, independent triggering of up to 3 different sub-detector combinations. In this contribution, we give an overview of the fully operational software framework of the L1CT system with particular emphasis on the configuration, controls and monitoring aspects. The software framework allows the L1CT system to be configured consistently with the ATLAS experiment and the LHC machine, upstream and downstream trigger processors, and the data acquisition. Trigger and dead-time rates are monitored coherently at all stages of processing and are logged by the online computing system for physics analysis, data quality assurance and operation debugging. In addition, the synchronisation of trigger inputs is watched based on bunch-by-bunch trigger information. Several software tools allow to efficiently display the relevant information in the control room in a way useful for shifters and experts. The design of the framework aims at reliability, flexiblity, and robustness of the system and takes into account the operational experience gained during Run 1. We present the overall performance during cosmic-ray data taking with the full ATLAS detector and the experience with first beams in 2015.
Speaker: Julian Glatzer (CERN)
• 14:45
Upgrade of the ATLAS Level-1 Trigger with event topology information 15m
The Large Hadron Collider (LHC) in 2015 will collide proton beams with increased luminosity from $10^{34}$ up to $3 \times 10^{34}$ cm$^{−2}$ s$^{−1}$. ATLAS is an LHC experiment designed to measure decay properties of highly energetic particles produced in these proton-collisions. The high luminosity places stringent physical and operational requirements on the ATLAS Trigger in order to reduce the 40 MHz collision rate to an event storage rate of 1 kHz, thereby retaining events with valuable physics content. The hardware-based first ATLAS trigger level (Level-1) has an output rate of 100 kHz and decision latency of less than 2.5 µs. It is composed of the Calorimeter Trigger (L1Calo), the Muon Trigger (L1Muon) and the Central Trigger Processor. In 2014, there will be a new trigger system has been added: the Topological Processor System (L1Topo system). The L1Topo system consists of a single AdvancedTCA shelf equipped with three L1Topo processor blades. It processes detailed information from L1Calo and L1Muon in individual state-of-the-art FPGA processors to derive desitions based on the topology of each collision event. Such topologies are the angles between jets and/or leptons or global event variables based on lists of pre-selected/-sorted objects. The system is designed to receive and process up to 6 Tb/s of real time data. The talk is about the relevant upgrades of the Level-1 trigger with focus on the topological processor design and commissioning.
Speaker: Eduard Ebron Simioni (Johannes-Gutenberg-Universitaet Mainz (DE))
• 15:00
gFEX, the ATLAS Calorimeter Level 1 Real Time Processor 15m
The global feature extractor (gFEX) is a component of the Level-1 Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, topquarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be implemented as a fast reconfigurable processor based on four largeFPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 264 optical fibers with the data transferred at the 40 MHz LHC clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure FPGAs, monitor board health, and interface to externalsignals. Although the board is being designed specifically for the ATLAS experiment, itis sufficiently generic that it could be used for fast data processing at other HEP or NP experiments. We will present the design of the gFEX board and discuss how it is beingimplemented.
Speaker: Helio Takai (Brookhaven National Laboratory (US))
• 15:15
Data Acquisition for the New Muon $g$-$2$ Experiment at Fermilab 15m
A new measurement of the anomalous magnetic moment of the muon, $a_{\mu} \equiv (g-2)/2$, will be performed at the Fermi National Accelerator Laboratory. The most recent measurement, performed at Brookhaven National Laboratory and completed in 2001, shows a 3.3-3.6 standard deviation discrepancy with the standard model value of $g$-$2$. The new measurement will accumulate 20 times those statistics, measuring $g$-$2$ to 0.14 ppm and improving the uncertainty by a factor of 4 over that of the previous measurement. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $\mu$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $\mu$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The system will be controlled using the MIDAS data acquisition software package. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2016.
Speaker: Wesley Gohn (U)
• 15:30
Online tracking with GPUs at PANDA 15m
The PANDA experiment is a next generation particle detector planned for operation at the FAIR facility, currently under construction in Darmstadt, Germany. PANDA will detect events generated by colliding an antiproton beam on a fixed proton target, allowing studies in hadron spectroscopy, hypernuclei production, open charm and nucleon structure. The nature of hadronic collisions means that signal and background events will look very similar, making a conventional approach, based on a hardware trigger signal generated by a subset of the detectors to start the data acquisition, unfeasible. Instead, data coming from the detector are acquired continuously, and all online selection is performed in real-time. A rejection factor of about 1000 is needed to reduce the data rate for offline storage, making the data acquisition system computationally very challenging. Adoption of Graphical Processing Units (GPUs) in many computing applications is increasing, due to their cost-effectiveness, performance, and accessible and versatile development using high-level programming paradigms such as CUDA or OpenCL. Applications of GPU within HEP include Monte Carlo production, analysis, low- and high-level trigger. Online track reconstruction of charged particles plays an essential part in the event reconstruction and selection process. Our activity within the PANDA collaboration is centered on the development and implementation of particle tracking algorithms on GPUs, and on studying the possibility of performing online tracking using a multi-GPU architecture. Three algorithms are currently under development, using information from the PANDA tracking system: a Hough Transform; a Riemann Track Finder; and a Triplet Finder algorithm, a novel approach finely tuned for the PANDA STT detector. The algorithms are implemented on the GPU in the CUDA C language, utilizing low-level optimizations and non-trivial data packaging in order to exploit to the maximum the capabilities of GPUs. This talk will present details of the implementation of these algorithms, together with first performance results, and solutions for data transfer to and from GPUs based on message queues for a deeper integration of the algorithms with the FairRoot and PandaRoot frameworks, both for online and offline applications.
Speaker: Ludovico Bianchi (Forschungszentrum Jülich)
• 14:00 16:00
Track 2 Session: #1 (Reconstruction) Auditorium (Auditorium)

### Auditorium

#### Auditorium

Offline software

Convener: Elizabeth Sexton-Kennedy (Fermi National Accelerator Lab. (US))
• 14:00
Optimisation of the ATLAS Track Reconstruction Software for Run-2 15m
Track reconstruction is one of the most complex elements of the reconstruction of events recorded by ATLAS from collisions delivered by the LHC. It is the most time consuming reconstruction component in high luminosity environments. After a hugely successful Run-1, the flat budget projections for computing resources for Run-2 of the LHC together with the demands of reconstructing higher pile-up collision data at rates more than double those in Run-1 (an increase from 400 Hz to 1 kHz in trigger output) have put stringent requirements on the track reconstruction software. The ATLAS experiment has performed a two year long software campaign which aimed to reduce the reconstruction rate by a factor of three to meet the resource limitations for Run-2: a major part of the changes to achieve this were improvements to the track reconstruction software and will be presented in this contribution . The CPU processing time of ATLAS track reconstruction was reduced by more than a factor of three during this campaign without any loss of output information of the track reconstruction. We present the methods used for analysing the tracking software and the code changes and new methods implemented to optimise both algorithmic performance and event data. Although most improvements were obtained without dedicated targetting concurrency strategies, major parts of the ATLAS tracking software were updated to allow for future improvements based on parallelism which will also be discussed.
Speaker: Andreas Salzburger (CERN)
• 14:15
Simulation and Reconstruction Upgrades for the CMS experiment 15m
Over the past several years, the CMS experiment has made significant changes to its detector simulation and reconstruction applications motivated by the planned program of detector upgrades over the next decade. These upgrades include both completely new tracker and calorimetry systems and changes to essentially all major detector components to meet the requirements of very high pileup operations. We have untaken an ambitious program to implement these changes into the Geant4-based simulation and reconstruction applications of CMS in order to perform physics analyses to both demonstrate the improvements from the detector upgrades and to influence the detector design itself. In this presentation, we will discuss our approach to generalizing much of the CMS codebase to efficiently and effectively work for multiple detector designs. We will describe our computational solutions for very high pileup reconstruction and simulation algorithms, and show results based on our work.
Speaker: David Lange (Lawrence Livermore Nat. Laboratory (US))
• 14:30
CMS reconstruction improvements for the tracking in large pile-up events 15m
The CMS tracking code is organized in several levels, known as 'iterative steps', each optimized to reconstruct a class of particle trajectories, as the ones of particles originating from the primary vertex or displaced tracks from particles resulting from secondary vertices. Each iterative step consists of seeding, pattern recognition and fitting by a kalman filter, and a final filtering and cleaning. Each subsequent step works on hits not yet associated to a reconstructed particle trajectory. The CMS tracking code is continuously evolving to make the reconstruction computing load compatible with the increasing instantaneous luminosity of LHC, resulting in a large number of primary vertices and tracks per bunch crossing. The major upgrade put in place during the present LHC Long Shutdown will allow the tracking code to comply with the conditions expected during Run2 and the much larger pile-up. In particular, new algorithms that are intrinsically more robust in high occupancy conditions have been developed, iterations have been re-designed (including a new one, dedicated to specific physics objects), code optimizations have been deployed and new software techniques have been used. The speed improvement has been achieved without significant reduction in term of physics performance. The methods and the results are presented and the prospects for future applications are discussed.
Speaker: Marco Rovere (CERN)
• 14:45
Optimization of the LHCb track reconstruction 15m
The LHCb track reconstruction uses sophisticated pattern recognition algorithms to reconstruct trajectories of charged particles. Their main feature is the use of a Hough-transform like approach to connect track segments from different subdetectors, allowing for having no tracking stations in the magnet of LHCb. While yielding a high efficiency, the track reconstruction is a major contributor to the overall timing budget of the software trigger of LHCb, and will continue to be so in the light of the higher track multiplicity expected from Run II of the LHC. In view of this fact, key parts of the pattern recognition have been revised and redesigned. We will present the main features which were studied. A staged approach strategy for the track reconstruction in the software trigger was investigated: it allows unifying complementary sets of tracks coming from the different stages of the high level trigger, resulting in a more flexible trigger strategy and a better overlap between online and offline reconstructed tracks. Furthermore the use of parallelism was investigated, using SIMD instructions for time-critical parts of the software or - in a later stage - using GPU-driven track reconstruction. In addition a new approach to monitoring was implemented, where quantities important for track reconstruction are monitored on a regular basis, using an automated framework for comparing different figures of merit.
Speaker: Barbara Storaci (Universitaet Zuerich (CH))
• 15:00
Event Reconstruction Techniques in NOvA 15m
The NOvA experiment is a long baseline neutrino oscillation experiment utilizing the NuMI beam generated at Fermilab. The experiment will measure the oscillations within a muon neutrino beam in a 300 ton Near Detector located underground at Fermilab and a functionally-identical 14 kiloton Far Detector placed 810 km away. The detectors are liquid scintillator tracking calorimeters with a fine-grained cellular structure that provides a wealth of information for separating the different particle track and shower topologies. Each detector has its own challenges with the Near Detector seeing multiple overlapping neutrino interactions in each event and the Far Detector having a large background of cosmic rays due to being located on the surface. A series of pattern recognition techniques have been developed to go from event records, to spatially and temporally separating individual interactions, to vertexing and tracking, and particle identification. This combination of methods to achieve the full event reconstruction will be presented.
Speaker: Dominick Rocco (urn:Google)
• 15:15
Kalman Filter Tracking on Parallel Architectures 15m
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.
Speaker: Giuseppe Cerati (Univ. of California San Diego (US))
• 15:30
Cellular Automaton based Track Finding for the Central Drift Chamber of Belle II 15m
With the upgraded electron-positron-collider facility, SuperKEKB and Belle II, the Japanese high energy research center KEK strives to exceed its own world record luminosity by a factor of 40. To provide a solid base for the event reconstruction within the central drift chamber in the enhanced luminosity setup, a powerful track finding algorithm coping with the higher beam induced backgrounds is developed at DESY. Pursing a bottom-up approach, which is less susceptible to the increased number of background hits compared to global event reconstruction techniques such as the Hough transformation and its successors, we present a generalization of the cellular automaton. While maintaining the high execution speed by circumventing the combinatorial backtracking in the graph of local hit information and extrapolations naturally arising in bottom-up approaches, this so called weighted cellular automaton integrates the adaptiveness of the Hopfield network into the original algorithm.
Speaker: Oliver Frost (DESY)
• 15:45
4-Dimensional Event Building in the First-Level Event Selection of the CBM Experiment. 15m
The future heavy-ion experiment CBM (FAIR/GSI, Darmstadt, Germany) will focus on the measurement of very rare probes at interaction rates up to 10 MHz with data flow of up to 1 TB/s. The beam will provide free stream of beam particles without bunch structure. That requires full online event reconstruction and selection not only in space, but also in time, so-called 4D event building and selection. This is a task of the First-Level Event Selection (FLES). The FLES reconstruction and selection package consists of several modules: track finding, track fitting, short-lived particles finding, event building and event selection. Since all detector measurements contain also time information, the event building is done at all stages of the reconstruction process. The input data are distributed within the FLES farm in a form of so-called time-slices, which time length is proportional to a compute power of a processing node. A time-slice is reconstructed in parallel between cores within a CPU, thus minimizing communication between CPUs. After all tracks of the whole time-slice are found and fitted in 4D, they are collected into clusters of tracks originated from common primary vertices, which then are fitted, thus identifying 4D interaction points registered within the time-slice. Secondary tracks are associated with primary vertices according to their estimated production time. After that short-lived particles are found and the full e vent building process is finished. The last stage of the FLES package is a selection of events according to the requested trigger signatures. We describe in details all stages of the FLES package and present results of tests on many-core computer farms with up to 3000 cores, focusing mainly on parallel implementations of the track finding and the event building stages as the most complicated and time consuming parts of the package. The track finding efficiency remains stable and the processing time grows as a polynomial of second order with respect to the number of events in the time-slice. The first results of J/psi selection are presented and discussed.
Speaker: Dr Ivan Kisel (Johann-Wolfgang-Goethe Univ. (DE))
• 14:00 16:00
Track 3 Session: #1 (Databases) C209 (C209)

### C209

#### C209

Data store and access

Convener: Mr Barthelemy Von Haller (CERN)
• 14:00
Designing a future Conditions Database based on LHC experience 15m
The ATLAS and CMS Conditions Database infrastructures have served each of the respective experiments well through LHC Run 1, providing efficient access to a wide variety of conditions information needed in online data taking and offline processing and analysis. During the long shutdown between Run 1 and Run 2, we have taken various measures to improve our systems for Run 2. In some cases, a drastic change was not possible because of the relatively short time scale to prepare for Run 2. In this process, and in the process of comparing to the systems used by other experiments, we realized that for Run 3, we should consider more fundamental changes and possibilities. We seek changes which would streamline conditions data management, improve monitoring tools, better integrate the use of metadata, incorporate analytics to better understand conditions usage, as well as investigate fundamental changes in the storage technology, which might be more efficient while minimizing maintenance of the data as well as simplify the access to it. This contribution will describe architectures we are evaluating and testing which we think will address the problematic areas while providing improved services.
Speaker: Andrea Formica (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR))
• 14:15
Evolution of Database Replication Technologies for WLCG 15m
During LHC run 1 ATLAS and LHCb databases have been using Oracle Streams replication technology for their use cases of data movement between online and offline Oracle databases. Moreover ATLAS has been using Streams to replicate conditions data from CERN to selected Tier 1s. GoldenGate is a new technology introduced by Oracle to replace and improve on Streams, by providing better performance, robustness, manageability and support. At the same time Oracle Active Data Guard is an another technology that fulfils several use cases in this area, by allowing the creation of read-only copies of production databases, currently in use by CMS, ALICE and more recently by ATLAS for controls data. In this paper we will provide a short review of when GoldenGate is preferable to Streams/Active Data Guard. Further we will report the details of the migration of CERN and Tier 1s database replication setups/configurations from Streams to GoldenGate. In particular we will describe the architecture of GoldenGate replication services, our performance tests for remote replication done in collaboration with ATLAS and Tier 1 database administrators and the return of experience from running GoldenGate in production. We will also summarize our tests for future work on how GoldenGate can be used to reduce the downtime for Oracle version upgrades and on using GoldenGate for heterogeneous database replication.
Speaker: Zbigniew Baranowski (CERN)
• 14:30
NoSQL technologies for the CMS Conditions Database 15m
With the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions. We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL databases to support a key-value representation of the CMS Conditions. An important detail about the Conditions that the payloads are stored as BLOBs, and they can reach sizes that may require special treatment (splitting) in these NoSQL databases. As big binary objects may be a bottleneck in several database systems, and also to give an accurate baseline, a testing framework extension was implemented to measure the characteristics of the handling of arbitrary binary data in these databases. Based on the evaluation, prototypes of a document store, using a column-oriented and plain key-value store, are deployed. An adaption layer to access the backends in the CMS Offline software was developed to provide transparent support for these NoSQL databases in the CMS context. Additional data modelling approaches and considerations in the software layer, deployment and automatization of the databases are also covered in the research. In this paper we present the results of the evaluation as well as a performance comparison of the prototypes studied.
Speaker: Roland Sipos (Eotvos Lorand University (HU))
• 14:45
Evaluation of NoSQL databases for DIRAC monitoring and beyond 15m
Nowadays, many database systems are available but they may not be optimized for storing time series data. The DIRAC job monitoring is a typical use case of such time series. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series is not trivial as one must take into account different aspects such manageability, scalability, extensibility etc. We compared the performance of Elasticsearch, OpenTSDB that is based on HBase and InfluxDB time series NoSQL databases using the same set of machines and the same data. We also evaluated the effort required for maintaining them. Using the LHCb Workload Management System, based on DIRAC, as a use case we have setup a new monitoring system in parallel with the current MySQL system and we publish the same data into the databases under test. We have evaluated Grafana (for OpenTSDB) and Kibana (for ElasticSearch) metrics and graph editors for creating dashboards in order to have clear picture on the usability of each candidate. In this paper we present the result of this study and the performance of the selected technology. We also give an outlook of other potential applications of NoSQL databases with DIRAC project.
Speaker: Federico Stagni (CERN)
• 15:00
Studies of Big Data meta-data segmentation between relational and non-relational databases 15m
Speaker: Ms Marina Golosova (National Research Centre "Kurchatov Institute")
• 15:15
Evolution of ATLAS conditions data and its management for LHC run-2 15m
The ATLAS detector consists of several sub-detector systems. Both data taking and Monte Carlo (MC) simulation rely on an accurate description of the detector conditions from every sub system, such as calibration constants, different scenarios of pile-up and noise conditions, size and position of the beam spot, etc. In order to guarantee database availability for critical online applications during data-taking, two database systems, one for online access and another one for all other database access have been implemented. The long shutdown period has provided the opportunity to review and improve the run-1 system: revise workflows, include new and innovative monitoring and maintenance tools and implement a new database instance for run-2 conditions data. The detector conditions are organized by tag identification strings and managed independently from the different sub-detector experts. The individual tags are then collected and associated into a global conditions tag, assuring synchronization of various sub-detector improvements. Furthermore, a new concept was introduced to maintain conditions over all the different data run periods into a single tag, by using Interval of Validity (IOV) dependent detector conditions for the MC database as well. This allows on-flight preservation of past conditions for data and MC and assure their sustainability with software evolution. This contribution presents an overview of the commissioning of the new database instance, improved tools and workflows, and summarizes the actions taken during the run-2 commissioning phase beginning of 2015.
Speaker: Michael Boehler (Albert-Ludwigs-Universitaet Freiburg (DE))
• 15:30
The ATLAS EventIndex: architecture, design choices, deployment and first operation experience. 15m
The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LS1. The EventIndex is in operation for the start of LHC Run 2. This talk describes the high level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, will be reported.
Speaker: Dr Dario Barberis (Università e INFN Genova (IT))
• 15:45
Distributed Data Collection for the ATLAS EventIndex. 15m
The ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and the internal “pointer” to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for each permanent output file. This information is packed and transfered to a central broker at CERN using an ActiveMQ messaging system, and then is unpacked, sorted and reformatted in order to be stored and catalogued into a central Hadoop server. This talk describes in detail the Producer/Consumer architecture to convey this information from the running jobs through the messaging system to the Hadoop server.
Speaker: Javier Sanchez (Instituto de Fisica Corpuscular (ES))
• 14:00 16:00
Track 4 Session: #1 (Middleware) B250 (B250)

### B250

#### B250

Middleware, software development and tools, experiment frameworks, tools for distributed computing

Convener: Oliver Keeble (CERN)
• 14:00
Pilots 2.0: DIRAC pilots for all the skies 15m
In the last few years, new types of computing infrastructures, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resource may come as part of pledged resources, while others are in the form of opportunistic ones. Most of these new infrastructures are based on virtualization techniques, others don't. Meanwhile, some concepts, such as distributed queues, lost appeal, while still supporting a vast amount of resources. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to hide the diversity of underlying resources has become essential. The DIRAC WMS is based on the concept of pilot jobs that was introduced back in 2004. A pilot is what creates the possibility to run jobs on a worker node. The advantages of the pilot job concept are now well established. The pilots are not only increasing the visible efficiency of the user jobs but also help to manage the heterogeneous computing resources presenting them to the central services in a uniform coherent way. Within DIRAC, we developed a new generation of pilot jobs, that we dubbed Pilots 2.0. Pilots 2.0 are not tied to a specific infrastructure; rather they are generic, fully configurable and extendible pilots. A Pilot 2.0 can be sent, as a script to be run, or it can be fetched from a remote location. A pilot 2.0 can run on every computing resource, e.g.: on CREAM Computing elements, on DIRAC Computing elements, on Virtual Machines as part of the contextualization script, or IAAC resources, provided that these machines are properly configured, hiding all the details of the WN's infrastructure. Pilots 2.0 can be generated server and client side. Pilots 2.0 are the "pilots to fly in all the skies", aiming at easy use of computing power, in whatever form it is presented. Another aim is the unification and simplification of the monitoring infrastructure for all kind of computing resources by using pilots as a network of distributed sensors coordinated by a central resource monitoring system. Pilots 2.0 have been developed using the command pattern: each command is realizing an atomic function, and can be easily activated and de-activated based on the WN type. VOs using DIRAC can tune pilots 2.0 as they need, and extend or replace each and every pilot command in an easy way. In this paper we describe how Pilots 2.0 work with distributed and heterogeneous resources providing the abstraction necessary to deal with different kind of computing resources.
Speaker: Federico Stagni (CERN)
• 14:15
The Future of PanDA in ATLAS Distributed Computing 15m
Speaker: Tadashi Maeno (Brookhaven National Laboratory (US))
• 14:30
Evolution of CMS workload management towards multicore job support 15m
The successful exploitation of the multicore processor architectures available at the computing sites is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework has introduced the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute the data reconstruction and simulation as multicore processing yet supporting single-core processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management across the Grid thus aims at integrating single and multicore job scheduling. This is accomplished by scheduling multicore pilots with dynamic partitioning of the allocated resources, capable of running jobs with various core counts within a single pilot. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites. Scale tests have been run to optimize the scheduling strategy and to ensure the most efficient use of the distributed resources. This contribution will present in detail the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its optimization and deployment, which will enable CMS to transition to a multicore production model by the restart of the LHC.
Speaker: Dr Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
• 14:45
A history-based estimation for LHCb job requirements 15m
The main goal of a Workload Management System (WMS) is to find and allocate resources for the jobs it is handling. The more and more accurate information the WMS receives about the jobs, the easier it will be to accomplish its task, which will directly translate into a better utilization of resources. Traditionally, the information associated with each job, like expected runtime or memory requirement, is in the best case defined at submission time by the Production Manager or fixed by default to arbitrary conservative values. In the case of LHCb's Workload Management System, no mechanisms are provided that automatize the estimation of job requirements. As a result, in order to be conservative, much more CPU time is normally requested than actually needed. Particularly, in the context of multicore jobs this represents a major problem, since single- and multi-core jobs shall share the same resources. Therefore, in order to allow an optimization of the available resources, an accurate estimation of the necessary resources is required. As the main motivation for going to multicore jobs is the reduction of the overall memory footprint, the memory requirement of the jobs should also be correctly estimated. A detailed workload analysis of past LHCb jobs will be presented. It includes a study of which job features have a correlation with runtime and memory consumption. Based on these features, a supervised learning algorithm has been developed relying on a history-based prediction. The aim is to learn over time how jobs' runtime and memory evolve due to changes in the experimental conditions and the software versions. It will be shown that this estimation can be notably improved if the experimental conditions are taken into account.
Speaker: Nathalie Rauschmayr (CERN)
• 15:00
Using the glideinWMS System as a Common Resource Provisioning Layer in CMS 15m
CMS will require access to more than 125k processor cores for the beginning of Run2 in 2015 to carry out its ambitious physics program with more and higher complexity events. During Run1 these resources were predominantly provided by a mix of grid sites and local batch resources. During the long shut down cloud infrastructures, diverse opportunistic resources and HPC supercomputing centers were made available to CMS, which further complicated the operations of the submission infrastructure. In this presentation we will discuss the CMS effort to adopt and deploy the glideinWMS system as a common resource provisioning layer to grid, cloud, local batch, and opportunistic resources and sites. We will address the challenges associated with integrating the various types of resources, the efficiency gains and simplifications associated with using a common resource provisioning layer, and discuss the solutions found. We will finish with an outlook of future plans for how CMS is moving forward on resource provisioning for more heterogenous architectures and services.
Speaker: James Letts (Univ. of California San Diego (US))
• 15:15
The ATLAS Data Management system - Rucio: commissioning, migration and operational experiences 15m
For more than 8 years, the Distributed Data Management (DDM) system of ATLAS called DQ2 has been able to demonstrate very large scale data management capabilities with more than 600M files, 160 petabytes spread worldwide across 130 sites, and accesses from 1,000 active users. However, the system does not scale for LHC run2 and a new DDM system called Rucio has been developed to be DQ2's successor. Rucio is based on different concepts and has new functionalities not provided by DQ2 which make the migration from the old to the new system a big challenge. The main issues are the large amount of data to move between the two systems, the number of users affected by the change, and the fact that the ATLAS Distributing Computing system, on the contrary to the sub-detectors, must stay continuously up and running during the LHC long shutdown to ensure the continuity of analysis and Monte-Carlo production. We will detail here the difficulties of this transition and will present the steps that were realized to ensure a smooth and transparent transition from DQ2 to Rucio. We will also discuss the new features and gains from the Rucio system.
Speaker: Vincent Garonne (CERN)
• 15:30
Resource control in ATLAS distributed data management: Rucio Accounting and Quotas 15m
The ATLAS Distributed Data Management system stores more than 160PB of physics data across more than 130 sites globally. Rucio, the next-generation data management system of ATLAS has been introduced to cope with the anticipated workload of the coming decade. The previous data management system DQ2 pursued a rather simplistic approach for resource management, but with the increased data volume and more dynamic handling of data workflows required by the experiment, a more elaborate approach to this issue is needed. This document describes how resources, like storage, accounts and replication requests, are accounted in Rucio. Especially the measurement of used logical storage space is fundamentally different in Rucio than it’s predecessor DQ2. We introduce a new concept of declaring quota policies (limits) for accounts in Rucio. This new quota concept is based on accounts and RSE (Rucio storage element) expressions, which allows the definition of account limits in a dynamic way. This concept enables the operators of the data management system to establish very specific limits in which users, physics groups and production systems use the distributed data management system while, at the same time, lowering the operational burden. This contribution describes the architecture behind those components, the interfaces to other internal and external components and will show the benefits made by this system.
Speaker: Martin Barisits (CERN)
• 15:45
AsyncStageOut: Distributed user data management for CMS Analysis 15m
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that data movement was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirements for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.
Speaker: Dr Tony Wildish (Princeton University (US))
• 14:00 16:00
Track 5 Session: #1 (Computing Models) B503 (B503)

### B503

#### B503

Computing activities and Computing models

Convener: Stefan Roiser (CERN)
• 14:00
Designing Computing System Architecture and Models for the HL-LHC era 15m
The global distributed computing system (WLCG) used by the Large Hadron Collider (LHC) is evolving. The treatment of wide-area-networking (WAN) as a scarce resource that needs to be strictly managed is far less necessary that originally foreseen. Static data placement and replication, intended to limit interdependencies among computing centers, is giving way to global data federations building on computing centers whose maturity has increased significantly over the past decade. Different modalities for provisioning resources, including commercial clouds, will coexist with computing centers in our labs and universities. Compute resources may increasingly be shared between HEP and other fields. By necessity today's computing system is evolving in an adiabatic fashion due to the need to support the next LHC run. In the medium and long term, however, a number of questions arise regarding the appropriate system architecture, for example: What ratio of storage to compute capabilities will be needed? How should storage be deployed geographically to maximally exploit owned, opportunistic and cloud distributed compute capabilities? What is the right mix of placement, WAN reads, and automated caching to optimize performance? How will the reliability and scalability of the system be impacted by these choices? Can different computing models or computation techniques (map reduce, etc.) be deployed more easily with different system architectures? In this presentation we report results from our work to simulate future distributed computing system architectures and computing models to answer these questions. When possible we also report our efforts to validate this simulation using today's computing system.
Speaker: Stephen Gowdy (Fermi National Accelerator Lab. (US))
• 14:15
Improvements in the CMS Computing System for Run2 15m
Beginning in 2015 CMS will collected and produce data and simulation adding to 10B new events a year. In order to realize the physics potential of the experiment these events need to be stored, processed, and delivered to analysis users on a global scale. CMS has 150k processor cores and 80PB of disk storage and there is constant pressure to reduce the resources needed and increase the efficiency of usage. In this presentation we will comprehensively overview the improvements made in the computing system for Run2 by CMS in the areas of data and simulation processing, data distribution, data management and data access. The system has been examined and we will discuss the improvements in the entire data and workflow systems: CMS processing processing and analysis workflow tools, the development and deployment of dynamic data placement infrastructure, and progress toward operating a global data federation. We will describe the concepts and approaches to utilize the variety of CMS CPU resources, ranging from established Grid sites to HPC centers, Cloud resources and CMS' own High Level Trigger farm. We will explain the strategy for improving how effectively the storage is used and the commissioning, validation and challenge activities will be presented.
Speaker: Ian Fisk (Fermi National Accelerator Lab. (US))
• 14:30
ATLAS Distributed Computing in LHC Run2 15m
The ATLAS Distributed Computing infrastructure has evolved after the first period of LHC data taking in order to cope with the challenges of the upcoming LHC Run2. An increased data rate and computing demands of the Monte-Carlo simulation, as well as new approaches to ATLAS analysis, dictated a more dynamic workload management system (ProdSys2) and data management system (Rucio), overcoming the boundaries imposed by the design of the old computing model. In particular, the commissioning of new central computing system components was the core part of the migration toward the flexible computing model. The flexible computing utilization exploring the opportunistic resources such as HPC, cloud, and volunteer computing is embedded in the new computing model, the data access mechanisms have been enhanced with the remote access, and the network topology and performance is deeply integrated into the core of the system. Moreover a new data management strategy, based on defined lifetime for each dataset, has been defined to better manage the lifecycle of the data. In this note, the overview of the operational experience of the new system and its evolution is presented.
Speaker: Dr Simone Campana (CERN)
• 14:45
Optimising Costs in WLCG Operations 15m
The Worldwide LHC Computing Grid project (WLCG) provides the computing and storage resources required by the LHC collaborations to store, process and analyse the ~50 Petabytes of data annually generated by the LHC. The WLCG operations are coordinated by a distributed team of managers and experts and performed by people at all participating sites and from all the experiments. Several improvements in the WLCG infrastructure have been implemented during the first long LHC shutdown to prepare for the increasing needs of the experiments during Run2 and beyond. However, constraints in funding will affect not only the computing resources but also the available effort for operations. This paper presents the results of a detailed investigation on the allocation of the effort in the different areas of WLCG operations, identifies the most important sources of inefficiency and proposes viable strategies for optimising the operational cost, taking into account the current trends in the evolution of the computing infrastructure and the computing models of the experiments.
Speaker: Dr Andrea Sciaba (CERN)
• 15:00
Distributed Computing for Pierre Auger Observatory 15m
Pierre Auger Observatory operates the largest system of detectors for ultra-high energy cosmic ray measurements. Comparison of theoretical models of interactions with recorded data requires thousands of computing cores for Monte Carlo simulations. Since 2007 distributed resources connected via EGI grid are succesfully used. The first and the second versions of production system based on bash scripts and MySQL database were able to submit jobs to all reliable sites supporting Virtual Organization auger. Many years VO auger belongs to top ten of EGI users based on the total used computing time. Migration of the production system to DIRAC interware started in 2014. Pilot jobs improve efficiency of computing jobs and eliminate problems with small and less reliable sites used for the bulk production. The new system has also possibility to use available resources in clouds. Dirac File Catalog replaced LFC for new files, which are organized in datasets defined via metadata. CVMFS is used for software distribution since 2014. In the presentation we give a comparison of the old and new production system and report the experience from migration to the new system.
Speaker: Jiri Chudoba (Acad. of Sciences of the Czech Rep. (CZ))
• 15:15
Recent Evolution of the Offline Computing Model of the NOvA Experiment 15m
The NOvA experiment at Fermilab is a long-baseline neutrino experiment designed to study nu-e appearance in a nu-mu beam. Over the last few years there has been intense work to streamline the computing infrastructure in preparation for data, which started to flow in from the far detector in Fall 2013. Major accomplishments for this effort include migration to the use of offsite resources through the use of the Open Science Grid and upgrading the file handling framework from simple disk storage to a tiered system using a comprehensive data management and delivery system to find and access files on either disk, dCache, or tape storage. NOvA has already produced more than 6.5 million files and more than 1 PB of raw data and Monte Carlo simulation generated files which are managed under this model. The current system has demonstrated sustained rates of up to 1 TB per hour of file transfer to permanent storage. NOvA pioneered the use of new tools and this paved the way for their use by other Intensity Frontier experiments at Fermilab. Most importantly, the new framework places the experiment’s computing infrastructure on a firm foundation, which is ready to produce the files required for NOvA's first physics results. In this talk we discuss the offline computing model and infrastructure that has been put in place for NOvA and how we have used it to produce the experiment’s first neutrino oscillation results.
Speaker: Alec Habig (Univ. of Minnesota Duluth)
• 15:30
COMPUTING STRATEGY OF THE AMS-02 EXPERIMENT 15m
The Alpha Magnetic Spectrometer (AMS) is a high energy physics experiment installed and operating on board of the International Space Station (ISS) from May 2011 and expected to last through Year 2024 and beyond. The computing strategy of the AMS experiment is discussed in the paper, including software design, data processing and modelling details, simulation of the detector performance and overall computing organization.
Speaker: Dr Baosong Shan (Beihang University (CN))
• 15:45
Data-analysis scheme and infrastructure at the X-ray free electron laser facility, SACLA 15m
An X-ray free electron laser (XFEL) facility, SACLA, is generating ultra-short, high peak brightness, and full-spatial-coherent X-ray pulses [1]. The unique characteristics of the X-ray pulses, which have never been obtained with conventional synchrotron orbital radiation, are now opening new opportunities in a wide range of scientific fields such as atom, molecular and optical physics, ultrafast science, material science, and life science. More than 100 experiments have been performed since the first X-ray delivery to experimental users in March 2012. In this paper, we present an overview of SACLA data acquisition (DAQ) and analysis system with a special emphasis on the analysis scheme and its infrastructure. In the case of serial femotosecond protein crystallography experiments [2], a typical experiment collects diffraction image patterns of order of $10^6$, which demands heavy load to the data analysis system. Each pattern is recorded by a $2000 \times 2000$ pixel detector that consists of eight Multiport Charge-Coupled Device (MPCCD) sensors [3]. The resolution of the single MPCCD sensor is $1024 \times 512$ pixel and data depth of each pixel is 16 bits. The DAQ system consists of detector front-ends [4], data-handling servers, hardware-based event-tag distribution system [5], event-synchronized database [6], two cache storages, tape archive system, and physically-segregated two network system [7]. The DAQ has data bandwidth of maximum 6 Gbps to support other experiment setups with various detector configuration of up to twelve MPCCD sensors. In addition to the currently operational beamline BL3, BL2 will operate concurrently through a fast-switching operation mode in 2015. To support this operation mode, the cache storage with capacities of 200 TB (250 TB) is assigned to the beamline BL2 (BL3) respectively [8]. These capacities correspond to the accumulated data size for one week operation. Experimental data are periodically moved into the tape archive system. The tape archive system has a capacity of 7 PB, and extendable up to 26 PB by installing additional tape cartridges. The analysis section has two functions: one is run-by-run analysis to monitor the experimental conditions, and the other is off-line analysis. To implement these functions, the analysis system consists of a PC cluster and a supercomputer. The PC cluster is based on x86_64 processors and has a computing power of 14 TFLOPS. A 160 TB storage is connected to the PC cluster via Infiniband QDR network. To pick up raw image data, the PC cluster is connected to the cache storages and the tape archive system via 10 Gigabit Ethernet. The run-by-run analysis is performed using the PC cluster. The results are saved on the storage in HDF5 format [9]. The PC cluster is also used for off-line analysis, using analysis code developed by the scientific community, such as CrystFEL [10] and SITENNO [11]. The supercomputer with 90 TFLOPS SPARC-based processors (Fujitsu FX10) was installed for the data analysis that requires higher computing power. Storage of the supercomputer is 100+500 TB Lustre-based file system (Fujitsu FEFS). Another 1 PB Lustre file system is also connected to both the supercomputer and the PC cluster to interaccess the experimental data from the two systems. Data analysis that requires much higher computing power is foreseen. For these cases, we are developing a joint analysis mode using both the supercomputer and the 10-PFLOPS K computer [12]. The results of the feasibility study on data transfer and quasi-realtime job submission to the K computer will also be discussed. [1] T. Ishikawa et al., "A compact X-ray free-electron laser emitting in the sub-angstrom region", Nature Photonics 6, 540-544 (2012). [2] M. Sugahara et al., Nature Methods submitted. [3] T. Kameshima et al., "Development of an X-ray pixel detector with multi-port charge-coupled device for X-ray free-electron laser experiments", Rev. Sci. Instrum. 85, 033110 (2014). [4] A. Kiyomichi, A. Amselem, et al., "Development of Image Data Acquisition System for 2D Detector at SACLA", Proceedings of ICALEPCS2011, WEPMN028 (2011). [5] T. Abe et al., "DEVELOPMENT OF NEW TAG SUPPLY SYSTEM FOR DAQ FOR SACLA USER EXPERIMENTS", Proceedings of IPAC2014, TUPRI108 (2013). [6] M. Yamaga et al., "Event-Synchronized Data Acquisition System of 5 Giga-bps Data Rate for User Experiment at the XFEL Facility, SACLA", Proceedings of ICALEPCS2011, TUCAUST06 (2011). [7] T. Sugimoto et al., "Large-bandwidth Data-acquisition Network for XFEL Facility, SACLA", Proceedings of ICALEPCS2011, WEBHAUST03 (2011). [8] K. Okada, et al., "UPGRADE OF SACLA DAQ SYSTEM ADAPTS TO MULTI-BEAMLINE OPERATION", Proceedings of PCaPAC2014, WCO205 (2014). [9] The HDF Group, http://www.hdfgroup.org/HDF5/. [10] T. A. White, et al.. "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, pp.335–341 (2012). [11] Y. Sekiguchi et al., "Data processing software suite SITENNO for coherent X-ray diffraction imaging using X-ray free electron laser SACLA", Journal of Synchrotron Radiation 21/ 4, 600-612 (2014). [12] M. Yokokawa, et al., "The K computer: Japanese next-generation supercomputer development project", ISLPED '11 Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, pp.371-372 (2011).
Speaker: Dr Takashi SUGIMOTO (Japan Synchrotron Radiation Research Institute)
• 14:00 16:00
Track 6 Session: #1 (Facilities, Monitoring) C210 (C210)

### C210

#### C210

Facilities, Infrastructure, Network

Convener: Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
• 14:00
Analysis of CERN Computing Infrastructure and Monitoring Data 15m
Optimising a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group.  The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools.  To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface.  Using this infrastructure we were able to quantitatively analyse the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput.  In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.
Speaker: Christian Nieke (Brunswick Technical University (DE))
• 14:15
Monitoring Evolution at CERN 15m
Over the past two years, the operation of the CERN Data Centres went through significant changes with the introduction of new mechanisms for hardware procurement, new services for cloud infrastructure and configuration management, among other improvements. These changes resulted in an increase of resources being operated in a more dynamic environment. Today, the CERN Data Centres provide over 11000 multi-core processor servers, 130 PB disk servers, 100 PB tape robots, and 150 high performance tape drives. To cope with these developments, an evolution of the data centre monitoring tools was also required. This modernisation was based on a number of guiding rules: sustain the increase of resources, adapt to the new dynamic nature of the data centres, make monitoring data easier to share, give more flexibility to Service Managers on how they publish and consume monitoring metrics and logs, establish a common repository of monitoring data, optimize the handling of monitoring notifications, and replace the previous toolset by new open source technologies with large adoption and community support. This talk will explain how these improvements were delivered, present the architecture and technologies of the new monitoring tools, and review the experience of its production deployment.
Speaker: Pedro Andrade (CERN)
• 14:30
Challenge and Future of Job Distribution at a Multi-VO Grid Site 15m
DESY operates a multi-VO Grid site for 20 HEP and non-HEP collaborations and is one of the world-wide largest Tier-2 sites for ATLAS, CMS, LHCb, and BELLE2. In one common Grid infrastructure computing resources are shared by all VOs according to MoUs and agreements, applying an opportunistic usage model allows to distribute free resources among the VOs. Currently, the Grid site DESY-HH provides roughly 100kHS06 in 10000 job slots, exploiting the queueing system PBS/TORQUE. As described in former CHEP conferences, resource utilization and job scheduling in a multi-VO environment is a major challenge. On one hand side all job slots should be occupied, on the other hand jobs with diverging resource usage patterns must be cleverly distributed to the compute nodes in order to guarantee stability and optimal resource usage. Batch systems such as PBS/TORQUE with the scheduler MAUI only scale up to a few thousand job slots. At DESY-HH an alternative scheduler was developed and brought into operation. In the preparation for LHC Run 2 as well as the start of BELLE2, in particular the request for the support of multi-core jobs requires appropriate job scheduling strategies which are not available out-of-the-box. Even more, the operation of work load managements system and pilot factories (DIRAC, PANDA) by the big collaborations question the way of how sites provide computing resources in the future. Is cloud computing the future? What about small VOs and individual users who use the standard Grid tools to submit jobs then? In the contribution to CHEP2015 we will try the review what has been learned by operating a large Grid site for many VOs. We will give a summary of the experiences with the job scheduling at DESY-HH of the last years and we will describe limits of the current system. A main focus will be put on the discussion of future scenarios, including alternatives to the approach of local resource managements systems (LRMS) which are still widely used.
Speaker: Birgit Lewendel (Deutsches Elektronen-Synchrotron (DE))
• 14:45
Experience of public procurement of Open Compute servers 15m
The Open Compute Project, OCP ( http://www.opencompute.org/ (link is external)), was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at lowest possible cost. The technologies are released as open hardware, with the goal to develop servers and data centers following the model traditionally associated with open source software projects. In 2013 CERN acquired a few OCP servers in order to compare performance and power consumption with standard hardware. The conclusions were that there are sufficient savings to motivate an attempt to procure a large scale installation. One objective is to evaluate if the OCP market is sufficiently mature and broad to meet the constrains of a public procurement. This talk will summarize this procurement, which started in September 2014 and involved the Request for information (RFI) to qualify bidders and Request for Tender (RFT).
Speaker: Olof Barring (CERN)
• 15:00
Building a Tier-3 Based on ARMv8 64-bit Server-on-Chip for the WLCG 15m
Deploying the Worldwide LHC Computing Grid (WLCG) was greatly facilitated by the convergence, around the year 2000, on Linux and commodity x86 processors as a standard scientific computing platform. This homogeneity enabled a relatively simple "build once, run anywhere" model for applications. A number of factors are now driving interest in alternative platforms. Power limitations at the level of individual processors, and in aggregate in computer centers, place greater emphasis on power efficiency issues. The rise of mobile computing, based primarily on ARM processors with a different intellectual property model, has also created interest in ARM as an potential general purpose architecture for the server market. We report our experience building a demonstrator Grid cluster using systems with ARMv8 64-bit processors, capable of running production-style workflows. We present what we have learned regarding the use of both application and Open Science Grid (OSG) software on ARMv8 64-bit processors, as well as issues related to the need to manage heterogeneous mixes of x86_64 and ARMv8 64-bit processors in a Grid environment. We also report the hardware experience we have gained while building the cluster.
Speaker: Dr Peter Elmer (Princeton University (US))
• 15:15
Integrating network and transfer metrics to optimize transfer efficiency and experiment workflows 15m
The Worldwide LHC Computing Grid relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion, traffic routing, etc. The WLCG Network and Transfer Metrics project aims to integrate and combine all network-related monitoring data collected by the WLCG infrastructure. This includes FTS monitoring information, monitoring data from the XRootD federation, as well as results of the perfSONAR tests. The main challenge consists of further integrating and analyzing this information so that it can be turned into actionable insight for optimization of data transfers and workload management systems of the LHC experiments.The presentation will include technical description of the WLCG network monitoring infrastructure as well as results of the analysis of the collected data. It will also highlight how results of this analysis can be used in order to improve efficiency of the WLCG computing activities.
Speaker: Shawn Mc Kee (University of Michigan (US))
• 15:30
100G Deployment@(DE-KIT) 15m
Speaker: Bruno Heinrich Hoeft (KIT - Karlsruhe Institute of Technology (DE))
• 15:45
A first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications 15m
The LHCb experiment is preparing a major upgrade of both the detector and the data acquisition system. A system capable of transporting up to 50 Tbps of data will be required. This can only be achieved in a manageable way using 100 Gbps links. Such links recently became available also in the servers, while they have been available between switches already for a while. We present first measurements with such links (InfiniBand EDR, Ethernet 100 GbE) both using standard benchmarks and using a prototype event-building application. We analyse the CPU load effects by using Remote DMA technologies, and we also show comparison with previous tests on 40 G equipment.
Speaker: Adam Jedrzej Otto (Ministere des affaires etrangeres et europeennes (FR))
• 16:00 16:30
Poster session A: #2 Tunnel Gallery

#### Tunnel Gallery

The posters in this session are displayed on April 13-14

• 16:30 18:30
Track 2 Session: #2 (Exp. Comp. Models) Auditorium (Auditorium)

### Auditorium

#### Auditorium

Offline software

Convener: Dr Ivan Kisel (Johann-Wolfgang-Goethe Univ. (DE))
• 16:30
Status and future evolution of the ATLAS offline software 15m
The talk will give a summary of the broad spectrum of software upgrade projects to prepare ATLAS for the challenges of the soon coming LHC Run-2. Those projects include the reduction of the CPU required for reconstruction by a factor 3 compared to 2012, which was required to meet the challenges of the expected increase in pileup and the higher data taking rate of up to 1 kHz. As well, the new Integrated Simulation Framework (ISF) has been put into production. By far the most ambitious project is the implementation of a completely new Analysis Model, based on a new ROOT readable reconstruction format xAOD, a reduction framework based on the train model to centrally produce skimmed data samples and an analysis framework. The Data Challenge 2014 has been a first large scale test of most of the foreseen software upgrades. At the end of the talk an overview will be given of future software projects and plans that will lead up to the coming Long Shutdown 2 as the next major ATLAS software upgrade phase.
Speaker: Rolf Seuster (TRIUMF (CA))
• 16:45
Implementation of the ATLAS Run 2 event data model 15m
During the 2013-2014 shutdown of the Large Hadron Collider, ATLAS switched to a new event data model for analysis, called the xAOD. A key feature of this model is the separation of the object data from the objects themselves (the auxiliary store'). Rather being stored as member variables of the analysis classes, all object data are stored separately, as vectors of simple values. Thus, the data are stored in a structure of arrays' format, while the user still can access it as an array of structures'. This organization allows for on-demand partial reading of objects, the selective removal of object properties, and the addition of arbitrary user-defined properties in a uniform manner. It also improves performance by increasing the locality of memory references in typical analysis code. The resulting data structures can be written to ROOT files with data properties represented as simple ROOT tree branches. This talk will focus on the design and implementation of the auxiliary store and its interaction with ROOT. Results on reconstruction and analysis performance will also be discussed.
Speaker: Scott Snyder (Brookhaven National Laboratory (US))
• 17:00
ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond 15m
ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs.  This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework’s state machine transitions.   This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework.  At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony.  The increased use of scatter-gather architectures, both local and distributed, requires further enhancement of metadata infrastructure in order to ensure semantic coherence and robust bookkeeping.   This paper describes the evolution of ATLAS metadata infrastructure for Run 2 and beyond, including the transition to dual-use tools—tools that can operate inside or outside the ATLAS control framework—and the implications thereof.   It further examines how the design of this infrastructure is changing to accommodate the requirements of future frameworks and emerging event processing architectures.
Speaker: Dr Peter Van Gemmeren (Argonne National Laboratory (US))
• 17:15
The Data Quality Monitoring Software for the CMS experiment at the LHC 15m
The Data Quality Monitoring (DQM) Software is a central tool in the CMS experiment. Its flexibility allows for integration in several key environments: Online, for real-time detector monitoring; Offline, for the final, fine-grained data analysis and certification; Release-Validation, to constantly validate the functionalities and the performance of the reconstruction software; in Monte Carlo productions. Since the end of data taking at a center of mass energy of 8 TeV, the environment in which the DQM lives has undergone fundamental changes. In turn, the DQM system has made significant upgrades in many areas to respond to not only the changes in infrastructure, but also the growing specialized needs of the collaboration with an emphasis on more sophisticated methods for evaluating data quality, as well as advancing the DQM system to provide quality assessments of various Monte Carlo simulations versus data distributions, monitoring changes in physical effects due to modifications of algorithms or framework, and enabling regression modeling for long-term effects for the CMS detector. The central tool to deliver Data Quality information is an interactive web site for browsing data quality histograms (DQMGUI), and its transition to becoming a distributed system will also be presented. In this contribution the usage of the DQM Software in the different environments and its integration in the CMS Reconstruction Software Framework (CMSSW) and in all production workflows are presented, with emphasis on recent developments and improvement in advance of the LHC restart at 13 TeV. The main technical challenges and the adopted solutions to them will be also discussed with emphasis on functionality and long-term robustness.
Speaker: Marco Rovere (CERN)
• 17:30
MiniAOD: A new analysis data format for CMS 15m
The CMS experiment has developed a new analysis object format (the "mini-AOD") targeted to be less than 10% of the size of the Run 1 AOD format. The motivation for the Mini-AOD format is to have a small and quickly derived data format from which the majority of CMS analysis users can perform their analysis work. This format is targeted at having sufficient information to serve about 80% of CMS analysis, while dramatically simplifying the disk and I/O resources needed for analysis. This improvement should bring substantial improvements in resource needs and turn-around time for analysis applications. Such large reductions were achieved using a number of techniques, including defining light-weight physics object candidate representations, increasing transverse momentum thresholds for storing physics-object candidates, and reduced numerical precision when it is not required at the analysis level. In this contribution we discuss the critical components of the mini-AOD format, our experience with its deployment and the planned physics analysis model for Run 2 based on the mini-AOD.
Speaker: Dr Carl Vuosalo (University of Wisconsin (US))
• 17:45
MAUS: The MICE Analysis and User Software 15m
The Muon Ionization Cooling Experiment (MICE) has developed the MICE Analysis User Software (MAUS) to simulate and analyse experimental data. It serves as the primary codebase for the experiment, providing for offline batch simulation and reconstruction as well as online data quality checks . The software provides both traditional particle physics functionalities such as track reconstruction and particle identification, and accelerator physics functions such as calculating transfer matrices and emittances. The code is structured in a Map-Reduce framework to allow parallelization whether on a personal computer or in the control room. MAUS allows users to develop in either Python or C++ and provides APIs for both. Various software engineering practices from industry are also used to ensure correct and maintainable physics code, which include unit, functional and integration tests, continuous integration and load testing, code reviews, and distributed version control systems. The software framework and the simulation and reconstruction capabilities are described.
Speaker: Janusz Martyniak (Imperial College London)
• 18:00
The NOvA Simulation Chain 15m
The NOvA experiment is a two-detector, long-baseline neutrino experiment operating in the recently upgraded NuMI muon neutrino beam. Simulating neutrino interactions and backgrounds requires many steps including: the simulation of the neutrino beam flux using FLUKA and the FLUGG interface; cosmic ray generation using CRY; neutrino interaction modeling using GENIE; and a simulation of the energy deposited in the detector using GEANT4. To shorten generation time, the modeling of detector-specific aspects, such as photon transport, detector and electronics noise, and readout electronics, employs custom, parameterized simulation applications. We will describe the NOvA simulation chain, and present details on the techniques used in modeling photon transport near the ends of cells, and in developing a novel data-driven noise simulation. Due to the high intensity of the NuMI beam, the Near Detector samples a high rate of muons originating in the surrounding rock. In addition, due to its location on the surface at Ash River, MN, the Far Detector collects a large rate (~40 kHz) of cosmic muons. We will discuss the methods used in NOvA for overlaying rock muons and cosmic ray muons with simulated neutrino interactions and show how realistically the final simulation reproduces the preliminary NOvA data.
Speaker: Adam Aurisano (University of Cincinnati)
• 18:15
The Heavy Photon Search Experiment Software Environment 15m
The Heavy Photon Search (HPS) is an experiment at the Thomas Jefferson National Accelerator Facility (JLab) designed to search for a hidden sector photon (A’) in fixed target electroproduction. It uses a silicon microstrip tracking and vertexing detector inside a dipole magnet to measure charged particle trajectories and a fast electromagnetic calorimeter just downstream of the magnet to provide a trigger and identify electrons. As the first stage of this project, the HPS Test Run apparatus was constructed and operated in 2012 to demonstrate the experiment’s technical feasibility and to confirm that the trigger rates and occupancies were as expected. The full detector is currently being installed and will be commissioned starting in November, 2014. Data taking is expected to commence in the spring of 2015. The HPS experiment uses both invariant mass and secondary vertex signatures to search for the A’. The overall design of the detector follows from the kinematics of A’ production which typically results in a final state particle within a few degrees of the incoming beam. The occupancies of sensors near the beam plane are high, so high-rate detectors, a fast trigger, and excellent time tagging are required to minimize their impact. The trigger comes from a highly-segmented lead-tungstate crystal calorimeter located just downstream of the dipole magnet. The detector was fully simulated using the flexible and performant Geant4-based program slic (abstract 445, this conference). Simulation of the readout and the event reconstruction itself were performed with the Java-based software package org.lcsim (abstract 445, this conference).. The simulation of the detector readout includes full charge deposition, drift and diffusion in the silicon wafers, followed by a detailed simulation of the readout chip and associated electronics. Full accounting of the occupancies was performed by overlaying the expected number of beam backgrounds. Track, cluster and vertex reconstruction for both simulated and real data will be described and preliminary comparisons of the expected and measured detector performance will be presented. We will begin with an overview of the physics goals of the experiment followed by a short description of the detector design. We will then describe the software tools used to design the detector layout and simulate the expected detector performance. Finally, the event reconstruction chain will be presented.
Speaker: Norman Anthony Graf (SLAC National Accelerator Laboratory (US))
• 16:30 18:30
Track 3 Session: #2 (Databases, Data access protocols) C209 (C209)

### C209

#### C209

Data store and access

Convener: Laurent Aphecetche (Laboratoire de Physique Subatomique et des Technologies Associe)
• 16:30
The Belle II Conditions Database 15m
The Belle II experiment, a next-generation B factory experiment at the KEK laboratory, Tsukuba, Japan, is expected to collect an experimental data sample fifty times larger than its predecessor, the Belle experiment. The data taking and processing rates are expected to be at least one order of magnitude larger as well. In order to cope with these large data processing rates and huge data samples, the Conditions Database, which stores the time-dependent calibrations, have to be designed carefully for successful and efficient data processing and analysis by the worldwide Belle II Collaboration. The database system needs to offer fast response, has to enable storing experimental information with fine spatial and time granularity, and ensure the scalability and redundancy for robust operation. The Conditions Database design and implementation details, together with future plans, will be presented in this talk.
Speaker: Marko Bracko (Jozef Stefan Institute (SI))
• 16:45
Federating LHCb datasets using the Dirac File Catalog 15m
In the distributed computing model of LHCb the File Catalog (FC) is a central component that keeps track of each file and replica stored on the Grid. It is federating the LHCb data files in a logical namespace used by all LHCb applications. As a replica catalog, it is used for brokering jobs to sites where their input data is meant to be present, but also by jobs for finding alternative replicas if necessary. The LCG File Catalog (LFC) used originally by LHCb and other experiments is now being retired and needs to be replaced. The DIRAC File Catalog (DFC) was developed within the framework of the DIRAC Project and presented during CHEP 2012. From the technical point of view, the code powering the DFC follows an Aspect oriented programming (AOP): each type of entity that is manipulated by the DFC (Users, Files, Replicas, etc) is treated as a separate 'concern' in the AOP terminology. Hence, the database schema can also be adapted to the needs of a Virtual Organization. LHCb opted for a highly tuned MySQL database, with optimized requests and stored procedures. This paper will present the improvements brought to the DFC presented at CHEP 2012, its performance with respect to the LFC, as well as the migration procedure used to migrate the LHCb data from the LFC to the DFC. Finally it will show how a combination of the DFC and the LHCb framework Gaudi allow LHCb to build a data federation at low cost.
Speaker: Christophe Haen (CERN)
• 17:00
Experience in running relational databases on clustered storage 15m
CERN IT-DB group is migrating its storage platform, mainly NetApp NAS’s running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control databases. This talk shows our setup: network, monitoring, use of features like transparent movement of file systems, flash pools (SSD + HDD storage pools), snapshots, etc. It will also show how these features are used on our infrastructure to support backup & recovery solutions with different database solutions: Oracle (11g and 12c multi tenancy), MySQL or PostgreSQL. Performance benchmarks and experience collected while running services on this platform will be also shared. It will be also covered the use of the cluster to provide iSCSI (block device) access for OpenStack windows virtual machines.
Speaker: Ruben Domingo Gaspar Aparicio (CERN)
• 17:15
Operational Experience Running Hadoop XRootD Fallback 15m
In April of 2014, the UCSD T2 Center deployed hdfs-xrootd-fallback, a UCSD-developed software system that interfaces Hadoop with XRootD to increase reliability of the Hadoop file system. The hdfs-xrootd-fallback system allows a site to depend less on local file replication and more on global replication provided by the XRootD federation to ensure data redundancy. Deploying the software has allowed us to reduce Hadoop replication on a significant subset of files in our cluster, freeing hundreds of terabytes in our local storage, and to recover HDFS blocks lost due to storage degradation. An overview of the architecture of the hdfs-xrootd-fallback system will be presented, as well as details of our experience operating the service over the past year.
Speaker: Jeffrey Michael Dost (Univ. of California San Diego (US))
• 17:30
Engineering the CernVM-FileSystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data 15m
Fermilab has several physics experiments including NOvA, MicroBooNE, and the Dark Energy Survey that have computing grid-based applications that need to read from a shared set of data files. We call this type of data Auxiliary data to distinguish it from (a) Event data which tends to be different for every job, and (b) Conditions data which tends to be the same for each job in a batch of jobs. Conditions data also tends to be relatively small per job (100s of Megabytes) where both Event data and Auxiliary data are larger per job (Gigabytes), but unlike Event data, Auxiliary data comes from a limited working set (10s to 100s of Gigabytes) of shared files. Since there is some sharing of the Auxiliary data, it appeared at first that the CernVM-Filesystem (CVMFS) infrastructure built for distributing software to the grid would also be the best way to distribute Auxiliary data. However, because grid jobs tend to be started in large batches running the same code, the software distribution use case of CVMFS has a very high cache hit ratio, so the bandwidth requirements on the per-site http proxy caches (squids) is quite low. As a result those proxy caches have been engineered with relatively low bandwidth, and they are easily overwhelmed by Auxiliary data with its relatively low cache hit ratio. A new approach was needed. We are taking advantage of a CVMFS client feature called "alien cache" to cache data on site-local high-bandwidth data servers that were engineered for storing Event data. This site-shared cache replaces caching CVMFS files on both the worker node local disks and on the site-local squids. We have tested this alien cache with the dCache NFSv4.1 interface, Lustre, and Hadoop-FS-fuse, and found that they all perform well. In addition, we use high-bandwidth data servers at central sites to perform the CVMFS Stratum 1 function instead of the low-bandwidth web servers deployed for the CVMFS software distribution function. We have tested this using the dCache http interface. As a result, we have an end-to-end high-bandwidth widely distributed caching read-only filesystem, using existing client software already widely deployed to grid worker nodes and existing file servers already widely installed at grid sites. Files are published in a central place and are soon available on demand throughout the grid and cached locally on the site with a convenient POSIX interface. This paper discusses the details of the architecture and reports performance measurements.
Speaker: Jakob Blomer (CERN)
• 17:45
New data access with HTTP/WebDAV in the ATLAS experiment 15m
With the exponential growth of LHC (Large Hadron Collider) data in the years 2010-2012, distributed computing has become the established way to analyze collider data. The ATLAS experiment Grid infrastructure includes more than 130 sites worldwide, ranging from large national computing centres to smaller university clusters. So far the storage technologies and access protocols to the clusters that host this tremendous amount of data vary from site to site. HTTP/WebDAV offers the possibility to use a unified industry standard to access the storage. We present the deployment and testing of HTTP/WebDAV for local and remote data access in the ATLAS experiment for the new data management system Rucio and the PanDA workload management system. Deployment and large scale tests have been performed using the Grid testing system HammerCloud and the ROOT HTTP plugin Davix.
Speaker: Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE))
• 18:00
ATLAS I/O Performance Optimization in As-Deployed Environments 15m
I/O is a fundamental determinant in the overall performance of physics analysis and other data-intensive scientific computing. It is, further, crucial to effective resource delivery by the facilities and infrastructure that support data-intensive science. To understand I/O performance, clean measurements in controlled environments are essential, but effective optimization requires as well an understanding of the complicated realities of as-deployed environments. These include a spectrum of local and wide-area data delivery and resilience models, heterogeneous storage systems, matches and mismatches between data organization and access patterns, multi-user considerations that may help or hinder individual job performance, and more.   The ATLAS experiment has organized an interdisciplinary working group of I/O, persistence, analysis framework, distributed infrastructure, site deployment, and external experts to understand and improve I/O performance in preparation for Run 2 of the Large Hadron Collider. The adoption of a new analysis data model for Run 2 has afforded the collaboration a unique opportunity to incorporate instrumentation and monitoring from the outset.   This paper describes a program of instrumentation, monitoring, measurement, and data collection both in cleanroom and grid environments, and discusses how such information is propagated and employed.   The paper further explores how these findings inform decision-making on many fronts, including persistent data organization, caching, best practices, framework interactions with underlying service layers, and settings at many levels, from sites to application code. Related developments that increase robustness and resilience in the presence of faults by improving communication between frameworks and underlying infrastructure layers are also discussed.
Speaker: Thomas Maier (Ludwig-Maximilians-Univ. Muenchen (DE))
• 18:15
Protocol benchmarking for HEP data access using HTTP and Xrootd 15m
The DPM project offers an excellent opportunity for comparative testing of the HTTP and xroot protocols for data analysis. - The DPM storage itself is multi-protocol, allowing comparisons to be performed on the same hardware - The DPM has been instrumented to produce an i/o monitoring stream, familiar from the xrootd project, regardless of the protocol being used for access - The continuous builds of DPM have been instrumented in 2013 with automated testing procedures that regularly produce, and collect metadata stress test information - The DPM Collaboration involves a number of active grid sites who have made testing resources available. These sites are continuously exercised by a ROOT analysis benchmark several times per day, and comparisons can be conducted in a variety of environments, also involving access through Wide Area Network. We present the results of our performance analyses of realistic use cases, and discuss their implications for the use of HTTP as a data and metadata access protocol for HEP.
Speaker: Oliver Keeble (CERN)
• 16:30 18:30
Track 4 Session: #2 (Framework) B250 (B250)

### B250

#### B250

Middleware, software development and tools, experiment frameworks, tools for distributed computing

Convener: Vincent Garonne (University of Oslo (NO))
• 16:30
IceProd2: A Next Generation Data Analysis Framework for the IceCube Neutrino Observatory 15m
We describe the overall structure and new features of the second generation of IceProd, a data processing and management framework. IceProd was developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations and detector data, and has been a key component of the IceCube offline computing infrastructure since it was first deployed in 2006. It runs fully in user space as a separate layer on top of grid and batch systems. This is accomplished by a set of python daemons which process job workflow and maintain configuration and status information on every job before, during, and after processing. IceProd can also manage complex workflow DAGs across distributed computing grids in order to optimize usage of special resources such as GPUs. As the second major version of IceProd, substantial improvements have been made to increase security, reliability, scalability, and ease of use. One new goal is to extend usage beyond centralized production to individual users and to facilitate large-scale individual analysis. Currently only a handful of IceCube users have access to more than one computing cluster; this version of IceProd should enable over 200 active IceCube data analyzers to transparently access distributed CPU and GPU resources. The scope of IceProd 2 also extends beyond IceCube-specific applications and can be used as a general grid computing tool.
Speaker: David Schultz (University of Wisconsin-Madison)
• 16:45
Belle II production system 15m
In Belle II experiment a large amount of physics data will be continuously taken and the production rate is equivalent to LHC experiments. Considerable resources of computing, storage, and network, are necessary to handle not only the taken data but also substantial simulated data. Therefore Belle II exploits distributed computing system based on DIRAC interware. DIRAC is a general software framework to provide unified interface among heterogeneous computing resources. As well as proven DIRAC software stack, Belle II is developing its own extension called BelleDIRAC. BelleDIRAC gives a transparent user experience of Belle II analysis framework (basf2) on various environments and access to file information managed by LFC and AMGA metadata catalog. With unifying DIRAC and BelleDIRAC functionalities, Belle II plans to operate automated mass data processing framework named a production system. Belle II production system covers sizable raw data transfer from experimental site to raw data centers, followed by massive data processing, and smart output deployment to each remote site. The production system is also utilized for simulated data production and data analysis. Although development of the production system is still on-going, recently Belle II has prepared prototype version and evaluated it with large scale of simulated data production test. In this presentation we will report the evaluation of the prototype system and future development plan.
Speaker: Hideki Miyake (KEK)
• 17:00
Jobs masonry with elastic Grid Jobs 15m
The DIRAC workload management system used by LHCb Distributed Computing is based on Computing Resource reservation and late binding (also known as pilot job in the case of batch resources) that allows the serial execution of several jobs obtained from a central task queue. CPU resources can usually be reserved for limited duration only (e.g. batch queue time limit) and in order to optimize their usage, it is important to be able to use them for the whole available time. However traditionally the tasks to be performed by jobs are defined at submission time and therefore it may happen that no job fits in the available time. In LHCb, this so-called job masonry is optimized by the usage of elastic simulation jobs: unlike data processing jobs that must process all events in the input dataset, simulation jobs offer an interesting degree of freedom as one can define the number of events to be simulated even after submission. This requires however knowing three information: the time available in the reserved resource, its CPU power and the average CPU work required for simulating one event. The decision on the number of events can then be made at the very last moment, just before starting the simulation application. This is what we call elastic jobs. LHCb simulation jobs are now all elastic, with an upper limit on the number of events per job. When several jobs are needed to complete a simulation request, enough jobs are submitted for simulating the total number of events required, assuming this upper limit for each job. New jobs are then submitted depending on the actual number of events simulated by this first batch of jobs. We will show in this contribution how elastic jobs allow a better backfilling of computing resources as well as using resources with limited work capacity, such as short batch queues or volunteer computing resources. They also allow easily to shutdown virtual machines on cloud resources when sites require them to shutdown within a grace period.
Speaker: Federico Stagni (CERN)
• 17:15
The ATLAS Event Service: A new approach to event processing 15m
The ATLAS Event Service (ES) implements a new fine grained approach to HEP event processing, designed to be agile and efficient in exploiting transient, short-lived resources such as HPC hole-filling, spot market commercial clouds, and volunteer computing. Input and output control and data flows, bookkeeping, monitoring, and data storage are all managed at the event level in an implementation capable of supporting ATLAS-scale distributed processing throughputs (about 4M CPU-hours/day). Input data flows utilize remote data repositories with no data locality or pre­staging requirements, minimizing the use of costly storage in favor of strongly leveraging powerful networks. Object stores provide a highly scalable means of remotely storing the quasi-continuous, fine grained outputs that give ES based applications a very light data footprint on a processing resource, and ensure negligible losses should the resource suddenly vanish. We will describe the motivations for the ES system, its unique features and capabilities, its architecture and the highly scalable tools and technologies employed in its implementation, and its applications in ATLAS processing on HPCs, commercial cloud resources, volunteer computing, and grid resources.
Speaker: Dr Torre Wenaus (Brookhaven National Laboratory (US))
• 17:30
CMS data distributed analysis with CRAB3 15m
The CMS Remote Analysis Builder (CRAB) provides the service for managing analysis tasks isolating users from the technical details of the distributed Grid infrastructure. Throughout the LHC Run 1, CRAB has been successfully employed by an average 350 distinct users every week executing about 200,000 jobs per day. In order to face the new challenges posed by the LHC Run 2, CRAB has been significantly re-factored. The pieces of the new system are a lightweight client, a central server exposing a REST interface accepting user requests, a number of servers dealing with user analysis tasks and submitting jobs to the CMS resource provisioning system, and a central service to asynchronously move user data from the execution node to the desired storage location. The new system improves the robustness, scalability and sustainability of the service. This contribution will give an overview of the new system, reporting the status, experience and lessons learnt from the commissioning phase and the production rollout for the initial data taking. It will address all aspects of the project from development to operations and support.
Speaker: Marco Mascheroni (Universita & INFN, Milano-Bicocca (IT))
• 17:45
Agile Research - Strengthening Reproducibility in Collaborative Data Analysis Projects 15m
Reproducibility of results is a fundamental quality of scientific research. However, as data analyses become more and more complex and research is increasingly carried out by larger and larger teams, it becomes a challenge to keep up this standard. The decomposition of complex problems into tasks that can be effectively distributed over a team in a reproducible manner becomes nontrivial. Overcoming these obstacles requires a shift in both management methodology as well as supporting technology. The LHCb collaboration is experimenting with different methods and technologies to attack such challenges. In this talk we present a language and thinking framework for laying out data analysis projects. We show how this framework can be supported by specific tools and services that allow teams of researchers to achieve high quality results in a distributed environment. Those methodologies are based on so called agile development approaches that have been adopted very successfully in industry. We show how the approach has been adapted to HEP data analysis projects and report on experiences gathered on a pilot project.
Speaker: Sebastian Neubert (CERN)
• 18:00
Multi-VO Support in IHEP’s Distributed Computing Environment 15m
For Beijing Spectrometer III (BESIII) experiment located at the Institute of High Energy Physics (IHEP), China, the distributed computing environment (DCE) has been setup and been in production status since 2012. The basic framework or middleware is DIRAC (Distributed Infrastructure with Remote Agent Control) with BES-DIRAC extensions. About 2000 CPU cores and 400 TB storage contributed by BESIII collaboration members are integrated to serve MC simulation and reconstruction jobs, as well as data transfer between sites. Recently, this distributed computing environment has been extended to support other experiments operated by IHEP, such as Jiangmen Underground Neutrino Observatory (JUNO), Circular Electron Positron Collider (CEPC), Large High Altitude Air Shower Observatory (LHAASO), Hard X-ray Modulation Telescope (HXMT), etc. Each experiment’s community naturally forms a virtual organization (VO). They have various heterogeneous computing and storage resources located at geographically distributed universities or institutions worldwide, with larger or smaller size. They wish to integrate those resources and supply to their VO’s users. The already established BES-DIRAC DCE is chosen as platform to serve their requirements, with considerable multi-VO support technologies implemented. The advantage of building DIRAC as a service is obvious in our case. First of all, DIRAC servers need dedicated hardware and expert manpower to maintain, some small VOs are not willing to afford that. Second, many universities and institutions in China are collaboration members of several experiments listed above, they would like to contribute resources to more than one experiment, a single DIRAC setup will be the easiest way to manage these resources. Therefore, Instead of setting up dedicated DCE for each experiment, operating a single setup for all the experiments is an effective way to make maximal use of shared resources and manpower while remaining flexible and open for new experiments to join. The key point for this single setup is to design and develop a multi-VO support scheme based on BES-DIRAC framework. The VO Management System (VOMS) plays a central role in multi-VO members and resources information management. Other services will retrieve user proxy information and resource per-VO configurations from VOMS server. In data management subsystem, DIRAC file catalogs (DFC) with dynamic dataset functionality is extended to handle files and datasets belong to users from different VOs. StoRM SE is employed to manage the storage resources contributed by universities belong to more than two VOs. In workload management subsystem, the pilot, job matcher and scheduler should take into consideration the VO properties of jobs, priorities of VOs in resources configurations, etc. In frontend subsystem, a task based request submission and management system is developed. Different VO’s users can specify their own workflows. It allows users from different VOs to submit jobs, manage data transferring, get monitoring and accounting information of their requests. The experience of designing multi-VO support in IHEP’s distributed computing environment may be interested and useful for other high energy physics center operating several experiments.
Speaker: Dr Tian Yan (Institution of High Energy Physics, Chinese Academy of Science)
• 18:15
The GridPP DIRAC project - DIRAC for non-LHC communities 15m
The GridPP consortium in the UK is currently testing a multi-VO DIRAC service aimed at non-LHC VOs. These VOs are typically small (fewer than two hundred members) and generally do not have a dedicated computing support post. The majority of these represent particle physics experiments (e.g. T2K, NA62 and COMET), although the scope of the DIRAC service is not limited to this field. A few VOs have designed bespoke tools around the EMI-WMS & LFC, while others have so far eschewed distributed resources as they perceive the overhead for accessing them to be too high. The aim of the GridPP DIRAC project is to provide an easily adaptable toolkit for such VOs in order to lower the threshold for access to distributed resources such as grid and cloud computing. As well as hosting a centrally run DIRAC service, we will also publish our changes and additions to the base DIRAC codebase under an open-source license. We started by taking a survey of the existing VO specific solutions using the feedback to determine the user requirements that were driving these implementations. These details were then used to map the base requirements to available DIRAC features, implementing any additional common functionality that was needed. Once the groundwork was complete, this knowledge was shared with the existing VOs and we worked with them to adapt their grid model to use the new DIRAC service. The experience gained from this process was then used to recommend sensible approaches to the new VOs and assist them in getting started with distributed computing. We investigated different support models and found that a mailing list was the most accessible option for the target audience, while GGUS is used for tracking service issues. We report on the current status of this project and show increasing adoption of DIRAC within the non-LHC communities.
Speaker: Janusz Martyniak (Imperial College London)
• 16:30 18:30
Track 5 Session: #2 (Computing Activities) B503 (B503)

### B503

#### B503

Computing activities and Computing models

Convener: Dr Simone Campana (CERN)
• 16:30
Dynamic Data Management for the Distributed CMS Computing System 15m
The Dynamic Data Management (DDM) framework is designed to manage the majority of the CMS data in an automated fashion. At the moment 51 CMS Tier-2 data centers have the ability to host about 20 PB of data. Tier-1 centers will also be included adding substantially more space. The goal of DDM is to facilitate the management of the data distribution and optimize the accessibility of data for the physics analyses. The basic idea is to replicate highly demanded data and remove data replicas that are not often used. There are four distinct parts that work together on achieving this goal: newly created datasets are inserted into the managed data pool, highly popular datasets are replicated across multiple data centers, data centers are cleaned of the least popular data, and temporarily unavailable sites are accounted by creating replicas of missing datasets. We are also developing various metrics that show whether the system behaves as expected. The CMS Popularity Service is the authoritative source of the popularity information -- CPU time, number of accesses, number of users/day, etc. of datasets, blocks and files. The information is collected from various resources namely directly from the CMS Software, from the CMS Remote Analysis Builder (CRAB) and from files accessed via xrootd. It is subsequently summarized and exposed using a REST API to other CMS services for further use. This paper describes the architecture of these components and details the experience of commissioning and running the system over the last year.
Speaker: Christoph Paus (Massachusetts Inst. of Technology (US))
• 16:45
A study on dynamic data placement for the ATLAS Distributed Data Management system 15m
This contribution presents a study on the applicability and usefulness of dynamic data placement methods for data-intensive systems, such as ATLAS distributed data management (DDM). In this system the jobs are sent to the data, therefore having a good distribution of data is significant. Ways of forecasting workload patterns are examined which then are used to redistribute data to achieve a better overall utilisation of computing resources and to reduce waiting time for jobs before they can run on the grid. This method is based on a tracer infrastructure that is able to monitor and store historical data accesses and which is used to create popularity reports. These reports provide detailed summaries about data accesses in the past, including information about the accessed files, the involved users and the sites. From this past data it is possible to then make near-term forecasts for data popularity in the future. This study evaluates simple prediction methods as well as more complex methods like neural networks. Based on the outcome of the predictions a redistribution algorithm deletes unused replicas and adds new replicas for potentially popular datasets. Finally, a grid simulator is used to examine the effects of the redistribution. The simulator replays workload on different data distributions while measuring the job waiting time and site usage. The study examines how the average waiting time is affected by the amount of data that is moved, how it differs for the various forecasting methods and how that compares to the optimal data distribution.
Speaker: Thomas Beermann (Bergische Universitaet Wuppertal (DE))
• 17:00
Exploiting CMS data popularity to model the evolution of data management for Run-2 and beyond 15m
During the LHC Run-1 data taking, all experiments collected large data volumes from proton-proton and heavy-ion collisions. The collisions data, together with massive volumes of simulated data, were replicated in multiple copies, transferred among various Tier levels, transformed/slimmed in format/content. These data were then accessed (both locally and remotely) by large groups of distributed analysis communities exploiting the WorldWide LHC Computing Grid infrastructure and services. While efficient data placement strategies - together with optimal data redistribution and deletions on demand - have become the core of static versus dynamic data management projects, little effort has so far been invested in understanding the detailed data-access patterns which surfaced in Run-1. These patterns, if understood, can be used as input to simulation of computing models at the LHC, to optimise existing systems by tuning their behaviour, and to explore next-generation CPU/storage/network co-scheduling solutions. This is of great importance, given that the scale of the computing problem will increase far faster than the resources available to the experiments, for Run-2 and beyond. Studying data-access patterns involves the validation of the quality of the monitoring data collected on the "popularity" of each dataset, the analysis of the frequency and pattern of accesses to different datasets by analysis end-users, the exploration of different views of the popularity data (by physics activity, by region, by data type), the study of the evolution of Run-1 data exploitation over time, the evaluation of the impact of different data placement and distribution choices on the available network and storage resources and their impact on the computing operations. This work presents some insights from studies on the popularity data from the CMS experiment. We present the properties of a range of physics analysis activities as seen by the data popularity, and make recommendations for how to tune the initial distribution of data in anticipation of how it will be used in Run-2 and beyond.
Speaker: Prof. Daniele Bonacorsi (University of Bologna)
• 17:15
A Comparative Analysis of Event Processing Frameworks used in HEP 15m
Today there are many different experimental event processing frameworks in use by running or about to be running experiments. This talk will compare and contrast the different components of these frameworks and highlight the different solutions chosen by different groups.  In the past there have been attempts at shared framework projects for example the collaborations on the BaBar framework (between BaBar, CDF, and CLEO), on the Gaudi framework (between LHCb and ATLAS), on AliROOT/FairROOT (between Alice and GSI/Fair), and in some ways on art (Fermilab based experiments) and CMS’ framework.  However, for reasons that will be discussed, these collaborations did not result in common frameworks shared among the intended experiments. Though importantly, two of the resulting projects have succeeded in providing frameworks that are shared among many customer experiments: Fermilab's art framework and GSI/Fair's FairROOT. Interestingly, several projects are considering remerging their frameworks after many years apart. I'll report on an investigation and analysis of these realities.  With the advent of the need for multi-threaded frameworks and the scarce available manpower, it is important to collaborate in the future; however it is also important to understand why previous attempts at multi-experiment frameworks either worked or didn’t work.
Speaker: Elizabeth Sexton-Kennedy (Fermi National Accelerator Lab. (US))
• 17:30
The OSG Open Facility: A Sharing Ecosystem Using Harvested Opportunistic Resources 15m
The Open Science Grid (OSG) ties together individual experiments' computing power, connecting their resources to create a large, robust computing grid; this computing infrastructure started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero. OSG has been funded by the Department of Energy Office of Science and National Science Foundation since 2006 to meet the US LHC community's computational needs. In the years since, the OSG has broadened its focus to also address the needs of other US researchers and increased delivery of Distributed High Throughput Computing (DHTC) to users from a wide variety of disciplines via the OSG Open Facility. Presently, the Open Facility delivers about 100 million computing wall hours per year to researchers who are not already associated with the owners of the computing sites; this is primarily accomplished by harvesting and organizing the temporarily unused capacity (i.e. opportunistic cycles) from the sites in the OSG. We present an overview of the infrastructure developed to accomplish this from flocking architecture to harvesting opportunistic resources to providing user support to a diverse set of researchers. We also present our experiences since becoming a high-throughput computing service provider as part of the NSF’s XD program. Using these methods, OSG resource providers and scientists share computing hours with researchers in many other fields to enable their science, striving to make sure that these computing resources are used with maximal efficiency. We believe that expanded access to DHTC is an essential tool for scientific innovation and work continues in expanding this service.
Speaker: Dr Bodhitha Jayatilaka (Fermilab)
• 17:45
Distributed analysis in ATLAS 15m
The ATLAS experiment accumulated more than 140 PB of data during the first run of the Large Hadron Collider (LHC) at CERN. The analysis of such an amount of data for the distributed physics community is a challenging task. The Distributed Analysis (DA) system of the ATLAS experiment is an established and stable component of the ATLAS distributed computing operations. About half a million user jobs are daily running on DA resources, submitted by more than 1500 ATLAS physicists. The reliability of the DA system during the first run of the LHC and the following shutdown period has been high thanks to the continuous automatic validation of the distributed analysis sites and the user support provided by a dedicated team of expert shifters. During the LHC shutdown, the ATLAS computing model has undergone several changes to improve the analysis workflows, including the re-design of the production system, a new analysis data format and event model, and the development of common reduction and analysis frameworks. We report on the impact such changes have on the DA infrastructure, describe the new DA components, and include first measurements of DA performances in the first months of the second LHC run, starting early 2015.
Speaker: Federica Legger (Ludwig-Maximilians-Univ. Muenchen (DE))
• 18:00
The CMS Tier-0 goes Cloud and Grid for LHC Run 2 15m
In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threaded framework to deal with the increased event complexity and to ensure efficient use of the resources. This contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.
Speaker: Dirk Hufnagel (Fermi National Accelerator Lab. (US))
• 16:30 18:30
Track 7 Session: #1 (Site operations) C210 (C210)

### C210

#### C210

Clouds and virtualization

Convener: Jeff Templon (NIKHEF (NL))
• 16:30
Scaling the CERN OpenStack cloud 15m
CERN has been running a production OpenStack cloud since July 2013 to support physics computing and infrastructure services for the site. This is expected to reach over 100,000 cores by the end of 2015. This talk will cover the different use cases for this service and experiences with this deployment in areas such as user management, deployment, metering and configuration of thousands of servers across two data centres. Ongoing research, such as bare metal management, containers, software defined networking and orchestration will also be covered.
Speaker: Stefano Zilli (CERN)
• 16:45
Benchmarking and accounting for the (private) cloud 15m
As part of CERN's Agile Infrastructure project, large parts of the CERN batch farm have been moved to virtual machines running on CERNs private IaaS (link is external) cloud. During this process a large fraction of the resources, which had previously been used as physical batch worker nodes, were converted into hypervisors. Due to the large spread of the per-core performance (rated in HS06) in the farm, caused by its heterogenious nature, it is necessary to have a good knowledge of the performance of the virtual machines. This information is used both for scheduling and accounting. While in the previous setup worker nodes were classified and benchmarked based on the purchase order number, for virtual batch worker nodes this is no longer possible; the information is now either hidden or hard to retrieve. Therefore we developed a new scheme to classify worker nodes in terms of their performance. The new scheme is flexible enough to be usable both for virtual and physical machines in the batch farm. It should be possible to apply it as well to public clouds and more dynamic future batch farms with worker nodes coming and going at a high rate. The used scheme, experiences and lessons learned will be presented. Possible extensions as well as application to a more general case, for example in the context of accounting within WLCG, will be covered.
Speaker: Dr Ulrich Schwickerath (CERN)
• 17:00
Managing virtual machines with Vac and Vcycle 15m
We compare the Vac and Vcycle virtual machine lifecycle managers and our experiences in providing production job execution services for ATLAS, LHCb, and the GridPP VO at sites in the UK and at CERN. In both the Vac and Vcycle systems, the virtual machines are created outside of the experiment's job submission and pilot framework. In the case of Vac, a daemon runs on each physical host which manages a pool of virtual machines on that host, and a peer-to-peer UDP protocol is used to achieve the desired target shares between experiments across the site. In the case of Vcycle, a daemon manages a pool of virtual machines on an Infrastructure As A Service cloud system such as OpenStack, and has within itself enough information to create the types of virtual machines to achieve the desired target shares. Both systems allow unused shares for one experiment to temporarily taken up by other experiements with work to be done. The virtual machine lifecycle is managed with a minimum of information, gathered from the virtual machine creation mechanism (such as libirt or OpenStack) and using the proposed Machine/Job Features API from WLCG. We demonstrate that the same virtualmachine designs can be used to run production jobs on Vac and Vcycle/OpenStack sites for ATLAS, LHCb, and GridPP, and that these technologies allow sites to be operated in a reliable and robust way.
Speaker: Andrew McNab (University of Manchester (GB))
• 17:15
Evaluation of containers as a virtualisation alternative for HEP workloads 15m
Speaker: Andrew John Washbrook (University of Edinburgh (GB))
• 17:30
docker & HEP: containerization of applications for development, distribution and preservation. 15m
docker & HEP: containerization of applications for development, distribution and preservation. ================================================= HEP software stacks are not shallow. Indeed, HEP experiments' software are usually many applications in one (reconstruction, simulation, analysis, ...) and thus require many libraries - developed in-house or by third parties - to be properly compiled and installed. Moreover, because of resource constraints, experiments' software is usually installed, tested, validated and deployed on a very narrow set of platforms, architectures, toolchains and operating systems. As a consequence, bootstrapping a software environment on a developer machine or deploying the software on production or user machines is usually perceived as tedious and iterative work, especially when one wants the native performances of bare metal. Docker containers provide an interesting avenue for packaging applications and development environment, relying on the Linux kernel capabilities for process isolation, adding "git"-like capabilities to the filesystem layer and providing (close to) native CPU, memory and I/O performances. This paper will introduce in more details the modus operandi of Docker containers and then focus on the hepsw/docks containers which provide containerized software stacks for -among others- LHCb. The paper will then discuss various strategies experimented with to package software (e.g. CVMFS, RPMs, from-source) and applied to optimize provisioning speed and disk usage, leveraging the caching system of Docker. Finally, the paper will report on benchmarks comparing workloads on native, VM and Docker` containers setups.
Speaker: Ben Couturier (CERN)
• 17:45
Integrated Monitoring-as-a-service for Scientific Computing Cloud applications using the Elasticsearch ecosystem 15m
The INFN computing centre in Torino hosts a private Cloud, which is managed with the OpenNebula cloud controller. The infrastructure offers IaaS services to different scientific computing applications. The main stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a separate grid Tier-2 site for the BES-III collaboration, plus an increasing number of other smaller tenants. The dynamic allocation of resources to tenants is partially automated. This feature requires detailed monitoring and accounting of the resource usage. We set up a monitoring framework to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the Elasticsearch, Logstash and Kibana stack. The infrastructure relies on an SQL database back-end for data preservation and to ensure flexibility to switch to a different monitoring solution if needed. The heterogeneous accounting information is transferred from the database to the Elasticsearch engine via a custom Logstash plugin. Each use-case is indexed separately in Elasticsearch and we setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service. Moreover, we have developed a billing system for our private Cloud, which relies on the RabbitMQ message queue for asynchronous communication to the database and on the ELK stack for its graphical interface. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BES-III virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. Finally, we have defined a model for monitoring-as-a-service, based on the tools described above, which the Cloud tenants can easily configure to suit their needs. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools.
Speaker: Sara Vallero (Universita e INFN (IT))
• 18:00
Implementation of the vacuum model using HTCondor 15m
The recently introduced vacuum model offers an alternative to the traditional methods that virtual organisations (VOs) use to run computing tasks at sites, where they either submit jobs using grid middleware or create virtual machines (VMs) using cloud APIs. In the vacuum model VMs are created and contextualized by the site itself, and start the appropriate pilot job framework which fetches real jobs. This allows sites to avoid the effort required for running grid middleware or a cloud. Here we present an implementation of the vacuum model based entirely on HTCondor, making use of HTCondor's ability to manage VMs. Extensive use is made of job hooks, including for preparing fast local storage for use in the VMs, carrying out contextualization, and updating job ClassAds about the status of the VMs and their payloads. VMs for each supported VO are created at regular intervals. If there is no work or there are fatal errors, no additional VMs are created. On the other hand, if there is real work running, further VMs can be created. Since the HTCondor negotiator decides whether to run the VMs or not, fairshares are naturally respected. Normal grid or locally-submitted jobs can run on the same resources and share the same physical worker nodes that are also used as hypervisors for running VMs.
Speaker: Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))
• 16:30 18:30
Track 8 Session: #1 (GPU and other accelerators) Village Center (Village Center)

### Village Center

#### Village Center

Performance increase and optimization exploiting hardware features

Convener: Niko Neufeld (CERN)
• 16:30
Fast event generation on graphics processing unit (GPU) and its integration into the MadGraph system. 15m
Fast event generation system of physics processes is developed using graphics processing unit (GPU). The system is based on the Monte Carlo integration and event generation programs, BASES/SPRING, which were originally developed in FORTRAN. They were rewritten on the CUDA platform provided by NVIDIA in order for the implementation of these programs to GPUs. Since the Monte Carlo integration algorithm is composed of a lot of independent function calls at multi-dimensional space, highly parallel architecture of GPU is very suitable for the improvement of their performance. The performance of event generations based on the integrated results can be also easily improved by the event parallelization on GPU. Parallelized programs show very good performance in process time compared to the existing event generation programs. For the computation of cross sections of physics processes on GPU the helicity amplitude calculation package in FORTRAN is implanted in the CUDA framework as "HEGET" library, and new phase space generation library and random number generator are developed in order for the better generation efficiency on GPU. The event generation system on GPU is tested using general Standard Model processes and computed cross sections are consistent with those obtained with the MadGraph system which is widely used in the field of elementary particle physics. The total process time of the new system on GPU is compared with the equivalent programs on CPU and its improvement factors for various physics processes range from 10 to 100 depending on the complexity of their final states. In order to achieve better interface for the generation of various physics processes and also to realize wider application of the fast Monte Carlo integration program the system has been integrated into the MadGraph5. Also the HEGET library for the helicity amplitude computations is integrated into the ALOHA system of the MadGraph5.
Speaker: Dr Junichi Kanzaki (KEK)
• 16:45
Triggering events with GPUs at ATLAS 15m
The growing size and complexity of events produced at the high luminosities expected in 2015 at the Large Hadron Collider demands much more computing power for the online event selection and for the offline data reconstruction than in the previous data taking period. In recent years, the explosive performance growth of low-cost, massively parallel processors like Graphical Processing Units (GPUs) - both in computing power and in low energy consumption - make GPUs extremely attractive for solving the challenging computing tasks of high energy physics experiment like ATLAS. After the optimisation of the computing intensive algorithms and their adaptation to GPUs, thus exploiting the paradigm of massively parallel computing, a small scale prototype of the full reconstruction chain of the ATLAS High Level Trigger under inclusion of the GPU-based algorithms has been implemented. We discuss the integration procedure of this prototype, the achieved performance and the prospects for the future.
Speaker: Dr Sami Kama (Southern Methodist University (US))
• 17:00
GPU Accelerated Event-by-event Reweighting for a T2K Neutrino Oscillation Analysis 15m
The Tokai-to-Kamioka (T2K) experiment is a second generation long baseline neutrino experiment, which uses a near detector to constrain systematic uncertainties for oscillation measurements with its far detector. Event-by-event reweighting of Monte Carlo (MC) events is applied to model systematic effects and construct PDFs describing predicted event distributions. However when analysing simultaneously several data samples from both near and far detectors, the computational overhead can become a limiting factor in an oscillation analysis. Because reweighting each MC event is an independent process, it can be parallelized using graphics processing units (GPUs). For a recent T2K analysis, several bottlenecking calculations were offloaded onto NVIDIA GPUs using CUDA: the calculation of oscillation probabilities with matter effects and the evaluation of non-linear parameter responses with cubic splines. Individually, these methods achieved 40-180x speed-ups in standalone benchmarks. When implemented into the analysis software suite, an improvement of ~20x was seen to the overall analysis. This talk will discuss the motivation and implementation of GPU reweighting into the T2K oscillation analysis, and prospects for further improvements using GPUs.
Speaker: Richard Calland
• 17:15
Fast TPC online tracking on GPUs and asynchronous data-processing in the ALICE HLT to enable online calibration 15m
ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute the data and the workload among the compute nodes. ALICE employs several subdetectors that are sensitive to calibration, e.g. the TPC (Time Projection Chamber). For a precise reconstruction, the HLT has to perform the calibration online. Online-calibration can make certain offline calibration steps obsolete and can thus speed up offline analysis. In ALICE Run 3 starting in 2020, online calibration becomes a necessity. The main detector used for track reconstruction is the TPC. Reconstructing the trajectories in the TPC is the most compute-intense step during event reconstruction. Therefore, a fast tracking implementation is of great importance. Reconstructed TPC tracks build the basis for the calibration making a fast online-tracking mandatory. We present several components developed for the ALICE High Level Trigger to perform fast event reconstruction and to provide features required for online calibration. As first topic, we present our TPC tracker, which employs GPUs to speed up the processing, and which bases on a Cellular Automaton and on the Kalman filter. Our TPC tracking algorithm has been successfully used in 2011 and 2012 in the lead-lead and the proton-lead runs. We have improved it to leverage features of newer GPUs and we have ported it to support OpenCL, CUDA, and CPUs with a single common source code. This makes us vendor independent. As second topic, we present framework extensions, which are required for online calibration. The extensions, however, are generic and can be used for other purposes as well. We have extended the framework to allow asynchronous compute chains, which are required for long-running tasks e.g. for online calibration. And we describe our method to feed in custom data sources in the data flow. This can be external parameters like environmental temperature required for calibration and this can also be used to feed back calibration results in the processing chain. Overall, the work presented in this contribution makes the ALICE HLT ready for online reconstruction and calibration for the LHC Run 2 starting in 2015.
Speaker: David Michael Rohr (Johann-Wolfgang-Goethe Univ. (DE))
• 17:30
Detector Simulation On Modern Coprocessors 15m
The recent prevalence of hardware architectures of many-core or accelerated processors opens opportunities for concurrent programming models taking advantages of both SIMD and SIMT architectures. The Geant Vector Prototype has been designed both to exploit the vector capability of main stream CPUs and to take advantage of Coprocessors including NVidia’s GPU and Intel Xeon Phi. The characteristics of each of those architectures are very different in term of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. Between each platforms the number of individual tasks to be processed ‘at once’ for efficient use of the hardware varies sometimes by an order of magnitude. The granularity of the code executed may also need to be dynamically adjusted. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. We will present the challenges, solutions and resulting performance of running an end to end detector simulation concurrently on a main stream CPU and a coprocessor and detail the broker implementation bridging the disparity between the two architectures. The impacts of task decomposition, vectorization, efficient sampling techniques and data look-up using track level parallelism will be also evaluated on vector and massively parallel architectures.
Speaker: Philippe Canal (Fermi National Accelerator Lab. (US))
• 17:45
Hardware and Software Design of FPGA-based PCIe Gen3 interface for APENet+ network interconnect system 15m
The computing nodes of modern hybrid HPC systems are built using the CPU+GPU paradigm. When this class of systems is scaled to large size, the efficiency of the network connecting GPUs mesh and supporting the internode traffic is a critical factor. The adoption of a low latency, high performance dedicated network architecture, exploiting peculiar characteristics of CPU and GPU hardware, allows to guarantee scalability and a good level of sustained performances. In the attempt to develop a custom interconnection architecture optimized for scientific computing we designed APEnet+, a point-to-point, low-latency and high-performance 3D torus network controller which supports 6 fully bidirectional off-board links. The first release of APEnet+ (named V4), was a board based on a high end 40nm Altera FPGA that integrates multiple (6) channels at 34Gbps of raw bandwidth per direction and a PCIe Gen2 x8 host interface. APEnet+ board was the first-of-its-kind to implement a Remote Direct Memory Access (RDMA) protocol to directly read/write data from/to Fermi and Kepler NVIDIA GPUs using the Nvidia “peer-to-peer” and “GPUDirect RDMA” protocols, obtaining real zero-copy, low-latency GPU-to-GPU transfers over the network and reducing the performance bottleneck due to the costly copies of data from user to kernel space( and vice-versa). The last generation of APEnet+ systems (V5), currently under development, is based on state-of-the-art high end FPGA, 28nm Altera Stratix V, offering a number of multi-standard fast transceivers (up to 14.4 Gbps), huge amount of configurable internal resources and hardware IP cores to support main interconnection standard protocols. APEnet+ V5 implements a PCIe Gen3 x8 interface, the current standard protocol for high end system peripherals, in order to gain performance on the critical CPU/GPU connection and mitigate the effect of the bottleneck represented by GPUs memory access. Furthermore the FPGA technology advancement, allowed us to integrate in V5, new off-board torus channels characterized by a target speed of 56 Gbps. Both Linux Device Driver and the low-level libraries, have been redesigned to support the PCIe Gen3 protocol, introducing optimizations and solutions based on hardware/software co-design. In this paper we present the architecture of APEnet+ V5 and discuss the status of APEnet+ V5 PCIe Gen3 hardware and system software design. Measures of performance in terms of latency and bandwidth, both for the local APEnet+ to CPU-GPU connection (with Kepler class GPU) and host-to-host via torus links, will also be provided.
Speaker: Michele Martinelli (INFN Rome)
• 18:00
Online-Analysis of Hits in the Belle-II Pixeldetector for Separation of Slow Pions from Background 15m
The impending Upgrade of the Belle experiment is expected to increase the generated data set by a factor of 50. This means that for the planned pixeldetector, which is the closest to the interaction point, the data rates are going to increase to over 20 GB/s. Combined with data generated by the other detectors, this rate is too big to be efficiently send out to offline processing. This is makes the employment of online data reduction schemes necessary, in which background is detected and rejected in order to reduce the data rates. In this paper an approach for efficient online data reduction for the planned pixeldetector of Belle-II is presented. The approach centers around the usage of an algorithm, the NeuroBayes, that is based on multivariate analysis, allowing the identification of signal and background by analysing clusters of hits in the pixeldetector on FPGAs. The algorithm is leveraging the fact that hits of signal particles can have very different characteristics, compared to background, when passing through the pixeldetector. The applicability and advantages in performance are shown through the D* decay. In Belle-II these decays produce pions with such a small transversal momentum, that they barley escape the pixel detector itself. In a common approach like extrapolation of tracks from outer detectors to RoIs, these pions are simply lost, since they do not reach all necessary layers of a detector. Meanwhile usage of the cluster analysis succeeds in separating those pions from background, allowing to retain the data. For that characteristics of corresponding hits, like the total amount of charge deposited in the pixels, are used for separation. The capability for effective data reduction is underlined by a background reduction of at least 90% and signal efficiency of 95 %, for slow pions. An implementation of the algorithm for usage on Virtex-6 FPGAs, that are used at the pixeldetector, was performed. It is shown that the resulting implementation succeeds in replicating the efficiency of the algorithm, implemented in software, while throughputs that suffice hard realtime constraints, set by the read out system of Belle-II, are achieved and efficient use of the resources present on the FPGA is made.
Speaker: Mr Steffen Baehr (Karlsruhe Institute of Technology)
• 18:15
Evaluation of 'OpenCL for FPGA' for Data Acquisition and Acceleration in High Energy Physics applications 15m
The proposed upgrade for the Large Hadron Collider LHCb experiment at CERN envisages a system of 500 Data sources each generating data at 100 Gbps, the acquisition and processing of which is a challenge even for state of the art FPGAs. This challenge splits into two, the Data Acquisition (DAQ) part and the Algorithm acceleration part, the later not necessarily immediately following the former. Looking first at the DAQ part, a Header Generator module was needed to packetize the streaming data coming in from the front-end electronics of the detectors, for easy access and processing by the servers. This necessitates FPGA architectures that not only handle the data generated by the experiment in real-time but also dynamically adapt to potential inadequacies of other components, such as the network and PCs, while ensuring system stability and overall data integrity. Since the data source has no flow control, this module needs to modify the stream data by dropping datasets in a controlled fashion in the event of receiving a back pressure signal from the downstream modules. Also needed was a front-end source emulator capable of generating the various data patterns, that can act as a test bed to validate the functionality and performance of the Header Generator. Such a system was earlier designed and realized in VHDL. The results from this were presented as a paper, ‘Dynamically Adaptive Header Generator and Front-End Source Emulator for a 100 Gbps FPGA based DAQ’ presented at the IEEE Real-Time Conference earlier in 2014 (RT2014). While this process has been traditionally carried out using hardware description languages (HDLs), the possibility exists of using OpenCL to design a DAQ system. This has the potential to simplify development for physicists using the tools, who are more familiar with traditional software as opposed to HDLs, so they can understand the system and make modifications in the future. This is challenging due to fact that the OpenCL language is designed for Parallel Processing and not really targeted at real-time DAQ and there are major challenges in representing the cycle-accurate data acquisition and processing system in OpenCL. However, OpenCL for FPGAs may be applicable from a high level synthesis perspective. Achieving this will enable the movement of the entire FPGA design flow for High Energy Physics applications to OpenCL, rather than just the algorithm acceleration portion that involves parallel processing. For the algorithm acceleration part, the Hough transformation will be implemented in OpenCL. This is a method to reconstruct lines from points in 2D/3D space and can be used to identify particle tracks from hits in the VELO detector elements. Variations of this algorithm are also used for feature identification on the data from other detectors too. This work explores the feasibility of implementing Data Acquisition and Processing system on OpenCL and evaluates the performance of this OpenCL implementation with the HDL based implementation. Development is using the Altera OpenCL compiler for FPGA. This work was is funded under ICE-DIP, a European Industrial Doctorate project in the European Community’s 7th Framework programme Marie Curie Actions under grant PITN-GA-2012-316596.
Speaker: Srikanth Sridharan (CERN)
• Tuesday, 14 April
• 09:00 10:30
Plenary: Session #3 Auditorium

#### Auditorium

Convener: Simon C. LIN (Academia Sinica)
• 09:00
HEP Computing: A Tradition of Scientific Leadership 15m
Speaker: Dr Jonathan Dorfan (OIST)
• 09:15
Computing in Intensity Frontier Accelerator Experiments 45m
Speaker: Robert Group (University of Virginia)
• 10:00
KAGRA and the Global Network of Gravitational Wave Detectors -Construction Status and Prospects for GW Astronomy with Data Sharing Era- 30m
• 10:30 11:00
Coffee Break 30m
• 11:00 12:45
Plenary: Session #4 Auditorium

#### Auditorium

Convener: Dr David Malon (Argonne National Laboratory (US))
• 11:00
Challenges of Developing and Maintaining HEP "Community" Software 45m
Speaker: Amber Boehnlein
• 11:45
IBM Corporation 30m
• 12:15
Lenovo Corporation 30m
• 12:45 13:30
Lunch Break 45m
• 13:30 14:00
Poster session A: #3 Tunnel Gallery

#### Tunnel Gallery

The posters in this session are displayed on April 13-14

• 14:00 16:00
Track 1 Session: #2 (Data acquisition and electronics) Village Center (Village Center)

### Village Center

#### Village Center

Online computing

Conveners: Mr Matthias Richter (DEPARTMENT OF PHYSICS AND TECHNOLOGY, UNIVERSITY OF BERGEN), Matthias Richter (University of Oslo (NO))
• 14:00
THE DAQ NEEDLE IN THE BIG-DATA HAYSTACK 15m
Speaker: Emilio Meschi (CERN)
• 14:15
ATLAS TDAQ System Administration: evolution and re-design 15m
Speaker: Christopher Jon Lee (University of Johannesburg (ZA))
• 14:30
The ATLAS Data Flow system for the second LHC run 15m
After its first shutdown, LHC will provide pp collisions with increased luminosity and energy. In the ATLAS experiment the Trigger and Data Acquisition (TDAQ) system has been upgraded to deal with the increased event rates. The Data Flow (DF) element of the TDAQ is a distributed hardware and software system responsible for buffering and transporting event data from the Readout system to the High Level Trigger (HLT) and to the event storage. The DF has been reshaped in order to profit from the technological progress and to maximize the flexibility and efficiency of the data selection process. The updated DF is radically different from the previous implementation both in terms of architecture and expected performance. The pre-existing two level software filtering, known as L2 and the Event Filter, and the Event Building are now merged into a single process, performing incremental data collection and analysis. This design has many advantages, among which are: the radical simplification of the architecture, the flexible and automatically balanced distribution of the computing resources, the sharing of code and services on nodes. In addition, logical farm slicing, with each slice managed by a dedicated supervisor, has been dropped in favour of global management by a single farm master operating at 100 kHz. The Data Collection network, that connects the HLT processing nodes to the Readout and the storage systems has evolved to provide network connectivity as required by the new Data Flow architecture. The old Data Collection and Back-End networks have been merged into a single Ethernet network and the Readout PCs have been directly connected to the network cores. The aggregate throughput and port density have been increased by an order of magnitude and the introduction of Multi Chassis Trunking significantly enhanced fault tolerance and redundancy. We will discuss the design choices, the strategies employed to minimize the data-collection latency, the results of scaling tests done during the commissioning phase and the operational performance after the first months of data taking.
Speaker: Reiner Hauser (Michigan State University (US))
• 14:45
FELIX: a High-throughput network approach for interfacing to front end electronics for ATLAS upgrades 15m
Speaker: Jorn Schumacher (University of Paderborn (DE))
• 15:00
A New Event Builder for CMS Run II 15m
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GB/s to the high-level trigger (HLT) farm. The DAQ system has been redesigned during the LHC shutdown in 2013/14. The new DAQ architecture is based on state-of-the-art network technologies for the event building. For the data concentration, 10/40 Gb/s Ethernet technologies are used together with a reduced TCP/IP protocol implemented in FPGA for a reliable transport between custom electronics and commercial computing hardware. A 56 Gb/s Infiniband FDR CLOS network has been chosen for the event builder with a throughput of 4 Tb/s. This paper will discuss the software design, protocols and optimizations for exploiting the hardware capabilities. We will present performance measurements from small-scale prototypes and from the full-scale production system.
Speaker: Remi Mommsen (Fermi National Accelerator Lab. (US))
• 15:15
File-based data flow in the CMS Filter Farm 15m
During the LHC Long Shutdown 1, the CMS DAQ system underwent a partial redesign to replace obsolete network equipment, use more homogeneous switching technologies, and prepare the ground for future upgrades of the detector front-ends. The software and hardware infrastructure to provide input, execute the High Level Trigger (HLT) algorithms and deal with output data transport and storage has also been redesigned to be completely file-based. This approach provides a complete decoupling between the HLT algorithms and the input and output data flow. All the metadata needed for bookkeeping of the data flow and the HLT process lifetimes are also generated in the form of small “documents” using the JSON encoding, by either services in the flow of the HLT execution (for rates etc.) or watchdog processes. These “files" can remain memory-resident or be written to disk if they are to be used in another part of the system (e.g. for aggregation of output data). We discuss how this redesign improves to the robustness and flexibility of the CMS DAQ and the performance of the system currently being commissioned for the LHC Run 2.
Speaker: Emilio Meschi (CERN)
• 15:30
Online data handling and storage at the CMS experiment 15m
Speaker: Georgiana Lavinia Darlea (Massachusetts Inst. of Technology (US))
• 15:45
A scalable monitoring for the CMS Filter Farm based on elasticsearch 15m
A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small “documents” using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing “natural” horizontal scaling. A separate “central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2.
Speaker: Srecko Morovic (CERN)
• 14:00 16:00
Track 2 Session: #3 (Frameworks) Auditorium (Auditorium)

### Auditorium

#### Auditorium

Offline software

Convener: Andrew Norman (Fermilab)
• 14:00
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP) 15m
AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Speaker: Vakho Tsulaia (Lawrence Berkeley National Lab. (US))
• 14:15
A New Petabyte-scale Data Derivation Framework for ATLAS 15m
During the Long shutdown of the LHC, the ATLAS collaboration overhauled its analysis model based on experience gained during Run 1.  A significant component of the model is a "Derivation Framework" that takes the Petabyte-scale AOD output from ATLAS reconstruction and produces samples, typically Terabytes in size, targeted at specific analyses.  The framework incorporates all of the functionality of the core reconstruction software, while producing outputs that are simply configured.  Event selections are specified via strings, including support for logical operations.  The output content can be highly optimised to minimise disk requirements, while maintaining the same C++ interface.  The framework includes an interface to the late-stage physics analysis tools, ensuring that the final outputs are consistent with tool requirements.  Finally, the framework allows several outputs to be produced for the same input, providing the possibility to optimise configurations to computing resources.
Speaker: James Catmore (University of Oslo (NO))
• 14:30
Requirements for a Next Generation Framework: ATLAS Experience 15m
The challenge faced by HEP experiments from the current and expected architectural evolution of CPUs and co-processors is how to successfully exploit concurrency and keep memory consumption within reasonable limits. This is a major change from frameworks which were designed for serial event processing on single core processors in the 2000s. ATLAS has recently considered this problem in some detail through its Future Frameworks Requirements group. Here we report on the major considerations of the group, which was charged with considering the best strategies to exploit current and anticipated CPU technologies. The group has re-examined the basic architecture of event processing and considered how the building blocks of a framework (algorithms, services, tools and incidents) should evolve. The group has also had to take special care to ensure that the use cases of the ATLAS high level trigger are encompassed, which differ in important ways from offline event processing (for example, 99% of events are rejected, which must be done with minimum resource investment). Finally, the group has considered how best to use the wide variety of concurrency techniques, such as multi-processing, multi-threading, offloaded accelerator technology, parallel I/O, and how to exploit resources such as high performance computing sites.
Speaker: Dr Sami Kama (Southern Methodist University (US))
• 14:45
Development of a Next Generation Concurrent Framework for the ATLAS Experiment 15m
The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. With current memory consumption for 64 bit ATLAS reconstruction in a high luminosity environment approaching 4GB, it will become impossible to fully occupy all cores in a machine without exhausting available memory. However, since maximising performance per watt will be a key metric, a mechanism must be found to use all cores as efficiently as possible. In this paper we report on our progress with a practical demonstration of the use of multi-threading in the ATLAS reconstruction software, using the GaudiHive framework. We have expanded support to Calorimeter, Inner Detector, and Tracking code, discussing what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on both the performance gains, and what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. We also present our findings on implementing a hybrid multi-threaded / multi-process framework, to take advantage of the strengths of each type of concurrency, while avoiding some of their corresponding limitations.
Speaker: Charles Leggett (Lawrence Berkeley National Lab. (US))
• 15:00
Using the CMS Threaded Framework In A Production Environment 15m
During 2014, the CMS Offline and Computing Organization completed the necessary changes to use the CMS threaded framework in the full production environment. Running reconstruction workflows using the multi-threaded framework is a crucial element of CMS' 2015 and beyond production plan. We will briefly discuss the design of the CMS Threaded Framework, in particular how the design affects scaling performance. We will then cover the effort involved in getting both the CMSSW application software and the workflow management system ready for using multiple threads for production. Finally, we will present metrics on the performance of the application and workflow system as well as the difficulties which were uncovered. We will end with CMS' plans for using the threaded framework to do production for LHC Run 2.
Speaker: Dr Christopher Jones (Fermi National Accelerator Lab. (US))
• 15:15
New developments in the FairRoot framework 15m
The FairRoot framework is the standard framework for simulation, reconstruction and data analysis developed at GSI for the future experiments at the FAIR facility. The framework delivers base functionality for simulation, i.e.: Infrastructure to easily implement a set of detectors, fields, and event generators. Moreover, the framework decouples the user code (e.g.: Geometry description, detector response, etc.) completely from the used MC engine. The framework also handles the Input/Output (IO). The output of single detectors (tasks) can be switched on (made persistence) or off (transient) in a simple and flexible way. For reconstruction and/or data analysis the user code is organized in modular tasks that implement the different states of a state machine. The order in which these tasks are executed is defined via a so-called steering macro. This scheme allows a very flexible handling of the reconstruction and data analysis configurations, it also allow for mixing of simulation and data reconstruction. Reconstruction tasks can run separately after simulation or directly on the fly within the simulation. The modular design of the framework has allowed a smooth transition to a message queue based system, which makes it possible to parallelize the execution of the tasks without re-designing or re-writing the existing user code. The new design also allows implementing the processes in different programming languages or on different hardware platforms. For the communication between the different processes modern technologies like protocol buffers and Boost serialization are also used. The framework with a focus on the basic building blocks and the transition to the message queue based system will be presented.
Speaker: Dr Florian Uhlig (GSI Darmstadt)
• 15:30
ALFA: The new ALICE-FAIR software framework 15m
The commonalities between the ALICE and FAIR experiments and their computing requirements lead to the development of large parts of a common software framework in an experiment independent way. The FairRoot project has already shown the feasibility of such an approach for the FAIR experiments and extending it beyond FAIR to experiments at other facilities. The ALFA framework is a joint development between ALICE Online-Offline (O2) and FairRoot teams. ALFA is designed as a flexible, elastic system, which balances reliability and ease of development with performance using multi-processing and multi-threading. A message-based approach has been adopted; such an approach will support the use of the software on different hardware platforms, including heterogeneous systems. Each process in ALFA assumes limited communication and reliance on other processes. Such a design will add horizontal scaling (multiple processes) to “vertical scaling” provided by multiple threads to meet computing and throughput demands. ALFA does not dictate any application protocols. Potentially, any content-based processor or any source can change the application protocol. The framework supports different serialization standards for data exchange between different hardware and software languages. The concept and design of this new framework as well as the already implemented set of utilities and interfaces will be presented.
Speaker: Dr Mohammad Al-Turany (CERN)
• 15:45
Physics Analysis Software Framework for Belle II 15m
We present software framework being developed for physics analyses using the data collected by the Belle II experiment. The analysis workflow is organized in a modular way integrated within the Belle II software framework (BASF2). A set of physics analysis modules that perform simple and well defined tasks and are common to almost all physics analyses are provided. The physics modules do not communicate with each other directly but only through the data access protocols that are part of the BASF2. The physics modules are written in C++, Python or combination of both. Typically, a user performing a physics analysis only needs to provide a job configuration file with analysis’ specific sequence of physics modules that can be then executed on the Grid.
Speaker: marko staric (J. Stefan Institute, Ljubljana, Slovenia)
• 14:00 16:00
Track 4 Session: #3 (Middleware) B250 (B250)

### B250

#### B250

Middleware, software development and tools, experiment frameworks, tools for distributed computing

Convener: Marco Clemencic (CERN)
• 14:00
Commissioning HTCondor-CE for the Open Science Grid 15m
The HTCondor-CE is the next-generation gateway software for the Open Science Grid (OSG). This is responsible for providing a network service which authorizes remote users and provides a resource provisioning service (other well-known gatekeepers include Globus GRAM, CREAM, Arc-CE, and Openstack’s Nova). Based on the venerable HTCondor software, this new CE is simply a highly-specialized configuration of HTCondor. It was developed and adopted to provide the OSG with a more flexible, scalable, and easier-to-manage gateway software. This software does not exist in a vacuum: to deploy this gateway across the OSG, we had to integrate it with the CE configuration, deploy a corresponding information service, coordinate with sites, and overhaul our documentation.
Speaker: Edgar Fajardo Hernandez (Univ. of California San Diego (US))
• 14:15
Dynamic Resource Allocation with the ARC Control Tower 15m
Distributed computing resources available for high-energy physics research are becoming less dedicated to one type of workflow and researchers’ workloads are increasingly exploiting modern computing technologies such as parallelism. The current pilot job management model used by many experiments relies on static dedicated resources and cannot easily adapt to these changes. The model used for ATLAS in Nordic countries and some other places enables a flexible job management system based on dynamic resources allocation. Rather than a fixed set of resources managed centrally, the model allows resources to be requested on the fly. The ARC Computing Element (ARC-CE) and ARC Control Tower (aCT) are the key components of the model. The aCT requests jobs from the ATLAS job mangement system (Panda) and submits a fully-formed job description to ARC-CEs. ARC-CE can then dynamically request the required resources from the underlying batch system. In this paper we describe the architecture of the model and the experience of running many millions of ATLAS jobs on it.
Speaker: Andrej Filipcic (Jozef Stefan Institute (SI))
• 14:30
ARC Control Tower: A flexible generic distributed job management framework 15m
While current grid middlewares are quite advanced in terms of connecting jobs to resources, their client tools are generally quite minimal and features for managing large sets of jobs are left to the user to implement. The ARC Control Tower (aCT) is a very flexible job management framework that can be run on anything from a single user’s laptop to a multi-server distributed setup. aCT was originally designed to enable ATLAS jobs to be submitted to the ARC CE. However, with the recent redesign of aCT where the ATLAS specific elements are clearly separated from the ARC job management parts, the control tower can now easily be reused as a flexible generic distributed job manager for other communities. This paper will give a detailed explanation how aCT works as a job management framework and go through the steps needed to create a simple job manager using aCT and show that it can easily manage thousands of jobs.
Speaker: Jon Kerr Nilsen (University of Oslo (NO))
• 14:45
Intrusion Detection in Grid computing by Intelligent Analysis of Jobs Behavior – The LHC ALICE Case 15m
Grid infrastructures allow users flexible on-demand usage of computing resources using an Internet connection. A remarkable example of a Grid in High Energy Physics (HEP) research is used by the ALICE experiment at European Organization for Nuclear Research CERN. Physicists can submit jobs used to process the huge amount of particle collision data produced by the Large Hadron Collider (LHC) at CERN. Grids allow the submission of user developed jobs (code and data). They also have interfaces to Internet, storage systems, experiment infrastructure and other networks. This creates an important security challenge. Even when Grid system administrators perform a careful security assessment of sites, worker nodes, storage elements and central services, an attacker could still take advantage of unknown vulnerabilities (zero day). This attacker could enter and escalate her access privileges to misuse the computational resources for unauthorized or even criminal purposes. She could even manipulate the data of the experiments. Accordingly, Grid systems require an automatic tool to monitor and analyze the behavior of user jobs. This tool should analyze data generated by jobs such as log entries, traces, system calls, to determine if they run in a desired behavior or are performing any kind of attack on the system. The tool should react to the attack by sending alerts, logging information about relevant events and performing automatic defensive actions (for example stopping a suspicious process). This piece of software could be classified as Grid Intrusion Detection Systems (Grid-IDS). Traditional IDS allow detection of attacks by fixed if-then rules based on signatures. It compares the input data with known predefined conditions from previous events. This strategy fails when a new type of intrusion is used, even with a slightly difference. Artificial intelligence algorithms have been suggested as a method to improve Intrusion Detection Systems. By the usage of a Machine Learning approach it is possible to train the IDS on generalizing among attacks even when they are completely new. Intelligent IDS can also analyze the huge amount of data generated in Grid logs and process traces to determine a misbehaving scenario (data mining). This Grid IDS has to be adapted to highly distributed scenarios, when collaboration among geographically separate sites is necessary and reliability on central services is not always an option. Currently there is no framework that allows us to fulfill all the above requirements. We will design and build such framework. This framework should allow the monitoring and analysis of grid job behavior to detect attack attempts, even if new techniques or zero day vulnerabilities are utilized. This framework should also perform required countermeasures for its protection. In a first step, we plan to analyze the behavior of the usual job execution in the ALICE experiment Grid. We will determine the most important metrics to characterize a “bad” behavior (an attack). Later we will collect data from the Grid logs using these metrics and will use this data to train a machine learning algorithm. The algorithm will allow us classification of jobs as in desired or undesired state depending on the data produced in their execution. We plan to implement the proposed framework as a software prototype that will be tested as a component of the ALICE Grid middelware. **Keywords –** grid computing, distributed computing, distributed System security, artificial intelligence, data mining, Intrusion Detection Systems.
Speaker: Andres Gomez Ramirez (Johann-Wolfgang-Goethe Univ. (DE))
• 15:00
Virtual Circuits in PhEDEx, an update from the ANSE project 15m
The ANSE project has been working with the CMS and ATLAS experiments to bring network awareness into their middleware stacks. For CMS, this means enabling control of virtual network circuits in PhEDEx, the CMS data-transfer management system. PhEDEx orchestrates the transfer of data around the CMS experiment to the tune of 1 PB per week spread over about 70 sites. The goal of ANSE is to improve the overall working efficiency of the experiments, by enabling more deterministic time to completion for a designated set of data transfers, through the use of end-to-end dynamic virtual circuits with guaranteed bandwidth. ANSE has enhanced PhEDEx, allowing it to create, use and destroy circuits according to it's own needs. PhEDEx can now decide if a circuit is worth creating based on its current workload and past transfer history, which allows circuits to be created only when they will be useful. This paper reports on the progress made by ANSE in PhEDEx. We show how PhEDEx is now capable of using virtual circuits as a production-quality service, and describe how the mechanism it uses can be refactored for use in other software domains. We present first results of transfers between CMS sites using this mechanism, and report on the stability and performance of PhEDEx when using virtual circuits. The ability to use dynamic virtual circuits for prioritised large-scale data transfers over shared global network infrastructures represents an important new capability and opens many possibilities. The experience we have gained with ANSE is being incorporated in an evolving picture of future LHC Computing Models, in which the network is considered as an explicit component. Finally, we describe the remaining work to be done by ANSE in PhEDEx, and discuss future directions for continued development.
Speaker: Dr Tony Wildish (Princeton)
• 15:15
Integrating Network Awareness in ATLAS Distributed Computing Using the ANSE Project 15m
A crucial contributor to the success of the massively scaled global computing system that delivers the analysis needs of the LHC experiments is the networking infrastructure upon which the system is built. The experiments have been able to exploit excellent high-bandwidth networking in adapting their computing models for the most efficient utilization of resources. New advanced networking technologies now becoming available such as software defined networking hold the potential of further leveraging the network to optimize workflows and dataflows, through proactive control of the network fabric on the part of high level applications such as experiment workload management and data management systems. End to end monitoring of networks using perfSONAR combined with data flow performance metrics further allows applications to adapt based on real time conditions. We will describe efforts underway in ATLAS on integrating network awareness at the application level, particularly in workload management, building upon the ANSE (Advance Network Services for Experiments) project components. We will show how knowledge of network conditions, both historical and current, are used to optimize PanDA and other systems for ATLAS and describe how software control of end-to-end network paths can augment ATLAS's ability to effectively utilize its distributed resources.
Speaker: Dr Alexei Klimentov (Brookhaven National Laboratory (US))
• 15:30
Multicore job scheduling in the Worldwide LHC Computing Grid 15m
After the successful first run of the LHC, data taking will restart in early 2015 with unprecedented experimental conditions leading to increased data volumes and event complexity. In order to process the data generated in such scenario and exploit the multicore architectures of current CPUs, the LHC experiments have developed parallelized software for data reconstruction and simulation. A good fraction of their computing effort is still expected to be executed as single-core tasks. Therefore, jobs with diverse resources requirements will be distributed across the Worldwide LHC Computing Grid (WLCG), making workload scheduling a complex problem in itself. In response to this challenge, the WLCG Multicore Deployment Task Force has been created with the purpose of coordinating the joint effort from experiments and WLCG sites. The main objective is to ensure the convergence of approaches from the different LHC Virtual Organizations (VOs) to make the best use of the shared resources in order to satisfy their new computing needs and minimize any inefficiency deriving from the scheduling mechanisms. This should also be achieved without imposing unnecessary complexities in the way sites manage their resources. Job scheduling in the WLCG involves the use of grid-wide workload submission tools by the VOs linked via Computing Element (CE) middleware to the batch system technologies managing local resources at every WLCG site. Each of these elements and their interaction has been analyzed by the Task Force. The various job submission strategies proposed by the LHC VOs have been evaluated, providing feedback for the evolution of their grid-wide submission models and tools. The diverse capabilities of different CE technologies in passing the resource request from the VOs to the sites have been examined. The technical features of the most common batch systems in WLCG sites have been discussed for a better understanding of their multicore job handling capabilities. Participants in the Task Force have also been encouraged to share their system configurations with the purpose of avoiding duplicated efforts among sites operating the same technologies. This contribution will present the activities and progress of the Task Force related to the aforementioned topics, including experiences from key sites on how to best use different batch system technologies, the evolution of workload submission tools by the experiments and the knowledge gained from scale tests of the different proposed job submission strategies.
Speaker: Alessandra Forti (University of Manchester (GB))
• 15:45
Multicore-Aware Data Transfer Middleware (MDTM) 15m
Multicore and manycore have become the norm for scientific computing environments. Multicore/manycore platform architectures provide advanced capabilities and features that can be exploited to enhance data movement performance for large-scale distributed computing environments, such as LHC. However, existing data movement tools do not take full advantage of these capabilities and features. The result is inefficiencies in data movement operations, particularly within the wide area scope. These inefficiencies will become more pronounced as networks upgrade to 100GE infrastructure, and host systems correspondingly migrate to 10GE, Nx10GE, and 40GE technologies. To address these inefficiencies and limitations, DOE’s Advanced Scientific Computing Research (ASCR) office has funded Fermilab and Brookhaven National Laboratory to collaboratively work on the Multicore-Aware Data Transfer Middleware (MDTM) project. MDTM aims to accelerate data movement toolkits on multicore systems. Essentially, the MDTM project consists of two major components: - MDTM middleware services to harness multicore parallelism and make intelligent decisions that align CPU, memory, and I/O device operations in a manner which optimizes performance for higher layer applications. - MDTM-enabled data transfer applications (client or server) that can utilize MDTM middleware to reserve and manage multiple CPU cores, memory, network devices, disk storage as an integrated resource entity, thus achieving higher throughput and enhanced quality of service when compared with existing approaches. A prototype version of MDTM is currently undergoing testing and evaluation. This talk will describe MDTM’s architectural and design principles, how it works in implementation, and initial test results in comparison to standard GridFTP and BBCP operations. In addition, future directions for the project will be discussed, including the notion of enabling an external resource scheduling capability for MDTM, thus making it a reservable component for application-driven end-to-end path resource reservations.
Speaker: Dr Wenji Wu (Fermi National Accelerator Laboratory)
• 14:00 16:00
Track 5 Session: #3 (Data Preservation, Computing Activities) B503 (B503)

### B503

#### B503

Computing activities and Computing models

Convener: Gordon Watts (University of Washington (US))
• 14:00
Open Data and Data Analysis Preservation Services for LHC Experiments 15m
In this paper we present newly launched services for open data and for long-term preservation and reuse of high-energy-physics data analyses. We follow the "data continuum" practices through several progressive data analysis phases up to the final publication. The aim is to capture all digital assets and associated knowledge inherent in the data analysis process for subsequent generations, and to make a subset available rapidly to the public. A data analysis preservation pilot study was launched in order to assess the usual workflow practices in LHC collaborations. Leveraging on synergies between ALICE, ATLAS, CMS and LHCb experiments, the analysed data was followed through various "analysis train" steps, from the initial capture and pre-selection of primary data, through several intermediate selection steps yielding more greatly reduced datasets, up to the final selection of N-tuples used for producing high-level plots appearing in scientific journal publications. Most of the analysis chain is kept strictly restricted within a given collaboration; only the final plots, presentations and papers are usually made public. It is therefore essential to handle access rights and embargo periods as part of the data life cycle. The study revealed many similarities between collaborations, even though the variety of different practices existing in different groups within the collaborations make it hard to reproduce an analysis at a later time in a uniform way. One recurring problem underlined by the study was to ensure an efficient "knowledge capture" related to user code when the principal author of an analysis (e.g. a PhD student) leaves the collaboration later. The pilot solution has been prototyped using the Invenio digital library platform which was extended with several data-handling capabilities. The aim was to preserve information about datasets, the underlying OS platform and the user software used to study it. The configuration parameters, the high-level physics information such as physics object selection, and any necessary documentation and discussions are optionally being recorded alongside the process as well. The metadata representation of captured assets uses the MARC bibliographic standard which had to be customised and extended in relation to specific analysis-related fields. The captured digital assets are being minted with Digital Object Identifiers, ensuring later referencability and citability of preserved data and software. Connectors were built in the platform to large-scale data storage systems (CASTOR, EOS, Ceph). In addition, to facilitate information exchange among concerned services, further connectors were built to the internal information management systems of LHC experiments (e.g. CMS CADI), to the discussion platforms (e.g. TWiki, SharePoint), and to the final publication servers (e.g. CDS, INSPIRE) used in the process. Finally, the platform draws inspiration from the Open Archival Information System (OAIS) recommended practices in order to ensure long-term preservation of captured assets. The ultimate goal of the analysis preservation platform is to capture enough information about the process in order to facilitate reproduction of an analysis even many years after its initial publication, permitting to extend the impact of preserved analyses through future revalidation and recasting services. A related "open data" service was launched for the benefit of the general public. The LHC experimental collaborations are committed to make their data open after a certain embargo period. Moreover, the collaborations also release simplified datasets for the general public within the framework of the international particle physics masterclass program. The primary and reduced datasets that the collaborations release for public use are being collected within the CERN Open Data portal service, allowing any physicist or general data scientist to access, explore, and further study the data on their own. The CERN Open Data portal offers several high-level tools which help to visualise and work with the data, such as an interactive event display permitting to visualise CMS detector events on portal web pages, or a basic histogram plotting interface permitting to create live plots out of CMS reduced datasets. The platform guides high-school teachers and students to online masterclasses to further explore the data and improve their knowledge of particle physics. A considerable part of the CERN Open Data portal was therefore devoted to attractive presentation and ease-of-use of captured data and associated information. The CERN Open Data platform not only offers datasets and live tools to explore them, but it also preserves the software tools used to analyse the data. It notably offers the download of Virtual Machine images permitting users to start their own working environment in order to further explore the data; for this the platform uses CernVM based images prepared by the collaborations. Moreover, the CERN Open Data platform preserves examples of user analysis code, illustrating how the general public could write their own code to perform further analyses.
Speaker: Tim Smith (CERN)
• 14:15
The VISPA Internet Platform for Scientific Research, Outreach and Education 15m
VISPA provides a graphical front-end to computing infrastructures giving its users all functionality needed for working conditions comparable to a personal computer. It is a framework that can be extended with custom applications to support individual needs, e.g. graphical interfaces for experiment-specific software. By design, VISPA serves as a multi-purpose platform for many disciplines and experiments as demonstrated in the following different use-cases. A GUI to the analysis framework OFFLINE of the Pierre Auger collaboration, submission and monitoring of computing jobs, university teaching of hundreds of students, and outreach activity, especially in CERN's open data initiative. Serving heterogeneous user groups and applications gave us lots of experience. This helps us in maturing the system, i.e. improving the robustness and responsiveness, and the interplay of the components. Among the lessons learned are the choice of a file system, the implementation of websockets, efficient load balancing, and the fine-tuning of existing technologies like the RPC over SSH. We present in detail the improved server setup and report on the performance, the user acceptance and the realized applications of the system.
Speaker: Martin Urban (Rheinisch-Westfaelische Tech. Hoch. (DE))
• 14:30
Data Preservation at the Fermilab Tevatron 15m
The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have nearly 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 or beyond. To achieve this, we are implementing a system that utilizes virtualization, automated validation, and migration to new standards in both software and data storage technology as well as leveraging resources available from currently-running experiments at Fermilab. These efforts will provide useful lessons in ensuring long-term data access for numerous experiments throughout high-energy physics, and provide a roadmap for high-quality scientific output for years to come. We will present the status, benefits, and challenges of data preservation efforts within the CDF and D0 collaborations at Fermilab.
Speaker: Dr Bodhitha Jayatilaka (Fermilab)
• 14:45
Data Preservation in ATLAS 15m
Complementary to parallel open access and analysis preservation initiatives, ATLAS is taking steps to ensure that the data taken by the experiment during run-1 remain accessible and available for future analysis by the collaboration. An evaluation of what is required to achieve this is underway, examining the ATLAS data production chain to establish the effort required and potential problems. Several alternatives are explored, but the favoured solution is to bring the run 1 data and software in line with the equivalent to that which will be used for run 2. This will result in a coherent ATLAS dataset for the data already taken and that to come in the future.
Speaker: Roger Jones (Lancaster University (GB))
• 15:00
Analysis Traceability and Provenance for HEP 15m
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time in order to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. This so-called provenance data in the computer science world is defined as the history or derivation of a data (or process) artifact. Many scientific workflow based studies are beginning to realise the benefit of capturing the provenance of their data and the activities used to process, transform and carry out studies on that data. This is especially true in scientific disciplines where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. With the increase in importance of provenance data many different models have emerged to capture provenance such as PROV or the OPM models. However, these are more for interoperability of provenance information and do not focus on the capture of provenance related data. There is a clear and emerging requirement for systems to handle the provenance of data over extended timescales with a emphasis on preserving the analysis procedures themselves and the environment in which the analyses were conducted alongside the processed data sets. A provenance data system that has been built in house at CERN since early 2000 is called CRISTAL. CRISTAL was used to capture the provenance resulting from the design and construction of the CMS ECAL detector over the period 2002-2010. The CRISTAL Kernel (V3.0) has now been launched as open source under the LGPL (V3) licencing scheme and it is available for use by the wider communities including teams involved in the offline analysis of physics data, whether at CMS or other experiments. In addition, in the EC funded neuGRID and N4U projects the original developers have been using CRISTAL to capture the provenance of analyses for neuroscientists running complex pipelines of algorithms in the study of biomarkers for the onset of Alzheimer’s disease. In this paper this application is presented with a focus on how its approach can be customised for use in the high energy physics data analysis community at large. The main focus of this is a set of analysis tools (persistency, browsing/querying, visualising and analysis tracking services) which together with a generic analysis model backend can be used to capture the information required to support complex analyses. The Analysis Tools comprise the following interfaces : - **The Analysis Web Serivce –** Which is for advanced users, these users are able to programmatically create analyses on the fly. - **The Analysis Command-line Interface –** This is for users that are intermediate/advanced. It allows users to create analysis using a shell like interface. - **The Analysis Web Portlet Interface –** This is for novice users, it is a visual interface which is portlet based and allows users to browse datasets and pipelines. It also allows users to create and deploy their analyses in a visual manner. - **The Analysis Core –** These are a core set of objects used by the above interfaces. These objects connect directly to a customised analysis aware CRISTAL instance which is provenance aware. During the provenance capture phase the Analysis tools are able to capture : - **who** ran an analysis, this is a user name, - for **what** purpose, what their analysis is supposed to achieve, - **when** they ran it this is a timestamp which denotes when it started and when it finished, - **where** it was run this is GRID and Cloud related information, - **which** datasets and algorithms were used to create and run their analyses, - **how** it was executed, this more detailed infrastructure information - and lastly **why** the analysis was run, this is a justification from the user. Also in this paper, we present the case for using the Analysis Services developed using CRISTAL as an avenue for long timescale data preservation. The tools are able to store the provenance metadata surrounding the analyses in a human readable form (XML). This is a light-weight and queryable manner of storing provenance as well as analysis results. In CRISTAL everything is recorded and nothing is thrown away. So another user would be able to replicate the experiment at a later date and time. Besides reproducibility of experiments, users can also share their experiments with other users using the provenance related information. The analysis tools run currently in a GRID and a Cloud infrastructure. As well as collecting analysis provenance information, they are able to provenance of the infrastructure. There is strong novelty in this work which facilitates allows more precise reproducibility of experiments. This information is known as *infrastructure provenance*. It is currently being collected in the course of the N4U project. This infrastructure provenance can also be applied to HEP to aid in the reproducibility of results. For example, if performance is a factor it can be sent to the same compute node. Concerning the future of the analysis tools there is currently an emerging standard known as PROV which is used for *provenance interoperability*. In the near future, the analysis provenance information that we have collected with be exported to PROV. This work has already begun, we are looking for mapping patterns to aid in our cause. The reasoning for this is so that people can study and use provenance information in a standard and commonly understood format. This will also allow users to publish the provenance generated from their analyses onto the ever growing linked data cloud as well.
Speaker: Jetendr Shamdasani (University of the West of England (GB))
• 15:15
Large Scale Management of Physicist’s Personal Analysis Data 15m
The ability of modern HEP experiments to acquire and process unprecedented amounts of data and simulation have led to an explosion in the volume of information that individual scientists deal with on a daily basis. This explosion has resulted in a need for individuals to generate and keep large “personal analysis” data sets which represent the skimmed portions of official data collections pertaining to their specific analysis. These personal analysis and simulation sets represent a significant reduction in size compared to the original data, but they can still be many terabytes or tens of terabytes in size and consist of tens of thousands of files. When this personal data is aggregated across the many physicists in a single analysis group or experiment it can represent data volumes on par with or exceeding the official “production” samples which require special data handling techniques and storage systems to deal with effectively. In this paper we explore the toolsets, analysis models and changes to the Fermilab computing infrastructure which have been developed and deployed by the NOvA experiment to allow experimenters to effectively manage their personal analysis data and other data that falls outside of the typically centrally managed production chains. In particular we describe the models and tools that are being used to allow NOvA to leverage Fermilab storage resources that are sufficient to meet their analysis needs, without imposing management burdens of specific quotas on users or groups of users, without relying on traditional central disk facilities and without having to constantly police individuals users usage. We discuss the storage mechanisms and the caching algorithms that are being used as well as the toolkits that have been developed to allow the users to easily operate with terascale+ datasets.
Speaker: Dr Andrew Norman (Fermilab)
• 15:30
Overview of Different Exciting Technologies Being Researched in CERN openlab V 15m
CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide HEP community. Since January 2015 openlab phase V has started. To bring the openlab conducted research closer to the experiments, phase V has been changed to a project based structure which allows research being done not only in CERN IT but also directly in the different experiments or even in different research institutes. In this talk I will give an overview of the different exciting technologies being researched by CERN openlab V.
Speaker: Fons Rademakers (CERN)
• 15:45
Computer security: surviving and operating services despite highly skilled and well-funded organised crime groups 15m
This presentation gives an overview of the current computer security landscape. It describes the main vectors of compromises in the academic community including lessons learnt, reveals inner mechanisms of the underground economy to expose how our computing resources are exploited by organised crime groups, and gives recommendations how to better protect our computing infrastructures. By showing how these attacks are both sophisticated and profitable, the presentation concludes that an important mean to adopt an appropriate response is to build a tight international collaboration and trusted information sharing mechanisms within the community.
Speaker: Mr Romain Wartel (CERN)
• 14:00 16:00
Track 6 Session: #2 (Facilities, Monitoring) C209 (C209)

### C209

#### C209

Facilities, Infrastructure, Network

Convener: Dr Helge Meinhard (CERN)
• 14:00
GridPP – preparing for Run-2 and the wider context 15m
This first section of this paper elaborates upon the operational status and directions within the UK Computing for Particle Physics (GridPP) project as we approach LHC Run-2. It details the pressures that have been gradually reshaping the deployed hardware and middleware environments at GridPP sites – from the increasing adoption of larger multicore nodes to the move towards alternative batch systems and cloud alternatives - as well as changes being driven by funding considerations. The second section focuses on work being done with non-LHC communities, communities that GridPP has supported for many years with varying degrees of engagement, and lays out findings of a survey into where such communities desire more support and their changing requirements. The section also explores some of the early outcomes of adopting a generic DIRAC based job submission and management framework within GridPP. The third and final section of the paper highlights changes being made in GridPP operations in order to meet new challenges that are arising as recent technical advancements (in areas such as cloud/VM provision) reach production readiness.
Speaker: Jeremy Coles (University of Cambridge (GB))
• 14:15
Getting prepared for the LHC Run2: the PIC Tier-1 case 15m
The LHC experiments will collect unprecedented data volumes in the next Physics run, with high pile-up collisions resulting in events which require a more complex processing. The collaborations have been asked to update their Computing Models to optimize the use of the available resources in order to cope with the Run2 conditions, in the midst of widespread funding restrictions. The changes in computing for Run2 represent significant efforts for the collaboration, as well as significant repercussions on how the WLCG sites are built and operated. This contribution focuses on how these changes have been implemented and integrated in the Spanish WLCG Tier-1 center at Port d'Informació Científica (PIC), which serves ATLAS, CMS and LHCb experiments. The approach to adapt a multi-VO site to the new requirements while maintaining top reliability levels for all the experiments, is described. The main activities covered in this contribution include setting up dCache disk-only pools together with access via HTTP/XrootD protocols to expose the most demanded data; enabling end user analysis at the center; efficient integration of multi-core job handling in the site batch system and scheduler by means of the dynamic allocation of computing nodes; implementation of dynamic high memory queues; simplification, automation and virtualization of services deployment; and setting up a dCache test environment to assess the storage management readiness against experiment workflows. In addition, innovative free-cooling techniques along with a modulation of computing power versus electricity costs have been implemented, achieving a significant reduction of the electricity costs of our infrastructure. The work has been done to reduce the operational and maintenance costs of the Spanish Tier-1 center, in agreement with the expectations from WLCG. With the state of the optimizations currently being implemented and the work foreseen during the coming years to further improve the effectiveness of the use of the provided resources, it is expected that the resources deployed in WLCG will approximately double by the end of Run2. All of the implementations done in PIC are flexible enough to rapidly evolve following changing technologies.
Speaker: Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
• 14:30
Scheduling multicore workload on shared multipurpose clusters 15m
With the advent of workloads containing explicit requests for multiple cores in a single grid job, grid sites faced a new set of challenges in workload scheduling. The most common batch schedulers deployed at HEP computing sites do a poor job at multicore scheduling when using only the native capabilities of those schedulers. This talk describes how efficient multicore scheduling was achieved at the three sites represented in the author list, by implementing dynamically-sized multicore partitions via a minimalistic addition to the Torque/Maui batch system already in use at those sites. The first part of the talk covers the theory related to this particular problem, which is also applicable to e.g. the scheduling of large-memory jobs or data-aware jobs similarly comprising part of a highly heterogenous workload. The system design is also presented, linking it to previous work at Nikhef on grid-cluster scheduling. The second part of the talk presents an evaluation of several months of production operation at the three sites.
Speaker: Jeff Templon (NIKHEF (NL))
• 14:45
Active job monitoring in pilots 15m
Recent developments in high energy physics (HEP) including multi-core jobs and multi-core pilots require data centres to gain a deep understanding of the system to correctly design and upgrade computing clusters. Especially networking is a critical component as the increased usage of data federations relies on WAN connectivity and availability as a fallback to access data. The specific demands of different experiments and communities, but also the need for identification of misbehaving batch jobs requires an active monitoring. Existing monitoring tools are not capable of measuring fine-grained information at batch job level. This complicates network-specific scheduling and optimisations. In addition, pilots add another layer of abstraction. They behave like batch systems themselves by managing and executing payloads of jobs internally. As the original batch system has no access to internal information about the scheduling process inside the pilots, there is an unpredictable number of jobs being executed. Therefore, the comparability of jobs and pilots cannot be ensured to predict runtime behaviour or network performance. Hence, the identification of the actual payload is of interest. At the GridKa Tier 1 centre a specific monitoring tool is in use, that allows the monitoring of network traffic information at batch job level. A first analysis using machine learning algorithms showed the relevance of the measured data, but indicated a possible improvement by subdividing pilots into separate jobs. This contribution will present the current monitoring approach and will discuss recent efforts and importance to identify pilots and their substructures inside the batch system. It will also show how to determine monitoring data of specific jobs from identified pilots. Finally, the approach is evaluated and adapted to the former analysis and the results are presented.
Speaker: Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE))
• 15:00
Migrating to 100GE WAN Infrastructure at Fermilab 15m
Speaker: Mr Phil Demar (Fermilab)
• 15:15
Monitoring WLCG with lambda-architecture: a new scalable data store and analytics platform for monitoring at petabyte scale. 15m
Monitoring the WLCG infrastructure requires to gather and to analyze high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to store, to process and to serve monitoring data, has limitations in coping with the foreseen extension of the volume (e.g. higher LHC luminosity) and the variety (e.g. new data-transfer protocols and new resource-types, as cloud-computing) of WLCG monitoring events. This paper presents a new scalable data store and analytics platform designed by the Support for Distributed Computing (SDC) group, at the CERN IT department, which leverages on a stack of technology each one targeting specific aspects on big-scale distributed data-processing (commonly referred as lambda-architecture approach). Results on data processing on Hadoop for WLCG data transfers are presented, showing how the new architecture can easily analyze hundreds of millions of transfer logs in few minutes. Moreover, a comparison on data partitioning, compression and file format (e.g. CSV, AVRO) is presented, with particular attention on how the file structure impacts the overall MapReduce performance. In conclusion, the evolution of the current implementation, which focuses on data store and batch processing, towards a complete lambda-architecture is discussed, with consideration on candidate technology for the serving layer (e.g. ElasticSearch) and a description of a proof of concept implementation, based on Esper, for the real-time part which compensates for batch-processing latency and automates problem detection and failures.
Speaker: Luca Magnoni (CERN)
• 15:30
A Model for Forecasting Data Centre Infrastructure Costs 15m
The computing needs in the HEP community are increasing steadily, but the current funding situation in many countries is tight. As a consequence experiments, data centres, and funding agencies have to rationalize resource usage and expenditures. CC-IN2P3 (Lyon, France) provides computing resources to many experiments including LHC, and is a major partner for astroparticle projects like LSST, CTA or Euclid. The financial cost to accommodate all these experiments is substantial and has to be planned well in advance for funding and strategic reasons. In that perspective, leveraging infrastructure expenses, electric power cost and hardware performance observed in our site over the last years, we have built a model that integrates these data and provides estimates of the investments that would be required to cater to the experiments for the mid-term future. We present how our model is built and the expenditure forecast it produces, taking into account the experiment roadmaps. We also examine the resource growth predicted by our model over the next years assuming a flat-budget scenario.
Speaker: Renaud Vernet (CC-IN2P3 - Centre de Calcul (FR))
• 15:45
High-Speed Mobile Communications in Hostile Environments 15m
With the inexorable increase in the use of mobile devices, for both general communications and mission-critical applications, wireless connectivity is required anytime and anywhere. This requirement is addressed in office buildings through the use of Wi-Fi technology but Wi-Fi is ill adapted for use in large experiment halls and complex underground environments such as the LHC tunnel and experimental caverns. CERN is instead investigating the use of 4G/LTE technology to address issues such as radiation tolerance and distance constraints. This presentation will describe these studies, presenting results on the level of data throughput that can be achieved and discussing issues such as the provision of a consistent user experience as devices migrate between Wi-Fi and 4G/LTE.
Speaker: Stefano Agosta (CERN)
• 14:00 16:00
Track 7 Session: #2 (volunteer computing, storage) C210 (C210)

### C210

#### C210

Clouds and virtualization

Convener: Federico Stagni (CERN)
• 14:00
Towards a production volunteer computing infrastructure for HEP 15m
Using virtualisation with CernVM has emerged as a de-facto standard among HEP experiments; it allows for running of HEP analysis and simulation programs in cloud environments. Following the integration of virtualisation with BOINC and CernVM(link is external), first pioneered for simulation of event generation in the Theory group at CERN, the LHC experiments ATLAS, CMS and LHCb have all adopted volunteer computing as part of their strategy to benefit from opportunistic computing resources. This presentation will describe the current technology for volunteer computing and the evolution of the BOINC service at CERN from project support for LHC@home(link sends e-mail) towards a general service. It will also provide some recommendations for teams and experiments wishing to benefit from volunteer computing resources.
Speaker: Dr Miguel Marquina (CERN)
• 14:15
ATLAS@Home: Harnessing Volunteer Computing for HEP 15m
A recent common theme among HEP computing is exploitation of opportunistic resources in order to provide the maximum statistics possible for Monte-Carlo simulation. Volunteer computing has been used over the last few years in many other scientific fields and by CERN itself to run simulations of the LHC beams. The ATLAS@Home project was started to allow volunteers to run simulations of collisions in the ATLAS detector. So far many thousands of members of the public have signed up to contribute their spare CPU cycles for ATLAS, and there is potential for volunteer computing to provide a significant fraction of ATLAS computing resources. Here we describe the design of the project, the lessons learned so far and the future plans.
Speaker: David Cameron (University of Oslo (NO))
• 14:30
CMS@Home: Enabling Volunteer Computing Usage for CMS 15m
Volunteer computing remains an untapped opportunistic resource for the LHC experiments. The use of virtualization in this domain was pioneered by the Test4theory project and enabled the running of high-energy particle physics simulations on home computers. This paper describes the model for CMS to run workloads using a similar volunteer computing platform. It is shown how the original approach is explored to map onto the existing CMS workflow and identifies missing functionality along with the components and changes that are required. The final implementation of the prototype is detailed along with the identification of areas that would benefit from further development.
Speaker: Laurence Field (CERN)
• 14:45
Dynamic provisioning of local and remote compute resources with OpenStack 15m
Modern high-energy physics experiments rely on the extensive usage of computing resources, both for the reconstruction of measured events as well as for Monte Carlo simulation. The Institut für Experimentelle Kernphysik (EKP) at KIT is participating in both the CMS and Belle experiments with computing and storage resources. In the upcoming years, these requirements are expected to increase due to growing amount of recorded data and the rise in complexity of the simulated events. It is therefore essential to increase the available computing capabilities by tapping into all resource pools. At the EKP institute, powerful desktop machines are available to users. Due to the multi-core nature of modern CPUs, vast amounts of CPU time are not utilized by common desktop usage patterns. Other important providers of compute capabilities are classical HPC data centers at Universities or national research centers. Due to the shared nature of these installations, the standardized software stack required by HEP applications cannot be installed. A viable way to overcome this constraint and offer a standardized software environment in a transparent manner is the usage of virtualization technologies. The OpenStack project has become a widely adopted solution to virtualize hardware and offer additional services like storage and virtual machine management. This contribution will report on the incorporation of the institute's desktop machines into a private OpenStack cloud. The additional compute resources provisioned via the virtual machines have been used for Monte Carlo simulation and data analysis. Furthermore, a concept to integrate shared, remote HPC centers into regular HEP job workflows will be presented. In this approach, local and remote resources are be merged to form a unfiorm, virtual compute cluster with a single point-of-entry for the user. Evaluations of the performance and stability of this setup and operational experiences will be discussed.
Speaker: Mr Thomas Hauth (KIT - Karlsruhe Institute of Technology (DE))
• 15:00
Using S3 cloud storage with ROOT and CvmFS 15m
Amazon S3 is a widely adopted protocol for scalable cloud storage that could also fulfill storage requirements of the high-energy physics community. CERN has been evaluating this option using some key HEP applications such as ROOT and the CernVM filesystem (CvmFS) with S3 back-ends. In this contribution we present our evaluation based on two versions of the Huawei UDS storage system used from a large number of clients executing HEP software applications. The performance of concurrently storing individual objects is presented alongside with more complex data access patterns as produced by the ROOT data analysis framework. We further report on the S3 integration with recent CvmFS versions and summarize the performance and pre-production experience with CvmFS/S3 for publishing daily releases of the full LHCb experiment software stack.
Speaker: Maria Arsuaga Rios (CERN)
• 15:15
New adventures in storage: cloud storage and CDMI 15m
Speaker: Paul Millar (Deutsches Elektronen-Synchrotron (DE))
• 15:30
Enabling opportunistic resources for CMS Computing Operations 15m
With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize “opportunistic” resources — resources not owned by, or a priori configured for CMS — to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enable access and bring the CMS environment into these non CMS resources. Here we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.
Speaker: Dirk Hufnagel (Fermi National Accelerator Lab. (US))
• 16:00 16:30
Coffee break 30m
• 16:30 18:30
Track 1 Session: #3 (Data acquisition and electronics) Village Center (Village Center)

### Village Center

#### Village Center

Online computing

Convener: Emilio Meschi (CERN)
• 16:30
A design study for the upgraded ALICE O2 computing facility 15m
An upgrade of the ALICE detector is currently prepared for the Run 3 period of the Large Hadron Collider (LHC) at CERN starting in 2020. The physics topics under study by ALICE during this period will require the inspection of all collisions at a rate of 50 kHz for minimum bias Pb-Pb and 200 kHz for pp and p-Pb collisions in order to extract physics signals embedded into a large background. The upgraded ALICE detector will produce more than 1 TByte/s of data. Both collision and data rate impose new challenges onto the detector readout and compute system. Some detectors will not use a triggered readout, which will require a continuous processing of the detector data. Although various online systems are existing for event based reconstruction, the application of a production system for time-based data processing and reconstruction is a novel case in HEP. The project will benefit from the experience gained with the current ALICE High Level Trigger online system, which already implements a modular concept combining data transport, algorithms and heterogeneous hardware. Processing of individual events will however have to be replaced by the continuous processing of the data stream segmented according to a time-frame structure. One challenge is the distribution of data within the compute nodes. Time-correlated data sets are received by the First Level Processors (FLP) and must be coherently transported to and aggregated on the Event Processing Nodes (EPN). Several approaches for the distribution of data are being studied. Aggregated time-frame data is processed on the EPN with the primary goal to reconstruct particle properties. On-the-fly and short-latency detector calibration is necessary for the reconstruction. The impact of the calibration strategy to the reconstruction performance is under study. Based on the partially reconstructed data, events corresponding to particular collisions can be assembled from the time-based data. The original raw data are then replaced by these preprocessed data. This transformation together with the application of lossless data compression algorithms will provide a data volume reduction of a factor of 20 before data is passed onto the storage system. Building on messaging solutions, the design and development of a flexible framework for transparent data flow, online reconstruction, and data compression has started. The system uses parallel processing on the level of processes and threads within processes in order to achieve an optimal utilization of CPU cores and memory. Furthermore, the framework provides the necessary abstraction to run common code on heterogeneous platforms including various hardware accelerator cards. We present in this contribution the first results of a prototype with estimates for scalability and feasibility for a full scale system.
Speaker: Matthias Richter (University of Oslo (NO))
• 16:45
Efficient time frame building for online data reconstruction in ALICE experiment 15m
After Long Shutdown 2, the upgraded ALICE detector at the LHC will produce more than a terabyte of data per second. The data, constituted from a continuous un-triggered stream data, have to be distributed from about 250 First Level Processor nodes (FLPs) to O(1000) Event Processing Nodes (EPNs). Each FLP receives a small subset of the detector data that is chopped in sub-timeframes. One EPN needs all the fragments from the 250 FLPs to build a full timeframe. An algorithm is being implemented on the FLPs with the aim of optimizing the usage of the network connecting the FLPs and EPNs. The algorithm minimizes contention when several FLPs are sending to the same EPN. An adequate traffic shaping is implemented by delaying the sending time of each FLP by a unique offset. The payloads are stored in a buffer large enough to accommodate the delay provoked by the maximum number of FLPs. As the buffers are queued for sending, the FLPs can operate with the highest efficiency. Using the time information embedded in the data any further FLP synchronization can be avoided. Moreover, “zero-copy” and multipart messages of ZeroMQ are used to create full timeframes on the EPNs without the overhead of copying the payloads. The concept and the performance measurement of the implementation on a computing cluster are presented.
Speaker: Alexey Rybalchenko (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
• 17:00
Pilot run of the new DAQ of the COMPASS experiment 15m
Speaker: Josef Novy (Czech Technical University (CZ))
• 17:15
The data acquisition system of the XMASS experiment 15m
XMASS is a multi-purpose low-background experiment with a large volume of liquid xenon scintillator at Kamioka in Japan. The first phase of the experiment aiming at direct detection of dark matter was commissioned in 2010 and is currently taking data. The detector uses ~830 kg of liquid xenon viewed by 642 photomultiplier tubes (PMTs). Signals from 642 PMTs are amplified and read out by 1 GS/s digitizers (CAEN V1751) as well as ADC/TDC modules. To reduce data size, we implemented an on-board data suppression algorithm in digitizers. The trigger is generated based on the number of hit PMTs within a 200 ns coincidence window. Recently, it is pointed out that the XMASS detector has also a great potential to detect supernova neutrino burst, and therefore several DAQ upgrades and system extensions are proceeding for this purpose. We will present the overall design and performance of the XMASS data acquisition system and the status of DAQ upgrade for supernova neutrino burst detection.
Speaker: Katsuki Hiraide (the University of Tokyo)
• 17:30
Development of New Data Acquisition System at Super-Kamiokande for Nearby Supernova Bursts 15m
Super-Kamiokande (SK), a 50-kiloton water Cherenkov detector, is one of the most sensitive neutrino detectors. SK is continuously collecting data as the neutrino observatory and can be used also for supernova observations by detecting supernova burst neutrinos. It is reported that Betelgeuse (640ly) is shrinking 15% in 15 years (C.H.townes et al. 2009) and this may be an indication of the supernova burst. Based on the Livermore model, the simulation study predicts 30MHz neutrino events observed in the SK detector for the neutrino burst from a supernova within a few hundreds of light years. The current SK data acquisition (DAQ) system can record only about first 20% of the events. To overcome this inefficiency, we developed a new DAQ system that records the number of hit PMTs so that we can store high-rate events and obtain a time profile of the number of neutrinos emitted at the supernova. This system uses the outputs from the number of hits from existing electronics modules as inputs and it is synchronized to the existing DAQ system. Therefore the data is easily checked the correlation to that from the existing electronics. The data is transferred to the computers with SiTCP, an implementation of TCP/IP stack in FPGA without CPU. Part of the data are stored in the 4GB DDR2 memory before it transferred and this makes it possible to record detailed time structure of the superonva signal. The design and the production of the new modules were completed and we tested basic functions and the interference with the existing system. The firmware for the module is prepared and now being installed in SK. We will report the development and the status of the operation.
Speaker: Asato Orii (urn:Facebook)
• 17:45
The Electronics, Online Trigger System and Data Acquisition System of the J-PARC E16 Experiment 15m

## 1. Introduction

The J-PARC E16 experiment aims to investigate the chiral symmetry restoration in cold nuclear matter and the origin of the hadron mass through the systematic study of the mass modification of vector mesons.
In the experiment,
$e^{+}e^{-}$ decay of slowly-moving $\phi$ mesons in the normal nuclear matter density are intensively studied using several nuclear targets (H, C, Cu and Pb).
The dependence of the modification on the nuclear size and the meson momentum will be measured for the first time.

## 2. Experiment

The experiment will be performed in 2016 at the high-momentum beam line of the J-PARC hadron experimental facility,
where a 30-GeV proton beam with a high intensity of $1\times10^{10}$ per pulse (2-second spill per 6-second cycle) is delivered to experimental targets.
Since the material budget around the targets is sensitive to the $e^{+}e^{-}$ measurement,
thin detector systems are under construction.
The targets are surrounded by GEM Trackers (GTR) with three tracking planes to achieve the good resolution of 100 $\mu$m in the high rate environment of 5 kHz/mm$^{2}$.
The electrons (positrons) are identified by two types of counters.
One is the Hadron Blind Detector (HBD), which is a threshold type gas Cherenkov detector using GEM,
and the other is the Lead-glass EM calorimeter (LG).

## 3. Trigger electronics

The first level trigger is decided by the three fold coincidence of $\sim$620-ch from the GTR, $\sim$940-ch from the HBD and $\sim$1000-ch from the LG.
Cathode foils which face to the read out strips of the most outside GTR and pads of the HBD are divided into trigger segments.
A pulse fired on the GEM cathode foils are fed into an amplifier-shaper-discriminator (ASD) ASIC, which has been developed by our group in cooperation with
Open-It[1].
The LG signals are discriminated by a commercial fast comparator.
In order to gather the trigger primitives, which are sent from the GTR, HBD and LG in parallel LVDS signals,
a trigger merger board (TRG-MRG) has been developed.
The TRG-MRG produces time stamps of the trigger primitives with a resolution of less than 4 nsec by using a Xilinx Kintex-7 FPGA.
The time stamps are serialized by the FPGA and transmitted to a global trigger decision module via optical fibers at each link rate of 5 Gbps or more.
The global trigger module utilizes a Belle-II Universal Trigger Board 3.
The first level trigger as well as a global clock of $\sim$125 MHz is distributed by Belle-II FTSW boards via Category-7 LAN cables to the front-end-modules described bellow.

## 4. Readout electronics

The numbers of readout channels amount to $\sim$56k, $\sim$36k and $\sim$1k for the GTR, HBD and LG, respectively.
In the current design, waveforms from all of the readout channels will be recorded by using analog memory ASICs to obtain timing and charge deposit information and to distinguish pulse pile-up in the high rate environment for the offline analysis.
The waveform from the GTR and HBD are stored with a 25 nsec cycle in APV25s1[2] chips and then transferred to the Scalable Readout System, which has been developed by the CERN RD51 Collaboration[3] (an R&D collaboration for MGPDs).
The LGs are read out by custom made boards, which employ DRS4[4] chips to record the pulses at 1 GHz.
Those modules digitize the waveforms and perform the zero suppression at online.
The data are collected by the DAQ-Middleware[5] using gigabit Ethernet and 10G Ethernet links.
The expected data rate is 660 MB/spill with the event rate of 2k/spill after zero suppression.

## 5. Summary

This is an overview talk on the electronics and trigger system
for the J-PARC E16 experiment.
Other contributions for the detail of the DAQ software, trigger ASIC,
and so on are also prepared and submitted by coauthors.

## References

[1] http://openit.kek.jp/ (in Japanese)

[5] http://daqmw.kek.jp/ (in Japanese)

Speaker: Tomonori Takahashi (Research Center for Nuclear Physics, Osaka University)
• 18:00
The Application of DAQ-Middleware to the J-PARC E16 Experiment 15m
**1. Introduction** We developed a DAQ system of the J-PARC E16 Experiment by using the DAQ-Middleware. We evaluated the DAQ system and confirmed that the DAQ system can be applied to the experiment. The DAQ system receives an average 660MB/spill of data (2-seconds spill per 6 seconds cycle). In order to receive such a large quantity of data, we need a network-distributed system. DAQ-Middleware is a software framework of a network-distributed DAQ system. Therefore, the framework is a useful tool for the DAQ system development. In our talk, we are going to talk about useful features of DAQ-Middleware, an architecture and DAQ performance of the J-PARC E16 Experiment DAQ system. **2. The J-PARC E16 Experiment** The aim of the J-PARC E16 Experiment is to measure mass spectra of vector mesons in nucleus using electron pair decays with a huge statistics. For such purpose, a high intensity proton beam is used and the interaction rate of the experiment becomes $1\times10^{10}$. To cope with such high intensity and the rate, we plan to use the Gas Electron Multiplier (GEM) Tracker and strip read outs. Figure1 shows the estimation of data transfer to DAQ PCs. ![Estimation of data transfer to DAQ PCs][1] **3. DAQ-Middleware** DAQ-Middleware is a software framework of a network-distributed DAQ system. The framework consists of some software components called DAQ-Components. The framework provides the basic functionalities, such as communication between DAQ-Components, transferring data, starting and stopping the DAQ system. DAQ-Components can be set on separate computers. **4. DAQ system for the E16 Experiment** The DAQ system performs following functions. - store all data on storage devices - build and monitor event data from a part of all events Figure2 shows architecture of an entire DAQ system. ![Architecture of an entire DAQ system][2] The DAQ system consists of two stages. 1st stage PCs read and store all data. The 2nd stage PC builds and monitors event data from a part of all events. Figure3 shows DAQ-Component architecture on one 1st stage PC. ![1st stage for the DAQ system][3] Blue boxes of the figure represent DAQ-Component. Gatherer reads data from one read-out module. Merger receives data from multiple Gatherers and sends the data to Dispatcher. Dispatcher sends data to Logger and Filter. Logger stores data on storage devices. Filter sends data, which meet specific conditions, to a next DAQ-Component on the 2nd stage PC. 1st stage consists of multiple PCs which have these DAQ-Components. Figure4 shows DAQ-Component architecture on one 2nd stage PC. ![2nd Stage for the DAQ system][4] Blue boxes of the figure represent DAQ-Component. Merger receives data from multiple Filters on the 1st stage PC and sends the data to Eventbuilder. This Merger is the same as one of the 1st stage PC. Eventbuilder builds event data from received data. Monitor analysis and monitor the event data. **5. Evaluation for the DAQ System** Because we did not have enough PCs, we evaluated 1st stage and 2nd stage separately. ![Specification of evaluation PC][5] **5.1 1st stage evaluation** ![The environment of 1st stage evaluation][6] We measured maximum throughput of one 1st stage PC. Figure6 shows the environment of the evaluation. Because we did not have read-out modules, we used the emulators instead of those. Data format of the emulators was the same as that of the read-out modules. We installed DAQ-Components of the 1st stage to an evaluation PC. Figure5 shows the specification of the evaluation PC. We prepared the 2nd stage PC which is installed Skeltonsink. Skeltonsink received data and is used only for 1st stage evaluation. Each emulator sent test data to the evaluation PC at a maximum rate. As shown in Table 1, one event data size is 45KB per one read-out module and event rate is 2000Hz. ![Evaluation result of 1st Stage PC][7] Figure7 shows the evaluation result. The points mean measured value, line means ideal value. When the number of emulators was up to 7, measured value matched ideal value. The result shows the evaluation PC can process up to 7 emulators, when read-out modules send data at a maximum rate. One evaluation PC can process around 600MB/s of data at a maximum. During this evaluation, we observed data loss size. There was no data loss when the number of emulators was up to 7. **5.2 2nd stage evaluation** ![The environment of 2nd stage evaluation][8] We evaluated 2nd stage. Figure8 shows the environment of the evaluation. We installed DAQ-Components of the 2nd stage on the evaluation PC. Figure5 shows specification of the evaluation PC. We assumed that 50 read-out modules transferred 45kB/event of data and event rate was 10 Hz after passing Filter. In this case, the 2nd PC received 22MB/s of data. We confirmed that the evaluation PC was able to process 22MB/s of data losing no data. This result shows the evaluation PC has an enough capability. **5. Conclusion** The DAQ system of the J-PARC E16 Experiment consists of two stages. One evaluation PC on 1st stage can process 600MB/s of data at a maximum. Average total data transfer Rate to DAQ PCs is 330MB/s. Therefore, 1st stage can be developed by one PC or a few PCs. 2nd stage can be developed by one PC. Therefore, we were able to confirm the DAQ system can be applied to the experiment by a few PCs. [1]: http://research.kek.jp/people/ehamada/test3.png [2]: http://research.kek.jp/people/ehamada/figure1.png [3]: http://research.kek.jp/people/ehamada/figure2.png [4]: http://research.kek.jp/people/ehamada/figure3.png [5]: http://research.kek.jp/people/ehamada/table2.png [6]: http://research.kek.jp/people/ehamada/figure5.png [7]: http://research.kek.jp/people/ehamada/figure4.png [8]: http://research.kek.jp/people/ehamada/figure6.png
Speaker: Mr Eitaro Hamada (High Energy Accelerator Research Organization (KEK))
• 18:15
Developments and applications of DAQ framework DABC v2 15m
Speaker: Dr Sergey Linev (GSI DARMSTADT)
• 16:30 18:30
Track 2 Session: #4 (Simulation) Auditorium (Auditorium)

### Auditorium

#### Auditorium

Offline software

Convener: Andrea Dell'Acqua (CERN)
• 16:30
Geant4 Version 10 Series 15m
Speaker: Dr Makoto Asai (SLAC National Accelerator Laboratory (US))
• 16:45
Geant4 VMC 3.0 15m
Virtual Monte Carlo (VMC) provides an abstract interface into Monte Carlo transport codes. A user VMC based application, independent from the specific Monte Carlo codes, can be then run with any of the supported simulation programs.  Developed by the ALICE Offline Project and further included in ROOT, the interface and implementations have reached stability during the last decade and have become a foundation for other detector simulation frameworks, the FAIR facility experiments framework being among the first and largest. Geant4 VMC, which provides the implementation of the VMC interface for Geant4, is in continuous maintenance and development, driven by the evolution of Geant4 on one side and requirements from users on the other side.  Besides the implementation of the VMC interface, Geant4 VMC also provides a set of examples that demonstrate the use of VMC to new users and also serve for testing purposes. Since major release 2.0, it includes the G4Root navigator package, which implements an interface that allows one to run a Geant4 simulation using a ROOT geometry. The release of Geant4 version 10.00 with the integration of multi-threading processing has triggered the development of the next major version of Geant4 VMC (version 3.0), whose release is planned for this year. A beta version, available for user testing since March, has helped its consolidation and improvement. We will review the new capabilities introduced in this major version, in particular the integration of multi-threading into the VMC design, its impact on the Geant4 VMC and G4Root packages, and the introduction of a new package, MTRoot, providing utility functions for ROOT parallel output in independent files with necessary additions for thread-safety. Migration of user applications to multi-threading that preserves the ease of use of VMC will be also discussed. We will also report on the introduction of a new CMake based build system, the migration to ROOT major release 6 and the improvement of the testing suites.
Speaker: Ivana Hrivnacova (IPNO, Université Paris-Sud, CNRS/IN2P3)
• 17:15
The GeantV project: preparing the future of simulation 15m
Detector simulation is consuming at least half of the HEP computing cycles, and even so, experiments have to take hard decisions on what to simulate, as their needs greatly surpass the availability of computing resources. New experiments still in the design phase such as FCC, CLIC and ILC as well as upgraded versions of the existing LHC detectors will push further the simulation requirements. Since computing resources will not increase at best, it is therefore necessary to sustain the progress of High Energy Physics and to explore innovative ways of speeding up simulation. The GeantV project aims at developing a high performance detector simulation system integrating fast and full simulation that can be ported on different computing architectures, including accelerators. After more than two years of R&D the project has produced a prototype capable of transporting particles in complex geometries exploiting micro-parallelism, SIMD and multithreading. Portability is obtained via C++ template techniques that allow the development of machine-independent computational kernels. A set of tables derived from Geant4 for cross sections and final states provides a realistic shower development and, having been ported into a Geant4 physics list, is also a basis for a performance comparison. The talk will describe the development of the project and the main R&D results motivating the technical choices of the project. It will review the current results and the major challenges facing the project. We will conclude with an outline of the future roadmaps and major milestones for the project.
Speaker: Mr Federico Carminati (CERN)
• 17:30
CMS Full Simulation for Run-II 15m
This presentation will discuss new features of the CMS simulation for Run 2, where we have made considerable improvements during LHC shutdown to deal with the increased event complexity and rate for Run 2. For physics improvements migration from Geant4 9.4p03 to Geant4 10.0p02 has been performed. CPU performance was improved by introduction of the Russian roulette method inside CMS calorimeters, optimization of CMS simulation sub-libraries, and usage of statics build of the simulation executable. As a result of these efforts, CMS simulation was speeded up by about factor two. In this work we provide description of these updates and discuss different software components of CMS simulation. Geant4 version 10.0 is multi-threaded capable. This allows development of multi-threaded version of CMS simulation in parallel with mainstream sequential production version. For CMS multi-threaded Geant4 additional modules and manager classes were added. Geometry, magnetic field, physics, user actions, and sensitive detectors classes are the same for both sequential and multi-threaded versions of CMS simulation. In this work we report on details of implementation of CMS multi-threaded simulation including CPU and memory performance.
Speaker: David Lange (Lawrence Livermore Nat. Laboratory (US))
• 17:45
CMS Detector Description for Run II and Beyond 15m
CMS Detector Description (DD) is an integral part of the CMSSW software multithreaded framework. CMS software has evolved to be more flexible and to take advantage of new techniques, but many of the original concepts remain and are in active use. In this presentation we will discuss the limitations of the Run I DD model and changes implemented for the restart of the LHC program in 2015. Responses to Run II challenges and transition to multithreaded environment are discussed. The DD is a common source of information for Simulation, Reconstruction, Analysis, and Visualisation, while allowing for different representations as well as specific information for each application. The DD model usage for CMS Magnetic field map description allows seamless access to variable field during the same run. Examples of the integration of DD in the GEANT4 simulation and in the reconstruction applications are provided.
Speaker: Gaelle Boudoul (Universite Claude Bernard-Lyon I (FR))
• 18:00
Continuous Readout Simulation with FairRoot on the Example of the PANDA Experiment 15m
Future particle physics experiments are searching more and more for rare decays which have similar signatures in the detector as the huge background. For those events usually simple selection criteria do not exist, which makes it impossible to implement a hardware-trigger based on a small subset of detector data. Therefore all the detector data is read out continuously and processed on-the-fly to achieve a data reduction suitable for permanent storage and detailed analysis. To cope with these requirements of a triggerless readout also the simulation software has to be adopted to add a continuous data production with pile-up effects and event overlapping in addition to the event wise simulation. This simulated data is of utmost importance to get a realistic detector simulation, to develop event-building algorithms and to determine the hardware requirements for the DAQ system of the experiments. The possibility to simulate a continuous data stream was integrated into the FairRoot simulation framework. This running mode is called time-based simulation and a lot of effort was taken that one can switch seamlessly between event-based and a time-based simulation mode. One experiment, which is using this new feature, is the PANDA experiment. It utilizes a quasi-continuous antiproton beam with a mean time between interactions of 50 ns. Because of the unbunched structure of the beam the interaction time follows a Poisson statistics with a high probability of events with short time distances. Depending on the time resolution of the sub-detectors this leads to an overlap of up to 20 events inside a sub-detector. This makes it an ideal test candidate for the time-based simulation. In this talk the way the time-based simulation was implemented into FairRoot will be presented and examples of time-based simulations done for the PANDA experiment will be shown.
Speaker: Dr Tobias Stockmanns (FZ Jülich GmbH)
• 18:15
Energy Reconstruction using Artificial Neural Networks and different analytic methods in a Highly Granularity Semi-Digital Hadronic Calorimeter. 15m
The Semi-Digital Hadronic CALorimeter(SDHCAL) using Glass Resistive Plate Chambers (GRPCs) is one of the two hadronic calorimeter options proposed by the ILD (International Large Detector) project for the future (ILC) International Linear Collider experiments. It is a sampling calorimeter with 48 layers. Each layer has a size of 1 m² and finely segmented into cells of 1 cm² ensuring a high granularity which is required for the application of the Particle Flow Algorithm (PFA) in order to improve the jet energy resolution which is the corner stone of ILC experiments. The electronic of SDHCAL provide 2-bit readout. It is equiped with power pulisng mode reducing the power consumption and thus heating related problems. The performance of the SDHCAL technological prototype was tested successfully in beam tests at CERN during 2012. The next beam test will take place at CERN in December 2014 with new improvements in hardware developments. Results of this test beam will be shown. One of the main points to be discussed concerns the energy reconstruction in SDHCAL. Based on Monte Carlo Simulation of the SDHCAL prototype with Geant4, we will show different analytic energy reconstruction methods. We will present the single particle energy resolution and the linearity of the detecor response to hadrons obtained with these methods. In particular, we will highlight a new approch based on the Artificial Neural Networks used in the energy reconstruction and giving promising results compared to the classical analytic methods. Results will be presented for both simulation and real data in the aim to compare them. In the same context, we will discuss the application of The Artifical Neural Network to purify the beam test data from contamination. Results of particle separation obtained with the Artificial Neural Network will be shown and compared to classical event selection methods.
Speaker: Sameh Mannai (Universite Catholique de Louvain (UCL) (BE))
• 16:30 18:30
Track 3 Session: #3 (Hardware and data archival) C209 (C209)

### C209

#### C209

Data store and access

Convener: Latchezar Betev (CERN)
• 16:30
Mean PB to Failure -- Initial results from a long-term study of disk storage patterns at the RACF 15m
The RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has nearly 50,000 computing cores and over 23 PB of storage capacity distributed over 12,000+ (non-SSD) disk drives. The majority of the 12,000+ disk drives provides a cost-effective solution for dCache/xRootd-managed storage, and a key concern is the reliability of this solution over the lifetime of the hardware, particularly as the number of disk drives and the storage capacity of individual drives grow. We report initial results of a long-term study to measure lifetime PB read/written to disk drives in the worker node cluster. We discuss the historical disk drive mortality rate, disk drive manufacturers' published MPBTF (Mean PB to Failure) data and how they are correlated to our results. The results helps the RACF understand the productivity and reliability of its storage solutions and has implications for other highly-available storage systems (NFS, GPFS, CVMFS, etc) with large I/O requirements.
Speaker: Christopher Hollowell (Brookhaven National Laboratory)
• 16:45
Experiences and challenges running CERN's high-capacity tape archive 15m
CERN’s tape-based archive system has collected over 70 Petabytes of data during the first run of the LHC. The Long Shutdown is being used for migrating the complete 100 Petabytes data archive to higher-density tape media. During LHC Run 2, the archive will have to cope with yearly growth rates of up to 40-50 Petabytes. In this contribution, we will describe the scalable architecture for coping with the storage and long-term archival of such massive data amounts, as well as the procedures and tools developed for the proactive and efficient operation of the tape infrastructure. This will include also the features developed for automated problem detection, identification and notification. We will also review the challenges resulting and mechanisms devised for measuring and enhancing availability and reliability, as well as ensuring the long-term integrity and bit-level preservation of the complete data repository. Finally, we will present an outlook in terms of the future performance and capacity requirements growth and how they match the expected tape technology evolution.
Speaker: Eric Cano (CERN)
• 17:00
Archiving Scientific Data outside of the traditional High Energy Physics Domain, using the National Archive Facility at Fermilab 15m
Many experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with modern data handling systems and large file archive facilities. In this paper we discuss in detail how we have created a robust toolset and simple portal into the Fermilab Archive Facility, which allows for scientific data to be quickly imported, organized and retrieved from the 0.650 Exabyte facility. In particular we discuss how the data from the Sudbury Neutrino Observatory (SNO) for the COUPP dark matter detector was aggregated, cataloged, archived and re-organized to permit it to be retrieved and analyzed using modern distributed computing resources both at Fermilab and on the Open Science Grid. We pay particular attention to the methods that were employed to “uniquify” the namespaces for the data, derive metadata for the over 460,000 image series taken by the COUP experiment and what was required to map that information into coherent datasets that could be stored and retrieved using the large scale archives systems. We describe the data transfer and cataloging engines that are used for data importation and how these engines have been setup to import data from the data acquisition systems of ongoing experiments at non-Fermilab remote sites including the Laboratori Nazionali del Gran Sasso and the Ash River Laboratory in Orr, Minnesota. We also describe how large University computing sites around the world are using the system to store and retrieve large volumes of simulation and experiment data for physics analysis.
Speaker: Dr Andrew Norman (Fermilab)
• 17:15
Data preservation for the HERA Experiments @ DESY using dCache technology 15m
Speaker: Karsten Schwank (DESY)
• 17:30
Deep Storage for Big Scientific Data 15m
Brookhaven National Lab (BNL)’s RHIC and Atlas Computing Facility (RACF), is supporting science experiments such as RHIC as its Tier-0 center and the U.S. ATLAS/LHC as a Tier-1 center. Scientific data is still growing exponentially after each upgrade. The RACF currently manages over 50 petabytes of data on robotic tape libraries, and we expect a 50% increase in data next year. Not only do we have to address the issue of efficiently archiving high bandwidth data to our tapes, but we also have to face the problem of randomly restoring files from tapes. In addition, we have to manage tape resource usage and technology migration, which is moving data from low-capacity media to newer, high-capacity tape media, in order to free space within a tape library. BNL’s mass storage system is managed by a software called IBM HPSS. To restore files from HPSS, we have developed a file retrieval scheduling software, called TSX. TSX provides dynamic HPSS resource management, schedules jobs efficiently, and enhances visibility of real-time staging activities and advanced error handling to maximize the tape staging performance.
Speaker: David Yu (BNL)
• 17:45
Disk storage at CERN 15m
CERN IT DSS operates the main storage resources for data taking and physics analysis mainly via three system: AFS, CASTOR and EOS. The total usable space available for users is about 100 PB (with relative ratios 1:20:120). EOS deploys disk resources across the two CERN computer centres (Meyrin and Wigner) with a current ratio 60% to 40%. IT DSS is also providing sizable on-demand resources for general IT services most notably OpenStack and NFS clients. This is provided by our Ceph infrastructure and a few of proprietary servers (NetApp) for a total capacity of ~1 PB. We will describe our operational experience and recent changes to these systems with special emphasis to the following items: - Present usages for LHC data taking (new roles of CASTOR and EOS) - Convergence to commodity hardware (nodes with 200-TB each with optional SSD) shared across all services - Detailed study of the failure modes in the different services and approaches (RAID, RAIN, ZFS vs XFS, etc...) - Disaster recovery strategies (across the two CERN computer centres) - Experience in coupling commodity and home-grown solution (e.g. Ceph disk pools for AFS, CASTOR and NFS) - Future evolution of these systems in the WLCG realm and beyond
Speaker: Luca Mascetti (CERN)
• 18:00
Archiving tools for EOS 15m
Archiving data to tape is a critical operation for any storage system, especially for the EOS system at CERN which holds production data from all major LHC experiments. Each collaboration has an allocated quota it can use at any given time therefore, a mechanism for archiving "stale" data is needed so that storage space is reclaimed for online analysis operations. The archiving tool that we propose for EOS aims to provide a robust interface for moving data between EOS and the tape storage system while enforcing data integrity and verification. The archiving infrastructure is written in Python and is fully based on the XRootD framework. All data transfers are done using the third-party copy mechanism which ensures point-to-point communication between the source and destination, thus providing maximum aggregate throughput. Using ZMQ message-passing paradigm and a process-based approach enabled us to archive optimal utilisation of the resources and a stateless architecture which can easily be tuned during operation. In conclusion, we make a comparative analysis between archiving a data set in a "managed" way using the archiving tool and the old plain copy method, highlighting the speed-up gain, the data integrity checks performed and the behaviour of the system in different failure scenarios. We expect this tool to considerably improve the data movement work-flow at CERN in both directions between disk and tape.
Speaker: Mr Andreas Joachim Peters (CERN)
• 18:15
Disk storage management for LHCb based on Data Popularity estimator 15m
The amount of data produced by the LHCb experiment every year consists of several petabytes. This data is kept on disk and tape storage systems. Disks are much faster than tapes, but are way more expensive and hence disk space is limited. It is impossible to fit the whole data taken during the experiment's lifetime on disk, but fortunately fast access to datasets are no longer needed after the analysis requiring them are over. So it is highly important to identify which datasets should be kept on disk and which ones should be kept as archives on tape. The metrics to be used for deprecating datasets’ caching is based on the “popularity” of this dataset, i.e. whether it is likely to be used in the future or not. We discuss here the approach and the studies carried out for optimizing such a Data Popularity estimator. Input information to the estimator are the dataset usage history and metadata (size, type, configuration etc). The system is designed to select the datasets which may be used in the future and thus should remain on disk. Studies have therefore been performed on how to optimize the usage of dataset information from the past for predicting its future popularity. In particular, we have carried out a detailed comparison of various time series analysis, machine learning classifier, clustering and regression algorithms. We demonstrate that our approach is capable of improving significantly the disk usage efficiency.
Speaker: Mikhail Hushchyn (Moscow Institute of Physics and Technology, Moscow)
• 16:30 18:30
Track 4 Session: #4 (Application) B250 (B250)

### B250

#### B250

Middleware, software development and tools, experiment frameworks, tools for distributed computing

Convener: Oliver Gutsche (Fermi National Accelerator Lab. (US))
• 16:30
The Careful Puppet Master: Reducing risk and fortifying acceptance testing with Jenkins CI 15m
Using centralized configuration management, including automation tools such as Puppet, can greatly increase provisioning speed and efficiency when configuring new systems or making changes to existing systems, reduce duplication of work, and improve automated processes. However, centralized management also brings with it a level of inherent risk: a single change in just one file can quickly be pushed out to thousands of computers and, if that change is not properly and thoroughly tested and contains an error, could result in catastrophic damage to many services, potentially bringing an entire computer facility offline. Change management procedures can -- and should -- be formalized in order to prevent such accidents. However, like the configuration management process itself, if such procedures are not automated, they can be difficult to enforce strictly. Therefore, to reduce the risk of merging potentially harmful changes into our production Puppet environment, we have created an automated testing system, which includes the Jenkins CI tool, to manage our Puppet testing process. This system includes the proposed changes and runs Puppet on a pool of dozens of RHEV VMs that replicate most of our important production services. This paper describes our automated test system and how it hooks into our production approval process for automatic acceptance testing. All pending changes that have been pushed to production must pass this validation process before they can be approved and merged into production.
Speaker: Mr Jason Alexander Smith (Brookhaven National Laboratory)
• 16:45
The ATLAS Software Installation System v2: a highly available system to install and validate Grid and Cloud sites via Panda 15m
The ATLAS Installation System v2 is the evolution of the original system, used since 2003. The original tool has been completely re-designed in terms of database backend and components, adding support for submission to multiple backends, including the original WMS and the new Panda modules. The database engine has been changed from plain MySQL to Galera/Percona and the table structure has been optimized to allow a full High-Availability (HA) solution over WAN. The servlets, running on each frontend, have been also decoupled from local settings, to allow an easy scalability of the system, including the possibility of an HA system with multiple sites. The clients can also be run in multiple copies and in different geographical locations, and take care of sending the installation and validation jobs to the target Grid or Cloud sites. Moreover, the Installation DB is used as source of parameters by the automatic agents running in CVMFS, in order to install the software and distribute it to the sites. The system is in production for ATLAS since 2013, having as main sites in HA the INFN Roma T2 and CERN AI. The LJSFi2 engine is directly interfacing with Panda for the Job Management, AGIS for the site parameter configurations, and CVMFS for both core components and the installation of the software itself. LJSFi2 is also able to use other plugins, and is essentially VO-agnostic, so can be directly used and extended to cope with the requirements of any Grid or Cloud enabled VO. In this work we'll present the architecture, performance, status and possible evolutions to the system for the LHC Run2 and beyond.
Speaker: Alessandro De Salvo (Universita e INFN, Roma I (IT))
• 17:00
A Validation System for the Complex Event Processing Directives of the ATLAS Shifter Assistant Tool 15m
Complex Event Processing (CEP) is a methodology that combines data from different sources in order to identify events or patterns that need particular attention. It has gained a lot of momentum in the computing world in the past few years and is used in ATLAS to continuously monitor the behaviour of the data acquisition system, to trigger corrective actions and to guide the experiment’s operators. This technology is very powerful, if experts regularly insert and update their knowledge about the system’s behaviour into the CEP engine. Nevertheless, writing or modifying CEP directives is not trivial since the used programming paradigm is quite different with respect to what developers are normally familiar with. In order to help experts verify that the directives work as expected, we have thus developed a complete testing and validation environment. This system consists of three main parts: the first is the persistent storage of all relevant data streams that are produced during data taking, the second is a playback tool that allows to re-inject data of specific data taking sessions from the past into the CEP engine and the third is a reporting tool that shows the output that the directives loaded into the engine would have produced in the live system. In this paper we describe the design, implementation and performance of this validation system, highlight its strengths and short-comings and indicate how such a system could be re-used in similar projects.
Speaker: Dr Giuseppe Avolio (CERN)
• 17:15
Visualization of dCache accounting information with state-of-the-art Data Analysis Tools. 15m
Over the previous years, storage providers in scientific infrastructures were facing a significant change in the usage profile of their resources. While in the past, a small number of experiment frameworks were accessing those resources in a coherent manner, now, a large amount of small groups or even individuals request access in a completely chaotic way. Moreover, scientific laboratories have been recently forced to provide detailed accounting information for their communities and individuals. Another consequence of the chaotic access profiles is the difficulty, for often rather small operating teams, to detect malfunctions in extremely complex storage systems, composed of a large variety of different hardware components. Although information about usage and possible malfunction is available in the corresponding log and billing files, the sheer amount of collected meta data makes it extremely difficult to be handled or interpreted. Simply the dCache production instances at DESY are producing Gigabytes of meta data per day. To cope with those pressing issues, DESY has been evaluating and put into production a Big Data processing tool, enabling our operation team to analyze log and billing information by providing a configurable and easy to interpret visualization of that data. This presentation will demonstrate how DESY built a real-time monitoring system, visualizing dCache billing files and providing an intuitive and simple to operate Web interface, using ElasticSearch, Logstash and Kibana.
Speaker: Mr Tigran Mkrtchyan (DESY)
• 17:30
Event-Driven Messaging for Offline Data Quality Monitoring at ATLAS 15m
During LHC Run 1, the information flow through the offline data quality monitoring in ATLAS relied heavily on chains of processes polling each other's outputs for handshaking purposes.  This resulted in a fragile architecture with many possible points of failure and an inability to monitor the overall state of the distributed system.  We report on the status of a project undertaken during the long shutdown to replace the ad hoc synchronization methods with a uniform message queue system.  This enables the use of standard protocols to connect processes on multiple hosts; reliable transmission of messages between possibly unreliable programs; easy monitoring of the information flow; and the removal of inefficient polling-based communication.
Speaker: Peter Onyisi (University of Texas (US))
• 17:45
An object-oriented approach to generating highly configurable Web interfaces for the ATLAS experiment 15m
In order to manage a heterogeneous and worldwide collaboration, the ATLAS experiment developed web systems that range from supporting the process of publishing scientific papers to monitoring equipment radiation levels. These systems are vastly supported by Glance, a technology that was set forward in 2004 to create an abstraction layer on top of different databases; it automatically recognizes their modelling and generates web search interfaces. Fence (Front ENd ENgine for glaNCE) assembles classes to build applications by making extensive use of configuration files. It produces templates of the core JSON files on top of which it is possible to create Glance-compliant search interfaces. Once the database, its schemas and tables are defined using Glance, its records can be incorporated into the templates by escaping the returned values with a reference to the column identifier wrapped around double enclosing brackets. The developer may also expand on available configuration files to create HTML forms and securely interact with the database. A token is issued within each deployed form as a random string of characters which must then be matched whenever it is posted. Additionally, once the user is authenticated through CERN's Shibboleth single sign-on, Fence assigns them roles and permissions as stored in the database. Clearance attributes can then be bound to individual inputs within their own JSON description so that whenever they are submitted, the resulting system verifies whether the user has the necessary permissions to edit them. Input validation is primarily carried out on the server side with PHP but, following progressive enhancement guidelines, verification routines may be additionally entrusted to the client side by enabling specific HTML5 data attributes which are then handed over to the jQuery validation plugin. User monitoring is accomplished by logging URL requests along with any POST data. The documentation is automatically published from the source code using the Doxygen tool and made accessible in a web interface. Fence, therefore, speeds up the implementation of Web software products while minimizing maintenance overhead and facilitating the comprehension of embedded rules and requirements.
Speakers: Bruno Lange Ramos (Univ. Federal do Rio de Janeiro (BR)), Bruno Lange Ramos (Univ. Federal do Rio de Janeiro (BR))
• 18:00
Accelerating Debugging In A Highly Distributed Environment 15m
As more experiments move to a federated model of data access the environment becomes highly distributed and decentralized. In many cases this may pose obstacles in quickly resolving site issues; especially given vast time-zone differences. Spurred by ATLAS needs, Release 4 of XRootD incorporates a special mode of access to provide remote debugging capabilities. Essentially, XRootD allows a site to grant secure access to specific individuals to view certain XRootD information (e.g. log files, configuration files, etc). In a virtual view all of the information is laid out in a site independent way regardless of how the site configured its system. This allows experts at other locations to assist in resolving issues and alleviates time-zone vagaries. The view is available through XRootd or, optionally, HTTP. This talk provides the motivation for developing the remote debugging facility, why it is essential in highly distributed environments, and what can actually be done with it.
Speaker: Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)
• 16:30 18:30
Track 7 Session: #3 (user for experiments) C210 (C210)

### C210

#### C210

Clouds and virtualization

Convener: Claudio Grandi (INFN - Bologna)
• 16:30
HEP cloud production using the CloudScheduler/HTCondor Architecture 15m
The use of distributed IaaS clouds with the CloudScheduler/HTCondor architecture has been in production for HEP and astronomy applications for a number of years. The design has proven to be robust and reliable for batch production using HEP clouds, academic non-HEP (opportunistic) clouds and commercial clouds. Further, the system is seamlessly integrated into the existing WLCG infrastructures for the both ATLAS and BelleII experiments. Peak workloads continue to increase and we have utilized over 3000 cores for HEP applications. We show that the CloudScheduler/HTCondor architecture has adapted well to the evolving cloud ecosystem. Its design preceded the introduction of OpenStack clouds, however, the integration of these clouds has been straightforward. Key developments over the past year include the use of CVMFS for application software distribution. CVMFS together with the global array of Squid web caches and the use of the Squid-discovery service, Shoal, have significantly simplified the management and distribution of virtual machine images. The recent deployment of micro-CernVM images have reduced the size of the images to a few megabytes and make the images independent of the application software. The introduction of Glint, a distributed VM image management service, has made the distribution of images over multiple OpenStack clouds an automated operation. These new services have greatly simplified management of the application software and operating system, and enable us to respond quickly to urgent security issues such as the ShellShock (bash shell) vulnerability. HEP experiments are beginning to use multi-core applications and this has resulted in a demand for simultaneously running single-core, multi-core or high-memory jobs. We exploit HTCondor’s feature that dynamically partitions the slots within a VM instance. As a result, a new VM instance can run either single or multi-core jobs depending on the demand. The distributed cloud system has been used primarily for low I/O production jobs, however, the goal is run higher I/O production and analysis application jobs. This will require the use a data federation and we are integrating the CERN UGR data federator for managing data distributed over multiple storage elements. The use of distributed clouds with the CloudScheduler/HTCondor architecture continues to improve and expand in use and functionality. We review the overall progress, and highlight the new features and future challenges.
Speaker: Ian Gable (University of Victoria (CA))
• 16:45
The diverse use of clouds by CMS 15m
The resources CMS is using are increasingly being offered as clouds. In Run 2 of the LHC the majority of CMS CERN resources, both in Meyrin and at the Wigner Computing Centre, will be presented as cloud resources on which CMS will have to build its own infrastructure. This infrastructure will need to run all of the CMS workflows including: Tier 0, production and user analysis. In addition, the CMS High Level Trigger will provide a compute resource comparable in scale to the total offered by the CMS Tier 1 sites, when it is not running as part of the trigger system. During these periods a cloud infrastructure will be overlaid on this resource, making it accessible for general CMS use. Finally, CMS is starting to utilise cloud resources being offered by individual institutes and is gaining experience to facilitate the use of opportunistically available cloud resources.
Speaker: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))
• 17:00
Evolution of Cloud Computing in ATLAS 15m
The ATLAS experiment has successfully incorporated cloud computing technology and cloud resources into its primarily grid-based model of distributed computing. Cloud R&D activities continue to mature and transition into stable production systems, while ongoing evolutionary changes are still needed to adapt and refine the approaches used, in response to changes in prevailing cloud technology. In addition, completely new developments are needed to handle emerging requirements. This work will describe the overall evolution of cloud computing in ATLAS. The current status of the VM management systems used for harnessing IAAS resources will be discussed. Monitoring and accounting systems tailored for clouds are needed to complete the integration of cloud resources within ATLAS' distributed computing framework. We are developing and deploying new solutions to address the challenge of operation in a geographically distributed multi-cloud scenario, including a system for managing VM images across multiple clouds, a system for dynamic location-based discovery of caching proxy servers, and the usage of a data federation to unify the worldwide grid of storage elements into a single namespace and access point. The usage of the experiment's HLT farm for Monte Carlo production, in a specialized cloud environment, will be presented. Finally, we evaluate commercial clouds, and have conducted a study assessing cost vs. benchmark performance.
Speaker: Ryan Taylor (University of Victoria (CA))
• 17:15
Utilizing cloud computing resources for BelleII 15m
The BelleII experiment is developing a global computing system for the simulation of MC data prior its collecting real collision data in the next few years. The system utilizes the grid middleware used in the WLCG and uses the DIRAC workload manager. We describe how IaaS cloud resources are being integrated into the BelleII production computing system in Australia and Canada. The IaaS resources include HEP as well as opportunistic and commercial clouds. In Canada, the cloud resources are managed by a DIRAC installation at the University of Victoria, which acts as a slave to the DIRAC instance at KEK. A workload management service running on the DIRAC server at the University of Victoria submits pilot jobs to an HTCondor queue dedicated to the distributed cloud system. The CloudScheduler VM provisioning service boots the VMs based on the HTCondor queue. The distributed cloud uses resources in Europe and North America including Amazon EC2. Australia provides it's contribution to the Belle II Distributed Computing solution via CPU resources supplied by the NeCTAR Open Stack cloud. The Australian solution employs a Dynamic Torque Batch system with cloud-based worker nodes as a backend to an EGI CREAM-CE. The DIRAC interware sees this a conventional grid cluster with no further configuration or tuning required. The worker nodes employ an SL6 operating system configured via puppet and with the Belle II application software provided via CVMFS. At the time of writing, the Australian system efficiently supplies 350 worker nodes to the Belle II Distributed Computing solution. All the clouds have been successfully used in BelleII MC production campaigns, producing a substantial fraction of the simulated data samples.
Speaker: Dr Randy Sobie (University of Victoria (CA))
• 17:30
LHCb experience with running jobs in virtual machines 15m
The LHCb experiment has been running production jobs in virtual machines since 2013 as part of its DIRAC-based infrastructure. We describe the architecture of these virtual machines and the steps taken to replicate the WLCG worker node environment expected by user and production jobs. This relies on the CernVM 3 system for providing root images for virtual machines. We use the cvmfs distributed filesystem to supply the root partition files, the LHCb software stack, and the bootstrapping scripts necessary to configure the virtual machines for us. Using this approach, we have been able to minimise the amount of contextualisation which must be provided by the virtual machine managers. We explain the process by which the virtual machine is able to receive payload jobs submitted to DIRAC by users and production managers, and how this differs from payloads executed within conventional DIRAC pilot jobs on batch queue based sites. We compare our operational experiences of running production on VM based sites managed using OpenStack, Vac, BOINC, and Condor. Finally we describe our requirements for monitoring which are specific to the additional responsibilities for experiments when operating virtual machines which were previously undertaken by the system managers of worker nodes, and how this is facilitated by the new DIRAC Pilot 2.0 architecture.
Speaker: Andrew McNab (University of Manchester (GB))
• 17:45
BESIII physical offline data analysis on virtualization platform 15m
Mass data processing and analysis contribute much to the development and discoveries of a new generation of High Energy Physics. The BESIII experiment of IHEP(Institute of High Energy Physics, Beijing, China) studies particles in the tau-charm energy region ranges from 2 GeV to 4.6 GeV, and requires massive storage and computing resources, which is a typical kind of data intensive application. With the rapid growth of experimental data, the data processing system encounters many problems, such as low resource utilization, complex migration and so on, which makes it urgent to transplant the data analysis system to a virtualization platform. However, offline software design, resource allocation and job scheduling of BESIII experiment are all based on physical machine. To solve those problems, we bring the virtualization technology of Openstack and KVM to BESIII computing system. In this contribution we present an ongoing work which aims to make BESIII physical analysis work on virtualized resources to achieve higher resource utilization, dynamic resource management and higher job operating efficiency. Particularly, we discuss the architecture of BESIII offline software and the way to optimize the offline software to reduce the performance loss in virtualized environment by creating event index(event metadata) and do event pre-selection based on index, which significantly reduces the IO throughput and event numbers that need to do analysis, and then greatly improves the job processing efficiency. We also report the optimization of KVM from various factors in hardware and kernel including EPT (Extended Page Tables) and CPU affinity. Experimental results show the CPU performance penalty of KVM can be decreased to about 3%. This work is validated through real use cases of production BESIII jobs by working on physical slots and virtualized slots. In addition, the performance comparison between KVM and physical machines in aspect of CPU, disk IO and network IO is also presented. Finally, we describe our development work of adaptive cloud scheduler, which allocates and reclaims VMs dynamically according to the status of TORQUE queue and the size of resource pool to improve resource utilization and job processing efficiency.
Speaker: Ms Bowen Kan (Institute of High Physics Chinese Academy of Sciences)
• 18:00
SkyGrid - where cloud meets grid computing 15m
Computational grid (or simply 'grid') infrastructures are powerful but restricted by several aspects: grids are incapable of running user jobs compiled with a non-authentic set of libraries and it is difficult to restructure grids to adapt to peak loads. At the same time if grids are not loaded with user-tasks, owners still have to pay for electricity and hardware maintenance. So a grid is not cheap and small/medium scientific experiments have difficulties working with such a computational model. To address these inflexibility issues we present SkyGrid - a system that integrates cloud technologies into grid systems. By cloud technologies we mean mainly virtualization that allows both virtualization of user jobs and computational hardware. Virtualization of user jobs is performed by means of Docker - a lightweight Virtual Machine system. Virtualization of hardware is provided by YARN or similar platforms. SkyGrid unties users from the computational cluster architecture and configuration. Also the virtualization approach we propose enables some extreme cases like volunteer computing inside browsers by means of PNaCl technology and running jobs on super-computers. In this paper we present the requirements and architecture of SkyGrid, interfaces available to end-users and interfaces to other systems. Also we provide a description of a real case study of system usage for the SHiP experiment integrating private cloud resources from several universities as well as volunteer computing resources into a single computational infrastructure.
Speaker: Alexander Baranov (ITEP Institute for Theoretical and Experimental Physics (RU))
• 16:30 18:30
Track 8 Session: #2 (Vectorization, NUMA and distribution) B503 (B503)

### B503

#### B503

Performance increase and optimization exploiting hardware features

Convener: Danilo Piparo (CERN)
• 16:30
Performance benchmark of LHCb code on state-of-the-art x86 architectures 15m
For Run 2 of the LHC, LHCb is exchanging a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late forking, NUMA balancing and optimal number of threads, i.e. it automatically optimises box-level performance. We have run this procedure on a wide range of Haswell-E CPUs and numerous other architectures from both Intel and AMD, including also the latest Intel micro-blade servers. We present results in terms of performance, power consumption, overheads and relative cost.
Speaker: Rainer Schwemmer (CERN)
• 16:45
SIMD studies in the LHCb reconstruction software 15m
During the data taking process in the LHC at CERN, millions of collisions are recorded every second by the LHCb Detector. The LHCb "Online" computing farm, counting around 15000 cores, is dedicated to the recontruction of the events in real-time, in order to filter those with interesting Physics. The ones kept are later analysed "Offline" in a more precise fashion on the Grid. This imposes very stringent requirements on the Reconstruction Software, which has to be as efficient as possible. Modern CPUs support so-called "vector-extensions", which extend their Instruction Sets, allowing for concurrent execution across functional units. Several libraries expose the Single Instruction Multiple Data programming paradigm to issue these instructions. The use of vectorisation in our codebase can provide performance boosts, leading ultimately to Physics reconstruction enhancements. In this paper, we present vectorisation studies of significant reconstruction algorithms. A variety of vectorisation libraries are analysed and compared in terms of design, maintainability and performance. We also present the steps taken to systematically measure the performance of the released software, to ensure the consistency of the run-time of the vectorized software.
Speaker: Daniel Hugo Campora Perez (CERN)
• 17:00
A new Self-Adaptive disPatching System for local cluster 15m
Scheduler is one of the most important components of high performance cluster. This paper introduces a self-adaptive dispatching system (SAPS) based on torque/maui which increases the resources utilization of cluster effectively and guarantees the high reliability of the computing platform. It provides great convenience for users to run various tasks on the computing platform. First of all, the SAPS implements the GPU scheduling with multi-core. This provides the basis for effective integration and utilization of computing resources, improves the ability of the cluster computing greatly. Secondly, SAPS analysis the relationship between the number of jobs queueing and the idle resources left, tune the priority of users’ job dynamically. In this way, more resources are provided for jobs running and less resources idle. Thirdly, integrated the on-line error detection with work nodes, the SAPS can excluded error nodes and include the recovered nodes automatically. In addition, SAPS provides a monitoring management with fine granularity, a comprehensive scheduling accounting module and a scheduling real-time alarm function, and all of those ensure the cluster runs more high-efficiently, and reliably. Currently, the SAPS has been running stable on IHEP local cluster (more than 10,000 cores and 30,000 jobs every day) and resource utilization has been improved more than 26%, and the SAPS has reduced costs for both administrator and users greatly.
Speaker: Ms Bowen Kan (Institute of High Physics Chinese Academy of Sciences)
• 17:15
Future Computing Platforms for Science in a Power Constrained Era 15m
Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. We evaluate the potential for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).
Speaker: Mr Giulio Eulisse (Fermi National Accelerator Lab. (US))
• 17:30
Large-Scale Merging of Histograms using Distributed In-Memory Computing 15m
Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but also on the number partial histogram output files. That means, while the time to analyze data decreases linearly with the number of worker nodes, the time to merge the histograms in fact increases with the number of worker nodes. On the grid, merging jobs that take a few hours are not unusual. In order to improve the situation, we present a distributed and decentral merging algorithm whose running time is independent of the number of worker nodes. We exploit full bisection bandwidth of local networks and we keep all intermediate results in memory. We present benchmarks from an implementation using the parallel ROOT facility (PROOF) and RAMCloud, a distributed key-value store that keeps all data in DRAM. Our results show that a real-world collection of ten thousand histograms with overall ten million non-zero bins can be merged in less than one minute.
Speaker: Jakob Blomer (CERN)
• 17:45
High performance data analysis via coordinated caches 15m
With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: High-throughput data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data. The contribution will discuss the current topology of computing resources available for HEP user analyses. Based on this, an overview on the KIT CMS analysis cluster design and implementation is presented. Finally, operational experience in terms of performance and reliability is presented.
Speaker: Max Fischer (KIT - Karlsruhe Institute of Technology (DE))
• 18:00
Evaluating the power efficiency and performance of multi-core platforms using HEP workloads 15m
As Moore's Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory is the major limiting factor, making feeding the cores with data a real challenge. On the other hand, the significant focus on power efficiency paves the way for power-aware computing and less complex architectures to data centers. In this paper we try to examine these trends and present results of our experiments with "Haswell-EP" processor family and highly scalable HEP workloads.
Speaker: Mr Pawel Szostek (CERN)
• 18:15
The Effect of NUMA Tunings on CPU Performance 15m
Non-uniform memory access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the "numactl" utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the "numad" daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This presentation gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark.
Speaker: Christopher Hollowell (Brookhaven National Laboratory)
• Wednesday, 15 April
• 09:00 10:30
Plenary: Session #5 Auditorium

#### Auditorium

Convener: Jeff Templon (NIKHEF (NL))
• 09:00
Diversity in Computing Technologies: Grid, Cloud, HPC ... and Strategies for Dynamic Resource Allocation 45m
Speaker: Oliver Gutsche (Fermi National Accelerator Lab. (US))
• 09:45
Distributed Data Management and Distributed File Systems 45m
Speaker: Dr Maria Girone (CERN)
• 10:30 11:00
Coffee Break 30m
• 11:00 13:00
Plenary: Session #6 Auditorium

#### Auditorium

Convener: Prof. Gang CHEN (INSTITUTE OF HIGH ENERGY PHYSICS)
• 11:00
Computing at FAIR 30m
Speaker: Thorsten Sven Kollegger (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
• 11:30
Expanding OpenStack community in academic fields 30m
Speaker: Mr Tom Fifield (OpenStack Foundation)
• 12:00
Computer Security for HEP 30m
Speaker: Sebastian Lopienski (CERN)
• 12:30
EMC Corporation 30m
• 13:00 13:45
Lunch Break 45m
• 13:45 14:15
Poster session B: #1 Tunnel Gallery

#### Tunnel Gallery

The posters in this session are displayed on April 15-16

• Thursday, 16 April
• 09:00 10:30
Track 1 Session: #4 (Online reconstruction and control systems) Village Center (Village Center)

### Village Center

#### Village Center

Online computing

Convener: Andrew Norman (Fermilab)
• 09:00
The CMS High Level Trigger 15m
The CMS experiment has been designed with a 2-level trigger system: the Level 1 Trigger, implemented on custom-designed electronics, and the High Level Trigger (HLT), a streamlined version of the CMS offline reconstruction software running on a computer farm. A software trigger system requires a tradeoff between the complexity of the algorithms running on the available computing power, the sustainable output rate, and the selection efficiency. Here we will present the performance of the main triggers used during the 2012 data taking, ranging from simpler single-object selections to more complex algorithms combining different objects, and applying analysis-level reconstruction and selection. We will discuss the optimisation of the triggers and the specific techniques developed to cope with the increasing LHC pile-up, reducing its impact on the physics performance.
Speaker: Dr Andrea Bocci (CERN)
• 09:15
Performance of the CMS High Level Trigger 15m
The CMS experiment has been designed with a 2-level trigger system. The first level is implemented using custom-designed electronics. The second level is the so-called High Level Trigger (HLT), a streamlined version of the CMS offline reconstruction software running on a computer farm. For Run II of the Large Hadron Collider, the increases in center-of-mass energy and luminosity will raise the event rate to a level challenging for the HLT algorithms. The increase in the number of interactions per bunch crossing: on average 25 in 2012, and expected to be around 40 in Run II will be an additional complication. We will present the performance of the main triggers used during the 2012 run and will also cover new approaches that have been developed since then to cope with the challenges of the new run. This includes improvements in HLT electron and photon reconstruction as well as better performing muon triggers. We will also present the performance of the improved tracking and vertexing algorithms, discussing their impact on the b-tagging performance as well as on the jet and missing energy reconstruction.
Speaker: Andrea Perrotta (Universita e INFN, Bologna (IT))
• 09:30
The LHCb Data Aquisition and High Level Trigger Processing Architecture 15m
The LHCb experiment at the LHC accelerator at CERN collects collisions of particle bunches at 40 MHz. After a first level of hardware trigger with output of 1 MHz, the physically interesting collisions are selected by running dedicated trigger algorithms in the High Level Trigger (HLT) computing farm. This farm consists of up to roughly 25000 CPU cores in roughly 1600 physical nodes each equipped with 2 TB of local storage space.
This work describes the LHCb online system with an emphasis on the developments implemented during during the current long shutdown (LS1). We will elaborate the architecture to treble the available CPU power of the HLT farm and the technicalities to determine and verify precise calibration and alignment constants which are fed to the HLT event selection procedure. Precise calibration and alignment constants are determined and verified in a separate data acquisition activity as soon as data from particle collisions are delivered by the LHC collider. We will describe how the constants are fed into a two stage HLT event selection facility using extensively the local disk buffering capabilities on the worker nodes. With the installed disk buffers, the installed CPU can be used during periods of up to ten days without beams. These periods in the past accounted to more than 70 % of the total time.
Speaker: Markus Frank (CERN)
• 09:45
LHCb topological trigger reoptimization 15m
The main b-physics trigger algorithm used by the LHCb experiment is the so-called topological trigger. The topological trigger selects vertices which are a) detached from the primary proton-proton collision and b) compatible with coming from the decay of a b-hadron. In the LHC Run 1, this trigger utilized a custom boosted decision tree algorithm, selected an almost 100% pure sample of b-hadrons with a typical efficiency of 60-70%, and its output was used in about 60% of LHCb papers. This talk presents studies carried out to optimize the topological trigger for LHC Run 2. In particular, we have carried out a detailed comparison of various machine learning classifier algorithms, e.g., AdaBoost, MatrixNet and neural networks. The topological trigger algorithm is designed to select all "interesting" decays of b-hadrons, but cannot be trained on every such decay. Studies have therefore been performed to determine how to optimize the performance of the classification algorithm on decays not used in the training. These include cascading, ensembling and blending techniques. Furthermore, novel boosting techniques have been implemented that will help reduce systematic uncertainties in Run 2 measurements. We demonstrate that the reoptimized topological trigger is expected to significantly improve on the Run 1 performance for a wide range of b-hadron decays.
Speaker: Tatiana Likhomanenko (National Research Centre Kurchatov Institute (RU))
• 10:00
The LHCb turbo stream 15m
The LHCb experiment will record an unprecedented dataset of beauty and charm hadron decays during Run II of the LHC, set to take place between 2015 and 2018. A key computing challenge is to store and process this data, which limits the maximum output rate of the LHCb trigger. So far, LHCb has written out a few kHz of events containing the full raw sub-detector data, which are passed through a full offline event reconstruction before being considered for physics analysis. Charm physics in particular is limited by trigger output rate constraints. A new streaming strategy includes the possibility to perform the physics analysis with candidates reconstructed in the trigger, thus bypassing the offline reconstruction. In the "turbo stream" the trigger will write out a compact summary of "physics" objects containing all information necessary for analyses, and this will allow an increased output rate and thus higher average efficiencies and smaller selection biases. This idea will be commissioned and developed during 2015 with a selection of physics analyses. It is anticipated that the turbo stream will be adopted by an increasing number of analyses during the remainder of LHC Run-II (2015-2018) and ultimately in Run-III (starting in 2020) with the upgraded LHCb detector.
Speaker: Sean Benson (CERN)
• 10:15
Online/Offline reconstruction of trigger-less readout in the R3B experiment at FAIR 15m
The R3B (Reactions with Rare Radioactive Beams) experiment is one of the planned experiments at the future FAIR facility at GSI Darmstadt. R3B will cover experimental reaction studies with exotic nuclei far off stability, thus enabling a broad physics programs with rare-isotope beams with emphasis on nuclear structure and dynamics. Several different detection subsystems as well as sophisticated DAQ system and data-analysis software are being developed for this purpose. The data analysis software for R3B is based on FairRoot framework and called R3BRoot. R3BRoot is being used for simulation and detector design studies for the last few years. Recently, it was successfully used directly with the data acquisition and for the analysis of the R3B test beam-time in April 2014. For the future beam times the framework has to deal with the free streaming readout of the detectors. The implementation within R3BRoot to fulfill this trigger-less run mode will be presented as well as the set of tools developed for the online reconstruction and quality assurance of the data during the run.
Speaker: Dmytro Kresan (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
• 09:00 10:30
Track 2 Session: #5 (Analysis) Auditorium (Auditorium)

### Auditorium

#### Auditorium

Offline software

Convener: Andrew Norman (Fermilab)
• 09:00
The ATLAS Higgs Machine Learning Challenge 15m
High Energy Physics has been using Machine Learning techniques (commonly known as Multivariate Analysis) since the 1990s with Artificial Neural Net and more recently with Boosted Decision Trees, Random Forest etc. Meanwhile, Machine Learning has become a full blown field of computer science. With the emergence of Big Data, data scientists are developing new Machine Learning algorithms to extract meaning from large heterogeneous data. HEP has exciting and difficult problems like the extraction of the Higgs boson signal, and at the same time data scientists have advanced algorithms: the goal of the HiggsML project was to bring the two together by a “challenge”: participants from all over the world and any scientific background could compete online to obtain the best Higgs to tau tau signal significance on a set of ATLAS fully simulated Monte Carlo signal and background. Instead of HEP physicists browsing through machine learning papers and trying to infer which new algorithms might be useful for HEP, then coding and tuning them, the challenge has brought realistic HEP data to the data scientists on the Kaggle platform, which is well known in the Machine Learning community. The challenge has been organized by the ATLAS collaboration associated to data scientists, in partnership with the Paris Saclay Center for Data Science, CERN and Google. The challenge ran from May to September 2014, drawing considerable attention. 1785 teams participated, making it the most popular challenge ever on the Kaggle platform. New Machine Learning techniques have been used by the participants with significantly better results than usual HEP tools. This presentation has two parts: the first one describes how a HEP problem was simplified (not too much!) and wrapped up into an online challenge, the second what was learned from the challenge, in terms of new Machine Learning algorithms and techniques which could have an impact on future HEP analysis.
Speaker: Glen Cowan (Royal Holloway, University of London)
• 09:15
How the Monte Carlo production of a wide variety of different samples if centrally handled in the LHCb experiment. 15m
In the LHCb experiment all massive processing of data is handled centrally. In the case of simulated data a wide variety of different types of Monte Carlo (MC) events has to be produced, as each physics’ analysis needs different sets of signal and background events. In order to cope with this large set of different types of MC events, of the order of several hundreds, a numerical event type identification code has been devised and is used throughout. A dedicated package contains all event type configurations files, automatically produced from this code, and is released independently from the simulation application. The deployment of the package on the distributed production system is handled centrally via the LHCb distribution tools and newly deployed event types are registered in the Bookkeeping catalogue. MC production requests are submitted via the LHCb production request system where a dedicated customization for MC data is in place. A specific request is made using predefined models centrally prepared to reproduce various data taking periods and selecting the event type from the Bookkeeping catalogue. After formal approval the requests are automatically forwarded to the LHCb Production team that carries them out. As the data are produced in remote sites they are automatically registered to the Bookkeeping catalogue where they can be found in folders specifying the event type, simulation conditions and processing chain. The various elements in the procedure, from writing a file for an event type to retrieving the sample produced, and the conventions established to allow their interplay will be described. The choices made have allowed to automate the MC production and for experts to concentrate on their specific tasks: while the configurations for each event type are prepared and validated by the physicists and simulation software experts, the MC samples are produced transparently on a world-wide distributed system the by LHCb production team.
Speaker: Gloria Corti (CERN)
• 09:30
The Library Event Matching classifier for $\nu_e$ events in NOvA 15m
In this paper we present the Library Event Matching (LEM) classification technique for particle identification. The LEM technique was developed for the NOvA electron neutrino appearance analysis as an alternative but complimentary approach to standard multivariate methods. Traditional multivariate PIDs are based on high-level reconstructed quantities which can obscure or discard important low-level detail in high granularity detectors. LEM, by contrast, uses the full hit by hit information of the event, comparing the hit charges and positions of each physics event to a large "template" library of simulated signal and background events. This is a powerful classification technique for the finely segmented NOvA detectors, but poses computational challenges due to the large Monte Carlo template libraries required for to reach the optimal physics sensitivity. We will present both the LEM classification technique as well as its technical implementation for the NOvA experiment exploiting memory mapping techniques on high memory Linux platforms.
Speaker: Dominick Rocco (urn:Google)
• 09:45
Interpolation between multi-dimensional histograms using a new non-linear moment morphing method 15m
In particle physics experiments data analyses generally use Monte Carlo (MC) simulation templates to interpret the observed data. These simulated samples may depend on one or multiple model parameters, such as a shifting mass parameter, and a set of such samples may be required to scan over the various parameter values. Since detailed detector MC simulation can be time-consuming, there is often a need to interpolate between the limited number of available MC simulation templates. Only several interpolation techniques exist for this. For example, the statistical tests widely used in particle physics, e.g. for the discovery of Higgs boson, rely critically on continuous and smooth parametric models that describe the physics processes in the data. We present a new template morphing technique, moment morphing, for the interpolation between multi-dimensional distribution templates based on one or multiple model parameters. Moment morphing is fast, numerically stable, and is not restricted in the number of input templates, the number of model parameters or the number of input observables. For the first time, statistical tests may include the impact of a non-factorizable response between different model parameters, where varying one model parameter at a time is insufficient to capture the full response function.
Speaker: Stefan Gadatsch (NIKHEF (NL))
• 10:00
CosmoSIS: a system for MC parameter estimation 15m
CosmoSIS [http://arxiv.org/abs/1409.3409] is a modular system for cosmological parameter estimation, based on Markov Chain Monte Carlo (MCMC) and related techniques. It provides a series of samplers, which drive the exploration of the parameter space, and a series of modules, which calculate the likelihood of the observed data for a given physical model, determined by the location of a sample in the parameter space. While CosmoSIS ships with a set of modules that calculate quantities of interest to cosmologists, there is nothing about the framework itself, nor in the MCMC technique, that is specific to cosmology. Thus CosmoSIS could be used for parameter estimation problems in other fields, including HEP. This presentation will describe the features of CosmoSIS and show an example of its use outside of cosmology. It will also discuss how collaborative development strategies differ between two different communities: that of HEP physicists, accustomed to working in large collaborations, and that of cosmologists, who have traditionally not worked in large groups. For example, because there is no collaboration to enforce a language choice, the framework supports programming in multiple languages. Additionally, since scientists in the cosmology community are used to working independently, a system was needed for helping ensure that proper attribution is given to authors of contributed algorithms.
Speaker: Alessandro Manzotti (The University of Chicago)
• 10:15
The Bayesian analysis toolkit: version 1.0 and beyond 15m

The Bayesian analysis toolkit (BAT)
is a C++ package centered around Markov-chain Monte Carlo sampling. It
is used in analyses of various particle-physics experiments such as
ATLAS and Gerda. The software has matured over the last few years to a
version 1.0. We will summarize the lessons learned and report on the
current developments of a complete redesign targeting multicore and
multiprocessor architectures and supporting many more sampling
algorithms both built-in and user-supplied.

Speaker: Dr Frederik Beaujean (LMU Munich)
• 09:00 10:30
Track 3 Session: #4 (Future use cases) C209 (C209)

### C209

#### C209

Data store and access

Convener: Shaun de Witt (STFC)
• 09:00
Architectures and methodologies for future deployment of multi-site Zettabyte-Exascale data handling platforms 15m
Several scientific fields, including Astrophysics, Astroparticle Physics, Cosmology, Nuclear and Particle Physics, and Research with Photons, are estimating that by the 2020 decade they will require data handling systems with data volumes approaching the Zettabyte distributed amongst as many as 1018 individually addressable data objects (Zettabyte-Exascale systems). It may be convenient or necessary to deploy such systems using multiple physical sites. This paper describes the findings of a working group composed of experts from several large European scientific data centres on architectures and methodologies that should be studied by building proof-of-concept systems, in order to prepare the way for building reliable and economic Zettabyte-Exascale systems. Key ideas emerging from the study are: the introduction of a global Storage Virtualization Layer which is logically separated from the individual storage sites; the need for maximal simplification and automation in the deployment of the physical sites; the need to present the user with an integrated view of their custom metadata and technical metadata (such as the last time an object was accessed, etc.); the need to apply modern efficient techniques to handle the large metadata volumes (e.g. Petabytes) that will be involved; and the challenges generated by the very large rate of technical metadata updates. It also addresses the challenges associated with the need to preserve scientific data for many decades. The paper is presented in the spirit of sharing the findings with both the user communities and data centre experts, in order to receive feedback and generate interest in starting prototyping work on the Zettabyte-Exascale challenges.
Speaker: Manuel Delfino Reznicek (Universitat Autònoma de Barcelona (ES))
• 09:15
Architecture of a new data taking and analysis infrastructure and services for the next generation detectors of Petra3 at DESY 15m
Data taking and analysis infrastructures in HEP have evolved during many years to a well known problem domain. In contrast to HEP, third generations synchrotron light sources, existing and upcoming free electron laser are confronted an explosion in data rates which is primarily driven by recent developments in 2D pixel array detectors. The next generation will produce data in the region upwards of 50 Gbytes per second. At synchrotrons, data was traditionally taken away by users following data taking using portable media. This will clearly not scale at all. We present first experiences of our new architecture and services underlying by results taken from the resumption of data taking in March 2015. Technology choices were undertaking over a period of twelve month. The work involved a close collaboration between central IT, beamline controls, and beamline support staff. In addition a cooperation was established between DESY IT and IBM to include industrial research and development experience and skills. In technological terms the next generation detectors exceeds current generations in order of magnitudes by data volume, -rate as well as complexity at an exponential growth. We are challenging unpredictable data access demands, computing platform and OS version integration (i.e. Windows), but still requiring acceptable bandwidth for DAQ rate at the final storage system. Our approach integrates HPC technologies for storage systems and protocols. In particular, our solution uses a single filesystem instance with a multiple protocol access, while operating within a single namespace - ubiquitous NFS & SMB access to same repository. We are targeting a system supporting distributed parity, decent erasure codes - to allow very fast rebuilds and a high availability level for the capacity disk based resources, multiple cluster, asynchronous file replication, high speed networking - Infiniband FDR & 10/40GE Ethernets, storage class memory (SSD & FlashDIMM) to run the high IOPS burst buffering layer within the architecture.
Speaker: Martin Gasthuber (Deutsches Elektronen-Synchrotron (DE))
• 09:30
dCache, evolution by tackling new challenges. 15m
With the great success of the dCache Storage Technology in the framework of the World Wide LHC Computing Grid, an increasing number of non HEP communities were attracted to use dCache for their data management infrastructure. As a natural consequence, the dCache team was presented with new use-cases that stimulated the development of interesting dCache features. Perhaps the most important group of new features is the enhanced media awareness. One aspect is the optimized migration of data between random access devices and tertiary storage, e.g. tape systems. Transparently for the user, dCache combines small files into containers before being copied to tape. Another aspect of this media awareness work is dCache's activity in making more efficient use of SSDs to boost high speed writing and chaotic reading: depending on access profile or protocol, data is placed on the most appropriate media types. A second hot topic, often requested by scientific communities, is Cloud Storage. By marrying the OwnCloud software with dCache, a unique hybrid system becomes available that provides both the simplicity of a sync-n-share service and dCache's many unique data management features. Beyond simple sync-n-share, users also demand control over the quality of service dCache offers. To support this, dCache is implementing the CDMI standard. With CDMI, dCache can present new functionality in a standard fashion; e.g. storing and querying metadata, triggering media migration or treating dCache as an object store. As the X509 certificate infrastructure proved unpopular outside of HEP sciences, dCache will support alternative methods of authenticating, like SAML and OpenID Connect. Such support allows sites running dCache to join identity federations as part of international collaborations. The dCache team will describe its ongoing activities and present its future development road map.
Speaker: Dr Patrick Fuhrmann (DESY)
• 09:45
dCache, Sync-and-Share for Big Data 15m
The availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation. Beside legal issues, those two worlds have little overlap in user authentication and access protocols. While traditional storage technologies, popular in HEP, are based on X509, cloud services and sync-n-share software technologies are generally based on user/password authentication or mechanisms like SAML or Open ID Connect. Similarly, data access models offered by both are somewhat different, with sync-n-share services often using proprietary protocols. As both approaches are very attractive, dCache.org developed a hybrid system, providing the best of both worlds. To avoid reinvent the wheel, dCache.org decided to embed another Open Source project: OwnCloud. This offers the required modern access capabilities but does not support the managed data functionality needed for large capacity data storage. With this hybrid system, scientist can share files and synchronize their data with laptops or mobile devices as easy as with any other cloud storage service. On top of this, the same data can be accessed via established mechanisms, like GridFTP to serve the Globus Transfer Service or the WLCG FTS3 tool, or the data can be made available to worker nodes or HPC applications via a mounted filesystem. As dCache provides a flexible authentication module, the same user can access its storage via different authentication mechanisms; e.g., X.509 and SAML. Additionally, users can specify the desired quality of service or trigger media transitions as necessary, so tuning data access latency to the planned access profile. Such features are a natural consequence of using dCache. We will describe the design of the hybrid dCache/OwnCloud system, report on several months of operations experience running it at DESY, and elucidate on the future road-map.
Speaker: Dr Paul Millar (Deutsches Elektronen-Synchrotron (DE))
• 10:00
EOS as the present and future solution for data storage at CERN 15m
EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC and non-LHC experiments and life-cycle management using JBOD hardware for multi PB storage installations. The EOS design implies a split of hot and cold storage and introduced a change of the traditional HSM functionality based workflows at CERN. The 2015 deployment brings storage at CERN to a new scale and foresees to breach 100 PB of disk storage in a distributed environment using tens of thousands of (heterogeneous) hard drives. EOS has brought to CERN major improvements compared to past storage solutions by allowing quick changes in the quality of services of storage pools. This allows the data centre to quickly meet the changing performance and reliability requirements of the LHC experiments with minimal data movements and dynamic reconfigurations. For example, the software stack has met the specific needs of the dual computing centre set-up required by CERN and allowed the fast design of new workflows accommodating the separation of long-term tape archive and disk storage required for the LHC Run II. The talk will give a high-level state of the art overview of EOS with respect to Run II, introduce new tools and use cases and set the new roadmap for the next storage solutions to come.
Speaker: Mr Andreas Joachim Peters (CERN)
• 10:15
Pooling the resources of the CMS Tier-1 sites 15m
The CMS experiment at the LHC relies on 7 Tier-1 centres of the WLCG to perform the majority of its bulk processing activity, and to archive its data. During the first run of the LHC, these two functions were tightly coupled as each Tier-1 was constrained to process only the data archived on its hierarchical storage. This lack of flexibility in the assignment of processing workflows occasionally resulted in uneven resource utilisation and in an increased latency in the delivery of the results to the physics community. The long shutdown of the LHC in 2013-2014 was an opportunity to revisit this mode of operations, disentangling the processing and archive functionalities of the Tier-1 centres. The storage services at the Tier-1s were redeployed breaking the traditional hierarchical model: each site now provides a large disk storage to host input and output data for processing, and an independent tape storage used exclusively for archiving. Movement of data between the tape and disk endpoints is not automated, but triggered externally through the WLCG transfer management systems. With this new setup, CMS operations actively controls at any time which data is available on disk for processing and which data should be sent to archive. Thanks to the high-bandwidth connectivity guaranteed by the LHCOPN, input data can be freely transferred between disk endpoints as needed to take advantage of free CPU, turning the Tier-1s into a large pool of shared resources. The output data can be validated before archiving them permanently, and temporary data formats can be produced without wasting valuable tape resources. Finally, the data hosted on disk at Tier-1s can now be made available also for user analysis since there is no risk any longer of triggering chaotic staging from tape. In this contribution, we describe the technical solutions adopted for the new disk and tape endpoints at the sites, and we report on the commissioning and scale testing of the service. We detail the procedures implemented by CMS computing operations to actively manage data on disk at Tier-1 sites, and we give examples of the benefits brought to CMS workflows by the additional flexibility of the new system.
Speaker: Christoph Wissing (Deutsches Elektronen-Synchrotron (DE))
• 09:00 10:30
Track 4 Session: #5 (Software) B250 (B250)

### B250

#### B250

Middleware, software development and tools, experiment frameworks, tools for distributed computing

Convener: Andreas Heiss (KIT - Karlsruhe Institute of Technology (DE))
• 09:00
Testable physics by design 15m
Testable physics by design The validation of physics calculations requires the capability to thoroughly test them. The difficulty of exposing parts of the software to adequate testing can be the source of incorrect physics functionality, which in turn may generate hard to identify systematic effects in physics observables produced by the experiments. Starting from real-life examples encountered in the course of an extensive validation effort of Geant4 physics, we show how software design choices may affect the ability to test basic aspects of physics functionality and to monitor their evolution in the course of software lifecycle. We document cases where inaccurate physics behaviour was hidden by software design that prevented its testability, and illustrate how improved physics functionality could be achieved by improving the transparency of software design. Exploiting the experience with Geant4 as a playground for investigation, we discuss methods to enhance the testability of existing software systems by means of refactoring techniques, the identification of inflection points and their coverage. We discuss how these techniques differ from, but can be combined with, the traditional practice of monitoring some physics observables with regression testing. We also present guidelines to introduce testability into the software design since the early phases of the software development, and to preserve it in the course of the product lifecycle. This issue is especially relevant in the context of ongoing R&D on future simulation systems.
Speaker: Dr Maria Grazia Pia (Universita e INFN (IT))
• 09:15
First statistical analysis of Geant4 quality software metrics 15m
Geant4 is a widespread simulation system of "particles through matter" used in several experimental areas from high energy physics and nuclear experiments to medical studies. Some of its applications may involve critical use cases; therefore they would benefit from an objective assessment of the software quality of Geant4. The issue of maintainability is especially relevant for such a widely used, mature software system, which at the present time is the result of 20 years of development. We performed a quantitative analysis of Geant4 software quality with emphasis on maintainability. To evaluate the maintainability of Geant4 software, we used existing standards, such as ISO/IEC 9126, that identifies the software characteristics. Furthermore, we exploited a set of product metrics - aggregated in the program size, code distribution, control flow complexity and object-orientation metrics categories - that allows to understand the code state. By using various software metrics tools, we were able to collect a large amount of measurements of software characteristics. In this paper, we provide a first statistical evaluation of software metrics data related to a set of Geant4 physics packages. The analysis determined what metrics are most effective at identifying risks for the considered Geant4 packages and their correlations. We also evaluated the applicability of existing quality standards, which may derive from different application environments, to the Geant4 context. The findings of this pilot study set the ground for further extensions to the whole of Geant4 and to other HEP software systems.
Speaker: Elisabetta Ronchieri (INFN)
• 09:30
ROOT6: a quest for performance 15m
The sixth release cycle of ROOT is characterised by a radical modernisation in the core software technologies the tookit relies on: language standard, interpreter, hardware exploitation mechanisms. If on the one hand, the change offered the opportunity of consolidating the existing codebase, in presence of such innovations, maintaing the balance between full backward compatibility and software performance was not easy. In this contribution we review the challenges and the solutions identified and implemented in the area of CPU and memory consumption as well as I/O capabilities in terms of patterns. Moreover, we present some of the new ROOT components which are offered to the users to improve the performance of third party applications.
Speaker: Danilo Piparo (CERN)
• 09:45
ROOT 6 and beyond: TObject, C++14 and many cores. 15m
Following the release of version 6, ROOT has entered a new area of development. It will leverage the industrial strength compiler library shipping in ROOT 6 and its support of the C++11/14 standard, to significantly simplify and harden ROOT's interfaces and to clarify and substantially improve ROOT's support for multi-threaded environments. This talk will also recap the most important new features and enhancements in ROOT in general, focusing on those allowed by the improved interpreter and better compiler support, including I/O for smart pointers, easier type safe access to the content of TTrees and enhanced multi processor support.
Speaker: Philippe Canal (Fermi National Accelerator Lab. (US))
• 10:15
IgProf profiler support for power efficient computing 15m
In recent years the size and scale of scientific computing has grown significantly. Computing facilities have grown to the point where energy availability and costs have become important limiting factors for data-center size and density. At the same time, power density limitations in processors themselves are driving interest in more heterogeneous processor architectures. Optimizing application performance is no longer required merely to obtain results faster, but also to stay within the economic limits and constraints imposed by power hungry datacenters. IgProf is an open-source, general purpose memory and performance profiler suite. We present the improvements we have made to permit optimizing the power efficiency of an application. This functionality builds on direct measurements of power related quantities using a newly developed IgProf module which exploits novel on-chip power monitoring capabilities (such as RAPL on new Intel processors). We also explore indirect methods which extrapolate from other measured application characteristics and processor behaviors, using CPU performance counters and processor power states. We demonstrate the use of our tools both on small micro-benchmarks, developed to better understand problems and tuning measurements, and with complex, large-scale C++ applications, derived from millions of lines of code.
Speaker: Mr Giulio Eulisse (Fermi National Accelerator Lab. (US))
• 09:00 10:30
Track 6 Session: #3 (Collaborative Tool) B503 (B503)

### B503

#### B503

Facilities, Infrastructure, Network

Convener: Peter Hristov (CERN)
• 09:00
Indico - the road to 2.0 15m
Indico has come a long way since it was first used to organize CHEP 2004. More than ten years of development have brought new features and projects, widening the application's feature set and enabling event organizers to work even more efficiently. While this has boosted the tool's usage and facilitated its adoption by a remarkable 300,000 events (at CERN only), it has also generated a whole new range of challenges, which have been the target of the team's attention for the last 2 years. One of them was that of scalability and the maintainability of the current database solution (ZODB). After careful consideration, the decision was taken to move away from ZODB to PostgreSQL, a relational and widely-adopted solution that will permit the development of a more ambitious feature set as well as improved performance and scalability. A change of this type is by no means trivial in nature and requires the refactoring of most backend code as well as the full rewrite of significant portions of it. We are taking this opportunity to modernize Indico, by employing standard web modules, technologies and concepts that not only make development and maintenance easier but also constitute an upgrade to Indico's stack. The first results are already visible since August 2014, with the full migration of the Room Booking module to the new paradigm. In this paper we explain what has been done so far in the context of this ambitious migration, what have been the main findings and challenges, as well as the main technologies and concepts that will constitute the foundation of the resultant Indico 2.0.
Speaker: Pedro Ferreira (CERN)
• 09:15
Vidyo@CERN: A Service Update 15m
We will present an overview of the current real-time video service offering for the LHC, in particular the operation of the CERN Vidyo service will be described in terms of consolidated performance and scale: The service is an increasingly critical part of the daily activity of the LHC collaborations, topping recently more than 50 million minutes of communication in one year, with peaks of up to 852 simultaneous connections. We will elaborate on the improvement of some front-end key features such as the integration with CERN Indico, or the enhancements of the Unified Client and also on new ones, released or in the pipeline, such as a new WebRTC client and CERN SSO/Federated SSO integration. An overview of future infrastructure improvements, such as virtualization techniques of Vidyo routers and geo-location mechanisms for load-balancing and optimum user distribution across the service infrastructure will also be discussed. The work done by CERN to improve the monitoring of its Vidyo network will also be presented and demoed. As a last point, we will touch the roadmap and strategy established by CERN and Vidyo with a clear objective of optimizing the service both on the end client and backend infrastructure to make it truly universal, to serve Global Science; that includes the introduction of the multi-tenant concept to serve different communities, as the follow up of CERN’s decision to offer the Vidyo service currently operated for the LHC, to other Sciences, Institutions and Virtual Organizations beyond HEP that might express interest for it.
Speaker: Mr Joao Correia Fernandes (CERN)
• 09:30
Status report of the migration of the CERN Document Server to the invenio-next package 15m
With the expected release of invenio next stable version, CDS is preparing a 'lab' service where users will have the opportunity to experience the powerful features of the new software. After a short introduction of Invenio next, the talk will explain the mechanisms that have been implemented to allow to run parallel services with the same content exposed from two different designs and the related constraints of this architecture. It will then focus on the main advantages of the new service for end-users. The modern personalized collection home pages that have been designed after interviewing random users, will be detailed. Both the process to converge to a User Interface answering the major requirements and the result of the development (how a user can easily decide on the more interesting content to be displayed first) will be presented to the audience. Many other improvements to the current CDS will also be introduced, like the advanced search clusters, the new commenting scheme heavily used by the experiments, the new auto-suggest feature looking in multiple author databases and the move of the CDS service monitoring tool to a dedicated ElasticSearch/Kibana instance. Finally, the on-going project of creating CERN Authors profile will be over-viewed, from the mandatory disambiguation process that uniquely identifies people to the design of the most appropriate profile pages.
Speaker: Esteban Gabancho (CERN)
• 09:45
Enterprise Social Networking Systems for HEP community 15m
The emergence of social media platforms in the consumer space unlocked new ways of interaction between individuals on the web. People develop now their social networks and relations based on common interests and activities with the choice to opt-in or opt-out on content of their interest. This kind of platforms have also an important place to fill inside large organizations and enterprises where communication and collaborators interaction are keys for development. They are enablers to unlock hidden opportunities. Enterprise Social Networking Systems (ESN) add value to an organization by encouraging information sharing, capturing knowledge, enabling action and empowering people. For such they propose microblogging, profiles, social networking, suggestion systems and discussion forums. Microblogging introduces a lightweight and informal communication channel that primes information sharing, promotes discussion and gives an equal opportunity to everyone to reach a broad audience. Profiles and social networking are the perfect tool to seek for skills and can act as a catalyzer for new projects and opportunities. Suggestion systems and discussion forums are the perfect tools to identify evolving trends and allow new communities to born. CERN is currently rolling out an ESN which aims to unify and provide a single point of access to the multitude of information sources in the organization. It also implements social features that can be added on top of existing communication channels. This talk will review the experiences with this deployment, the risks and benefits and outlook for the future as a social community tool for HEP.
Speaker: Bruno Silva De Sousa (CERN)
• 10:00
ATLAS Public Web Pages: Online Management of HEP External Communication Content 15m
The ATLAS Education and Outreach Group is in the process of migrating its public online content to a professionally designed set of web pages built on a Drupal-based content management system. Development of the front-end design passed through several key stages, including audience surveys, stakeholder interviews, usage analytics, and a series of fast design iterations, called sprints. Implementation of the web site involved application of the html design using Drupal templates, refined development iterations, and the overall population of the site with content. We present the design and development processes and share the lessons learnt along the way, including the results of the data-driven discovery studies. We also demonstrate the advantages of selecting a back-end supported by content management, with a focus on workflow. Finally, we discuss usage of the new public web pages to implement Outreach strategy through implementation of clearly presented themes, consistent audience targeting and messaging, and the enforcement of a well-defined visual identity.
Speaker: Steven Goldfarb (University of Michigan (US))
• 10:15
Offering Global Collaboration Services beyond CERN and HEP 15m
The CERN IT department has built over the years a performant and integrated ecosystem of collaboration tools, from videoconference and webcast services to event management software. These services have been designed and evolved in very close collaboration with the various communities surrounding the laboratory and have been massively adopted by CERN users. To cope with this very heavy usage, global infrastructures have been deployed which take full advantage of CERN's international and global nature. If these services and tools are instrumental in enabling the worldwide collaboration which generates major HEP breakthroughs, they would certainly also benefit other sectors of science in which globalisation has already taken place. Some of these services are driven by commercial software (Vidyo or Wowza for example), some others have been developed internally and have already been made available to the world as Open Source Software in line with CERN's spirit and mission. Indico for example is now installed in 100+ institutes worldwide. But providing the software is often not enough and institutes, collaborations and project teams do not always possess the expertise, or human or material resources that are needed to set up and maintain such services. Regional and national institutions have to answer needs which are growingly global and often contradict their operational capabilities or organisational mandate and so are looking at existing worldwide service offers such as CERN's. We believe that the accumulated experience obtained through the operation of a large scale worldwide collaboration service combined with CERN's global network and its recently-deployed Agile Infrastructure would allow the Organisation to set up and operate collaborative services, such as Indico and Vidyo, at a much larger scale and on behalf of worldwide research and education institutions and thus answer these pressing demands while optimizing resources at a global level. Such services would be built over a robust and massively scalable Indico server to which the concept of communities would be added, and which would then serve as a hub for accessing other collaboration services such as Vidyo, on the same simple and successful model currently in place for CERN users. This talk will describe this vision, its benefits and the steps which have already been taken to make it come to life.
Speaker: Mr Joao Correia Fernandes (CERN)
• 09:00 10:30
Track 7 Session: #4 C210 (C210)

### C210

#### C210

Clouds and virtualization

Convener: Andrew McNab (University of Manchester (GB))
• 09:00
Status and Roadmap of CernVM 15m
Cloud resources nowadays contribute an essential share of resources for computing in high-energy physics. Such resources can be either provided by private or public IaaS clouds (e.g. OpenStack, Amazon EC2, Google Compute Engine) or by volunteers’ computers (e.g. LHC@Home 2.0). In any case, experiments need to prepare a virtual machine image that provides the execution environment for the physics application at hand. The CernVM virtual machine since version 3 is a minimal and versatile virtual machine image capable of booting different operating systems. The virtual machine image is less than 20 megabyte in size. The actual operating system is delivered on demand by the CernVM File System. CernVM 3 has matured from a prototype to a production environment. It is used, for instance, to run LHC applications in the cloud, to tune event generators using a network of volunteer computers, and as a container for the historic Scientific Linux 5 and Scientific Linux 4 based software environments in the course of long-term data preservation efforts of the ALICE, CMS, and ALEPH experiments. We present experience and lessons learned from the use of CernVM at scale. We also provide an outlook on the upcoming developments. These developments include adding support for Scientific Linux 7, the use of container virtualization, such as provided by Docker, and the streamlining of virtual machine contextualization towards the cloud-init industry standard.
Speaker: Gerardo Ganis (CERN)