Conveners
Track 4 Session: #1 (Middleware)
- Oliver Keeble (CERN)
Track 4 Session: #2 (Framework)
- Vincent Garonne (University of Oslo (NO))
Track 4 Session: #3 (Middleware)
- Marco Clemencic (CERN)
Track 4 Session: #4 (Application)
- Oliver Gutsche (Fermi National Accelerator Lab. (US))
Track 4 Session: #5 (Software)
- Andreas Heiss (KIT - Karlsruhe Institute of Technology (DE))
Track 4 Session: #6 (Application)
- Tony Wildish (Princeton University (US))
Description
Middleware, software development and tools, experiment frameworks, tools for distributed computing
Federico Stagni
(CERN)
4/13/15, 2:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
In the last few years, new types of computing infrastructures, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resource may come as part of pledged resources, while others are in the form of opportunistic ones. Most of these new infrastructures are based on virtualization techniques, others don't. Meanwhile, some concepts, such as...
Tadashi Maeno
(Brookhaven National Laboratory (US))
4/13/15, 2:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Experiments at the Large Hadron Collider (LHC) face unprecedented computing challenges. Heterogeneous resources are distributed worldwide at hundreds of sites, thousands of physicists analyze the data remotely, the volume of processed data is beyond the exabyte scale, while data processing requires more than a few billion hours of computing usage per year. The PanDA (Production and Distributed...
Dr
Antonio Perez-Calero Yzquierdo
(Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
4/13/15, 2:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The successful exploitation of the multicore processor architectures available at the computing sites is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework has...
Nathalie Rauschmayr
(CERN)
4/13/15, 2:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The main goal of a Workload Management System (WMS) is to find and allocate resources for the jobs it is handling. The more and more accurate information the WMS receives about the jobs, the easier it will be to accomplish its task, which will directly translate into a better utilization of resources. Traditionally, the information associated with each job, like expected runtime or memory...
James Letts
(Univ. of California San Diego (US))
4/13/15, 3:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
CMS will require access to more than 125k processor cores for the beginning of Run2 in 2015 to carry out its ambitious physics program with more and higher complexity events. During Run1 these resources were predominantly provided by a mix of grid sites and local batch resources. During the long shut down cloud infrastructures, diverse opportunistic resources and HPC supercomputing centers...
Vincent Garonne
(CERN)
4/13/15, 3:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
For more than 8 years, the Distributed Data Management (DDM) system of ATLAS called DQ2 has been able to demonstrate very large scale data management capabilities with more than 600M files, 160 petabytes spread worldwide across 130 sites, and accesses from 1,000 active users. However, the system does not scale for LHC run2 and a new DDM system called Rucio has been developed to be DQ2's...
Martin Barisits
(CERN)
4/13/15, 3:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The ATLAS Distributed Data Management system stores more than 160PB of physics data across more than 130 sites globally. Rucio, the next-generation data management system of ATLAS has been introduced to cope with the anticipated workload of the coming decade. The previous data management system DQ2 pursued a rather simplistic approach for resource management, but with the increased data volume...
Dr
Tony Wildish
(Princeton University (US))
4/13/15, 3:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that data movement was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs.
ASO foresees the management of up to 400k files per day...
David Schultz
(University of Wisconsin-Madison)
4/13/15, 4:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
We describe the overall structure and new features of the second generation of IceProd, a data processing and management framework. IceProd was developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations and detector data, and has been a key component of the IceCube offline computing infrastructure since it was first deployed in 2006. It runs fully in user space as...
Hideki Miyake
(KEK)
4/13/15, 4:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
In Belle II experiment a large amount of physics data will be continuously taken and the production rate is equivalent to LHC experiments.
Considerable resources of computing, storage, and network, are necessary to handle not only the taken data but also substantial simulated data.
Therefore Belle II exploits distributed computing system based on DIRAC interware.
DIRAC is a general...
Federico Stagni
(CERN)
4/13/15, 5:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The DIRAC workload management system used by LHCb Distributed Computing is based on Computing Resource reservation and late binding (also known as pilot job in the case of batch resources) that allows the serial execution of several jobs obtained from a central task queue. CPU resources can usually be reserved for limited duration only (e.g. batch queue time limit) and in order to optimize...
Dr
Torre Wenaus
(Brookhaven National Laboratory (US))
4/13/15, 5:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The ATLAS Event Service (ES) implements a new fine grained approach to HEP event processing, designed to be agile and efficient in exploiting transient, short-lived resources such as HPC hole-filling, spot market commercial clouds, and volunteer computing. Input and output control and data flows, bookkeeping, monitoring, and data storage are all managed at the event level in an implementation...
Marco Mascheroni
(Universita & INFN, Milano-Bicocca (IT))
4/13/15, 5:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The CMS Remote Analysis Builder (CRAB) provides the service for managing analysis tasks isolating users from the technical details of the distributed Grid infrastructure. Throughout the LHC Run 1, CRAB has been successfully employed by an average 350 distinct users every week executing about 200,000 jobs per day.
In order to face the new challenges posed by the LHC Run 2, CRAB has been...
Sebastian Neubert
(CERN)
4/13/15, 5:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Reproducibility of results is a fundamental quality of scientific research. However, as data analyses become more and more complex and research is increasingly carried out by larger and larger teams, it becomes a challenge to keep up this standard. The decomposition of complex problems into tasks that can be effectively distributed over a team in a reproducible manner becomes...
Dr
Tian Yan
(Institution of High Energy Physics, Chinese Academy of Science)
4/13/15, 6:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
For Beijing Spectrometer III (BESIII) experiment located at the Institute of High Energy Physics (IHEP), China, the distributed computing environment (DCE) has been setup and been in production status since 2012. The basic framework or middleware is DIRAC (Distributed Infrastructure with Remote Agent Control) with BES-DIRAC extensions. About 2000 CPU cores and 400 TB storage contributed by...
Janusz Martyniak
(Imperial College London)
4/13/15, 6:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The GridPP consortium in the UK is currently testing a multi-VO DIRAC service aimed at non-LHC VOs. These VOs are typically small (fewer than two hundred members) and generally do not have a dedicated computing support post. The majority of these represent particle physics experiments (e.g. T2K, NA62 and COMET), although the scope of the DIRAC service is not limited to this field. A few VOs...
Edgar Fajardo Hernandez
(Univ. of California San Diego (US))
4/14/15, 2:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The HTCondor-CE is the next-generation gateway software for the Open Science Grid (OSG). This is responsible for providing a network service which authorizes remote users and provides a resource provisioning service (other well-known gatekeepers include Globus GRAM, CREAM, Arc-CE, and Openstackโs Nova). Based on the venerable HTCondor software, this new CE is simply a highly-specialized...
Andrej Filipcic
(Jozef Stefan Institute (SI))
4/14/15, 2:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Distributed computing resources available for high-energy physics research are becoming less dedicated to one type of workflow and researchersโ workloads are increasingly exploiting modern computing technologies such as parallelism. The current pilot job management model used by many experiments relies on static dedicated resources and cannot easily adapt to these changes. The model used for...
Jon Kerr Nilsen
(University of Oslo (NO))
4/14/15, 2:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
While current grid middlewares are quite advanced in terms of connecting jobs to resources, their client tools are generally quite minimal and features for managing large sets of jobs are left to the user to implement. The ARC Control Tower (aCT) is a very flexible job management framework that can be run on anything from a single userโs laptop to a multi-server distributed setup. aCT was...
Andres Gomez Ramirez
(Johann-Wolfgang-Goethe Univ. (DE))
4/14/15, 2:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Grid infrastructures allow users flexible on-demand usage of computing resources using an Internet connection. A remarkable example of a Grid in High Energy Physics (HEP) research is used by the ALICE experiment at European Organization for Nuclear Research CERN. Physicists can submit jobs used to process the huge amount of particle collision data produced by the Large Hadron Collider (LHC) at...
Dr
Tony Wildish
(Princeton)
4/14/15, 3:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The ANSE project has been working with the CMS and ATLAS experiments to bring network awareness into their middleware stacks. For CMS, this means enabling control of virtual network circuits in PhEDEx, the CMS data-transfer management system. PhEDEx orchestrates the transfer of data around the CMS experiment to the tune of 1 PB per week spread over about 70 sites.
The goal of ANSE is to...
Dr
Alexei Klimentov
(Brookhaven National Laboratory (US))
4/14/15, 3:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
A crucial contributor to the success of the massively scaled global computing system that delivers the analysis needs of the LHC experiments is the networking infrastructure upon which the system is built. The experiments have been able to exploit excellent high-bandwidth networking in adapting their computing models for the most efficient utilization of resources.
New advanced networking...
Alessandra Forti
(University of Manchester (GB))
4/14/15, 3:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
After the successful first run of the LHC, data taking will restart in early 2015 with unprecedented experimental conditions leading to increased data volumes and event complexity. In order to process the data generated in such scenario and exploit the multicore architectures of current CPUs, the LHC experiments have developed parallelized software for data reconstruction and simulation. A...
Dr
Wenji Wu
(Fermi National Accelerator Laboratory)
4/14/15, 3:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Multicore and manycore have become the norm for scientific computing environments. Multicore/manycore platform architectures provide advanced capabilities and features that can be exploited to enhance data movement performance for large-scale distributed computing environments, such as LHC. However, existing data movement tools do not take full advantage of these capabilities and features....
Mr
Jason Alexander Smith
(Brookhaven National Laboratory)
4/14/15, 4:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Using centralized configuration management, including automation tools such as
Puppet, can greatly increase provisioning speed and efficiency when configuring
new systems or making changes to existing systems, reduce duplication of work,
and improve automated processes. However, centralized management also brings
with it a level of inherent risk: a single change in just one file can...
Alessandro De Salvo
(Universita e INFN, Roma I (IT))
4/14/15, 4:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The ATLAS Installation System v2 is the evolution of the original system, used since 2003. The original tool has been completely re-designed in terms of database backend and components, adding support for submission to multiple backends, including the original WMS and the new Panda modules. The database engine has been changed from plain MySQL to Galera/Percona and the table structure has been...
Dr
Giuseppe Avolio
(CERN)
4/14/15, 5:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Complex Event Processing (CEP) is a methodology that combines data from different sources in order to identify events or patterns that need particular attention. It has gained a lot of momentum in the computing world in the past few years and is used in ATLAS to continuously monitor the behaviour of the data acquisition system, to trigger corrective actions and to guide the experimentโs...
Mr
Tigran Mkrtchyan
(DESY)
4/14/15, 5:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Over the previous years, storage providers in scientific infrastructures were facing a significant change in the usage profile of their resources. While in the past, a small number of experiment frameworks were accessing those resources in a coherent manner, now, a large amount of small groups or even individuals request access in a completely chaotic way. Moreover, scientific laboratories...
Peter Onyisi
(University of Texas (US))
4/14/15, 5:30โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
During LHC Run 1, the information flow through the offline data quality monitoring in ATLAS relied heavily on chains of processes polling each other's outputs for handshaking purposes. ย This resulted in a fragile architecture with many possible points of failure and an inability to monitor the overall state of the distributed system. ย We report on the status of a project undertaken during the...
Bruno Lange Ramos
(Univ. Federal do Rio de Janeiro (BR)),
Bruno Lange Ramos
(Univ. Federal do Rio de Janeiro (BR))
4/14/15, 5:45โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
In order to manage a heterogeneous and worldwide collaboration, the ATLAS experiment developed web systems that range from supporting the process of publishing scientific papers to monitoring equipment radiation levels. These systems are vastly supported by Glance, a technology that was set forward in 2004 to create an abstraction layer on top of different databases; it automatically...
Andrew Hanushevsky
(STANFORD LINEAR ACCELERATOR CENTER)
4/14/15, 6:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
As more experiments move to a federated model of data access the environment becomes highly distributed and decentralized. In many cases this may pose obstacles in quickly resolving site issues; especially given vast time-zone differences. Spurred by ATLAS needs, Release 4 of XRootD incorporates a special mode of access to provide remote debugging capabilities. Essentially, XRootD allows a...
Dr
Maria Grazia Pia
(Universita e INFN (IT))
4/16/15, 9:00โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Testable physics by design
The validation of physics calculations requires the capability to thoroughly test them. The difficulty of exposing parts of the software to adequate testing can be the source of incorrect physics functionality, which in turn may generate hard to identify systematic effects in physics observables produced by the experiments.
Starting from real-life examples...
Elisabetta Ronchieri
(INFN)
4/16/15, 9:15โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Geant4 is a widespread simulation system of "particles through matter" used in several experimental areas from high energy physics and nuclear experiments to medical studies. Some of its applications may involve critical use cases; therefore they would benefit from an objective assessment of the software quality of Geant4. The issue of maintainability is especially relevant for such a widely...
Danilo Piparo
(CERN)
4/16/15, 9:30โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The sixth release cycle of ROOT is characterised by a radical modernisation in
the core software technologies the tookit relies on: language standard,
interpreter, hardware exploitation mechanisms.
If on the one hand, the change offered the opportunity of consolidating the
existing codebase, in presence of such innovations, maintaing the balance
between full backward compatibility and...
Philippe Canal
(Fermi National Accelerator Lab. (US))
4/16/15, 9:45โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Following the release of version 6, ROOT has entered a new area of development. It will leverage the industrial strength compiler library shipping in ROOT 6 and its support of the C++11/14 standard, to significantly simplify and harden ROOT's interfaces and to clarify and substantially improve ROOT's support for multi-threaded environments.
This talk will also recap the most important new...
Mr
Giulio Eulisse
(Fermi National Accelerator Lab. (US))
4/16/15, 10:15โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
In recent years the size and scale of scientific computing has grown
significantly. Computing facilities have grown to the point where energy
availability and costs have become important limiting factors
for data-center size and density. At the same time, power density
limitations in processors themselves are driving interest in more
heterogeneous processor architectures. Optimizing...
Oliver Keeble
(CERN)
4/16/15, 11:00โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The overall success of LHC data processing depends heavily on stable, reliable and fast data distribution. The Worldwide LHC Computing Grid (WLCG) relies on the File Transfer Service (FTS) as the data movement middleware for moving sets of files from one site to another.
This paper describes the components of FTS3 monitoring infrastructure and how they are built to satisfy the common and...
Luca Mascetti
(CERN)
4/16/15, 11:15โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
Cernbox is a cloud synchronisation service for end-users: it allows to sync and share files on all major mobile and desktop platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide offline availability to any data stored in the CERN EOS infrastructure.
The successful beta phase of the service confirmed the high demand in the community for such easily accessible cloud storage...
Parag Mhashilkar
(Fermi National Accelerator Laboratory)
4/16/15, 11:30โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The FabrIc for Frontier Experiments (FIFE) program is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model development for Fermilab experiments and external projects. FIFE is a collaborative effort between physicists and computing professionals to provide computing solutions for experiments of varying scale, needs, and...
Tom Uram
(urn:Google)
4/16/15, 11:45โฏAM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
HEPโs demand for computing resources has grown beyond the capacity of the Grid, and these demands will accelerate with the higher energy and luminosity planned for Run II. Mira, the ten petaflops supercomputer at the Argonne Leadership Computing Facility, is a potentially significant compute resource for HEP research. Through an award of fifty million hours on Mira, we have delivered millions...
Dr
Robert Andrew Currie
(Imperial College Sci., Tech. & Med. (GB))
4/16/15, 12:00โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
The DIRAC INTERWARE system was originally developed within the LHCb VO as a common interface to access distributed resources, i.e. grids, clouds and local batch systems. It has been used successfully in this context by the LHCb VO for a number of years. In April 2013 the GridPP consortium in the UK decided to offer a DIRAC service to a number of small VOs. The majority of these had been...
Dr
Andrew Norman
(Fermilab)
4/16/15, 12:15โฏPM
Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing
oral presentation
As high energy physics experiments have grown, their operational needs and requirements they place on computing systems change. These changes often require new technical solutions to meet the increased demands and functionalities of the science. How do you affect sweeping change to core infrastructure, without causing major interruptions to the scientific programs?
This paper explores the...