Computing in High Energy and Nuclear Physics (CHEP) 2012

US/Eastern
New York City, NY, USA

New York City, NY, USA

Dirk Duellmann (CERN), Michael Ernst (Unknown)
    • 8:30 AM 10:00 AM
      Plenary Skirball Center

      Skirball Center

      Convener: Michael Ernst (Unknown)
      • 8:30 AM
        Welcome to CHEP 2012 5m
        Speaker: Michael Ernst (Unknown)
        Video in CDS
      • 8:35 AM
        Welcome to NYU 10m
        Speaker: Dr Paul Horn (NYU Distinguished Scientist in Residence & Senior Vice Provost for Research)
      • 8:45 AM
        Keynote Address: High Energy Physics and Computing – Perspectives from DOE 45m
        Speaker: Mr Glen Crawford (DOE Office of Science)
        Slides
        Video in CDS
      • 9:30 AM
        LHC experience so far, prospects for the future 30m
        Speaker: Prof. Joe Incandela (UCSB)
        Slides
        Video in CDS
    • 10:00 AM 10:30 AM
      Coffee Break 30m Skirball Center

      Skirball Center

    • 10:30 AM 12:00 PM
      Plenary Skirball Center

      Skirball Center

      Convener: Dr David Malon (Argonne National Laboratory (US))
      • 10:30 AM
        HEP Computing 30m
        Speaker: Dr Rene Brun (CERN)
        Slides
        Video in CDS
      • 11:00 AM
        Upgrade of the LHC Experiment Online Systems 30m
        Speaker: Wesley Smith (University of Wisconsin (US))
        Slides
        Video in CDS
      • 11:30 AM
        Perspective Across The Technology Landscape: Existing Standards Driving Emerging Innovations 30m
        Speaker: Mr Forrest Norrod (Vice President & General Manager, Dell Worldwide Server Solutions)
        Slides
        Video in CDS
    • 12:00 PM 1:30 PM
      Lunch break 1h 30m
    • 1:30 PM 3:35 PM
      Online Computing Room 804/805 (Kimmel Center)

      Room 804/805

      Kimmel Center

      Convener: Niko Neufeld (CERN)
      • 1:30 PM
        ALICE moves into warp drive. 25m
        A Large Ion Collider Experiment (ALICE) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). Since its successful start-up in 2010, the LHC has been performing outstandingly, providing to the experiments long periods of stable collisions and an integrated luminosity that greatly exceeds the planned targets. To fully explore these privileged conditions, we aim at maximizing the experiment's data taking productivity during stable collisions. We present in this paper the evolution of the online systems in order to spot reasons of inefficiency and address new requirements. This paper describes the features added to the ALICE Electronic Logbook (eLogbook) to allow the Run Coordination team to identify, prioritize, fix and follow causes of inefficiency in the experiment. Thorough monitoring of the data taking efficiency provides reports for the collaboration to portray its evolution and evaluate the measures (fixes and new features) taken to increase it. In particular, the eLogbook helps decision making by providing quantitative input, which can be used to better balance risks of changes in the production environment against potential gains in quantity and quality of physics data. It will also present the evolution of the Experiment Control System (ECS) to allow on-the-fly error recovery actions of the detector apparatus while limiting as much as possible the loss of integrated luminosity. The paper will conclude with a review of the ALICE efficiency so far and the future plans to improve its monitoring. This paper will describe how the ALICE Electronic Logbook (eLogbook) is used to recognize the main causes of inefficiency, allowing the Run Coordination team to identify, prioritize, address and follow them. It will also explain how the eLogbook is used to monitor the data taking efficiency, providing reports that allow the collaboration to portray its evolution and evaluate the measures taken to increase it. Finally, it will present the ALICE efficiency since the start-up of the LHC and the future plans to improve its monitoring.
        Speaker: Mr Vasco Chibante Barroso (CERN)
        Slides
        Video in CDS
      • 1:55 PM
        ALICE HLT TPC Tracking of Heavy-Ion Events on GPUs 25m
        The ALICE High Level Trigger (HLT) is capable of performing an online reconstruction of heavy-ion collisions. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running the QA on computers without a GPU, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024%. The GPU tracker outperforms its CPU analog by a factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and will be able to process central heavy ion events at a rate of approximately 200 Hz. The tracking algorithm together with the necessary modifications, a performance comparison of the CPU and the GPU version, and QA plots will be presented.
        Speaker: David Michael Rohr (Johann-Wolfgang-Goethe Univ. (DE))
        Slides
        Video in CDS
      • 2:20 PM
        Performance of the ATLAS trigger system 25m
        The ATLAS trigger has been used very successfully to collect collision data during 2009-2011 LHC running at centre of mass energies between 900 GeV and 7 TeV. The three-level trigger system reduces the event rate from the design bunch-crossing rate of 40 MHz to an average recording rate of about 300 Hz. The first level uses custom electronics to reject most background collisions, in less than 2.5 us, using information from the calorimeter and muon detectors. The upper two trigger levels are software-based triggers. The trigger system selects events by identifying signatures of muon, electron, photon, tau lepton, jet, and B meson candidates, as well as using global event signatures, such as missing transverse energy. We give an overview of the performance of these trigger selections based on extensive online running during the 2011 LHC run and discuss issues encountered during 2011 operations. Distributions of key selection variables are shown calculated at the different trigger levels and are compared with offline reconstruction. Trigger efficiencies with respect to offline reconstructed signals are shown and compared to simulation, illustrating a very good level of understanding of the detector and trigger performance. We describe how the trigger has evolved with increasing LHC luminosity coping with pileup conditions close to LHC design luminosity.
        Speaker: Diego Casadei (New York University (US))
        Slides
        Video in CDS
      • 2:45 PM
        Applications of advanced data analysis and expert system technologies in ATLAS Trigger-DAQ Controls framework 25m
        The Trigger and DAQ (TDAQ) system of the ATLAS experiment is a very complex distributed computing system, composed of O(10000) of applications running on more than 2000 computers. The TDAQ Controls system has to guarantee the smooth and synchronous operations of all TDAQ components and has to provide the means to minimize the downtime of the system caused by runtime failures, which are inevitable for a system of such scale and complexity. During data taking runs, streams of information messages sent or published by TDAQ applications are the main sources of knowledge about correctness of running operations. The huge flow of operational monitoring data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehavior. Given the scale of the system and the rates of data to be analyzed, the automation of the Control system functionality in areas of operational monitoring, system verification, error detection and recovery is a strong requirement. It allows to reduce the operations man power needs and to assure a constant high quality of problem detection and following recovery. To accomplish its objective, the Controls system includes some high-level components which are based on advanced software technologies, namely the rule-based expert system (ES) and the complex event processing (CEP) engines. The chosen techniques allow to formalize, to store and to reuse the TDAQ experts' knowledge in the Control framework and thus to assist TDAQ shift crew to accomplish its task. DVS (Diagnostics and Verification System) and Online Recovery components are responsible for the automation of system testing and verification, diagnostics of failures and recovery procedures. These components are built on top of a common technology of a forward-chaining ES framework (based on CLIPS expert system shell), that allows to program the behavior of a system in terms of “if-then” rules and to easily extend or modify the knowledge base. The core of AAL (Automated monitoring and AnaLysis) component is a CEP (Complex Event Processing) engine implemented using ESPER in Java. The engine is loaded with a set of directives and it performs correlation and analysis of operational messages and events and produces operator-friendly alerts, assisting TDAQ operators to react promptly in case of problems or to perform important routine tasks. The component is known to shifters as "Shifter Assistant" (SA), and introduction of the SA allowed to reduce the number of shifters in the ATLAS control room. Design foresees a machine learning module to detect anomaly and problems that cannot be defined in advance. The described components are constantly used for the ATLAS Trigger-DAQ system operations, and the knowledge base is growing as more expertise is acquired. By the end of 2011 the size of the knowledge base used for TDAQ operations was about 300 rules. The paper presents the design and present implementation of the components and also the experience of its use in a real operational environment of the ATLAS experiment.
        Speaker: Dr Giuseppe Avolio (University of California Irvine (US))
        Slides
        Video in CDS
      • 3:10 PM
        The operational performance of the ATLAS trigger and data acquisition system and its possible evolution 25m
        The ATLAS experiment at the Large Hadron Collider at CERN relies on a complex and highly distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data at unprecedented energy and rates. The TDAQ is composed of three levels which reduces the event rate from the design bunch-crossing rate of 40 MHz to an average event recording rate of about 200 Hz. The first part of this presentation will give an overview of the operational performance of the DAQ system during 2011 and the first months of data taking in 2012. It will describe how the flexibility inherent in the design of the system has be exploited to meet the changing needs of ATLAS data taking and in some cases push performance beyond the original design performance specification. The experience accumulated in the ATLAS DAQ/HLT system operation during these years stimulated also interest to explore possible evolutions, despite the success of the current design. One attractive direction is to merge three systems - the second trigger level (L2), the Event Builder (EB), and the Event Filter (EF) - within a single homogeneous one in which each HLT node executes all the steps required by the trigger and data acquisition process. Each L1 event is assigned to an available HLT node which executes the L2 algorithms using a subset of the event data and, upon positive selection, builds the event, which is further processed by the EF algorithms. Appealing aspects of this design are: a simplification of the software architecture and of its configuration, a better exploitation of the computing resources, the caching of fragments already collected for L2 processing, the automated load balancing between L2 and EF selection steps, the sharing of code and services on HLT nodes. Furthermore, the full treatment of the HLT selection on a single node allows more flexible approaches, for example "incremental event building" in which trigger algorithms progressively enlarge the size of the analysed region of interest, before requiring the building of the complete event. To spot possible limitations of the new approach and to demonstrate the benefits out-lined above, a prototype has been implemented. The preliminary measurements are positive and further tests are scheduled for the next months. Appealing aspects of this design are: a simplification of the software architecture and of its configuration, a better exploitation of the computing resources, the caching of fragments already collected for L2 processing, the automated load balancing between L2 and EF selection steps, the sharing of code and services on HLT nodes. Furthermore, the full treatment of the HLT selection on a single node allows more flexible approaches, for example "incremental event building" in which trigger algorithms progressively enlarge the size of the analysed region of interest, before requiring the building of the complete event. To spot possible limitations of the new approach and to demonstrate the benefits out-lined above, a prototype has been implemented. The preliminary measurements are positive and further tests are scheduled for the next months. Their results are the subject of this paper.
        Speaker: Andrea Negri (Universita e INFN (IT))
        Slides
        Video in CDS
    • 1:30 PM 3:35 PM
      Distributed Processing and Analysis on Grids and Clouds Eisner & Lubin Auditorium (Kimmel Center)

      Eisner & Lubin Auditorium

      Kimmel Center

      Convener: Oliver Gutsche (Fermi National Accelerator Lab. (US))
      • 1:30 PM
        AliEn: ALICE Environment on the GRID 25m
        AliEn is the GRID middleware used by the ALICE collaboration. It provides all the components that are needed to manage the distributed resources. AliEn is used for all the computing workflows of the experiment: Montecarlo production, data replication and reconstruction and organixed or chaotic user analysis. Moreover, AliEn is also being used by other experiments like PANDA and CBM. The main components of AliEn are a centralized file and metadata catalogue, a job execution model and file replication model. These three components have been evolving over the last 10 years to make sure that the satisfy the computing requirements of the experiment, which keep increasing every year.
        Speaker: Pablo Saiz (CERN)
        Slides
        Video in CDS
      • 1:55 PM
        The ATLAS Distributed Data Management project: Past and Future 25m
        The ATLAS collaboration has recorded almost 5PB of RAW data since the LHC started running at the end of 2009. Together with experimental data generated from RAW and complimentary simulation data, and accounting for data replicas on the grid, a total of 74TB is currently stored in the Worldwide LHC Computing Grid by ATLAS. All of this data is managed by the ATLAS Distributed Data Management system, called Don Quixote 2 (DQ2). The DQ2 system has over time rapidly evolved to assist the ATLAS collaboration management to properly manage the data, as well as provide an effective interface allowing physicists easy access to this data. Numerous new requirements and operational experience of ATLAS' use cases have necessitated the need for a next generation data management system, called Rucio, which will re-engineer the current system to cover new high-level use cases and workflows such as the management of data for physics groups. In this talk, we will describe the state of the current of DQ2, and present an overview of the upcoming Rucio system, covering it's architecture, new innovative features, and preliminary benchmarks.
        Speaker: Vincent Garonne (CERN)
        Slides
        Video in CDS
      • 2:20 PM
        The CMS workload management system 25m
        CMS has started the process of rolling out a new workload management system. This system is currently used for reprocessing and monte carlo production with tests under way using it for user analysis. It was decided to combine, as much as possible, the production/processing, analysis and T0 codebases so as to reduce duplicated functionality and make best use of limited developer and testing resources. This system now includes central request submission and management (Request Manager); a task queue for parcelling up and distributing work (WorkQueue) and agents which process requests by interfacing with disparate batch and storage resources (WMAgent).
        Speaker: Dr Stuart Wakefield (Imperial College London)
        Slides
        Video in CDS
      • 2:45 PM
        The LHCb Data Management System 25m
        The LHCb Data Management System is based on the DIRAC Grid Community Solution. LHCbDirac provides extensions to the basic DMS such as a Bookkeeping System. Datasets are defined as sets of files corresponding to a given query in the Bookkeeping system. Datasets can be manipulated by CLI tools as well as by automatic transformations (removal, replication, processing). A dynamic handling of dataset replication is performed, based on disk space usage at the sites and dataset popularity. For custodial storage, an on-demand recall of files from tape is performed, driven by the requests of the jobs, including disk cache handling. We shall describe all the tools that are available for Data Management, from handling of large datasets to basic tools for users as well as for monitoring the dynamic behaviour of LHCb Storage capacity.
        Speaker: Philippe Charpentier (CERN)
        Slides
        Video in CDS
      • 3:10 PM
        The “Common Solutions" Strategy of the Experiment Support group at CERN for the LHC Experiments 25m
        After two years of LHC data taking, processing and analysis and with numerous changes in computing technology, a number of aspects of the experiments’ computing as well as WLCG deployment and operations need to evolve. As part of the activities of the Experiment Support group in CERN’s IT department, and reinforced by effort from the EGI-InSPIRE project, we present work aimed at common solutions across all LHC experiments. Such solutions allow us not only to optimize development manpower but also offer lower long-term maintenance and support costs. The main areas cover Distributed Data Management, Data Analysis, Monitoring and the LCG Persistency Framework. Specific tools have been developed including the HammerCloud framework, automated services for data placement, data cleaning and data integrity (such as the data popularity service for CMS, the common Victor cleaning agent for ATLAS and CMS and tools for catalogue/storage consistency), the Dashboard Monitoring framework (job monitoring, data management monitoring, File Transfer monitoring) and the Site Status Board. This talk focuses primarily on the strategic aspects of providing such common solutions and how this relates to the overall goals of long-term sustainability and the relationship to the various WLCG Technical Evolution Groups
        Speaker: Dr Maria Girone (CERN)
        Slides
        Video in CDS
    • 1:30 PM 3:35 PM
      Computer Facilities, Production Grids and Networking Room 914 (Kimmel Center)

      Room 914

      Kimmel Center

      Convener: Dr Daniele Bonacorsi (University of Bologna)
      • 1:30 PM
        Intercontinental Multi-Domain Monitoring for the LHC Optical Private Network 25m
        The Large Hadron Collider (LHC) is currently running at CERN in Geneva, Switzerland. Physicists are using LHC to recreate the conditions just after the Big Bang, by colliding two beams of particles and heavy ions head-on at very high energy. The project is expected to generate 27 TB of raw data per day, plus 10 TB of "event summary data". This data is sent out from CERN to eleven Tier 1 academic institutions in Europe, Asia, and North America using a multi-gigabits Optical Private Network (OPN), the LHCOPN. Network monitoring on such complex network architecture to ensure robust and reliable operation is of crucial importance. The chosen approach for monitoring the OPN is based on the perfSONAR MDM framework (http://perfsonar.geant.net), which is designed for multi-domain monitoring environments. perfSONAR (www.perfsonar.net) is an infrastructure for performance monitoring data exchange between networks, making it easier to solve performance problems occurring between network measurement points interconnected through several network domains. It contains a set of services delivering performance measurements in a multi-domain environment. These services act as an intermediate layer, between the performance measurement tools and the visualization applications. This layer is aimed at exchanging performance measurements between networks, using well defined protocols. perfSONAR is web-service based, modular, and it uses NM-WG OGF standards. perfSONAR MDM is the perfSONAR version built by GÉANT (www.geant.net), the consortium operating the European Backbone for research and education. Given the quite particular structure of the LHCOPN, a specially customised version of the perfSONAR MDM was prepared by an international consortium for the specific monitoring of IP and circuits of the LHC Optical Private Network. The proposed presentation will introduce the main points of the LHCOPN structure, provide an introduction about perfSONAR framework (software, architecture, service structure) and finally describe the way the whole monitoring infrastructure is monitored and how the support is organised.
        Speaker: Dr Domenico Vicinanza (DANTE)
        Slides
        Video in CDS
      • 1:55 PM
        From IPv4 to eternity - the HEPiX IPv6 working group 25m
        The much-heralded exhaustion of the IPv4 networking address space has finally started. While many of the research and education networks have been ready and poised for years to carry IPv6 traffic, there is a well-known lack of academic institutes using the new protocols. One reason for this is an obvious absence of pressure due to the extensive use of NAT or that most currently still have sufficient IPv4 addresses. More importantly though, the fact is that moving your distributed applications to IPv6 involves much more than the routing, naming and addressing solutions provided by your campus and national networks. Application communities need to perform a full analysis of their applications, middleware and tools to confirm how much development work is required to use IPv6 and to plan a smooth transition. A new working group of HEPiX (http://www.hepix.org) was formed in Spring 2011 to address exactly these issues for the High Energy Physics community. The HEPiX IPv6 Working Group has been investigating the many issues which feed into the decision on the timetable for a transition to the use of IPv6 in HEP Computing, in particular for the Worldwide LHC Computing Grid (http://lcg.web.cern.ch/lcg/). The activities include the analysis and testing of the readiness for IPv6 and performance of many different components, including the applications, middleware, management and monitoring tools essential for HEP computing. A distributed IPv6 testbed has been deployed and used for this purpose and we have been working closely with the HEP experiment collaborations. The working group is also considering other operational issues such as the implications for security arising from a move to IPv6. This paper describes the work done by the HEPiX IPv6 working group since its inception and presents our current conclusions and recommendations.
        Speaker: Edoardo Martelli (CERN)
        Slides
        Video in CDS
      • 2:45 PM
        Overview of storage operations at CERN 25m
        Large-volume physics data storage at CERN is based on two services, CASTOR and EOS: * CASTOR - in production for many years - now handles the Tier0 activities (including WAN data distribution), as well as all tape-backed data; * EOS - in production since 2011 - supports the fast-growing need for high-performance low-latency (i.e. diskonly) data access for user analysis. In 2011, a large part of the original CASTOR storage has been migrated into EOS, which grew from the original testbed installation (1 PB usable capacity for ATLAS) to over 6 PB for ATLAS and CMS. EOS has has been validated for several month under production conditions and has already replaced several CASTOR service classes. CASTOR also evolved during this time with major improvements in critical areas - notably the internal scheduling of requests, the simplifications of the database structure and a complete overhaul of the tape subsystem. The talk will compare the two systems from an operation's perspective (setup, day-by-day user support, upgrades, resilience to common failures) while taking into account their different scope. In the case of CASTOR we will analyse the impact of the 2011 improvements on delivering Tier0 services, while for EOS we will focus on the steps to achieve to a production-quality service. For both systems, the upcoming changes will be discussed in relation with the evolution of the LHC programme and computing models (data volumes, access patterns, relations among computing sites).
        Speakers: Jan Iven (CERN), Massimo Lamanna (CERN)
        Slides
        Video in CDS
      • 3:10 PM
        CMS Data Transfer operations after the first years of LHC collisions 25m
        CMS experiment possesses distributed computing infrastructure and its performance heavily depends on the fast and smooth distribution of data between different CMS sites. Data must be transferred from the Tier-0 (CERN) to the Tier-1 for storing and archiving, and time and good quality are vital to avoid overflowing CERN storage buffers. At the same time, processed data has to be distributed from Tier-1 sites to all Tier-2 sites for physics analysis while MonteCarlo simulations synchronized back to Tier-1 sites for further archival. At the core of all transferring machinery is PhEDEx (Physics Experiment Data Export) data transfer system. It is very important to ensure reliable operation of the system, and the operational tasks comprise monitoring and debugging all transfer issues. Based on transfer quality information Site Readiness tool is used to create plans for resources utilization in the future. We review the operational procedures created to enforce reliable data delivery to CMS distributed sites all over the world. Additionally, we need to keep data consistent at all sites and both on disk and on tape. In this presentation, we describe the principles and actions taken to keep data consistent on sites storage systems and central CMS Data Replication Database (TMDB/DBS) while ensuring fast and reliable data samples delivery of hundreds of terabytes to the entire CMS physics community.
        Speaker: Rapolas Kaselis (Vilnius University (LT))
        Slides
        Video in CDS
    • 1:30 PM 3:35 PM
      Software Engineering, Data Stores and Databases Room 802 (Kimmel Center)

      Room 802

      Kimmel Center

      Convener: David Lange (Lawrence Livermore Nat. Laboratory (US))
      • 1:30 PM
        Cling - The LLVM-based C++ Interpreter 25m
        Cling (http://cern.ch/cling) is a C++ interpreter, built on top of clang (http://clang.llvm.org) and LLVM (http://llvm.org). Like its predecessor CINT, cling offers an interactive, terminal-like prompt. It enables exploratory programming with rapid edit / run cycles. The ROOT team has more than 15 years of experience with C++ interpreters, and this has been fully exploited in the design of cling. However, matching the concepts of an interpreter to a compiler library is a non-trivial task; we will explain how this is done for cling, and how we managed to implement cling as a small (10,000 lines of code) extension to the clang and llvm libraries. The resulting features clearly show the advantages of basing an interpreter on a compiler. Cling uses clang's praised concise and easy to understand diagnostics. Building an interpreter on top of a compiler library makes the transition between interpreted and compiled code much easier and smoother. We will present the design, e.g. how cling treats the C++ extensions that used to be available in CINT. We will also present the new features, e.g. how C++11 will come to cling, and how dictionaries will be simplified due to cling. We describe the state of cling's integration in the ROOT Framework.
        Speaker: Vasil Georgiev Vasilev (CERN)
        Slides
        Video in CDS
      • 1:55 PM
        GOoDA: The Generic Optimization Data Analyzer 25m
        Modern superscalar, out-of-order microprocessors dominate large scale server computing. Monitoring their activity, during program execution, has become complicated due to the complexity of the microarchitectures and their IO interactions. Recent processors have thousands of performance monitoring events. These are required to actually provide coverage for all of the complex interactions and performance issues that can occur. Knowing which data to collect and how to interpret the results has become an unreasonable burden for code developers whose tasks are already hard enough. It becomes the task of the analysis tool developer to bridge this gap. To address this issue, a generic decomposition of how a microprocessor is using the consumed cycles allows code developers to quickly understand which of the myriad of microarchitectural complexities they are battling, without requiring a detailed knowledge of the microarchitecture. When this approach is intrinsically integrated into a performance data analysis tool, it enables software developers to take advantage of the microarchitectural methodology that has only been available to experts. The Generic Optimization Data Analyzer (GOoDA) project integrates this expertise into a profiling tool in order to lower the required expertise of the user and, being designed from the ground up with large-scale object-oriented applications in mind, it will be particularly useful for large HENP codebases
        Speaker: Roberto Agostino Vitillo (LBNL)
        Slides
        Video in CDS
      • 2:20 PM
        Massively parallel Markov chain Monte Carlo with BAT 25m
        The Bayesian Analysis Toolkit (BAT) is a C++ library designed to analyze data through the application of Bayes' theorem. For parameter inference, it is necessary to draw samples from the posterior distribution within the given statistical model. At its core, BAT uses an adaptive Markov Chain Monte Carlo (MCMC) algorithm. As an example of a challenging task, we consider the analysis of rare B-decays in a global fit involving about 20 observables measured at the B-factories and by the CDF and LHCb collaborations. A single evaluation of the likelihood requires approximately 1 s. In addition to the 3 -- 12 parameters of interest, there are on the order of 25 nuisance parameters describing uncertainties from standard model parameters as well as from unknown higher order theory corrections and non-perturbative QCD effects. The resulting posterior distribution is multi-modal and shows significant correlation between parameters as well as pronounced degeneracies, hence the standard MCMC methods fail to produce accurate results. Parallelization is the only solution to obtain a sufficient number of samples in reasonable time. We present an enhancement of existing MCMC algorithms, including the ability for massive parallelization on a computing cluster and, more importantly, a general scheme to induce rapid convergence even in the face complicated posterior distributions.
        Speaker: Frederik Beaujean (Max Planck Institute for Physics)
        Slides
        Video in CDS
      • 2:45 PM
        Experiences with Software Quality Metrics in the EMI Middleware 25m
        The EMI Quality Model has been created to define, and later review, the EMI (European Middleware Initiative) software product and process quality. A quality model is based on a set of software quality metrics and helps to set clear and measurable quality goals for software products and processes. The EMI Quality Model follows the ISO/IEC 9126 Software Engineering – Product Quality to identify a set of characteristics that need to be present in the EMI software. For each software characteristic, such as portability, maintainability, compliance, etc, a set of associated metrics and KPIs (Key Performance Indicators) are identified. This article presents how the EMI Quality Model and the EMI Metrics have been defined in the context of the software quality assurance activities carried out in EMI. It also describes the measurement plan and presents some of the metrics reports that have been produced for the EMI releases and updates. It also covers which tools and techniques can be used by any software project to extract “code metrics” on the status of the software products and “process metrics” related to the quality of the development and support process such as reaction time to critical bugs, requirements tracking and delays in product releases.
        Speaker: Maria Alandes Pradillo (CERN)
        Slides
        Video in CDS
      • 3:10 PM
        Improving Software Quality of the ALICE Data-Acquisition System through Program Analysis 25m
        The Data-Acquisition System designed by ALICE , which is the experiment dedicated to the study of strongly interacting matter and the quark-gluon plasma at the CERN LHC(Large Hadron Collider), handles the data flow from the sub-detector electronics to the archiving on tape. The software framework of the ALICE data-acquisition system is called DATE (ALICE Data Acquisition and Test Environment) and consists of a set of software packages grouped into main logic packages and utility packages. In order to assess the software quality of DATE, and review possible improvements, we implement PAF (Program Analysis Framework) to analyze the software architecture and software modularity. The basic idea about PAF is recording the call relationships information among the important elements (i.e., functions, global variables, complex structures) firstly and then using the different analysis algorithms to find the Crosscutting Concerns which could destroy the modularity of the software from this recording information. The PAF is based on the API of Eclipse C/C++ Development Tooling(CDT) because the source codes of DATE framework is written in C language. The CDT project based on the Eclipse platform provides a fully functional C and C++ Integrated Development Environment. The PAF for DATE could also be used for the analysis of other projects written in C language. Finally we evaluate our framework through analyzing the software system of DATE. The analysis result proves the effectiveness and efficiency of our framework. PAF has pinpointed a number of possible optimizations which could be applied to DATE and help maximizing the software quality.
        Speaker: Mrs Jianlin Zhu (Huazhong Normal University (CN))
        Paper
        Slides
    • 1:30 PM 3:35 PM
      Collaborative tools Room 808 (Kimmel Center)

      Room 808

      Kimmel Center

      Convener: Tony Johnson (Nuclear Physics Laboratory)
      • 1:30 PM
        Talking Physics: Can Social Media Teach HEP to Converse Again? 25m
        Og, commonly recognized as one of the earliest contributors to experimental particle physics, began his career by smashing two rocks together, then turning to his friend Zog and stating those famous words “oogh oogh”. It was not the rock-smashing that marked HEP’s origins, but rather the sharing of information, which then allowed Zog to confirm the important discovery, that rocks are indeed made of smaller rocks. Over the years, Socrates and other great teachers developed the methodology of this practice. Yet, as small groups of friends morphed into large classrooms of students, readers of journals, and audiences of television viewers, science conversation evolved into lecturing and broadcasting. While information is still conveyed in this manner, the invaluable, iterative nature of question/response is often lost or limited in duration. The birth of Web 2.0 and the development of Social Media tools, such as Facebook, Twitter and Google +, are allowing iterative conversation to reappear in nearly every aspect of communication. From comments on public articles and publications to “wall” conversations and tweets, physicists are finding themselves interacting with the public before, during and after publication. I discuss both the danger and the powerful potential of this phenomenon, and present methods currently used in HEP to make the best of it.
        Speaker: Steven Goldfarb (University of Michigan (US))
        Slides
        Video in CDS
      • 1:55 PM
        Code and papers: computing publication patterns in the LHC era 25m
        Publications in scholarly journals establish the body of knowledge deriving from scientific research; they also play a fundamental role in the career path of scientists and in the evaluation criteria of funding agencies. This presentation reviews the evolution of computing-oriented publications in HEP following the start of operation of LHC. Quantitative analyses are illustrated, which document the production of scholarly papers on computing-related topics by HEP experiments and core tools projects (including distributed computing R&D), and the citations they receive. Several scientometric indicators are analyzed to characterize the role of computing in HEP literature. Distinctive features of scholarly publication production in the software-oriented and hardware-oriented experimental HEP communities are highlighted. Current patterns and trends are compared to the situation in previous generations' HEP experiments at LEP, Tevatron and B-factories. The results of this scientometric analysis document objectively the contribution of computing to HEP scientific production and technology transfer to other fields. They also provide elements for discussion about how to more effectively promote the role played by computing-oriented research in high energy physics.
        Speaker: Dr Maria Grazia Pia (Universita e INFN (IT))
        Slides
        Video in CDS
      • 2:20 PM
        Indico: CERN Collaboration Hub 25m
        Since 2009, the development of Indico has focused on usability, performance and new features, especially the ones related to meeting collaboration. Usability studies have resulted in the biggest change Indico has experienced up to now, a new web layout that makes the user experience better. Performance improvements were also a key goal since 2010; the main features of Indico have been optimized remarkably. Along with usability and performance, new features have been added to Indico such as webchat integration, video services bookings, webcast and recording requests, designed to really reinforce Indico position as the main hub for all CERN collaboration services, and many others which aim is to complete the conference lifecycle management. Indico development is also moving towards a broader collaboration where other institutes, hosting their own Indico instance, can contribute to the project in order make it a better and more complete tool.
        Speaker: Pedro Ferreira (CERN)
        Slides
        Video in CDS
      • 2:45 PM
        The Workflow of LHC Papers 25m
        In this talk, we will explain how CERN digital library services have evolved to deal with the publication of the first results of the LHC. We will describe the work-flow of the documents on CERN Document Server and the diverse constraints relative to this work-flow. We will also give an overview on how the underlying software, Invenio, has been enriched to cope with special needs. In a second part, the impact in terms of user access to the publication of the experiments and to the multimedia material will be detailed. Finally, the talk will focus on how the institutional repository (CDS) is being linked to the HEP disciplinary archive (INSPIRE) in order to provide users with a central access point to reach LHC results.
        Speaker: Ludmila Marian (CERN)
        Slides
        Video in CDS
      • 3:10 PM
        A New Information Architecture, Web Site and Services for the CMS Experiment 25m
        The age and size of the CMS collaboration at the LHC means it now has many hundreds of inhomogeneous web sites and services and more than 100,000 documents. We describe a major initiative to create a single coherent CMS internal and public web site. This uses the Drupal web Content Management System (now supported by CERN/IT) on top of a standard LAMP stack (Linux, Apache, MySQL, and php/perl). The new navigation, content and search services are coherently integrated with numerous existing CERN services (CDS, EDMS, Indico, phonebook, Twiki) as well as many CMS internal Web services. We describe the information architecture; the system design, implementation and monitoring; the document and content database; security aspects; and our deployment strategy which ensured continual smooth operation of all systems at all times.
        Speaker: Lucas Taylor (Fermi National Accelerator Lab. (US))
        Slides
        Video in CDS
    • 1:30 PM 6:15 PM
      Poster Session: set-up for session 1 Rosenthal Pavilion (10th floor) (Kimmel Center)

      Rosenthal Pavilion (10th floor)

      Kimmel Center

    • 3:35 PM 4:35 PM
      Coffee Break 1h Kimmel Center

      Kimmel Center

    • 4:35 PM 6:15 PM
      Online Computing Room 804/805 (Kimmel Center)

      Room 804/805

      Kimmel Center

      Convener: Sylvain Chapeland (CERN)
      • 4:35 PM
        Operational experience with the CMS Data Acquisition System 25m
        The data-acquisition (DAQ) system of the CMS experiment at the LHC performs the read-out and assembly of events accepted by the first level hardware trigger. Assembled events are made available to the high-level trigger (HLT), which selects interesting events for offline storage and analysis. The system is designed to handle a maximum input rate of 100 kHz and an aggregated throughput of 100 GB/s originating from approximately 500 sources and 10^8 electronic channels. An overview of the architecture and design of the hardware and software of the DAQ system is given. We report on the performance and operational experience of the DAQ and its Run Control System in the first two years of collider run of the LHC, both in proton-proton and Pb-Pb collisions. We present an analysis of the current performance, its limitations, and the most common failure modes and discuss the ongoing evolution of the HLT capability needed to match the luminosity ramp-up of the LHC.
        Speaker: Hannes Sakulin (CERN)
        Slides
        Video in CDS
      • 5:00 PM
        Upgrade of the CMS Event Builder 25m
        The Data Acquisition (DAQ) system of the Compact Muon Solenoid (CMS) experiment at CERN assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GB/s. By the time the LHC restarts after the 2013/14 shut-down, the current compute nodes and networking infrastructure will have reached the end of their lifetime. We are presenting design studies for an upgrade of the CMS event builder based on advanced networking technologies such as 10 Gb/s Ethernet. We report on tests and performance measurements with small-scale test setups.
        Speaker: Andrea Petrucci (CERN)
        Slides
        Video in CDS
      • 5:25 PM
        The Compact Muon Solenoid Detector Control System 25m
        The Compact Muon Solenoid (CMS) is a CERN multi-purpose experiment that exploits the physics of the Large Hadron Collider (LHC). The Detector Control System (DCS) ensures a safe, correct and efficient experiment operation, contributing to the recording of high quality physics data. The DCS is programmed to automatically react to the LHC changes. CMS sub-detector’s bias voltages are set depending on the machine mode and particle beam conditions. A protection mechanism ensures that the sub-detectors are locked in a safe mode whenever a potentially dangerous situation exists. The system is supervised from the experiment control room by a single operator. A small set of screens summarizes the status of the detector from the approximately 6M monitored parameters. Using the experience of nearly two years of operation with beam the DCS automation software has been enhanced to increase the system efficiency. The automation allows now for configuration commands that can be used to automatically pre-configure hardware for given beam modes, decreasing the time the detector needs to get ready when reaching physics modes. The protection mechanism was also improved so that sub-detectors could define their own protection response algorithms allowing, for example, tolerating a small proportion of channels out of the configured safe limits. From the infrastructure point of view the DCS will be subject to big modifications in 2012. The current rack mounted control PCs will be exchanged by a redundant pair of DELL Blade systems. These blades are a high-density modular solution that incorporates servers and networking into a single chassis that provides shared power, cooling and management. This infrastructure modification will challenge the DCS software and hardware factorization capabilities since the SCADA systems running currently in individual nodes will be combined in single blades. The undergoing studies allowing for this migration together with the latest modifications are discussed in the paper.
        Speaker: Robert Gomez-Reino Garrido (CERN)
        Slides
        Video in CDS
      • 5:50 PM
        The Software Architecture of the LHCb High Level Trigger 25m
        The LHCb experiment is a spectrometer dedicated to the study of heavy flavor at the LHC. The rate of proton-proton collisions at the LHC is 15 MHz, but disk space limitations mean that only 3 kHz can be written to tape for offline processing. For this reason the LHCb data acquisition system -- trigger -- plays a key role in selecting signal events and rejecting background. In contrast to previous experiments at hadron colliders like for example CDF or D0, the bulk of the LHCb trigger is implemented in software and deployed on a farm of 20k parallel processing nodes. This system, called the High Level Trigger (HLT) is responsible for reducing the rate from the maximum at which the detector can be read out, 1.1 MHz, to the 3 kHz which can be processed offline,and has 20 ms in which to process and accept/reject each event. In order to minimize systematic uncertainties, the HLT was designed from the outset to reuse the offline reconstruction and selection code, and is based around multiple independent and redundant, selection algorithms, which make it possible to trigger efficiently even in the case that one of the detector subsystems fails. Because of specific LHC running conditions, the HLT had to cope with three times higher event multiplicities than it was designed for in 2010 and 2011. This contribution describes the software architecture of the HLT, focusing on the code optimization and commissioning effort which took place during 2010 and 2011 in order to enable LHCb to trigger efficiently in these unexpected running conditions, and the flexibility and robustness of the LHCb software framework which allowed this reoptimization to be performed in a timely manner. We demonstrate that the software architecture of the HLT, in particular the concepts of algorithm redundancy and independence, were crucial to enable LHCb to deliver its nominal trigger signal efficiency and background rejection rate in these unexpected conditions, and outline lessons for future trigger design in particle physics experiments.
        Speaker: Mariusz Witek (Polish Academy of Sciences (PL))
        Slides
        Video in CDS
    • 4:35 PM 6:15 PM
      Event Processing Room 905/907 (Kimmel Center)

      Room 905/907

      Kimmel Center

      Convener: Dr Adam Lyon (Fermilab)
      • 4:35 PM
        Study of a Fine Grained Threaded Framework Design 25m
        Traditionally, HEP experiments exploit the multiple cores in a CPU by having each core process one event. However, future PC designs are expected to use CPUs which double the number of processing cores at the same rate as the cost of memory falls by a factor of two. This effectively means the amount of memory per processing core will remain constant. This is a major challenge for LHC processing frameworks since the LHC is expected to deliver more complex events (e.g. greater pile-up events) in the coming years while the LHC experiment's frameworks are already memory constrained. Therefore in the not so distant future we may need to be able to efficiently use multiple cores to process one event. In this presentation we will discuss a design for an HEP processing framework which can allow very fine grained parallelization within one event as well as supporting processing multiple events simultaneously while minimizing the memory footprint of the job. The design is built around the libdispatch framework created by Apple Inc. (a port for Linux is available) whose central concept is the use of task queues. This design also accommodates the reality that not all code will be thread safe and therefore allows one to easily mark modules or sub parts of modules as being thread unsafe. In addition, the design efficiently handles the requirement that events in one run must all be processed before starting to process events from a different run. After explaining the design we will provide measurements from simulating different processing scenarios where the CPU times used for the simulation are drawn from CPU times measured from actual CMS event processing. Our results have shown this design to be very promising and will be further pursued by CMS.
        Speaker: Dr Christopher Jones (Fermi National Accelerator Lab. (US))
        Paper
        Slides
      • 5:00 PM
        The art framework 25m
        Future "Intensity Frontier" experiments at Fermilab are likely to be conducted by smaller collaborations, with fewer scientists, than is the case for recent "Energy Frontier" experiments. *art* is an event-processing framework designed with the needs of such experiments in mind. The authors have been involved with the design and implementation of frameworks for several experiments, including D0, BTeV, and CMS. Although many of these experiments' requirements were the same, they shared little effort, and even less code. This resulted in significant duplication of development effort. The *art* framework project is intended to avoid such duplication of effort for the experiments planned, and under consideration, at Fermilab. The *art* framework began as an evolution of the framework of the CMS experiment, and has since been heavily adapted for the needs of the intensity frontier experiments. Trade-offs have been made to simplify the code in order for it to be maintainable and usable by much smaller groups. The current users of *art* include mu2e, NOvA, g-2, and LArSoft (ArgoNeuT, MicroBooNE, LBNE-LAr). The *art* framework relies upon a number of external products (e.g., the Boost C++ library and Root); these products are built by the *art* team and deployed through a simplified UPS package deployment system. The *art* framework is itself deployed via the same mechanism, and is treated by the experiments using it as just another external product upon which their code relies. Because of the increasing importance of multi-core and many-core architectures, current development plans center around the migration of *art* to support parallel processing of independent events as well as to permit parallel processing within events.
        Speaker: Dr Marc Paterno (Fermilab)
        Slides
      • 5:25 PM
        The ATLAS ROOT-based data formats: recent improvements and performance measurements 25m
        We detail recent changes to ROOT-based I/O within the ATLAS experiment. The ATLAS persistent event data model continues to make considerable use of a ROOT I/O backend through POOL persistency. Also ROOT is used directly in later stages of analysis that make use of a flat-ntuple based "D3PD" data-type. For POOL/ROOT persistent data, several improvements have been made including implementation of automatic basket optimisation, memberwise streaming, and changes to split and compression levels. Optimisations are also planned for the D3PD format. We present a full evaluation of the resulting performance improvements from these, including in the case of selected retrieval of events. We also evaluate ongoing changes internal to ROOT, in the ATLAS context, for both POOL and D3PD data. We report results not only from test systems, but also utilising new automated tests on real ATLAS production resources which employ a wide range of storage technologies.
        Speaker: Wahid Bhimji (University of Edinburgh (GB))
        Slides
      • 5:50 PM
        I/O Strategies for Multicore Processing in ATLAS 25m
        A critical component of any multicore/manycore application architecture is the handling of input and output. Even in the simplest of models, design decisions interact both in obvious and in subtle ways with persistence strategies. When multiple workers handle I/O independently using distinct instances of a serial I/O framework, for example, it may happen that because of the way data from consecutive events are compressed together, there may be serious inefficiencies, with workers redundantly reading the same buffers, or multiple instances thereof. With shared reader strategies, caching and buffer management by the persistence infrastructure and by the control framework may have decisive performance implications for a variety of design choices. Providing the next event may seem straightforward when all event data are contiguously stored in a block, but there may be performance penalties to such strategies when only a subset of a given event's data are needed; conversely, when event data are partitioned by type in persistent storage, providing the next event becomes more complicated, requiring marshaling of data from many I/O buffers. Output strategies pose similarly subtle problems, with complications that may lead to significant serialization and the possibility of serial bottlenecks, either during writing or in post-processing, e.g., during data stream merging. In this paper we describe the I/O components of AthenaMP, the multicore implementation of the ATLAS control framework, and the considerations that have led to the current design, with attention to how these I/O components interact with ATLAS persistent data organization and infrastructure.
        Speaker: Peter Van Gemmeren (Argonne National Laboratory (US))
        Paper
        Slides
    • 4:35 PM 6:15 PM
      Distributed Processing and Analysis on Grids and Clouds Eisner & Lubin Auditorium (Kimmel Center)

      Eisner & Lubin Auditorium

      Kimmel Center

      Convener: Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE))
      • 4:35 PM
        Multi-core job submission and grid resource scheduling for ATLAS AthenaMP 25m
        AthenaMP is the multi-core implementation of the ATLAS software framework and allows the efficient sharing of memory pages between multiple threads of execution. This has now been validated for production and delivers a significant reduction on overall memory footprint with negligible CPU overhead. Before AthenaMP can be routinely run on the LHC Computing Grid, it must be determined how the computing resources available to ATLAS can best exploit the notable improvements delivered by switching to this multi-process model. In particular, there is a need to identify and assess the potential impact of scheduling issues where single core and multi-core job queues have access to the same underlying resources. A study into the effectiveness and scalability of AthenaMP in a production environment will be presented. Submitting AthenaMP tasks to the Tier-0 and candidate Tier-2 sites will allow detailed measurement of worker node performance and also highlight the relative performance of local resource management systems (LRMS) in handling large volumes of multi-core jobs. Best practices for configuring the main LRMS implementations currently used by Tier-2 sites will be identified in the context of multi-core job optimisation. There will also be a discussion on how existing Grid middleware and the ATLAS job submission pilot model could use scheduling information to increase the overall efficiency of multi-core job throughput.
        Speaker: Andrew John Washbrook (University of Edinburgh (GB))
        Slides
        Video in CDS
      • 5:00 PM
        Multi-core processing and scheduling performance in CMS 25m
        Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We will present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.
        Speaker: Dr Jose Hernandez Calama (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
        Slides
        Video in CDS
      • 5:25 PM
        PoD: dynamically create and use remote PROOF clusters. A thin client concept. 25m
        PROOF on Demand (PoD) is a tool-set, which dynamically sets up a PROOF cluster at a user’s request on any resource management system (RMS). It provides a plug-in based system, in order to use different job submission front-ends. PoD is currently shipped with gLite, LSF, PBS (PBSPro/OpenPBS/Torque), Grid Engine (OGE/SGE), Condor, LoadLeveler, and SSH plug-ins. It makes it possible just within a few seconds to get a private PROOF cluster on any RMS. If there is no RMS, then SSH plug-in can be used, which dynamically turns a bunch of machines to PROOF workers. In this presentation new developments and use cases will be covered. Recently a new major step in PoD development has been made. It can now work not only with local PoD servers, but also with remote ones. PoD’s newly developed “pod-remote” command made it possible for users to utilize a thin client concept. In order to create dynamic PROOF clusters, users are now able to select a remote computer, even behind a firewall, to control a PoD server on it and to submit PoD jobs. In this case a user interface machine is just a lightweight control center and could run on different OS types or mobile devices. All communications are secured and provided via SSH channels. Additionally PoD automatically creates and maintains SSH tunnels for PROOF connections between a user interface and PROOF muster. PoD will create and manage remote and local PROOF clusters for you. Just two commands of PoD will provide you with the full functional PROOF cluster and a real computing on demand. The talk will also include several live demos from real life use cases.
        Speaker: Anar Manafov (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
        Slides
        Video in CDS
      • 5:50 PM
        Offline Processing in the Online Computer Farm 25m
        LHCb is one of the 4 experiments at the LHC accelerator at CERN. LHCb has approximately 1600 (8 cores) PCs for processing the High Level Trigger (HLT) during physics data acquisition. During periods when data acquisition is not required or the resources needed for data acquisition are reduced, like accelerator Machine Development (MD) periods or technical shutdowns, most of these PCs are idle or very little used. In these periods it is possible to profit from the unused processing capacity to reprocess earlier datasets with the newest applications (code and calibration constants), thus reducing the CPU capacity needed on the Grid. The offline computing environment is based on LHCb-DIRAC (Distributed Infrastructure with Remote Agent Control) to process physics data on the Grid. In DIRAC, agents are started on Worker Nodes, pull available jobs from the DIRAC central WMS (Workload Management System) and process them on the available resources. A Control System was developed which is able to launch, control and monitor the agents for the offline data processing on the HLT Farm. It can do so without overwhelming the offline resources (e.g. DBs) and in case of change of the accelerator planning it can easily return the used resources for online purposes. This control system is based on the existing Online System Control infrastructure, the PVSS SCADA and the FSM toolkit. A web server was also developed to provide a highly available and easy view of the status of the offline data processing on the online HLT farm.
        Speaker: Luis Granado Cardoso (CERN)
        Slides
        Video in CDS
    • 4:35 PM 6:15 PM
      Computer Facilities, Production Grids and Networking Room 914 (Kimmel Center)

      Room 914

      Kimmel Center

      Convener: Andreas Heiss (Forschungszentrum Karlsruhe GmbH)
      • 4:35 PM
        Monitoring the US ATLAS Network Infrastructure with perfSONAR-PS 25m
        Global scientific collaborations, such as ATLAS, continue to push the network requirements envelope. Data movement in this collaboration is projected to include the regular exchange of petabytes of datasets between the collection and analysis facilities in the coming years. These requirements place a high emphasis on networks functioning at peak efficiency and availability; the lack thereof could mean critical delays in the overall scientific progress of distributed data-intensive experiments like ATLAS. Network operations staff routinely must deal with problems deep in the infrastructure; this may be as benign as replacing a failing piece of equipment, or as complex as dealing with a multi-domain path that is experiencing data loss. In either case, it is crucial that effective monitoring and performance analysis tools are available to ease the burden of management. We will report on our experiences deploying and using the perfSONAR-PS Performance Toolkit at ATLAS sites in the United States. This software creates a dedicated monitoring server, capable of collecting and performing a wide range of passive and active network measurements. Each independent instance is managed locally, but able to federate on a global scale; enabling a full view of the network infrastructure that spans domain boundaries. This information, available through web service interfaces, can easily be retrieved to create customized applications. USATLAS has developed a centralized "dashboard" offering network administrators, users, and decisions makers the ability to see the performance of the network at a glance. The dashboard framework includes the ability to notify users (alarm) when problems are found, thus allowing rapid response to potential problems and making perfSONAR-PS crucial to the operation of our distributed computing infrastructure.
        Speaker: Shawn Mc Kee (University of Michigan (US))
        Slides
      • 5:00 PM
        A new data-centre for the LHCb experiment 25m
        The upgraded LHCb experiment, which is supposed to go into operation in 2018/19 will require a massive increase in its compute facilities. A new 2 MW data-centre is planned at the LHCb site. Apart from the obvious requirement of minimizing the cost, the data-centre has to tie in well with the needs of online processing, while at the same time staying open for future and offline use. We present our design and the evaluation process of various cooling and powering solutions as well as our ideas for fabric monitoring and control.
        Speaker: Niko Neufeld (CERN)
        Paper
        Slides
      • 5:25 PM
        High Performance Experiment Data Archiving with gStore 25m
        GSI in Darmstadt (Germany) is a center for heavy ion research. It hosts an Alice Tier2 center and is the home of the future FAIR facility. The planned data rates of the largest FAIR experiments, CBM and Panda, will be similar to those of the current LHC experiments at Cern. gStore is a hierarchical storage system with unique name space and successfully in operation since more than fifteen years. Its core consists of several tape libraries and currently ~20 data mover nodes connected within a SAN network. The gStore clients transfer data via fast socket connections from/to the disk cache of the data movers (~240 TByte currently). Each data mover has also a high speed connection to the GSI lustre file system (~3 PByte data capacity currently). The overall bandwidth between gStore (disk cache or tape) and lustre amounts to 6 GByte/s and will be duplicated in 2012. In the near future the lustre HSM functionality will be implemented with gStore. Each tape drive is accessible from any data mover, fully transparent to the users. The tapes and libraries are managed by commercial software (IBM Tivoli Storage Manager TSM), whereas the disk cache management and the TSM and user interfaces are provided by GSI software. This provides the flexibility needed to tailor gStore according to the always developing requirements of the GSI and FAIR user communities. For Alice users all gStore data are worldwide accessible via Alice grid software. Data streams from running experiments at GSI u(p to 500 MByte/s) are written via sockets from the event builders to gStore write cache for migration to tape. In parallel the data are also copied to lustre for online evaluation and monitoring. As all features related to tapes and libraries are handled by TSM gStore is practically completely hardware independent. Additionally, according to the design principles gStore is fully scalable in data capacity and I/O bandwidth. Therefore we are optimistic to fulfill also the dramatically increased mass storage requirements of the FAIR experiments in 2018, which will be some orders of magnitude higher than those of today.
        Speaker: Dr Horst Göringer (GSI)
        Slides
      • 5:50 PM
        Experience with HEP analysis on mounted filesystems 25m
        We present results on different approaches on mounted filesystems in use or under investigation at DESY. dCache, established since long as a storage system for physics data has implemented the NFS v4.1/pNFS protocol. New performance results will be shown with the most current version of the dCache server. In addition to the native usage of the mounted filesystem in a LAN environment, the results are given for the performance of the dCache NFS v4.1/pNFS in WAN case. Several commercial vendors are currently in alpha or beta phase of adding the NFS v4.1/pNFS protocol to their storage appliances. We will test some of these vendor solutions for their readiness for HEP analysis. DESY has recently purchased an IBM Sonas system. We will present the result of a thourough performance evaluation using the native protocols NFS (v3 or v4) and GPFS. As the emphasis is on the usability for end user analysis, we will use latest ROOT versions and current end user analysis code for benchmark scenarios.
        Speakers: Dmitry Ozerov (Deutsches Elektronen-Synchrotron (DE)), Martin Gasthuber (Deutsches Elektronen-Synchrotron (DE)), Patrick Fuhrmann (DESY), Yves Kemp (Deutsches Elektronen-Synchrotron (DE))
        Slides
    • 4:35 PM 6:15 PM
      Software Engineering, Data Stores and Databases Room 802 (Kimmel Center)

      Room 802

      Kimmel Center

      Convener: Simone Campana (CERN)
      • 4:35 PM
        CMS experience with online and offline Databases 25m
        The CMS experiment is made of many detectors which in total sum up to more than 75 million channels. The online database stores the configuration data used to configure the various parts of the detector and bring it in all possible running states. The database also stores the conditions data, detector monitoring parameters of all channels (temperatures, voltages), detector quality information, beam conditions, etc. These quantities are used by the experts to monitor the detector performance in detail, as they occupy a very large space in the online database they cannot be used as-is for offline data reconstruction. For this, a "condensed" set of the full information, the "conditions data", is created and copied to a separate database used in the offline reconstruction. The offline conditions database contains the alignment and calibrations data for the various detectors. Conditions data sets are accessed by a tag and an interval of validity through the offline reconstruction program CMSSW, written in C++. Performant access to the conditions data as C++ objects is a key requirement for the reconstruction and data analysis. About 200 types of calibration and alignment exist for the various CMS sub-detectors. Only those data which are crucial for reconstruction are inserted into the offline conditions DB. This guarantees a fast access to conditions during reconstruction and a small size of the conditions DB. Calibration and alignment data are fundamental to maintain the design performance of the experiment. Very fast workflows have been put in place to compute and validate the alignment and calibration sets and insert them in the conditions database before the reconstruction process starts. Some of these sets are produced analyzing and summarizing the parameters stored in the online database. Others are computed using event data through a special express workflow. A dedicated monitoring system has been put up to monitor these time-critical processes. The talk describes the experience with the CMS online and offline databases during the 2010 and 2011 data taking periods, showing some of the issues found and lessons learned.
        Speaker: Dr Andreas Pfeiffer (CERN)
        Slides
        Video in CDS
      • 5:00 PM
        A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS 25m
        The volume and diversity of metadata in an experiment of the size and scope of ATLAS is considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integrated view to physicists, and support both human use and programmatic access. In this paper we consider ATLAS metadata, metadata services, and metadata flow principally from the illustrative perspective of how disparate metadata are made available to executing jobs and, conversely, how metadata generated by such jobs are returned. We describe how metadata are read, how metadata are cached, and how metadata generated by jobs and the tasks of which they are a part are communicated, associated with data products, and preserved. We also discuss the principles that guide decision-making about metadata storage, replication, and access.
        Speaker: Dr David Malon (Argonne National Laboratory (US))
        Slides
        Video in CDS
      • 5:25 PM
        Comparison of the Frontier Distributed Database Caching System with NoSQL Databases 25m
        Non-relational "NoSQL" databases such as Cassandra and CouchDB are best known for their ability to scale to large numbers of clients spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects, is based on traditional SQL databases but also has the same high scalability and wide-area distributability for an important subset of applications. This paper compares the architectures, behavior, performance, and maintainability of the two different approaches and identifies the criteria for choosing which approach to prefer over the other.
        Speaker: Dave Dykstra (Fermi National Accelerator Lab. (US))
        Paper
        Slides
        Video in CDS
      • 5:50 PM
        ATLAS DDM/DQ2 & NoSQL databases: Use cases and experiences 25m
        The Distributed Data Management System DQ2 is responsible for the global management of petabytes of ATLAS physics data. DQ2 has a critical dependency on Relational Database Management Systems (RDBMS), like Oracle, as RDBMS are well suited to enforce data integrity in online transaction processing application. Despite these advantages, concerns have been raised recently on the scalability of data warehouse-like workload against the relational schema, in particular for the analysis of archived data or the aggregation of data for summary purposes. Therefore, we have considered new approaches of handling very large amount of data. More specifically, we investigated a new class of database technologies commonly referred to as NoSQL databases. This includes distributed file system like HDFS that support parallel execution of computational tasks on distributed data, as well as schema-less approaches via key-value/document stores, like HBase, Cassandra or MongoDB. These databases provide solutions to particular types of problems: for example, NoSQL databases have demonstrated horizontal scalability, high throughput, automatic fail-over mechanisms, and provide easy replication support over LAN and WAN. In this talk, we will describe our use cases in ATLAS, and share our experiences with NoSQL databases in a comparative study with Oracle.
        Speaker: Mario Lassnig (CERN)
        Slides
        Video in CDS
    • 7:00 PM 9:00 PM
      Social Event: Conference reception

      The conference banquet will be hosted by Spirit Cruises and will feature a three-hour boat cruise from Chelsea Piers (Pier 61, along West Side Highway and 23rd Street), around lower Manhattan and up the East River. Boarding time is 6:30 PM with departure at 7 PM. Shuttle buses will depart from Kimmel Center at 6:15 PM. Buses will provide a return trip to various points in Manhattan within walking distance of conference hotels. Photo ID is required for boarding. Boarding cards will be distributed to participants at boarding time at Chelsea Piers.

    • 8:30 AM 10:00 AM
      Plenary Skirball Center

      Skirball Center

      Convener: Dr Lothar Bauerdick (FERMILAB)
      • 8:30 AM
        A review of analysis in different experiments 30m
        Speaker: Markus Klute (Massachusettes Institute of Technology)
        Slides
        Video in CDS
      • 9:00 AM
        New computing models and LHCONE 30m
        Speaker: Ian Fisk (Fermi National Accelerator Lab. (US))
        Slides
        Video in CDS
      • 9:30 AM
        Current operations and future role of the Grid 30m
        Speaker: Dr Oxana Smirnova (LUND UNIVERSITY)
        Slides
        Video in CDS
    • 10:00 AM 10:30 AM
      Coffee Break 30m Skirball Center

      Skirball Center

    • 10:30 AM 12:00 PM
      Plenary Skirball Center

      Skirball Center

      Convener: Dr John Gordon (STFC - Science & Technology Facilities Council (GB))
      • 10:30 AM
        Middleware Evolution 30m
        From Grid to Cloud: A Perspective
        Speaker: Sebastien Goasguen (Clemson University)
        Slides
        Video in CDS
      • 11:30 AM
        Computing Technology Future 30m
        Speaker: Lennart Johnsson (Unknown)
        Slides
        Video in CDS
    • 12:00 PM 1:30 PM
      Lunch Break 1h 30m
    • 1:30 PM 3:35 PM
      Event Processing Room 905/907 (Kimmel Center)

      Room 905/907

      Kimmel Center

      Convener: Axel Naumann (CERN)
      • 1:30 PM
        Recent Developments and Validation of Geant4 Hadronic Physics 25m
        In the past year several improvements in Geant4 hadronic physics code have been made, both for HEP and nuclear physics applications. We discuss the implications of these changes for physics simulation performance and user code. In this context several of the most-used codes will be covered briefly. These include the Fritiof (FTF) parton string model which has been extended to include antinucleon and antinucleus interactions with nuclei, the Bertini-style cascade with its improved CPU performance and extension to include photon interactions, and the precompound and deexcitation models. We have recently released new models and databases for low energy neutrons, and the radioactive decay process has been improved with the addition of forbidden beta decays and better gamma spectra following internal conversion. As new and improved models become available, the number of tests and comparisons to data has increased. One of these is a validation of the parton string models against data from the MIPP experiment, which covers the largely untested range of 50 to 100 GeV. At the other extreme, a new stopped hadron validation will cover pions, kaons and antiprotons. These, and the ongoing simplified calorimeter studies, will be discussed briefly. We also discuss the increasing number of regularly performed validations, the demands they place on both software and users, and the automated validation system being developed to address them.
        Speaker: Julia Yarba (Fermi National Accelerator Lab. (US))
        Slides
        Video in CDS
      • 1:55 PM
        Refactoring, reengineering and evolution: paths to Geant4 uncertainty quantification and performance improvement 25m
        Quantitative results on Geant4 physics validation and computational performance are reported: they cover a wide spectrum of electromagnetic and hadronic processes, and are the product of a systematic, multi-disciplinary effort of collaborating physicists, nuclear engineers and statisticians. They involve comparisons with established experimental references in the literature and ad hoc measurements by collaborating experimental groups. The results highlight concurrent effects of Geant4 software design and implementation on physics accuracy, computational speed and memory consumption. Prototype alternatives, which improve these three aspects, are presented: they span a variety of strategies - from refactoring and reengineering existing Geant4 code to new and significantly different approaches in physics modeling, software design and software development methods. Solutions that simultaneously contribute to both physics and computational performance improvements are highlighted. In parallel, knowledge gaps embedded in Geant4 physics models are identified and discussed: they are due to lack of experimental data or conflicting measurements preventing the validation of the models themselves, and represent a potential source of systematic effects in detector observables.
        Speaker: Dr Maria Grazia Pia (Universita e INFN (IT))
        Slides
      • 2:20 PM
        "Swimming" : a data driven acceptance correction algorithm 25m
        The LHCb experiment is a spectrometer dedicated to the study of heavy flavor at the LHC. The rate of proton-proton collisions at the LHC is 15 MHz, but disk space limitations mean that only 3 kHz can be written to tape for offline processing. For this reason the LHCb data acquisition system -- trigger -- plays a key role in selecting signal events and rejecting background. Because the trigger efficiency, measured with respect to signal events selected by offline analysis algorithms, is not 100%, the trigger introduces biases in variables of interest. In particular, heavy flavor particles have a longer lifetime than background events, and the trigger exploits this information in its selections, introducing a bias in the lifetime distribution of offline selected heavy flavor particles. This bias must then be corrected for in order to perform measurements of heavy flavor lifetimes at LHCb, measurements which are particularly sensitive to physics beyond the Standard Model. This correction is accomplished by an algorithm called "swimming", which replays the passage of every offline selected event through the LHCb trigger, varying the lifetime of the signal at each step, and thus computes an event-by-event lifetime acceptance function for the trigger. This contribution describes the commissioning and deployment of the swimming algorithm during 2010 and 2011, and the world best lifetime and CP violation measurements accomplished using this method. In particular we focus on the key design decision in the architecture of the LHCb trigger which allows this method to work : the bulk of the triggering is implemented in software, reusing offline reconstruction and selection code to minimize systematics, and allowing the trigger selections to be re-executed offline exactly as they ran during data taking. We demonstrate the reproducibility of the LHCb trigger algorithms, show how the reuse of offline code and selections minimizes the biases introduced in the trigger, and show that the swimming method leads to an acceptance correction which contributes a negligible uncertainty to the measurements in question.
        Speaker: Marco Cattaneo (CERN)
        Slides
      • 2:45 PM
        Using Functional Languages and Declarative Programming to Analyze Large Datasets: LINQToROOT 25m
        Modern HEP analysis requires multiple passes over large datasets. For example, one has to first reweight the jet energy spectrum in Monte Carlo to match data before you can make plots of any other jet related variable. This requires a pass over the Monte Carlo and the Data to derive the reweighting, and then another pass over the Monte Carlo to plot the variables you are really interested in. With most modern ROOT based tools this requires separate analysis loops for each pass, and script files to glue to the two analysis loops together. A prototype framework has been developed that uses the functional and declarative features of C# and LINQ to specify the analysis. The framework uses language tools to convert the analysis into C++ and runs ROOT or PROOF as a backend to get the results. This gives the analyzer the full power of a object-oriented programming language to put together the analysis and at the same time the speed of C++ for the analysis loop. The tool allows one to incorporate C++ algorithms written for ROOT by others. The code is mature enough to have been used in ATLAS analyses. The package is open source and available on the open source site Codeplex.
        Speaker: Gordon Watts (University of Washington (US))
        Slides
    • 1:30 PM 3:35 PM
      Distributed Processing and Analysis on Grids and Clouds Eisner & Lubin Auditorium (Kimmel Center)

      Eisner & Lubin Auditorium

      Kimmel Center

      Convener: Mr Philippe Canal (Fermi National Accelerator Lab. (US))
      • 1:30 PM
        CernVM Co-Pilot: an Extensible Framework for Building Scalable Cloud Computing Infrastructures 25m
        CernVM Co-Pilot is a framework for instantiating an ad-hoc computing infrastructure on top of distributed computing resources. Such resources include commercial computing clouds (e.g. Amazon EC2), scientific computing clouds (e.g. CERN lxcloud), as well as the machines of users participating in volunteer computing projects (e.g. BOINC). The framework consists of components that communicate using the Extensible Messaging and Presence protocol (XMPP), allowing for new components to be developed in virtually any programming language and interfaced to existing Grid and batch computing infrastructures exploited by the High Energy Physics community. Co-Pilot has been used to execute jobs for both the ALICE and ATLAS experiments at CERN. CernVM Co-Pilot is also one of the enabling technologies behind the LHC@home 2.0 volunteer computing project, which is the first such project that exploits virtual machine technology. The use of virtual machines eliminates the necessity of modifying existing applications and adapting them to the volunteer computing environment. After start of the public testing in August 2011 LHC@home 2.0 quickly gained popularity, and as of October 2011 it had about 9000 registered volunteers. Resources provided by volunteers are used for running Monte-Carlo generator applications that simulate interactions between the colliding proton beams at the LHC. In this contribution we present the latest developments and the current status of the system, discuss how the framework can be extended to suit the needs of a particular scientific community, describe the operational experience using the LHC@home 2.0 volunteer computing infrastructure, as well as introduce future development plans.
        Speaker: Artem Harutyunyan (CERN)
        Slides
        Video in CDS
      • 1:55 PM
        The Integration of CloudStack and OpenNebula with DIRAC 25m
        The increasing availability of cloud resources is making the scientific community to consider a choice between Grid and Cloud. The DIRAC framework for distributed computing is an easy way to obtain resources from both systems. In this paper we explain the integration of DIRAC with a two Open-source Cloud Managers, OpenNebula and CloudStack. They are computing tools to manage the complexity and heterogeneity of distributed data center infrastructures which allow to create virtual clusters on demand including public, private and hybrid clouds. This approach requires to develop an extension to the previous DIRAC Virtual Manager Server, developed for Amazon EC2, allowing the connection with the cloud managers. In the OpenNebula case, the development has been based on the CERN Virtual Machine image with appropriate contextualization, while in the case of CloudStack, the infrastructure has been kept more general allowing other Virtual Machine sources and operating systems. In both cases, CernVM File System has been used to facilitate software distribution to the computing nodes. With the resulting infrastructure, users are allowed to use cloud resources transparently through a friendly interface like DIRAC Web Portal. The main purpose of this integration is a system that can manage cloud and grid resources at the same time. Users from different communities do not need to care about the installation of the standard software that is available at the nodes, nor the operating system of the host machine which is transparent to the user. In this paper we analyse the overhead of the virtual layer, with some tests comparing the proposed approach with the existing Grid solution.
        Speakers: Victor Manuel Fernandez Albor (Universidade de Santiago de Compostela (ES)), Victor Mendez Munoz (Port d'Informació Científica (PIC))
        Slides
        Video in CDS
      • 2:20 PM
        Exploiting Virtualization and Cloud Computing in ATLAS 25m
        The ATLAS Computing Model was designed around the concepts of grid computing; since the start of data-taking, this model has proven very successful in the federated operation of more than one hundred Worldwide LHC Computing Grid (WLCG) sites for offline data distribution, storage, processing and analysis. However, new paradigms in computing, namely virtualization and cloud computing, present improved strategies for managing and provisioning IT resources that could allow ATLAS to more flexibly adapt and scale its storage and processing workloads on varied underlying resources. In particular, ATLAS is developing a "grid-of-clouds" infrastructure in order to utilize WLCG sites that make resources available via a cloud API. This work will present the current status of the Virtualization and Cloud Computing R&D project in ATLAS Distributed Computing. First, strategies for deploying PanDA queues on cloud sites will be discussed, including the introduction of a "cloud factory" for managing cloud VM instances. Next, performance results when running on virtualized/cloud resources at CERN LxCloud, StratusLab, and elsewhere will be presented. Finally, we will present the ATLAS strategies for exploiting cloud-based storage, including remote XROOTD access to input data, management of EC2-based files, and the deployment of cloud-resident LCG storage elements.
        Speaker: Fernando Harald Barreiro Megino (CERN IT ES)
        Slides
        Video in CDS
      • 2:45 PM
        Dynamic Extension of a Virtualized Cluster by using Cloud Resources 25m
        The specific requirements concerning the software environment within the HEP community constrain the choice of resource providers for the outsourcing of computing infrastructure. The use of virtualization in HPC clusters and in the context of cloud resources is therefore a subject of recent developments in scientific computing. The dynamic virtualization of worker nodes in common batch systems provided by ViBatch serves each user with a dynamically virtualized subset of worker nodes on a local cluster. Now it can be transparently extended by the use of common open source cloud interfaces like OpenNebula or Eucalyptus, launching a subset of the virtual worker nodes within the cloud. It is demonstrated how a dynamically virtualized computing cluster is combined with cloud resources by attaching remotely started virtual worker nodes to the local batch system.
        Speaker: Oliver Oberst (KIT - Karlsruhe Institute of Technology (DE))
        Slides
        Video in CDS
      • 3:10 PM
        Connecting multiple clouds and mixing real and virtual resources via the open source WNoDeS framework 25m
        In this paper we present the latest developments introduced in the WNoDeS framework (http://web.infn.it/wnodes); we will in particular describe inter-cloud connectivity, support for multiple batch systems, and coexistence of virtual and real environments on a single hardware. Specific effort has been dedicated to the work needed to deploy a "multi-sites" WNoDeS installation. The goal is to give end users the possibility to submit requests for resources using cloud interfaces on several sites in a transparent way. To this extent, we will show how we have exploited already existing and deployed middleware within the framework of the IGI (Italian Grid Initiative) and EGI (European Grid Infrastructure) services. In this context, we will also describe the developments that have taken place in order to have the possibility to dynamically exploit public cloud services like Amazon EC2. The latter gives WNoDeS the capability to serve, for example, part of the user requests through external computing resources when needed, so that peak requests can be honored. We will then describe the work done to add support for open source batch systems like Torque/Maui to WNoDeS. This makes WNoDeS a fully open source product and gives the possibility to smaller sites as well (where often there is no possibility to run commercial batch systems) to install it and exploit its features. We will also describe recent WNoDeS I/O optimizations, showing results of performance tests executed using Torque as batch system and Lustre as a distributed file system. Finally, starting from the consideration that not all tasks are equally suited to run on virtual environments, we will describe a novel feature added to WNoDeS, allowing the possibility to use the same hardware to run both virtual machines and real jobs (i.e., jobs running on the bare metal and not in a virtualized environment). This increases flexibility and may optimize the usage of available resources. In particular, we will describe tests performed in order to show how this feature can help in fulfilling requests for "whole-node jobs" (which are becoming increasingly popular in the HEP community), for efficient analysis support, and for GPU-based resources (which are typically not easily amenable to be virtualized).
        Speakers: Mr Alessandro Italiano (INFN-CNAF), Dr Giacinto Donvito (INFN-Bari)
        Slides
        Video in CDS
    • 1:30 PM 3:35 PM
      Computer Facilities, Production Grids and Networking Room 914 (Kimmel Center)

      Room 914

      Kimmel Center

      Convener: Dr Maria Girone (CERN)
      • 1:30 PM
        Analysing I/O bottlenecks in LHC data analysis on grid storage resources 25m
        We describe recent I/O testing frameworks that we have developed and applied within the UK GridPP Collaboration, the ATLAS experiment and the DPM team, for a variety of distinct purposes. These include benchmarking vendor supplied storage products, discovering scaling limits of SRM solutions, tuning of storage systems for experiment data analysis, evaluating file access protocols, and exploring IO read patterns of experiment software and their underlying event data models. With multiple grid sites now dealing with petabytes of data, such studies are becoming increasingly essential. We describe how the tests build, and improve, on previous work and contrast how the use-cases differ. We also detail the results obtained and the implications for storage hardware, middleware and experiment software.
        Speaker: Wahid Bhimji (University of Edinburgh (GB))
        Slides
        Video in CDS
      • 1:55 PM
        A strategy for load balancing in distributed storage systems 25m
        Distributed storage systems are critical to the operation of the WLCG. These systems are not limited to fulfilling the long term storage requirements. They also serve data for computational analysis and other computational jobs. Distributed storage systems provide the ability to aggregate the storage and IO capacity of disks and tapes, but at the end of the day IO rate is still bound by the capabilities of the hardware, in particular the hard drives. Throughput of hard drives has increased dramatically over the decades, however for computational analysis IOPS is typically the limiting factor. To maximize return of investment, balancing IO load over available hardware is crucial. The task is made complicated by the common use of heterogeneous hardware and software environments that results from combining new and old hardware into a single storage system. This paper describes recent advances made in load balancing in the dCache distributed storage system. We describe a set of common requirements for load balancing policies. These requirements include considerations about temporal clustering, resistance to disk pool divisioning, evolution of the hardware portfolio as data centers get extended and upgraded, the non-linearity of disk performance, garbage collection, age of replicas, stability of control decisions, and stability of tuning parameters. We argue that the existing load balancing policy in dCache fails to satisfy most of these requirements. An alternative policy is proposed, the weighted available space selection policy. The policy incorporates ideas we have been working on for years while observing dCache in production at NDGF and at other sites. At its core the policy uses weighted random selection, but it incorporates many different signals into the weight. We argue that although the algorithm is technically more complicated, it is in our experience easier to predict the effect of parameter changes and thus the parameters are easier to tune than in the previous policy. It many cases it may not even require manual tuning, although we need more empirical data to conclude that. The new policy has been integrated into dCache 2.0 using a new load balancing plugin system developed by NDGF. It has been used in production at NDGF since end of August 2011. We will report on our experiences. A qualitative and quantitative analysis of the policy will be presented augmented by simulations and empirical data. Although our algorithm has been developed and is used in the context of dCache, the ideas are universal and could be applied to many storage systems.
        Speaker: Erik Mattias Wadenstein (Unknown)
        Slides
        Video in CDS
      • 2:20 PM
        Using Xrootd to Federate Regional Storage 25m
        While the LHC data movement systems have demonstrated the ability to move data at the necessary throughput, we have identified two weaknesses: the latency for physicists to access data and the complexity of the tools involved. To address these, both ATLAS and CMS have begun to federate regional storage systems using Xrootd. Xrootd, referring to a protocol and implementation, allows us to provide data access to all disk-resident data from a single virtual endpoint. This "redirector" endpoint (which may actually be multiple physical hosts) discovers the actual location of the data and redirects the client to the appropriate site. The approach is particularly advantageous since typically the redirection requires much less than 500 milliseconds (bounded by network round trip time) and the Xrootd client is conveniently built into LHC physicist's analysis tools. Currently, there are three regional storage federations - a US ATLAS region, a European CMS region, and a US CMS region. The US ATLAS and US CMS regions include their respective Tier 1 and Tier 2 facilities, meaning a large percentage of experimental data is available via the federation. There are plans for federating storage globally and so studies of the peering between the regional federations is of particular interest. From the base idea of federating storage behind an endpoint, the implementations and use cases diverge. For example, the CMS software framework is capable of efficiently processing data over high-latency data, so using the remote site directly is comparable to accessing local data. ATLAS's processing model is currently less resilient to latency, and they are particularly focused on the physics n-tuple analysis use case; accordingly, the US ATLAS region relies more heavily on caching in the Xrootd server to provide data locality. Both VOs use GSI security. ATLAS has developed a mapping of VOMS roles to specific filesystem authorizations, while CMS has developed callouts to the site's mapping service. Each federation presents a global namespace to users. For ATLAS, the global-to-local mapping is based on a heuristic-based lookup from the site's local file catalog, while CMS does the mapping based on translations given in a configuration file. We will also cover the latest usage statistics and interesting use cases that have developed over the previous 18 months.
        Speaker: Brian Paul Bockelman (University of Nebraska (US))
        Slides
        Video in CDS
      • 2:45 PM
        Dimensioning storage and computing clusters for efficient High Throughput Computing 25m
        Scientific experiments are producing huge amounts of data, and they continue increasing the size of their datasets and the total volume of data. These data are then processed by researchers belonging to large scientific collaborations, with the Large Hadron Collider being a good example. The focal point of Scientific Data Centres has shifted from coping efficiently with PetaByte scale storage to deliver quality data processing throughput. The dimensioning of the internal components in High Throughput Computing (HTC) data centers is of crucial importance to cope with all the activities demanded by the experiments, both the online (data acceptance) and the offline (data processing, simulation and user analysis). This requires a precise setup involving disk and tape storage services, a computing cluster and the internal networking to prevent bottlenecks, overloads and undesired slowness that lead to losses cpu cycles and batch jobs failures. In this paper we point out relevant features for running a successful storage setup in an intensive HTC environment
        Speaker: Dr Xavier Espinal Curull (Universitat Autònoma de Barcelona (ES))
        Slides
        Video in CDS
      • 3:10 PM
        Centralized Fabric Management Using Puppet, Git, and GLPI 25m
        Managing the infrastructure of a large and complex data center can be extremely difficult without taking advantage of automated services. Puppet is a seasoned, open-source tool designed for enterprise-class centralized configuration management. At the RHIC/ATLAS Computing Facility at Brookhaven National Laboratory, we have adopted Puppet as part of a suite of tools, including Git, GLPI, and some custom scripts, that comprise our centralized configuration management system. In this paper, we discuss the use of these tools for centralized configuration management of our servers and services; change management, which requires authorized approval of production changes; a complete, version-controlled history of all changes made; separation of production, testing, and development systems using Puppet environments; semi-automated server inventory using GLPI; and configuration change monitoring and reporting via the Puppet dashboard. We will also discuss scalability and performance results from using these tools on a 2,000+ node cluster and a pool of over 400 infrastructure servers with an administrative staff of only about 20 full-time employees. In addition to managing our data center servers, we've also used this Puppet infrastructure successfully to satisfy recent security mandates from our funding agency (the U.S. Department of Energy) to centrally manage all staff Linux and UNIX desktops; in doing so, we've extended several core Puppet modules to not only support both RHEL 5 and 6, but also to include support for other operating systems, including Fedora, Gentoo, OS X, and Ubuntu.
        Speaker: Jason Alexander Smith (Brookhaven National Laboratory (US))
        Slides
        Video in CDS
    • 1:30 PM 3:35 PM
      Software Engineering, Data Stores and Databases Room 802 (Kimmel Center)

      Room 802

      Kimmel Center

      Convener: Benedikt Hegner (CERN)
      • 1:30 PM
        A CMake-based build and configuration framework 25m
        The LHCb experiment has been using the CMT build and configuration tool for its software since the first versions, mainly because of its multi-platform build support and its powerful configuration management functionality. Still, CMT has some limitations in terms of build performance and the increased complexity added to the tool to cope with new use cases added latterly. Therefore, we have been looking for a viable alternative to it and we have investigated the possibility of adopting the CMake tool, which does a very good job for building and is getting very popular in the HEP community. The result of this study is a CMake-based framework which provides most of the special configuration features available natively only in CMT, with the advantages of better performances, flexibility and portability.
        Speaker: Marco Clemencic (CERN)
        Slides
        Video in CDS
      • 1:55 PM
        Tiered Storage For LHC 25m
        For more than a year, the ATLAS Western Tier 2 (WT2) at SLAC National Accelerator has been successfully operating a two tiered storage system based on Xrootd's flexible cross-cluster data placement framework, the File Residency Manager. The architecture allows WT2 to provide both, high performance storage at the higher tier to ATLAS analysis jobs, as well as large, low cost disk capacity at the lower tier. Data automatically moves between the two storage tiers based on the needs of analysis jobs and is completely transparent to the jobs.
        Speakers: Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Wei Yang (SLAC National Accelerator Laboratory (US))
        Slides
        Video in CDS
      • 2:20 PM
        Status and Future Perspectives of CernVM-FS 25m
        The CernVM File System (CernVM-FS) is a read-only file system used to access HEP experiment software and conditions data. Files and directories are hosted on standard web servers and mounted in a universal namespace. File data and meta-data are downloaded on demand and locally cached. CernVM-FS has been originally developed to decouple the experiment software from virtual machine hard disk images and to be used as a replacement of the shared software area at Grid sites. Here it allows for the provision of an essentially zero-maintenance software service. CernVM-FS solves the scalability issues of network file systems such as AFS, NFS, or Lustre, which are traditionally used for shared software areas. Currently, CernVM-FS distributes around 30 million files and directories. It is installed on a large portion of the Worldwide LHC Computing Grid (WLCG) worker nodes supporting the ATLAS and LHCb experiments. In order to scale to the order of 10^5 worker nodes, CernVM-FS uses replicated repository servers and a hierarchy of web caches. Repository replica servers are operated at CERN, BNL, RAL, and ASGC Tier 1 sites. We will report on the lessons learned from the HEP community feedback and the experience from large-scale deployment. For the server side, we present a new, streamlined and improved toolset to maintain repositories. The new toolset is supposed to reduce the delay for distributing new software releases to less than an hour. It provides parallel preprocessing of files and it introduces "push replication" of updates by means of a replication manager. The simplified repository maintenance also lowers the bar for small collaborations to distribute their software on the Grid. Finally, we present the roadmap for the further development of CernVM-FS. The roadmap includes Mac OS X support, variable algorithms for file compression and content hashing, as well as a distributed shared memory cache for diskless server farms.
        Speaker: Jakob Blomer (Ludwig-Maximilians-Univ. Muenchen (DE))
        Slides
      • 2:45 PM
        Advanced Modular Software Performance Monitoring 25m
        The LHCb software is based on the Gaudi framework, on top of which are built several large and complex software applications. The LHCb experiment is now in the active phase of collecting and analyzing data and significant performance problems arise in the Gaudi based software beginning from High Level Trigger (HLT) programs and ending with data analysis frameworks (DaVinci). It’s not easy to find hot spots in the code - only special tools can help to understand where CPU or memory usage is not reasonable. There exist many performance analyzing tools, but the main problem is that they show reports in terms of class and function names and such information usually is not very useful - the majority of algorithm developers use the Gaudi framework abstractions and usually do not know about functions which lie at the lower level. We will show a new approach which adds to performance reports a higher abstraction level based on knowledge of framework architecture and run-time object properties. A set of profiling tools (based on sampling and unwind library - a software for introspection of the program call-chain) and visualizing interfaces has been developed and deployed.
        Speaker: Alexander Mazurov (Universita di Ferrara (IT))
        Slides
        Video in CDS
      • 3:10 PM
        Artificial Intelligence in the service of system administrators 25m
        The LHCb online system relies on a large and heterogeneous IT infrastructure made from thousands of servers on which many different applications are running. They run a great variety of  tasks : critical ones such as data taking and secondary ones like web servers. The administration of such a system and making sure it is working properly represents a very important workload for the  small expert-operator team. Research has been performed to try to automatize (some) system administration tasks, starting in 2001 when IBM defined the so-called “self objectives” supposed to lead to “autonomic computing”. In this context, we present a framework that makes use of artificial intelligence and machine learning to monitor and diagnose at a low level and in a non intrusive way  Linux-based systems and their interaction with software. Moreover, the multi agent approach we use, coupled with a "object oriented paradigm" architecture should increase a lot our learning speed, and highlight relations between problems.
        Speaker: Christophe Haen (Univ. Blaise Pascal Clermont-Fe. II (FR))
        Slides
        Video in CDS
    • 1:30 PM 3:35 PM
      Collaborative tools Room 808 (Kimmel Center)

      Room 808

      Kimmel Center

      Convener: Steven Goldfarb (University of Michigan (US))
      • 1:30 PM
        From EVO to SeeVogh 25m
        Collaboration Tools, Videoconference, support for large scale scientific collaborations, HD video
        Speaker: Mr Philippe Galvez (CALTECH)
        Slides
        Video in CDS
      • 1:55 PM
        Next Generation High Quality Videoconferencing Service for the LHC 25m
        In recent times, we have witnessed an explosion of video initiatives in the industry worldwide. Several advancements in video technology are currently improving the way we interact and collaborate. These advancements are forcing tendencies and overall experiences: any device in any network can be used to collaborate, in most cases with an overall high quality. To cope with this technology progresses, CERN IT Department has taken the leading role to establish strategies and directions to improve the user experience in remote dispersed meetings and remote collaboration at large in the worldwide LHC communities. Due to the high rate of dispersion in the LHC user communities, these are critically dependent of videoconferencing technology, with a need of robustness and high quality for the best possible user experience. We will present an analysis of the factors that influenced the technical and strategic choices to improve the reliability, efficiency and overall quality of the LHC remote sessions. In particular, we are going to describe how the new videoconferencing service offered by CERN IT, based on Vidyo technology suits these requirements. During a vidyoconference, Vidyo’s core technology continuously monitors the performance of the underlying network and the capabilities of each endpoint device, in order to adapt video streams in real time and optimize video communication. This results in offering telepresence-quality videoconferencing over the commercial Internet and at the same time, in providing a robust platform to make video communications universally available on any device ranging from traditional videoconferencing room systems, to multiplatform PCs, the latest smartphones and tablets PCs, over any network. The infrastructure deployed to offer this new service, its integration in the specific CERN environment will be presented as well as recent use cases.
        Speaker: Marek Domaracky (CERN)
        Slides
        Video in CDS
      • 2:20 PM
        Preparing experiments' software for long term analysis and data preservation (DESY-IT) 25m
        Preserving data from past experiments and preserving the ability to perform analysis with old data is of growing importance in many domains of science, including High Energy Physics (HEP). A study group on this issue, DPHEP, has been established in this field to provide guidelines and a structure for international collaboration on data preservation projects in HEP. This contribution aims at preparing experiments' software for long term analysis and data preservation. In a first part, we discuss the use of modern techniques like virtualization or Cloud for this purpose. In a second part, we detail the constraints of a supporting IT center for future legacy experiments. In a third part, we present a framework that allows experimentalists to validate their software against a previously defined set of tests in an automated way. We show first usage of the system, and present results gained from the experience with early-bird-users, and future adaptations to the system.
        Speaker: Yves Kemp (Deutsches Elektronen-Synchrotron (DE))
        Slides
        Video in CDS
      • 2:45 PM
        Electronic Collaboration Logbook 25m
        In HEP, scientific research is performed by large collaborations of organizations and individuals. Log book of a scientific collaboration is important part of the collaboration record. Often, it contains experimental data. At FNAL, we developed an Electronic Collaboration Logbook (ECL) application which is used by about 20 different collaborations, experiments and groups at FNAL. ECL is the latest iteration of the project formerly known as Control Room Logbook (CRL). We have been working on mobile (IOS and Android) clients for ECL. We will present history, current status and future plans of the project, as well as design, implementation and support solutions made by the project.
        Speaker: Mr Igor Mandrichenko (Fermilab)
        Slides
        Video in CDS
      • 3:10 PM
        Project Management Web Tools at the MICE experiment 25m
        Project management tools like Trac are commonly used within the open-source community to coordinate projects. The Muon Ionization Cooling Experiment (MICE) uses the project management web application Redmine to host mice.rl.ac.uk. Many groups within the experiment have a Redmine project: analysis, computing and software (including offline, online, controls and monitoring, and database subgroups), executive board, and operations. All of these groups use the website to communicate, track effort, develop schedules, and maintain documentation. The issue tracker is a rich tool that is used to identify tasks and monitor progress within groups on timescales ranging from immediate and unexpected problems to milestones that cover the life of the experiment. It allows the prioritization of tasks according to time-sensitivity, while providing a searchable record of work that has been done. This record of work can be used to measure both individual and overall group activity, identify areas lacking sufficient personnel or effort, and as a measure of progress against the schedule. Given that MICE, like many particle physics experiments, is an international community, such a system is required to allow easy communication within a global collaboration. Unlike systems that are purely wiki-based, the structure of a project management tool like Redmine allows information to be maintained in a more structured and logical fashion.
        Speaker: Linda Coney (University of California, Riverside)
        Slides
        Video in CDS
    • 1:30 PM 6:15 PM
      Poster Session: session 1 Rosenthal Pavilion (10th floor) (Kimmel Center)

      Rosenthal Pavilion (10th floor)

      Kimmel Center

      • 1:30 PM
        A business model approach for a sustainable Grid infrastructure in Germany 4h 45m
        After a long period of project-based funding,during which the improvement of the services provided to the user communities was the main focus, distributed computing infrastructures (DCIs), having reached and established production quality, now need to tackle the issue of long-term sustainability. With the transition from EGEE to EGI in 2010 the major part of the responsibility (especially financially) now is on the national grid initiatives (NGIs). It is their duty not only to ensure the unobstructed continuation of scientific work on the grid, but also to cater for the needs of the user communities to be able to utilise a broader range of middlewares and tools. Sustainability in grid computing therefore must take into account the integration of this variety of technical developments. Newer developments like cloud computing need to be taken into account and integrated into the usage scenarios of the grid infrastructure, leading to a distributed computing infrastructure encompassing the positive aspects of both. On the whole a strategy for sustainability must focus on the three main aspects of technical integration, core services and business development and must make concrete statements how the respective efforts can be financed. Although not common in science, it seems necessary to use a business model approach to create a business plan to enable the long-term sustainability of the NGIs and international DCIs, like EGI.
        Speaker: Dr Torsten Antoni (KIT - Karlsruhe Institute of Technology (DE))
      • 1:30 PM
        A distributed agent based framework for high-performance data transfers 4h 45m
        Current network technologies like dynamic network circuits and emerging protocols like OpenFlow, enable the network as an active component in the context of data transfers. We present framework which provides a simple interface for scientists to move data between sites over Wide Area Network with bandwidth guarantees. Although the system hides the complexity from the end users, it was designed to include all the security, redundancy and fail-over aspects of large distributed systems. The agents collaborate between them over secure channels to advertise their presence, request and allocate network resources like Dynamic Network Circuits, end-host routing tables and IP addresses, when a user requests a data transfer and release the resources when a transfer finishes. The data transfer tool used by the framework is Fast Data Transfer (http://fdt.cern.ch/). It provides the dynamic bandwidth adjustments capabilities also at application level, so bandwidth scheduling can be used where network circuits are not available. The framework is currently being deployed and tested between a set of US HEP sites, part of the NSF-funded DYNES project: Dynamic Network System (http://internet2.edu/dynes). The DYNES “cyber-instrument” interconnects ~40 institutes participating in both US-Atlas and US-CMS collaborations.
        Speaker: Ramiro Voicu (California Institute of Technology (US))
      • 1:30 PM
        A General Purpose Grid Portal for simplified access to Distributed Computing Infrastructures 4h 45m
        One of the main barriers against Grid widespread adoption in scientific communities stems from the intrinsic complexity of handling X.509 certificates, which represent the foundation of the Grid security stack. To hide this complexity, in recent years, several Grid portals have been proposed which, however, do not completely solve the problem, either requiring that users manage their own certificates or proposing solutions that weaken the Grid middleware authorization and accounting mechanisms by obfuscating the user identity. General purpose Grid portals aim at providing a powerful and easy to use gateway to distributed computing resources. They act as incubators where users can securely run their applications without facing the complexity of the authentication infrastructure (e.g., handling X.509 certificates and VO membership requests, accessing resources through dedicated shell-based UIs). In this paper, we discuss a general purpose Grid portal framework, based on Liferay, which provides several important services such as job submission, workflow definition, data management and accounting services. It is also interfaced with external Infrastructure-as-a-Service (IaaS) frameworks for the dynamic provisioning of computing resources. In our model, authentication is demanded to a Shibboleth 2.0 federation while the generation and management of Grid credentials is handled securely integrating an On-Line CA with the MyProxy server. Consequently, the portal gives users full access to Grid functionality without exposing the complexity of X.509 certificates and proxy management. Unlike other existing solutions, our portal does not leverage robot certificates for the user credentials. This approach offers twofold benefits. On the one hand, user identity is not obfuscated across the middleware stack thus preserving the functionality and effectiveness of existing distributed accounting and authorization mechanisms. On the other hand, users are not constrained to a predefined set of applications but can freely take advantage of Grid facilities for any computational or data-intensive activity. The portal also provides simplified access to common Grid data-management operations. Our solution manages the staging of input and output data for Grid jobs to an external WebDAV storage service. The staged data is then transferred to or from Grid SE and registered in data catalogs on behalf of the user. This approach has two main benefits. Firstly, by delegating the file transfer handling to an external service, the portal is relieved from the potential load caused by many concurrent large file transfers operations that would severely impact its scalability. Secondly, the use of standard protocols like WebDAV enables any client machine to upload and download files to the Grid without requiring installation of custom software on the client side.
        Speaker: Marco Bencivenni (INFN)
      • 1:30 PM
        A gLite FTS based solution for managing user output in CMS 4h 45m
        The CMS distributed data analysis workflow assumes that jobs run in a different location to where their results are finally stored. Typically the user output must be transferred across the network from one site to another, possibly on a different continent or over links not necessarily validated for high bandwidth/high reliability transfer. This step is named stage-out and in CMS was originally implemented as a synchronous step of the job execution. However, our experience showed the weakness of this approach both in terms of low total job execution efficiency and failure rates, wasting precious CPU resources. The nature of analysis data makes it inappropriate to use PhEDEx, CMS' core data placement system. As part of the new generation of CMS Workload Management tools, the Asynchronous Stage-Out system (AsyncStageOut) has been developed to enable third party copy of the user output. The AsyncStageOut component manages glite FTS transfers of data from a temporary store at the site where the job ran to the final location of the data on behalf of that data owner. The tool uses python daemons, built using the WMCore framework, talking to CouchDB, to manage the queue of work and FTS transfers. CouchDB also provides the platform for a dedicated operations monitoring system. In this paper, we present the motivations of the asynchronous stage-out system. We give an insight into the design and the implementation of key features, describing how it is coupled with the CMS workload management system. Finally, we show the results and the commissioning experience.
        Speakers: Daniele Spiga (CERN), Hassen Riahi (Universita e INFN (IT)), Mattia Cinquilli (Univ. of California San Diego (US))
        Slides
      • 1:30 PM
        A Grid storage accounting system based on DGAS and HLRmon 4h 45m
        The accounting activity in a production computing Grid is of paramount importance in order to understand the utilization of the available resources. While several CPU accounting systems are deployed within the European Grid Infrastructure (EGI), storage accounting systems, that are stable enough to be adopted on a production environment, are not yet available. A growing interest is being put on the storage accounting and work is being carried out in the Open Grid Forum (OGF) to write a standard Usage Record (UR) definition suitable for this kind of resources. In this paper we present a storage accounting system which is composed of three parts: a sensors layer, a data repository and transport layer (Distributed Grid Accounting System - DGAS) and a web portal that generates graphical and tabular reports (HLRmon). The sensors layer is responsible for the creation of URs according to the schema that will be presented in the paper and that is being discussed in OGF. DGAS is one of the CPU accounting systems used in EGI, by the Italian Grid Infrastructure (IGI) and other National Grid Initiatives (NGIs) and other projects that relies on the Grid . DGAS is evolving towards an architecture that allows the collection of URs for different resources. Those features allows DGAS to be used as data repository and transport layer of the accounting system we depicted. HLRmon is the web interface for DGAS. It has been further developed to retrieve storage accounting data from the repository and create reports in an easy to access fashion in order to be useful to the Grid stakeholders.
        Speaker: Andrea Cristofori (INFN-CNAF, IGI)
      • 1:30 PM
        A new communication framework for the ALICE Grid 4h 45m
        Since the ALICE experiment began data taking in late 2009, the amount of end user jobs on the AliEn Grid has increased significantly. Presently 1/3 of the 30K CPU cores available to ALICE are occupied by jobs submitted by about 400 distinct users. The overall stability of the AliEn middleware has been excellent throughout the 2 years of running, but the massive amount of end-user analysis and its specific requirements and load has revealed few components which can be improved. One of them is the interface between users and central AliEn services (catalogue, job submission system) which we are currently re-implementing in Java. The interface provides persistent connection with enhanced data and job submission authenticity. In this paper we will describe the architecture of the new interface, the ROOT binding which enables the use of a single interface in addition to the standard UNIX-like access shell and the new security-related features.
        Speaker: Costin Grigoras (CERN)
      • 1:30 PM
        A new era for central processing and production in CMS 4h 45m
        The goal for CMS computing is to maximise the throughput of simulated event generation while also processing the real data events as quickly and reliably as possible. To maintain this achievement as the quantity of events increases, since the beginning of 2011 CMS computing has migrated at the Tier 1 level from its old production framework, ProdAgent, to a new one, WMAgent. The WMAgent framework offers improved processing efficiency and increased resource usage as well as a reduction in manpower. In addition to the challenges encountered during the design of the WMAgent framework, several operational issues have arisen during its commissioning. The largest operational challenges were in the usage and monitoring of resources, mainly a result of a change in the way work is allocated. Instead of work being assigned to operators, all work is centrally injected and managed in the Request Manager system and the task of the operators has changed from running individual workflows to monitoring the global workload. In this report we present how we tackled some of the operational challenges, and how we benefitted from the lessons learned in the commissioning of the WMAgent framework at the Tier 2 level in late 2011. As case studies, we will show how the WMAgent system performed during some of the large data reprocessing and Monte Carlo simulation campaigns.
        Speaker: Rapolas Kaselis (Vilnius University (LT))
      • 1:30 PM
        A scalable low-cost Petabyte scale storage for HEP using Lustre 4h 45m
        We describe a low-cost Petabyte scale Lustre filesystem deployed for High Energy Physics. The use of commodity storage arrays and bonded ethernet interconnects makes the array cost effective, whilst providing high bandwidth to the storage. The filesystem is a POSIX filesytem, presented to the Grid using the StoRM SRM. The system is highly modular. The building blocks of the array, the Lustre Object Storage Servers (OSS) each have 12*2TB SATA disks configured as a RAID6 array, delivering 18TB of storage. The network bandwidth from the storage servers is designed to match that from the compute servers within each module of 6 storage servers and 12 compute servers. The modules are connect together by a 10Gbit core network to provide balanced overall performance. We present benchmarks demonstrating the performance and scalability of the filesystem.
        Speakers: Dr Alex Martin (QUEEN MARY, UNIVERSITY OF LONDON), Christopher John Walker (University of London (GB))
        Poster
      • 1:30 PM
        A Study of ATLAS Grid Performance for Distributed Analysis 4h 45m
        In the past two years the ATLAS Collaboration at the LHC has collected a large volume of data and published a number of ground breaking papers. The Grid-based ATLAS distributed computing infrastructure played a crucial role in enabling timely analysis of the data. We will present a study of the performance and usage of the ATLAS Grid as platform for physics analysis and discuss changes that analysis usage patterns underwent in 2011. This includes studies of timing properties of user jobs (wait time, run time, etc) and analysis of data format popularity evolution that significantly affected ATLAS data distribution policies. These studies are based on mining of data archived by the PanDA workload management system.
        Speaker: Sergey Panitkin (Brookhaven National Laboratory (US))
      • 1:30 PM
        A tool for Image Management in Cloud Computing 4h 45m
        Entering information industry, the most new technologies talked about are virtualization and cloud computing. Virtualization makes the heterogeneous resources transparent to users, and plays a huge role in large-scale data center management solutions. Cloud computing emerges as a revolution in computing science which bases on virtualization, demonstrating a gigantic advantage in resource sharing, resource utilization, resource flexibility and resource scalability. And the new technology comes with new problems in which IT infrastructure is deployed with virtual machines. Among these is the problem of managing the virtual machine images that are indispensible to cloud environment. In order to deploying a large-scale cloud infrastructure within a tolerant time, how to distribute image to hypervisor quickly and make its validity and integrity is the most important thing to be considered. To address that, this paper proposes an image management system acting as image repository as well as image distributor that provides users a friendly portal and also effective standard commands. The system interfaces implement the operations like register, upload, download, unregister, delete and so on. Hence, some other features like access control rules for diverse users to guarantee security in cloud computing. To optimize the performance, different storage systems such as NFS, Lustre, AFS and gLusterFS are compared in detail as well as the image distribution protocol like peer-to-peer(P2P), http and scp are analyzed, which demonstrates the proper storage system and distribution protocol are essential to the performance of the system. The workflow of the deployment of a cloud using virtual machine provisioning like Opennebula is introduced and the comparison between diverse storage systems and transfer protocols is discussed. The high performance and scalability of image distribution of the system in production are fully proved and one virtual machine is deployed quickly within minute in average. Some useful tips for image management are also proposed.
        Speaker: Ms qiulan huang (Institute of High Energy Physics, Beijing)
      • 1:30 PM
        AGIS: The ATLAS Grid Information System 4h 45m
        The ATLAS Grid Information System (AGIS) centrally stores and exposes static, dynamic and configuration parameters required to configure and to operate ATLAS distributed computing systems and services. AGIS is designed to integrate information about resources, services and topology of the ATLAS grid infrastructure from various independent sources including BDII, GOCDB, the ATLAS data management system and the ATLAS PanDA workload management system. Being an intermediate middleware system between a client and external information sources, AGIS automatically collects and keeps data up to date, caching information required by and specific for ATLAS, removing the source as a direct dependency for clients but without duplicating the source information system itself. All interactions with various information providers are hidden. Synchronization of AGIS content with external sources is performed by agents which periodically communicate with sources via standard interfaces and update database content. For some types of information AGIS is itself the primary repository. AGIS stores data objects in a way convenient for ATLAS, introduces additional object relations required by ATLAS applications, exposes the data via API and web front end services. Through the API clients are able to update information stored in AGIS. A python API and command line tools further help end users and developers use the system conveniently. Web interfaces such as a site downtime calendar and ATLAS topology viewers are widely used by shifters and data distribution experts.
        Speaker: Alexey Anisenkov (Budker Institute of Nuclear Physics (RU))
        Poster
      • 1:30 PM
        Alert Messaging in the CMS Distributed Workload System 4h 45m
        WMAgent is the core component of the CMS workload management system. One of the features of this job managing platform is a configurable messaging system aimed at generating, distributing and processing alerts: short messages describing a given alert-worthy informational or pathological condition. Apart from the framework's sub-components running within the WMAgent instances, there is a stand-alone application collecting alerts from all WMAgent instances running across the CMS distributed computing environment. The alert framework has a versatile design that allows for receiving alert messages also from other CMS production applications, such as PhEDEx data transfer manager. We present implementation details of the system, including its python implementation using ZeroMQ, CouchDB message storage and future visions as well as operational experiences. Inter-operation with monitoring platforms such as Dashboard or Lemon is described.
        Speaker: Zdenek Maxa (California Institute of Technology (US))
      • 1:30 PM
        ALICE Grid Computing at the GridKa Tier-1 Center 4h 45m
        The GridKa center at the Karlsruhe Institute for Technology is the largest ALICE Tier-1 center. It hosts 40,000 HEPSEPC'06, approximately 2.75 PB of disk space and 5.25 PB of tape space for for A Large Ion Collider Experiment (ALICE), at the CERN LHC. These resources are accessed via the AliEn middleware. The storage is divided into two instances, both using the storage middleware xrootd. We will focus on the set-up of these resources and on the topic of monitoring. The latter serves a vast number of purposes, ranging from efficiency statistics for process and procedure optimization to alerts for on-call duty engineers.
        Speaker: Dr Christopher Jung (KIT - Karlsruhe Institute of Technology (DE))
      • 1:30 PM
        AliEn Extreme JobBrokering 4h 45m
        The AliEn workload management system is based on a central job queue wich holds all tasks that have to be executed. The job brokering model itself is based on pilot jobs: the system submits generic pilots to the compuiting centres batch gateways, and the assignment of a real job is done only when the pilot wakes up on the worker node. The model facilitates a flexible fair share user job distribution. This job model has proven stable and reliable over the past years and has surviced well the first two years of LHC operation with very little changes. Nonetheless there are several areas where the model can be pushed to the next level: most notably in the area of 'just in time' job and data assignment, where the decisions will be based on data closeness (relaxed locality) and the data which already has been processed. This methods will have significant efficiency enhancement effect for end user analysis tasks.
        Speaker: Pablo Saiz (CERN)
        Slides
      • 1:30 PM
        An optimization of the ALICE XRootD storage cluster at the Tier-2 site in Czech Republic 4h 45m
        ALICE, as well as the other experiments at the CERN LHC, has been building a distributed data management infrastructure since 2002. Experience gained during years of operations with different types of storage managers deployed over this infrastructure has shown that the most adequate storage solution for ALICE is the native XRootD manager developed within a CERN - SLAC collaboration. The XRootD storage clusters exhibit higher stability and availability in comparison with other storage solutions and demonstrate a number of other advantages like support of high speed WAN data access or no need for maintaining complex databases. Two of the operational charasteristics of XRootD data servers are a relatively high number of open sockets and a high Unix load. In this contribution we would like to describe our experience with the tuning/optimization of machines hosting the XRootD servers which are part of the ALICE storage cluster at the Tier-2 WLCG site in Prague, Czech Republic. The optimization procedure, in addition to boosting the read/write performance of the servers, also resulted in a reduction of the Unix load.
        Speakers: Dr Dagmar Adamova (Nuclear Physics Institute of the AS CR Prague/Rez), Mr Jiri Horky (Institute of Physics of the AS CR Prague)
        Poster
      • 1:30 PM
        APEnet+: a 3-D Torus network optimized for GPU-based HPC Systems 4h 45m
        The emerging of hybrid GPU-accelerated clusters in the supercomputing landscape is a matter of fact. In this framework we proposed a new INFN initiative, the QUonG project, aiming to deploy a high performance computing system dedicated to scientific computations leveraging on commodity multi-core processors coupled with last generation GPUs. The multi-node interconnection system is based on a point-to-point, high performance, low latency 3-d torus network built in the framework of the APEnet+ project: it consists of an FPGA-based PCI Express board exposing six full bidirectional links running at 34 Gbps each, and implementing RDMA protocol. In order to enable significant access latency reduction for inter-node data transfer a direct network-to-GPU interface was built. The specialized hardware blocks, integrated in the APEnet+ board, provide support for GPU-initiated communications using the so called PCI Express peer-to-peer (P2P) transactions. To this end we are strongly collaborating with NVidia GPU vendor. The final shape of a complete QUonG deployment is an assembly of standard 42U racks, each one capable of ~80 TFlops/rack of peak performance, at a cost of 5 KEuro/TFlops and for an estimated power consumption of 25 KW/rack. A first reduced QUonG system prototype is expected to be delivered by the end of the year 2011. In this talk we will report on the status of final rack deployment and on the 2012 R&D activities that will focus on performance enhancing of the APEnet+ hardware through the adoption of new generation 28nm FPGA allowing the implementation of PCI-e Gen3 host interface and the addition of new fault tolerance oriented capabilities.
        Speaker: Laura Tosoratto (INFN)
        Poster
      • 1:30 PM
        Applicability of modern, scale-out file services in dedicated LHC data analysis environments. 4h 45m
        DESY has started to deploy modern, state of the art, industry based, scale out file services together with certain extension as a key component in dedicated LHC analysis environments like the National Analysis Facility (NAF) @DESY. In a technical cooperation with IBM, we will add identified critical features to the standard SONAS product line of IBM to make the system best suited for the already high and increasing demands of the NAF@DESY. Initially we will give a short introduction of the core system and their basic mode of operations - followed by a detailed description of the identified additional components/services addressed within the DESY/IBM cooperation and largely worked out by talking to the physicists doing analysis on the NAF today. Already known areas are for example: interface to tertiary storage (archive), system federation through industry standard protocols, X509 integration and far more aggressive caching of physics data (immutable data). Finally we will show in detail the first results of the newly implemented features including lectures learned regarding the basic suitability in our community.
        Speaker: Mr Martin Gasthuber (Deutsches Elektronen-Synchrotron (DE))
        Poster
      • 1:30 PM
        Application of rule based data mining techniques to real time ATLAS Grid job monitoring data 4h 45m
        The Job Execution Monitor (JEM), a job-centric grid job monitoring software, is actively developed at the University of Wuppertal. It leverages Grid-based physics analysis and Monte Carlo event production for the ATLAS experiment by monitoring job progress and grid worker node health. Using message passing techniques, the gathered data can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures taken by the Job's owner. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehavior. Shifters can use the same aggregated data to quickly react to site error conditions and broken production jobs. JEM is integrated into ATLAS' Pilot-based "PanDA" job brokerage system. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitoring data to provide job- and site-health summary information to users and admins is presented. Finally, the provision of a secure real-time control- and steering channel to the job as extension of the presented monitoring software is considered and a possible architecture is shown
        Speaker: Sergey Kalinin (Bergische Universitaet Wuppertal (DE))
        Poster
      • 1:30 PM
        Application of the DIRAC framework in CTA: first evaluation 4h 45m
        The Cherenkov Telescope Array (CTA) – an array of many tens of Imaging Atmospheric Cherenkov Telescopes deployed on an unprecedented scale – is the next generation instrument in the field of very high energy gamma-ray astronomy. CTA will operate as an open observatory providing data products to the scientific community. An average data stream of some GB/s for about 1000 hours of observation per year, thus producing several PB/year, is expected. Large CPU time is required for data processing as well as for massive Monte Carlo simulations (MC) needed for detector calibration purposes and performance studies as a function of detectors and lay-out configurations. Given these large storage and computing requirements, the Grid approach is well suited and massive MC simulations are already running on the EGI Grid. In order to optimize resource usage and to handle in a coherent way all production and future analysis activities, a high level framework with advanced functionalities is needed. For this purpose the DIRAC framework for distributed computing access implementing CTA workloads is evaluated. The benchmark test results of DIRAC as well as the extensions developed to cope with CTA specific needs are presented.
        Speaker: Luisa Arrabito (IN2P3/LUPM on behalf of the CTA Consortium)
      • 1:30 PM
        ATLAS Data Caching based on the Probability of Data Popularity 4h 45m
        Efficient distribution of physics data over ATLAS grid sites is one of the most important tasks for user data processing. ATLAS' initial static data distribution model over-replicated some unpopular data and under-replicated popular data, creating heavy disk space loads while under-utilizing some processing resources due to low data availability. Thus, a new data distribution mechanism was implemented, PD2P (PanDA Dynamic Data Placement) within the production and distributed analysis system PanDA that dynamically reacts to user data needs [1], basing dataset distribution principally on user demand. Data deletion is also demand driven, reducing replica counts for unpopular data [2]. This dynamic model has led to substantial improvements in efficient utilization of storage and processing resources. Based on this experience, in this work we seek to further improve data placement policy by investigating in detail how data popularity is calculated. For this it is necessary to precisely define what data popularity means, what types of data popularity exist, how it can be measured, and most importantly, how the history of the data can help to predict the popularity of derived data. We introduce locality of the popularity: a dataset may be only of local interest to a subset of clouds/sites or may have a wide (global) interest. We also extend the idea of the “data temperature scale” model [3] and a popularity measure. Using the ATLAS data replication history, we devise data distribution algorithms based on popularity measures and past history. Based on this work we will describe how to explicitly identify why and how datasets become popular and how such information can be used to predict future popularity. [1] Kaushik De, Tadashi Maeno, Torre Wenaus, Alexei Klimentov, Rodney Walker, Graeme Stewart, “PD2P – PanDA Dynamic Data Placement”, ATLAS Notes, CERN [2] Angelos Molfetas, Fernando Barreiro Megino, Andrii Tykhonov, Vincent Garonne, Simone Campana, Mario Lassnig, Martin Barisits, Gancho Dimitrov, Florbela Tique Aires Viegas, “Popularity framework to process dataset tracers and its application on dynamic replica reduction in the ATLAS experiment”, CHEP, Taipei, Taiwan, October 18-22, 2010 [3] Alexei Klimentov, “ATLAS data over Grid (data replication, placement and deletion policy)”, ATLAS Notes, CERN, March 17, 2009
        Speaker: Mikhail Titov (University of Texas at Arlington (US))
        Poster
      • 1:30 PM
        ATLAS Distributed Computing Monitoring tools after full 2 years of LHC data taking 4h 45m
        This talk details variety of Monitoring tools used within the ATLAS Distributed Computing during the first 2 years of LHC data taking. We discuss tools used to monitor data processing from the very first steps performed at the Tier-0 facility at CERN after data is read out of the ATLAS detector, through data transfers to the ATLAS computing centers distributed world-wide. We present an overview of monitoring tools used daily to track ATLAS Distributed Computing activities ranging from network performance and data transfers throughput, through data processing and readiness of the computing services at the ATLAS computing centers, to the reliability and usability of the ATLAS computing centers. Described tools provide monitoring for issues of different level of criticality: from spotting issues with the instant online monitoring to the long-term accounting information.
        Speaker: Jaroslava Schovancova (Acad. of Sciences of the Czech Rep. (CZ))
        Poster
      • 1:30 PM
        ATLAS Distributed Computing Operations: Experience and improvements after 2 full years of data-taking 4h 45m
        This paper will summarize operational experience and improvements in ATLAS computing infrastructure during 2010 and 2011. ATLAS has had 2 periods of data taking, with many more events recorded in 2011 than in 2010. It ran 3 major reprocessing campaigns. The activity in 2011 was similar to that in 2010, but scalability issues had to be adressed due to the increase in luminosity and trigger rate. Based on improved monitoring of ATLAS Grid computing, the evolution of computing activities (data/group production, their distribution and grid analysis) over time will be presented. The major bottlenecks and the implemented solutions will be described. The main changes in the implementation of the computing model that will be shown are: the optimisation of data distribution over the Grid, according to effective transfer rate and site readiness for analysis; the relaxation of the cloud model, for data distribution and data processing; software installation migration to cvmfs; changing database access to a Frontier/squid infrastructure.
        Speakers: Graeme Andrew Stewart (CERN), Dr Stephane Jezequel (LAPP)
        Slides
      • 1:30 PM
        ATLAS Distributed Computing Shift Operation in the first 2 full years of LHC data taking 4h 45m
        ATLAS Distributed Computing organized 3 teams to support data processing at Tier-0 facility at CERN, data reprocessing, data management operations, Monte Carlo simulation production, and physics analysis at the ATLAS computing centers located world-wide. In this talk we describe how these teams ensure that the ATLAS experiment data is delivered to the ATLAS physicists in a timely manner in the glamorous era of the LHC data taking. We describe experience with ways how to improve degraded service performance, we detail on the Distributed Analysis support over the exciting period of the computing model evolution.
        Speaker: Jaroslava Schovancova (Acad. of Sciences of the Czech Rep. (CZ))
        Poster
      • 1:30 PM
        ATLAS DQ2 Deletion Service 4h 45m
        The ATLAS Distributed Data Management project DQ2 is responsible for the replication, access and bookkeeping of ATLAS data across more than 100 distributed grid sites. It also enforces data management policies decided on by the collaboration and defined in the ATLAS computing model. The DQ2 deletion service is one of the most important DDM services. This distributed service interacts with 3rd party grid middleware and the DQ2 catalogs to serve data deletion requests on the grid. Furthermore, it also takes care of retry strategies, check-pointing transactions, load management and fault tolerance. In this paper special attention is paid to the technical details which are used to achieve the high performance of service (peaking at more than 4 millions files deleted per day), accomplished without overloading either site storage, catalogs or other DQ2 components. Special attention is also paid to the deletion monitoring service that allows operators a detailed view of the working system.
        Speaker: Danila Oleynik (Joint Inst. for Nuclear Research (RU))
        Poster
      • 1:30 PM
        ATLAS Grid Data Processing: system evolution and scalability 4h 45m
        The production system for Grid Data Processing (GDP) handles petascale ATLAS data reprocessing and Monte Carlo activities. The production system empowered further data processing steps on the Grid performed by dozens of ATLAS physics groups with coordinated access to computing resources worldwide, including additional resources sponsored by regional facilities. The system provides knowledge management of configuration parameters for massive data processing tasks, reproducibility of results, scalable database access, orchestrated workflow and performance monitoring, dynamic workload sharing, automated fault tolerance and petascale data integrity control. The system evolves to accommodate a growing number of users and new requirements from our contacts in ATLAS main areas: Trigger, Physics, Data Preparation and Software & Computing. To assure scalability, the next generation production system architecture development is in progress. We report on scaling up the GDP production system for a growing number of users providing data for physics analysis and other ATLAS main activities.
        Speaker: Pavel Nevski (Brookhaven National Laboratory (US))
        Slides
      • 1:30 PM
        ATLAS job monitoring in the Dashboard Framework 4h 45m
        Monitoring of the large-scale data processing of the ATLAS experiment includes monitoring of production and user analysis jobs. Experiment Dashboard provides a common job monitoring solution, which is shared by ATLAS and CMS experiments. This includes an accounting portal as well as real-time monitoring. Dashboard job monitoring for ATLAS combines information from the Panda job processing DB, Production system DB and monitoring information from jobs submitted through Ganga to WMS or local batch systems. Usage of Dashboard-based job monitoring applications will decrease load on the PanDA DB and overcome scale limitations in PanDA monitoring caused by the short job rotation cycle in the PanDA DB. Aggregation of the task/job metrics from different sources will provide complete view of job processing in scope of ATLAS. The presentation will describe the architecture, functionality and the future plans of the new monitoring applications, including the accounting portal and task monitoring for production and analysis users.
        Speaker: Laura Sargsyan (A.I. Alikhanyan National Scientific Laboratory (AM))
        Poster
      • 1:30 PM
        ATLAS off-Grid sites (Tier 3) monitoring. From local fabric monitoring to global overview of the VO computing activities 4h 45m
        The ATLAS Distributed Computing activities have so far concentrated in the "central" part of the experiment computing system, namely the first 3 tiers (the CERN Tier0, 10 Tier1 centers and over 60 Tier2 sites). Many ATLAS Institutes and National Communities have deployed (or intend to) deploy Tier-3 facilities. Tier-3 centers consist of non-pledged resources, which are usually dedicated to data analysis tasks by the geographically close or local scientific groups, and which usually comprise a range of architectures without Grid middleware. Therefore a substantial part of the ATLAS monitoring tools which make use of Grid middleware, cannot be used for a large fraction of Tier3 sites. The presentation will describe the T3mon project, which aims to develop a software suite for monitoring the Tier3 sites, both from the perspective of the local site administrator and that of the ATLAS VO, thereby enabling the global view of the contribution from Tier3 sites to the ATLAS computing activities. Special attention in presentation will be paid generic monitoring solutions for PROOF and xrootd, covering monitoring components which collect, store and visualise monitoring data. One of the popular solutions for local data analysis is the PROOF-based computing facility with a simple storage system based on xrootd protocol. Monitoring of user activities at the PROOF-based computing facility as well as data access and data movement with xrootd is useful, both on the local and global VO level. The proposed PROOF and xrootd monitoring systems can be deployed as a part of the T3mon monitoring suite or separately as standalone components and can easily be integrated in the global VO tools for monitoring data movement, data access or job processing.
        Speaker: Danila Oleynik (Joint Inst. for Nuclear Research (RU))
        Paper
        Poster
      • 1:30 PM
        ATLAS R&D Towards Next-Generation Distributed Computing 4h 45m
        The ATLAS Distributed Computing (ADC) project delivers production quality tools and services for ATLAS offline activities such as data placement and data processing on the Grid. The system has been capable of sustaining with large contingency the needed computing activities in the first years of LHC data taking, and has demonstrated flexibility in reacting promptly to new challenges. Development activities in this period have focused on consolidating existing services and increasing automation to be able to sustain existing loads. At the same time, an R&D program has evaluated new solutions and promising technologies capable of extending the operational scale, manageability and feature set of ATLAS distributed computing, several of which have selectively been brought to maturity as production-level tools and services. We will give an overview of R&D work in evaluating new tools and approaches and their integration into production services. A non exhaustive list of items includes cloud computing and virtualization, non-relational databases, utilizing multicore processors, the CERNVM File System, end to end network monitoring, event and file level caching, and federated distributed storage systems. The R&D initiative, while focused on ATLAS needs, has aimed for a broad scope involving many other parties including other LHC experiments, ATLAS Grid sites, the CERN IT department, and WLCG and OSG programs.
        Speaker: Collaboration Atlas (Atlas)
      • 1:30 PM
        Automating ATLAS Computing Operations using the Site Status Board 4h 45m
        The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The presentation will describe how SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in SSB. It will demonstrate the positive impact of the use of SSB on the overall performance of ATLAS computing activities and will overview future plans.
        Speaker: Mr Erekle Magradze (Georg-August-Universitaet Goettingen (DE))
        Slides
      • 1:30 PM
        Automating Linux Deployment with Cobbler 4h 45m
        Cobbler is a network-based Linux installation server, which, via a choice of web or CLI tools, glues together PXE/DHCP/TFTP and automates many associated deployment tasks. It empowers a facility's systems administrators to write scriptable and modular code, which can pilot the OS installation routine to proceed unattended and automatically, even across heterogeneous hardware. These tools make it so system administrators do not have to move between various commands and applications and then and tweak machine specific configuration files when deploying the OS. Network deployments can be configured for new and re-installations via PXE, media-based over-the-network installations, and virtualized installations that support Xen, qemu, KVM, and some variants of VMware. Cobbler supports most large Linux distributions, including Red Hat Enterprise Linux, Scientific Linux, Centos, SuSE Enterprise Linux, Fedora, Debian, and Ubuntu. Here at the RACF at Brookhaven National Laboratory, we had been deploying network PXE installs for many years, and needed a centralized and scalable solution for Linux deployments. This paper will discuss the ways in which we now use Cobbler for nearly all Linux OS deployments, both physical and virtualized. We will discuss our existing Cobbler setup, and the details of how we use Cobbler to deploy variants of the RHEL OS to our 250+ infrastructure servers.
        Speaker: Mr James Pryor (Brookhaven National Laboratory)
      • 1:30 PM
        AutoPyFactory: A Scalable Flexible Pilot Factory Implementation 4h 45m
        The ATLAS experiment at the CERN LHC is one of the largest users of grid computing infrastructure, which is a central part of the experiment's computing operations. Considerable efforts have been made to use grid technology in the most efficient and effective way, including the use of a pilot job based workload management framework. In this model the experiment submits 'pilot' jobs to sites without payload. When these jobs begin to run they contact a central service to pick-up a real payload to execute. The first generation of pilot factories were usually specific to a single VO, and were bound to the particular architecture of that VO's distributed processing. A second generation provides factories which are more flexible, not tied to any particular VO, and provide new or improved features such as monitoring, logging, profiling, etc. In this paper we describe this key part of the ATLAS pilot architecture, a second generation pilot factory, AutoPyFactory. AutoPyFactory has a modular design and is highly configurable. It is able to send different types of pilots to sites and exploit different submission mechanisms and queue characteristics. It is tightly integrated with the PanDA job submission framework, coupling pilot flow to the amount of work the site has to run. It gathers information from many sources in order to correctly configure itself for a site, and its decision logic can easily be updated. Integrated into AutoPyFactory is a flexible system for delivering both generic and specific job wrappers which can perform many useful actions before starting to run end-user scientific applications, e.g. validation of the middleware, node profiling and diagnostics, and monitoring. AutoPyFactory now also has a robust monitoring system and we show how this has helped establish a reliable pilot factory service for ATLAS.
        Speaker: Dr Jose Caballero Bejar (Brookhaven National Laboratory (US))
        Poster
      • 1:30 PM
        BESIII and SuperB: Distributed job management with Ganga 4h 45m
        A job submission and management tool is one of the necessary components in any distributed computing system. Such a tool should provide a user-friendly interface for physics production group and ordinary analysis users to access heterogeneous computing resources, without requiring knowledge of the underlying grid middleware. Ganga, with its common framework and customizable plug-in structure, is such a tool. This paper will describe how experiment-specific job-management tools for BESIII and SuperB were developed as Ganga plugins, and discuss our experiences of using Ganga. The BESIII experiment studies electron-positron collisions in the tau-charm threshold region at BEPCII, located in Beijing. The SuperB experiment will take data at the new generation High Luminosity Flavor Factory, under construction in Rome. With its extremely high targeted luminosity (100 times more than previously achieved) it will provide a uniquely important source of data about the details of the New Physics uncovered at hadron colliders. To meet the challenge of rapidly increasing data volumes in the next few years, BESIII and SuperB are both now developing their own distributed computing environments. For both BESIII and SuperB, the experiment-specific Ganga plugins are described and their integration with the wider distributed system shown. For BESIII, this includes integration with the software system (BOSS) and the Dirac based distributed environment. Interfacing with the BESIII metadata and file catalog for dataset discovery is one of the key parts and is also described. The SuperB experience includes the development of a plugin capable of managing users’ analysis and Monte Carlo production jobs and integration of the Ganga job management features with two SuperB-specific information systems: the simulation production bookkeeping database and the data placement database. The experiences of these two different experiments in developing Ganga plugins to meet their own unique requirements are compared and contrasted, highlighting lessons learned.
        Speaker: Dr Xiaomei Zhang (IHEP, China)
        Poster
      • 1:30 PM
        Big data log mining: the key to efficiency 4h 45m
        In addition to the physics data generated each day from the CMS detector, the experiment also generates vast quantities of supplementary log data. From reprocessing logs to transfer logs this data could shed light on operational issues and assist with reducing inefficiencies and eliminating errors if properly stored, aggregated and analyzed. The term "big data" has recently taken the spotlight with organizations worldwide using tools such as CouchDB, Hadoop and Hive. In this paper we present a way of evaluating the capture and storage of log data from various experiment components to provide analytics and visualization in near real time.
        Speaker: Paul Rossman (Fermi National Accelerator Laboratory (FNAL))
      • 1:30 PM
        BOINC service for volunteer cloud computing 4h 45m
        Since a couple of years, a team at CERN and partners from the Citizen Cyberscience Centre (CCC) have been working on a project that enables general physics simulation programs to run in a virtual machine on volunteer PCs around the world. The project uses the Berkeley Open Infrastructure for Network Computing (BOINC) framework. Based on CERNVM and the job management framework Co-Pilot, this project was made available for public beta-testing in August 2011 with Monte Carlo simulations of LHC physics under the name "LHC@home 2.0" and the BOINC project: “Test4Theory”. At the same time, CERN's efforts on Volunteer Computing for LHC machine studies have been intensified; this project has previously been known as LHC@home, and has been running the “Sixtrack” beam dynamics application for the LHC accelerator, using a classic BOINC framework without virtual machines. CERN-IT has set up a BOINC server cluster, and has provided and supported the BOINC infrastructure for both projects. CERN intends to evolve the setup into a generic BOINC application service that will allow scientists and engineers at CERN to profit from volunteer computing. The authors describe the experience with the 2 different approaches to volunteer computing as well as the status and outlook of a general BOINC service. Please see also the presentation of CernVM Co-Pilot by Artem Harutyunyan https://indico.cern.ch/contributionDisplay.py?contribId=94&confId=149557
        Speaker: Alvaro Gonzalez Alvarez (CERN)
        Paper
        Poster
      • 1:30 PM
        Bolting the Door 4h 45m
        This presentation will cover the work conducted within the ScotGrid Glasgow Tier-2 site. It will focus on the multi-tiered network security architecture developed on the site to augment Grid site server security and will discuss the variety of techniques used including the utilisation of Intrusion Detection systems, logging and optimising network connectivity within the infrastructure. Also the analysis of the limitations of this approach and the potential for future research in this area will be investigated and discussed
        Speaker: Dr David Crooks (University of Glasgow/GridPP)
        Poster
      • 1:30 PM
        Building a local analysis center on OpenStack 4h 45m
        The experimental high energy physics group at the University of Melbourne is a member of the ATLAS, Belle and Belle II collaborations. We maintain a local data centre which enables users to test pre-production code and to do final stage data analysis. Recently the Australian National eResearch Collaboration Tools and Resources (NeCTAR) organisation implemented a Research Cloud based on OpenStack middlewear. This presentation details the development of an OpenStack-based local data analysis centre compromising hundreds of virtual machines with commensurate data storage.
        Speaker: Martin Sevior (University of Melbourne (AU))
      • 1:30 PM
        Building a Prototype of LHC Analysis Oriented Computing Centers 4h 45m
        A Consortium between four LHC Computing Centers (Bari, Milano, Pisa and Trieste) has been formed in 2010 to prototype Analysis-oriented facilities for CMS data analysis, using a grant from the Italian Ministry of Research. The Consortium aims to the realization of an ad-hoc infrastructure to ease the analysis activities on the huge data set collected by the CMS Experiment, at the LHC Collider. While "Tier2" Computing Centres, specialized in organized processing tasks like Monte Carlo simulation, are nowadays a well established concept, with years of running experience, site specialized towards end user chaotic analysis activities do not yet have a de-facto standard implementation. In our effort, we focus on all the aspects which can make the analysis tasks easier for a physics user not expert in computing. On the storage side, we are experimenting on storage techniques allowing for remote data access and on storage optimization on the typical analysis access patterns. On the networking side, we are studying the differences between flat and tiered LAN architecture, also using virtual partitioning of the same physical networking for the different use patterns. Finally, on the user side, we are developing tools and instruments to allow for an exhaustive monitoring of their processes at the site, and for an efficient support system in case of problems. We will report about the results of the test executed on different subsystem and give a description of the layout of the infrastructure in place at the site participating to the consortium.
        Speaker: Giacinto Donvito (Universita e INFN (IT))
        Poster
      • 1:30 PM
        Campus Grids Bring Additional Computational Resources to HEP Researchers 4h 45m
        It is common at research institutions to maintain multiple clusters that represent different owners or generations of hardware, or that fulfill different needs and policies. Many of these clusters are consistently under utilized while researchers on campus could greatly benefit from these unused capabilities. By leveraging principles from the Open Science Grid it is now possible to utilize these resources by forming a lightweight Campus Grids. The Campus Grids framework enables jobs that are submitted to one cluster to overflow, when necessary, to other clusters within the campus using whatever authentication mechanisms are available on campus. This framework is currently being used on several campuses to run HEP and other science jobs. Further, the framework has in some cases been expanded beyond the campus boundary by bridging campus grids into a regional grid, and can even be used to integrate resources from a national cyberinfrastructure such as the Open Science Grid. This poster will highlight 18 months of operational experiences creating campus grids in the US, and the different campus configurations that have successfully utilized the campus grid infrastructure.
        Speaker: Derek John Weitzel (University of Nebraska (US))
      • 1:30 PM
        Centralized configuration system for a large scale farm of network booted computers 4h 45m
        In the ATLAS Online computing farm, the majority of the systems are network booted - they run an operating system image provided via network by a Local File Server. This method guarantees the uniformity of the farm and allows very fast recovery in case of issues to the local scratch disks. The farm is not homogeneous and in order to manage the diversity of roles, functionality and hardware of different nodes we developed a dedicated central configuration system, ConfDB v2. We describe the design, functionality and performance of this system and its web-based interface, including its integration with CERN and ATLAS databases and with the monitoring infrastructure.
        Speaker: Georgiana Lavinia Darlea (Polytechnic University of Bucharest (RO))
        Poster
      • 1:30 PM
        Certified Grid Job Submission in the ALICE Grid Services 4h 45m
        Grid computing infrastructures need to provide traceability and accounting of their users’ activity and protection against misuse and privilege escalation, where the delegation of privileges in the course of a job submission is a key concern. This work describes an improved handling of multi-user Grid jobs in the ALICE Grid Services. A security analysis of the ALICE Grid job model is presented with derived security objectives, followed by a discussion of existing approaches of unrestricted delegation based on X.509 proxy certificates and the Grid middleware gLExec. Unrestricted delegation has severe security consequences and limitations, most importantly allowing for identity theft and forgery of jobs and data. These limitations are discussed and formulated, both in general and with respect to an adoption in line with multi-user Grid jobs. A new general model of mediated definite delegation is developed and formulated, allowing a broker to assign context-sensitive user privileges to agents while providing strong accountability and long-term traceability. A prototype implementation allowing for certified Grid jobs is presented including a potential interaction with gLExec. The achieved improvements regarding system security, malicious job exploitation, identity protection, and accountability are emphasized, followed by a discussion of non-repudiation in the face of malicious Grid jobs.
        Speaker: Mr Steffen Schreiner (CERN, CASED/TU Darmstadt)
        Poster
      • 1:30 PM
        Cloud based multi-platform data analysis application 4h 45m
        With the start-up of the LHC in 2009, more and more data analysis facilities have been built or enlarged at Universities and laboratories. In the mean time, new technologies, like Cloud computing and Web3D, and new types of hardware, like smartphones and tablets, have become available and popular in the market. Is there a way to integrate them into the existing data analysis models and allow physicists to do their daily work more conveniently and efficiently? In this paper we will discuss the development of a platform independent thin client application for data analysis on Cloud based infrastructures. The goal of this new development is to allow physicists to be able to run their data analysis with different hardware, like laptop, smart phone, tablet and access their data everywhere. The application can run within the web browser and smartphones without compatibility problems. Based on one of the most popular graphic engines, people can view 2D histograms, animated 3D event displays and even do event analysis. The heavy processing jobs will be sent to the Cloud via a master server, in such a way that people can run multiple complex jobs simultaneously. After having introduced the new system structure and the way the new application will fit in the overall picture, we will describe the current progress of the development and the test facility and discuss further technical difficulties that we expect to be confronted to, like the security (user authentication and authorization) data discovery and load balancing.
        Speaker: Neng Xu (University of Wisconsin (US))
      • 1:30 PM
        CMS Analysis Deconstructed 4h 45m
        The CMS Analysis Tools model has now been used robustly in a plethora of physics papers. This model is examined to investigate successes and failures as seen by the analysts of recent papers.
        Speaker: Prof. Sudhir Malik (University of Nebraska-Lincoln)
        Poster
      • 1:30 PM
        CMS resource utilization and limitations on the grid after the first two years of LHC collisions 4h 45m
        After years of development, the CMS distributed computing system is now in full operation. The LHC continues to set records for instantaneous luminosity, and CMS records data at 300 Hz. Because of the intensity of the beams, there are multiple proton-proton interactions per beam crossing, leading to larger and larger event sizes and processing times. The CMS computing system has responded admirably to these challenges, but some reoptimization of the computing model has been required to maximize the physics output of the collaboration in the face of increasingly constrained computing resources. We present the current status of the system, describe the recent performance, and discuss the challenges ahead and how we intend to meet them.
        Speaker: Kenneth Bloom (University of Nebraska (US))
        Slides
      • 1:30 PM
        Collaborative development. Case study of the development of flexible monitoring applications 4h 45m
        Collaborative development proved to be a key of the success of the Dashboard Site Status Board (SSB) which is heavily used by ATLAS and CMS for the computing shifts and site commissioning activities. The Dashboard Site Status Board (SSB) is an application that enables Virtual Organisation (VO) administrators to monitor the status of distributed sites. The selection, significance and combination of monitoring metrics falls clearly in the domain of the administrators, depending not only on the VO but also on the role of the administrator. Therefore, the key requirement for SSB is that it be highly customisable, providing an intuitive yet powerful interface to define and visualise various monitoring metrics. We present SSB as an example of a development process typified by very close collaboration between developers and the user community. The collaboration extends beyond the customisation of metrics and views to the development of new functionality and visualisations. SSB Developers and VO administrators cooperate closely to ensure that requirements are met and, wherever possible, new functionality is pushed upstream to benefit all users and VOs. The contribution covers the evolution of SSB over recent years to satisfy diverse use cases through this collaborative development process.
        Speaker: Pablo Saiz (CERN)
        Slides
      • 1:30 PM
        Combining virtualization tools for a dynamic, distribution agnostic grid environment for ALICE grid jobs in Scandinavia 4h 45m
        The Nordic Tier-1 for LHC is distributed over several, sometimes smaller, computing centers. In order to minimize administration effort, we are interested in running different grid jobs over one common grid middleware. ARC is selected as the internal middleware in the Nordic Tier-1. At the moment ARC has no mechanism of automatic software packaging and deployment. The AliEn grid middleware, used by ALICE provides this functionality. We are investigating the possibilities to use modern virtualization technologies to make these capabilities available for ALICE grid jobs on ARC. The CernVM project is developing a virtual machine that can provide a common analysis environment for all LHC experiments. One of our interests is to investigate the use of CernVM as a base setup for a dynamical grid environment capable of running grid jobs. For this, performance comparisons between different virtualization technologies have been conducted. CernVM needs an existing virtualization infrastructure, which is not always existing or wanted at some computing sites. To increase the possible application of dynamical grid environments to those sites, we describe several possibilities that are less invasive and have less specific Linux distribution requirements, at the cost of lower performance. Different tools like user-mode Linux (UML), micro Linux distributions, a new software packaging project by Stanford university (CDE) and CernVM are under investigation for their invasiveness, distribution requirements and performance. Comparisons between the different methods with solutions that are closer to the hardware will be presented.
        Speaker: Boris Wagner (University of Bergen (NO))
      • 1:30 PM
        Comparative Investigation of Shared Filesystems for the LHCb Online Cluster 4h 45m
        This paper describes the investigative study undertaken to evaluate shared filesystem performance and suitability in the LHCb Online environment. Particular focus is given to the measurements and field tests designed and performed on an in-house AFS setup, and related comparisons with NFSv3 and pNFS are presented. The motivation for the investigation and the test setup arises from the need to serve common user-space like home directories, experiment software and control areas, and clustered log areas. Since the operational requirements on such user-space are stringent in terms of read-write operations (in frequency and access speed) and unobtrusive data relocation, test results are presented with emphasis on file-level performance, stability and “high-availability” of the shared filesystems. Use-cases specific to the experiment operation in LHCb, including the specific handling of shared filesystems served to a cluster of 1500 diskless nodes, are described. Issues of authentication token expiry are explicitly addressed, keeping in mind long-running analysis jobs on the Online cluster. In addition, quantitative test results are also presented with alternatives including pNFS, which is now being seen as an increasingly viable option for shared filesystems in many medium to large networks. Comparative measurements of filesystem performance benchmarks are presented, which are seen to be used as reference for decisions on potential migration of the current storage solution deployed in the LHCb online cluster.
        Speakers: Niko Neufeld (CERN), Vijay Kartik Subbiah (CERN)
      • 1:30 PM
        Comparison of the CPU efficiency of High Energy and Astrophysics applications on different multi-core processor types. 4h 45m
        GridKa, operated by the Steinbuch Centre for Computing at KIT, is the German regional centre for high energy and astroparticle physics computing, supporting currently 10 experiments and serving as a Tier-1 centre for the four LHC experiments. Since the beginning of the project in 2002, the total compute power is upgraded at least once per year to follow the increasing demands of the experiments. The hardware is typically operated for about four years until it is replaced by more modern machines. The GridKa compute farm thus consists of a mixture of several generations of compute nodes differing in several parameters, e.g. CPU types, main memory, network connection bandwidth etc. We compare the CPU efficiency (CPU time to wall time ratio) of high energy physics and astrophysics compute jobs on these different types of compute nodes and estimate the impact of the ongoing trend towards many-core CPUs.
        Speaker: Andreas Heiss (KIT - Karlsruhe Institute of Technology (DE))
      • 1:30 PM
        Computing at Tier-3 sites in CMS 4h 45m
        There are approximately 60 Tier-3 computing sites located on campuses of collaborating institutions in CMS. We describe the function and architecture of these sites, and illustrate the range of hardware and software options. A primary purpose is to provide a platform for local users to analyze LHC data, but they are also used opportunistically for data production. While Tier-3 sites vary widely in size (number of nodes, users, support personnel), there are some common features. A site typically has a few nodes reserved for interactive use and to provide services such as an interface to the GRID. The remainder of the nodes are usually available for running CPU intensive batch jobs; a future plan will allow jobs to flock to other clusters on campus. In addition, data storage systems may be provided and we discuss several models in use, including the new paradigm of a diskless site with wide area access to data via a global XROOTD redirector. Compared to Tier-1 and Tier-2 sites, the Tier-3 sites are highly flexible and are designed for easy operation. Their ultimate configuration balances cost, performance, and reliability.
        Speaker: Robert Snihur (University of Nebraska (US))
      • 1:30 PM
        Computing On Demand: Dynamic Analysis Model 4h 45m
        Constant changes in computational infrastructure like the current interest in Clouds, imply conditions on the design of applications. We must make sure that our analysis infrastructure, including source code and supporting tools, is ready for the on demand computing (ODC) era. This presentation is about a new analysis concept, which is driven by users needs, completely disentangled from the computational resources, and scalable. What does it take for an analysis code to be performed on any resource management system? How can one achieve goals of on demand analysis, using PROOF on Demand (PoD)? These questions and such topics as preferable location of data files as well as tools and software development techniques for on demand data analysis are covered. Also analysis implementation requirements and comparisons of traditional and “on demand” facilities will be discussed during this talk.
        Speaker: Anar Manafov (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 1:30 PM
        Configuration management and monitoring of the middleware at GridKa 4h 45m
        GridKa is a computing centre located in Karlsruhe. It serves as Tier-1 centre for the four LHC experiments and also provides its computing and storage resources for other non-LHC HEP and astroparticle physics experiments as well as for several communities of the German Grid Initiative D-Grid. The middleware layer at GridKa comprises three main flavours: Globus, gLite and UNICORE. This layer provides the access to the several clusters, according to the requirements of the corresponding communities. The heterogeneous structure of middleware resources and services requires their effective administration for stable and sustainable operation of the whole computing centre. In the presentation the overview of the middleware system at GridKa is given with focus on the configuration management and monitoring. These are the crucial components of the administration task for the system with high-availability setup. The various configuration tools used at GridKa, their benefits and limitations as well as developed automation procedures of the configuration management will be discussed. The overview of the monitoring system which evaluates the information delivered by central and local grid information services and provides status and detailed diagnostics for the middleware services is presented.
        Speakers: Dimitri Nilsen (Karlsruhe Institute of Technology (KIT)), Dr Pavel Weber (Karlsruhe Institute of Technology (KIT))
      • 1:30 PM
        Consistency between Grid Storage Elements and File Catalogs for the LHCb experiment's data 4h 45m
        In the distributed computing model of WLCG Grid Storage Elements (SE) are by construction completely decoupled from the File Catalogs (FC) where the experiment's files are registered. On the basis of the experience of managing large volumes of data in such environment, inconsistencies have often happened either causing a waste of disk space, in case the data were deleted from the FC, but still physically on the SE, or serious operational problems in the opposite case, when some data registered in the FC was not found on the SE. Therefore, the LHCbDirac data management system has been equipped with a new dedicated system to ensure the consistency of the data stored on the SEs with the information reported in the FCs implementing systematic checks. Objective of the checks is to spot any inconsistency above a certain threshold, that cannot only be due to the expected latency between data upload and registration, and in such case try and identify the problematic data. The system relies on information provided by the sites who should make available to the experiment a full dump of their SEs on weekly or monthly basis. In this talk we shall present the definition of a common format and procedure to produce the storage dumps that has been coordinated with the other LHC experiments in order to provide a solution as generic as possible that can suit all LHC experiments and will reduce the effort for the sites who are asked to provide such data. We will also present the LHCb specific implementation for checking the consistency between SEs and FC and discuss the results.
        Speaker: Elisa Lanciotti (CERN)
      • 1:30 PM
        Controlled overflowing of data-intensive jobs from oversubscribed sites 4h 45m
        The CMS analysis computing model was always relying on jobs running near the data, with data allocation between CMS compute centers organized at management level, based on expected needs of the CMS community. While this model provided high CPU utilization during job run times, there were times when a large fraction of CPUs at certain sites were sitting idle due to lack of demand, all while Terabytes of data were never accessed. To improve the utilization of both CPU and disks, CMS is moving toward controlled overflowing of jobs from sites that have data but are oversubscribed to others with spare CPU and network capacity, with those jobs accessing the data through real time xrootd streaming over WAN. The major limiting factor for remote data access is the ability of the source storage system to serve such data, so the number of jobs accessing it must be carefully controlled. The CMS approach to this is to implement the overflowing by means of glideinWMS, a Condor based pilot system, and by providing the WMS with the known storage limits and let it schedule jobs within those limits. This talk presents the detailed architecture of the overflow-enabled glideinWMS system, together with operational experience of the past 6 months.
        Speaker: Mr Igor Sfiligoi (University of California San Diego)
        Poster
      • 1:30 PM
        CRAB3: Establishing a new generation of services for distributed analysis at CMS 4h 45m
        In CMS Computing the highest priorities for analysis tools are the improvement of the end users' ability to produce and publish reliable samples and analysis results as well as a transition to a sustainable development and operations model. To achieve these goals CMS decided to incorporate analysis processing into the same framework as the data and simulation processing. This strategy foresees that all workload tools (Tier0, Tier1, production, analysis) share a common core which allows long term maintainability as well as the standardization of the operator interfaces. The re-engineered analysis workload manager, called CRAB3, makes use of newer technologies, such as RESTful based web services, NoSQL Databases aiming to increase the scalability and reliability of the system. As opposed to CRAB2 in CRAB3 all work is centrally injected and managed in a global queue. A pool of agents, which can be geographically distributed, consumes work from the central services, servicing the user tasks. The new architecture of CRAB substantially changes the deployment model and operations activities. In this paper we present the implementation of CRAB3 emphasizing how the new architecture improves the workflow automation and simplifies maintainability. We will highlight, in particular, the impact of the new design on daily operations.
        Speaker: Daniele Spiga (CERN)
      • 1:30 PM
        CREAM Computing Element: a status update 4h 45m
        The European Middleware Initiative (EMI) project aims to deliver a consolidated set of middleware products based on the four major middleware providers in Europe - ARC, dCache, gLite and UNICORE. The CREAM (Computing Resource Execution And Management) Service, a service for job management operation at the Computing Element (CE) level, is one of the software product part of the EMI middleware distribution. In this paper we discuss about some new functionality in the CREAM CE introduced with the first EMI major release (EMI-1, codename Kebnekaise). The integration with the Argus authorization service is one of these implementations: the use of a unique authorization system, besides simplying the overall management, allows also to avoid inconsistent authorization decisions. An improved support for complex deployment scenarios (e.g. for sites having multiple CE head nodes and/or having heterogeneous resources) is another new achievement. The improved support for resource allocation in a multicore environments, and the initial support of version 2.0 of the Glue specification for resource publication are other new functionality introduced with the first EMI release.
        Speaker: Mr Massimo Sgaravatto (Universita e INFN (IT))
      • 1:30 PM
        Creating Dynamic Virtual Networks for network isolation to support Cloud computing and virtualization in large computing centers 4h 45m
        The extensive use of virtualization technologies in cloud environments has created the need for a new network access layer residing on hosts and connecting the various Virtual Machines (VMs). In fact, massive deployment of virtualized environments imposes requirements on networking for which traditional models are not well suited. For example, hundreds of users issuing cloud requests for which full access (i.e., including root privileges) to VMs are requested, typically requires the definition of network separation at layer 2 through the use of virtual LANs (VLANs). However, in large computing centers, due for example to the number of installed network switches, to their characteristics, or to their heterogeneity, the dynamic (or even static) definition of many VLANs is often impractical or simply not possible. In this paper, we present a solution to the problem of creating dynamic virtual networks based on the use of the Generic Routing Protocol (GRE). GRE is used to encapsulate VM traffic so that the configuration of the physical network switches doesn't have to change. In particular, we describe how this solution can be used to tackle problems such as dynamic network isolation and mobility of VMs across hosts or even sites. We will then show how this solution has been integrated in the WNoDeS framework (http://web.infn.it/wnodes) and tested in the WNoDeS installation at the INFN Tier-1, presenting performance metrics and an analysis of the scalability of the system.
        Speaker: Marco Caberletti (Istituto Nazionale Fisica Nucleare (IT))
        Poster
      • 1:30 PM
        Data analysis system for Super Charm-Tau Factory at BINP 4h 45m
        Super Charm–Tau Factory (CTF) is a future electron-positron collider with center-of-mass energy range from 2 to 5 GeV and unprecedented for this energy range peak luminosity of about 10**35 cm−2s−1. The project of CTF is being developed in the Budker Institute of Nuclear Physics (Novosibirsk, Russia). The main goal of experiments at Super Charm-Tau Factory is a study of the processes with charm quarks or tau leptons in the final state using data samples, which are by 3–4 orders of magnitude higher than collected by now in any other experiments. The peak input data flow up to 10 GBytes/s and very large collected data volume, estimated to be 200 PBytes, require to design large scale data storage and data analysis system. We overview the requirements for the computer infrastructure of Super Charm-Tau Factory and discuss the main design solutions.
        Speaker: Dr Ivan Logashenko (Budker Institute Of Nuclear Physics)
      • 1:30 PM
        Data storage accounting and verification in LHC experiments 4h 45m
        All major experiments at Large Hadron Collider (LHC) need to measure real storage usage at the Grid sites. This information is equally important for the resource management, planning, and operations. To verify consistency of the central catalogs, experiments are asking sites to provide full list of files they have on storage, including size, checksum, and other file attributes. Such storage dumps provided at regular intervals give a realistic view of the storage resource usage by the experiments. Regular monitoring of the space usage and data verification serve as additional internal checks of the system integrity and performance. Both the importance and the complexity of these tasks increase with the constant growth of the total data volumes during the active data taking period at the LHC. Developed common solutions help to reduce the maintenance costs both at the large Tier-1 facilities supporting multiple virtual organizations, and at the small sites that often lack manpower. We discuss requirements and solutions to the common tasks of data storage accounting and verification, and present experiment-specific strategies and implementations used within the LHC experiments according to their computing models.
        Speaker: Natalia Ratnikova (KIT - Karlsruhe Institute of Technology (DE))
        Poster
      • 1:30 PM
        Data transfer test with 100 Gb network 4h 45m
        As part of the Advanced Networking Initiative (ANI) of ESnet, we exercise a prototype 100Gb network infrastructure for data transfer and processing for OSG HEP applications. We present results of these tests.
        Speaker: Mr haifeng pi (CMS)
      • 1:30 PM
        Deployment and Operational Experiences with CernVM-FS at the GridKa Tier-1 Center 4h 45m
        In 2012 the GridKa Tier-1 computing center hosts 130kHEPSPEC06 computing resources and 11PB disk and 17.7PB tape space. These resources are shared between the four LHC VOs and a number of national and international VOs from high energy physics and other sciences. CernVM-FS has been deployed at GridKa to supplement the existing NFS-based system to access VO software on the worker nodes. It provides a solution tailored to the requirement of the LHC VOs. We will focus on the first operational experiences and the monitoring of CernVM-FS on the worker nodes and the squid caches.
        Speaker: Mr Andreas Petzold (KIT)
      • 1:30 PM
        Design and implementation of a reliable and cost-effective cloud computing infrastructure: the INFN Naples experience 4h 45m
        Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work, we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a results of our efforts we developed a complete solution for data and computing centers that can be easily replicated using commodity hardware. Our architecture mainly consists of 2 subsystems: a clustered storage solution, built on top of disk servers running Gluster file system, and a virtual machines execution environment. The hypervisor hosts use Scientific Linux and KVM as virtualization technology and run both Windows and Linux guests. Virtual machines have their root file systems on qcow2 disk-image files, stored on a Gluster network file system. Gluster is able to perform parallel writes on multiple disk servers (two, in our system), providing this way live replication of data. A failure of a disk server doesn't cause glitches or stops any of the running virtual guests as each hypervisor host still has full access to disk-image files. When the failing disk server returns to normal activity Gluster self-healing integrated mechanism performs a background transparent reconstruction of missing replicas. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. Linux channel bonding provides adaptive load balancing of network traffic over multiple links and dedicated VLANs guarantee isolation of the storage subsystem from the general-purpose network. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
        Speaker: Dr Vincenzo Capone (Universita e INFN (IT))
        Poster
      • 1:30 PM
        Development of noSQL data storage for the ATLAS PanDA Monitoring System 4h 45m
        For several years the PanDA Workload Management System has been the basis for distributed production and analysis for the ATLAS experiment at the LHC. Since the start of data taking PanDA usage has ramped up steadily, typically exceeding 500k completed jobs/day by June 2011. The associated monitoring data volume has been rising as well, to levels that present a new set of challenges in the areas of database scalability and monitoring system performance and efficiency. These challenges have being met with a R&D and development effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present the data design and indexing strategies for efficient queries, as well as our experience of operating a Cassandra cluster and interfacing it with a Web service.
        Speaker: Maxim Potekhin (Brookhaven National Laboratory (US))
      • 1:30 PM
        DIRAC evaluation for the SuperB experiment 4h 45m
        The SuperB asymmetric energy e+e- collider and detector to be built at the newly founded Nicola Cabibbo Lab will provide a uniquely sensitive probe of New Physics in the flavor sector of the Standard Model. Studying minute effects in the heavy quark and heavy lepton sectors requires a data sample of 75 ab-1 and a luminosity target of 10^36 cm-2 s-1. In this work we will present our evaluation of the DIRAC Distributed Infrastructure for use in the SuperB experiment based on the two use cases: End User Analysis and Monte Carlo Production. We will present: 1) The test bed layout with DIRAC site and service configurations and the efforts to enable and manage OSG-EGI interoperability. 2) Our specific use cases ported to the DIRAC test bed with the computational and data management requirements and the DIRAC subsystem configuration. 3) The test results obtained from running both SuperB Monte Carlo and end user analysis with details about the performance achieved, the efficiency and the failures that occurred during the tests. 4) An evaluation and comparison of the two catalogue systems provided by the DIRAC framework, LFC (LHC File Catalogue) and DIRAC File Catalog in terms of features, performance and reliability. 5) Evaluation of capabilities and performance tests of the DIRAC Cloud capabilities as potentially applicable to SuperB computing. 6) A comparison of DIRAC with other submission systems available in the HEP community with pros and cons of each system.
        Speaker: Dr Giacinto Donvito (INFN-Bari)
        Poster
      • 1:30 PM
        DIRAC File Replica and Metadata Catalog 4h 45m
        File replica and metadata catalogs are essential parts of any distributed data management system, which are largely determining its functionality and performance. A new File Catalog (DFC) was developed in the framework of the DIRAC Project that combines both replica and metadata catalog functionality. The DFC design is based on the practical experience with the data management system of the LHCb Collaboration. It is optimized for the most common patterns of the catalog usage in order to achieve maximum performance from the user perspective. The DFC supports bulk operations for replica queries and allows quick analysis of the storage usage globally and for each Storage Element separately. It supports flexible ACL rules with plug-ins for various policies that can be adopted by a particular community. The DFC catalog allows to store various types of metadata associated with the files and directories and to perform efficient queries for the data based on complex metadata combinations. Definition of file ancestor-descendent chains is also possible. It is implemented in the DIRAC distributed computing framework following the standard grid security architecture. In this contribution we describe the design of the DFC and its implementation details. The performance measurements are compared with other grid file catalog implementations. The experience of the DFC Catalog usage in the ILC Collaboration is discussed.
        Speaker: Dr Andrei Tsaregorodtsev (Universite d'Aix - Marseille II (FR))
        Poster
      • 1:30 PM
        DIRAC RESTful API 4h 45m
        The DIRAC framework for distributed computing has been designed as a flexible and modular solution that can be adapted to the requirements of any community. Users interact with DIRAC via command line, using the web portal or accessing resources via the DIRAC python API. The current DIRAC API requires users to use a python version valid for DIRAC. Some communities have developed their own software solutions for handling their specific workload, and would like to use DIRAC as their back-end to access distributed computing resources easily. Many of these solutions are not coded in python or depend on a specific python version. To solve this gap DIRAC provides a new language agnostic API that any software solution can use. This new API has been designed following the RESTful principles. Any language with libraries to issue standard HTTP queries may use it. GSI proxies can still be used to authenticate against the API services. However GSI proxies are not a widely adopted standard. The new DIRAC API also allows clients to use OAuth for delegating the user credentials to a third party solution. These delegated credentials allow the third party software to query to DIRAC on behalf of the users. This new API will further expand the possibilities communities have to integrate DIRAC into their distributed computing models.
        Speaker: Adrian Casajus Ramo (University of Barcelona (ES))
      • 1:30 PM
        Disk to Disk network transfers at 100 Gb/s using a handful of servers 4h 45m
        For the Super Computing 2011 conference in Seattle, Washington, a 100 Gb/s connection was established between the California Institute of Technology conference booth and the University of Victoria. A small team performed disk to disk data transfers between the two sites nearing 100 Gb/s, using only a small set of properly configured transfer servers equipped with SSD drives.The circuit was established over the BCnet, CANARIE and SCinet (the SuperComputing conference network) using network equipment dedicated to the demonstration.The end-sites' setups involved a mix of 10GE and 40GE technologies. Three servers were equipped with PCIe v3, with a theoretical throughput per network interface of 40Gb/s. We examine the design of the circuit and the work necessary to establish it. The technical hardware design of each end system is described. We discuss the transfer tools, disk configurations, and monitoring tools used in the test with particular emphasis on disk to disk throughput. We review the final test results in addition to discussing the practical problems encountered and overcome during the demonstration. Finally, we evaluate the performance obtained, both with regard to the 100Gb/s WAN circuit as well as end-system and LAN setups, and discuss potential application as a high-rate data access system, and/or caching front-end to a large conventional storage system.
        Speakers: Artur Jerzy Barczyk (California Institute of Technology (US)), Ian Gable (University of Victoria (CA))
      • 1:30 PM
        Distributed Data Analysis in the ATLAS Experiment: Challenges and Solutions 4h 45m
        The ATLAS experiment at the LHC at CERN is recording and simulating several 10's of PetaBytes of data per year. To analyse these data the ATLAS experiment has developed and operates a mature and stable distributed analysis (DA) service on the Worldwide LHC Computing Grid. The service is actively used: more than 1400 users have submitted jobs in the year 2011 and a total of more 1 million jobs run every week. Users are provided with a suite of tools to submit Athena, ROOT or generic jobs to the grid, and the PanDA workload management system is responsible for their execution. The reliability of the DA service is high but steadily improving; grid sites are continually validated against a set of standard tests, and a dedicated team of expert shifters provides user support and communicates user problems to the sites. This talk will review the state of the DA tools and services, summarize the past year of distributed analysis activity, and present the directions for future improvements to the system.
        Speaker: Johannes Elmsheuser (Ludwig-Maximilians-Univ. Muenchen (DE))
        Slides
      • 1:30 PM
        Distributed monitoring infrastructure for Worldwide LHC Computing Grid 4h 45m
        The journey of a monitoring probe from its development phase to the moment its execution result is presented in an availability report is a complex process. It goes through multiple phases such as development, testing, integration, release, deployment, execution, data aggregation, computation, and reporting. Further, it involves people with different roles (developers, site managers, VO managers, service managers, management), from different middleware providers (ARC, dCache, gLite, UNICORE and VDT), consortiums (WLCG, EMI, EGI, OSG), and operational teams (GOC, OMB, OTAG, CSIRT). The seamless harmonization of these distributed actors is in daily use for monitoring of the WLCG infrastructure. In this paper we describe the monitoring of the WLCG infrastructure from the operational perspective. We explain the complexity of the journey of a monitoring probe from its execution on a grid node to the visualization on the MyWLCG portal where it is exposed to other clients. This monitoring workflow profits from the interoperability established between the SAM and RSV frameworks. We show how these two distributed structures are capable of uniting technologies and hiding the complexity around them, making them easy to be used by the community. Finally, the different supported deployment strategies, tailored not only for monitoring the entire infrastructure but also for monitoring sites and virtual organizations, are presented and the associated operational benefits highlighted.
        Speaker: Wojciech Lapka (CERN)
      • 1:30 PM
        DPM: Future-proof storage 4h 45m
        The Disk Pool Manager (DPM) is a lightweight solution for grid enabled disk storage management. Operated at more than 240 sites it has the widest distribution of all grid storage solutions in the WLCG infrastructure. It provides an easy way to manage and configure disk pools, and exposes multiple interfaces for data access (rfio, xroot, nfs, gridftp and http/dav) and control (srm). During the last year we have been working on providing stable, high performant data access to our storage system using standard protocols, while extending the storage management functionality and adapting both configuration and deployment procedures to reuse commonly used building blocks. In this contribution we cover in detail the extensive evaluation we have performed of our new HTTP/WebDAV and NFS 4.1 frontends, in terms of functionality and performance. We summarize the issues we faced and the solutions we developed to turn them into valid alternatives to the existing grid protocols - namely the additional work required to provide multi-stream transfers for high performance wide area access, support for third party copies, credential delegation or the required changes in the experiment and fabric management frameworks and tools. We describe new functionality that has been added to ease system administration, such as different filesystem weights and a faster disk drain, and new configuration and monitoring solutions based on the industry standards Puppet and Nagios. Finally, we explain some of the internal changes we had to do in the DPM architecture to better handle the additional load from the analysis use cases.
        Speaker: Ricardo Brito Da Rocha (CERN)
        Poster
      • 1:30 PM
        Dynamic federations: storage aggregation using open tools and protocols 4h 45m
        A number of storage elements now offer standard protocol interfaces like NFS 4.1/pNFS and WebDAV, for access to their data repositories, in line with the standardization effort of the European Middleware Initiative (EMI). Here we report on work which seeks to exploit the federation potential of these protocols and build a system which offers a unique view of the storage ensemble and the possibility of integration of other compatible resources such as those from cloud providers. The challenge, here undertaken by the providers of dCache and DPM, but pragmatically open to other Grid and Cloud storage solutions, is to build such a system while being able to accommodate name translations from existing catalogues (e.g. LFCs), experiment-based metadata catalogues, or stateless algorithmic name translations, also known as “trivial file catalogues”. Such so-called storage federations of standard protocols-based storage elements will give a unique view of their content, thus promoting simplicity in accessing the data they contain and offering new possibilities for resilience and data placement strategies. The goal is to consider HTTP and NFS4.1-based storage elements and make them able to cooperate through an architecture that properly feeds the redirection mechanisms that they are based upon, thus giving the functionalities of a “loosely coupled” storage federation. One of the key requirements is to use standard clients (provided by OS'es or open source distributions, e.g. Web browsers) to access an already aggregated system; this approach is quite different from aggregating the repositories at the client side through some wrapper API, like for instance GFAL, or by developing new custom clients. Other technical challenges that will determine the success of this initiative include performance, latency and scalability, and the ability to create worldwide storage federations that are able to redirect clients to repositories that they can efficiently access, for instance trying to choose the endpoints that are closer or applying other criteria. We believe that the features of a loosely coupled federation of open-protocols-based storage elements will open many possibilities of evolving the current computing models without disrupting them, and, at the same time, will be able to operate with the existing infrastructures, follow their evolution path and add storage centers that can be acquired as a third-party service.
        Speaker: Fabrizio Furano (CERN)
        Poster
      • 1:30 PM
        Dynamic parallel ROOT facility clusters on the Alice Environment 4h 45m
        The ALICE collaboration has developed a production environment (AliEn) that implements several components of the Grid paradigm needed to simulate, reconstruct and analyze data in a distributed way. In addition to the Grid-like analysis, ALICE, as many experiments, provides a local interactive analysis using the Parallel ROOT Facility (PROOF). PROOF is part of the ROOT analysis framework used by ALICE. It enables physicists to analyze and understand much larger datasets on a shorter time scale, allowing analysis of data in parallel on remote computer clusters. The default installation of PROOF is a static shared cluster provided by administrators. However, using a new framework, PoD (Proof on Demand), PROOF can be used in a more user-friendly and convenient way, giving the possibility to dynamically set up a cluster after the user request. Integrating PoD in the AliEn environment, different sets of machines can become workers allowing the system to react to an increasing number of requests for PROOF sessions by starting an higher number of proofd processes. This paper will describe the integration of PoD framework in AliEn in order to provide private dynamic PROOF clusters. This functionality is transparent to the user who will only need to perform a job submission to the AliEn environment.
        Speaker: Cinzia Luzzi (CERN - University of Ferrara)
        Poster
      • 1:30 PM
        E-Center: collaborative platform for the Wide Area network users 4h 45m
        The LHC computing model relies on intensive network data transfers. The E-Center is a social collaborative web based platform for Wide Area network users. It is designed to give user all required tools to isolate, identify and resolve any network performance related problem.
        Speaker: Mr Maxim Grigoriev (Fermilab)
        Poster
      • 1:30 PM
        EGI Security Monitoring integration into the Operations Portal 4h 45m
        The Operations Portal is a central service being used to support operations in the European Grid Infrastructure: a collaboration of National Grid Initiatives (NGIs) and several European International Research Organizations (EIROs). The EGI Operation Portal is providing a single access point to operational information gathered from various sources such as site topology database, monitoring systems, user support helpdesk, grid information system, VO database and VOMS servers etc. Significant development effort has been put in place to implement synoptic view. The single operations platform has been proved invaluable for those who involve EGI operations such as site administrators, NGI representatives, VO managers and NGI operators. In parallel with this work, over the years, the EGI CSIRT (Computer Security Incident Response Team) has been developing security monitoring tools to monitor the infrastructure and to alert resource providers on any identified security problem. Due to the large and increasing number of resources joining the EGI e-Infrastructure it becomes more and more challenging for the EGI CSIRT to follow up all identified security issues. In order to scale up the operation capability a security dashboard has been developed. The security dashboard integrates into the EGI Operations Portal as a module which allows resource providers’ security officers and its NGI operation staff to access the monitoring results, and therefore to handle the issues directly. The dashboard aggregates the data produced by different security monitoring components and provides interfaces to its visualization. Access to the collected data is subject to strict access control so that sensitive information is accessed in a controlled manner. The integration will also allow operational security issue handling workflow to be easily incorporated into existing issue handling procedure, thus significantly reduces overall operational cost. The paper will first briefly introduce current security monitoring framework and its key components : Nagios and Pakiti, followed by the detail design and implementation of the security dashboard. we will also present some early experience gained with regular utilization of the security dashboard and results that have improved security of the whole environment recently.
        Speakers: Cyril L'Orphelin (CNRS/IN2P3), Daniel Kouril (Unknown), Dr Mingchao Ma (STFC - Rutherford Appleton Laboratory)
      • 1:30 PM
        EMI-european Middleware Initiative 4h 45m
        The EMI project intends to receive or rent an exhibition spot nearby the main and visible areas of the event (such as coffee-break areas), to exhibit the projects goals and the latest achievements, such as the EMI1 release. The means used will be posters, video and distribution of flyers, sheets or brochures. It would be useful to have a 2x3 booth with panels available to post on posters, and some basic furniture as table, 2 chairs, a lamp, a wired/wi-fi connection, electrical outlet.
        Speakers: Emidlo Giorgio (Istituto Nazionale Fisica Nucleare (IT)), giuseppina salente (INFN)
      • 1:30 PM
        EMI_datalib - joining the best of ARC and gLite data libraries 4h 45m
        To manage data in the grid, with its jungle of protocols and enormous amount of data in different storage solutions, it is important to have a strong, versatile and reliable data management library. While there are several data management tools and libraries available, they all have different strengths and weaknesses, and it can be hard to decide which tool to use for which purpose. EMI is a collaboration between the European middleware providers aiming to take the best out of each middleware to create one consolidated, all-purpose grid middleware. When EMI started there were two main tools for managing data - gLite had lcg_util and the GFAL library, ARC had the ARC data tools and libarcdata2. While different in design and purpose, they both have the same goal; to manage data in the grid. The design of the new EMI_datalib was ready by the end of 2011, and a first prototype is now implemented and going through a thorough testing phase. This presentation will give the latest results of the consolidated library together with an overview of the design, test plan and roadmap of EMI_datalib.
        Speaker: Jon Kerr Nilsen (University of Oslo (NO))
        Poster
      • 1:30 PM
        Enabling data analysis à la PROOF on the Italian ATLAS-Tier2's using PoD 4h 45m
        In the ATLAS computing model, Tier2 resources are intended for MC productions and end-user analyses activities. These resources are usually exploited via the standard GRID resource management tools, which are de facto a high level interface to the underlying batch systems managing the contributing clusters. While this is working as expected, there are user-cases where a more dynamic usage of the resources may be more appropriate. For example, the design and optimization of an analysis on a large data sample available on the local storage of the Tier2, requires many iterations and fast turn around. In these cases a 'pull' model for work distribution, like the one implemented by PROOF, may be more effective. This contribution describes our experience using PROOF for data analysis on the Italian ATLAS-Tier2: Frascati, Napoli and Roma1. To enable PROOF on the cluster we used PoD, PROOF on Demand. PoD is a set of tools designed to interact with any resource management system (RMS) to start the PROOF daemons. In this way any user can quickly setup its own PROOF cluster on the resources, with the RMS taking care of scheduling, priorities and accounting. Usage of PoD has steadily increased in the last years, and the product has now reached a production level quality. PoD features an abstract interface to RMSs and provides several plugins for the most common RMSs. In our tests we used both the gLite and PBS plug-ins, the latter being the native RMS handling the resources under test. Data were accessed via xrootd, with file discovery provided by the standard ATLAS tools. The SRM is DPM (Disk Pool Manager) which has rfio as standard data access protocol; so we provided DPM of Xrootd protocol too. We will describe the configuration and setup details and the results of some benchmark tests we run on the facility.
        Speakers: Elisabetta Vilucchi (Istituto Nazionale Fisica Nucleare (IT)), Roberto Di Nardo (Istituto Nazionale Fisica Nucleare (IT))
        Poster
      • 1:30 PM
        Engaging with IPv6: addresses for all 4h 45m
        Due to the changes occurring within the IPv4 address space, the utilisation of IPv6 within Grid Technologies and other IT infrastructure is becoming a more pressing solution for IP addressing. The employment and deployment of this addressing scheme has been discussed widely both at the academic and commercial level for several years. The uptake is not as advanced as was predicted and the potential of this technology hasn't been fully utilised. Presently, an investigation into this technology is underway as it may offer solutions to the future of IP addressing for collaborative environments. As part of the HEPIX IPv6 Working Group we investigate the test deployments of IPv6 at the University of Glasgow Tier-2 within Scot Grid and report on both the enablement of Grid services within this framework and also possible configuration solutions for Tier-2 network environments housed within University networks. Drawing upon various test scenarios enabled within the University of Glasgow, areas such as DNS, Monitoring and security mechanisms will also be touched upon.
        Speaker: Mr Mark Mitchell (University of Glasgow)
        Poster
      • 1:30 PM
        Eurogrid: a new glideinWMS based portal for CDF data analysis. 4h 45m
        The CDF experiment at Fermilab ended its Run-II phase on September 2011 after 11 years of operations and 10 fb-1 of collected data. CDF computing model is based on a Central Analysis Farm (CAF) consisting of local computing and storage resources, supported by OSG and LCG resources accessed through dedicated portals. Recently a new portal, Eurogrid, has been developed to effectively exploit computing and disk resources in Europe: a dedicated farm and storage area at the TIER-1 CNAF computing center in Italy, and additional LCG computing resources at different TIER-2 sites in Italy, Spain, Germany and France, are accessed through a common interface. The goal of this project was to develop a portal 1) easy to integrate in the existing CDF computing model, 2) completely transparent to the user and 3) requiring a minimum amount of maintenance support by the CDF collaboration. In this talk we will review the implementation of this new portal, and the performance in the first months of usage. Eurogrid is based on the glideinWMS[1] software, a Glidein Based WMS that works on top of Condor [2]. As CDF CAF is based on Condor, the choice of the glideinWMS software was natural and the implementation seamless. Thanks to the pilot jobs, user needs and site resources are matched in a very efficient way, completely transparent to the users. Official since June 2011, Eurogrid effectively complements and supports CDF computing resources and is the best solution for the future in terms of required manpower for administration, support and development.
        Speaker: Ms Silvia Amerio (University of Padova & INFN)
      • 1:30 PM
        Evaluation of a new data staging framework for the ARC middleware 4h 45m
        Staging data to and from remote storage services on the Grid for users' jobs is a vital component of the ARC computing element. A new data staging framework for the computing element has recently been developed to address issues with the present framework, which has essentially remained unchanged since its original implementation 10 years ago. This new framework consists of an intelligent data transfer scheduler which handles priorities and fair-share, a rapid caching system, and the ability to delegate data transfer over multiple nodes to increase network throughput. This paper uses data from real user jobs running on production ARC sites to present an evaluation of the new framework. It is shown to make more efficient use of the available resources, reduce the overall time to run jobs, and avoid the problems seen with the previous simplistic scheduling system. In addition, its simple design coupled with intelligent logic provides greatly increased flexibility for site administrators, end users and future development.
        Speaker: David Cameron (University of Oslo (NO))
        Poster
      • 1:30 PM
        Evaluation of software based redundancy algorithms for the EOS storage system at CERN 4h 45m
        EOS is a new disk based storage system used in production at CERN since autumn 2011. It is implemented using the plug-in architecture of the XRootD software framework and allows remote file access via XRootD protocol or POSIX-like file access via FUSE mounting. EOS was designed to fulfill specific requirements of disk storage scalability and IO scheduling performance for LHC analysis use cases. This is achieved by following a strategy of decoupling disk and tape storage as individual storage systems. A key point of the EOS design is to provide high availability and redundancy of files via a software implementation which uses disk-only storage systems without hardware RAID arrays. All this is aimed at reducing the overall cost of the system and also simplifying the operational procedures. This paper presents advantages and disadvantages of redundancy by hardware (most classical storage installations) in comparison to redundancy by software. The latter is implemented in the EOS system and achieves its goal by spawning data and parity stripes via remote file access over nodes. The gain in redundancy and reliability comes with a trade-off in the following areas: - Increased complexity of the network connectivity - CPU intensive parity computations during file creation and recovery - Performance loss through remote disk coupling An evaluation and performance figures of several redundancy algorithms are presented for simple file mirroring, dual parity RAID, Reed-Solomon and LDPC codecs. Moreover, the characteristics and applicability of these algorithms are discussed in the context of reliable data storage systems. Finally, a summary of the current state of implementation is given, sharing some experiences on migration and operation of a new multi-PB disk storage system at CERN.
        Speaker: Dr Andreas Peters (CERN)
        Poster
      • 1:30 PM
        Evolution of ATLAS PanDA System 4h 45m
        The PanDA Production and Distributed Analysis System plays a key role in the ATLAS distributed computing infrastructure. PanDA is the ATLAS workload management system for processing all Monte-Carlo simulation and data reprocessing jobs in addition to user and group analysis jobs. The system processes more than 5 million jobs in total per week, and more than 1400 users have submitted analysis jobs in 2011 through PanDA. PanDA has performed well with high reliability and robustness during the two years of LHC data-taking, while being actively evolved to meet the rapidly changing requirements for analysis use cases. We will present an overview of system evolution including PanDA's roles in data flow, automatic rebrokerage and reattempt for analysis jobs, adaptation for the CERNVM File System, support for the 'multi-cloud' model through which Tier 2s act as members of multiple clouds, pledged resource management, monitoring improvements, and so on. We will also describe results from the analysis of two years of PanDA usage statistics, current issues, and plans for the future.
        Speaker: Tadashi Maeno (Brookhaven National Laboratory (US))
        Slides
      • 1:30 PM
        Evolution of the Distributed Computing Model of the CMS experiment at the LHC 4h 45m
        The Computing Model of the CMS experiment was prepared in 2005 and described in detail in the CMS Computing Technical Design Report. With the experience of the first years of LHC data taking and with the evolution of the available technologies, the CMS Collaboration identified areas where improvements were desirable. In this work we describe the most important modifications that have been, or are being implemented in the Distributed Computing Model of CMS. The Worldwide LHC computing Grid (WLCG) Project acknowledged that the whole distributed computing infrastructure is impacted by this kind of changes that are happening in most LHC experiments and decided to create several Technical Evolution Groups (TEG) aiming at assessing the situation and developing a strategy for the future. In this work we describe the CMS view on the TEG activities as well.
        Speaker: Claudio Grandi (INFN - Bologna)
        Poster
      • 1:30 PM
        Evolution of the Virtualized HPC Infrastructure of Novosibirsk Scientific Center 4h 45m
        Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies, and Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG). Since each institute has specific requirements on the architecture of computing farms involved in its research field, currently we’ve got several computing facilities hosted by NSC institutes, each optimized for the particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM&MG), and a Grid Computing Facility of BINP. A dedicated optical network with the initial bandwidth of 10 Gbps connecting these three facilities was built in order to make it possible to share the computing resources among the research communities, thus increasing the efficiency of operating the existing computing facilities and offering a common platform for building the computing infrastructure for future scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technology based on XEN and KVM platforms. Our contribution gives a thorough review of the recent developments, present status and future plans for the NSC virtualized computing infrastructure focusing on its consolidation for the prospected deployment on other remote supercomputer sites and its applications for handling everyday data processing tasks of HEP experiments being carried out at BINP, the KEDR experiment in particular. We also present the results obtained while evaluating performance and scalability of the virtualized infrastructure following multiple hardware upgrades of the computing facilities involved over the last 2 years.
        Speaker: Alexey Anisenkov (Budker Institute of Nuclear Physics (RU))
        Poster
      • 1:30 PM
        Evolving ATLAS computing for today's networks 4h 45m
        The ATLAS computing infrastructure was designed many years ago based on the assumption of rather limited network connectivity between computing centers. ATLAS sites have been organized in a hierarchical model, where only a static subset of all possible network links can be exploited and a static subset of well connected sites (CERN and the T1s) can cover important functional roles such as hosting master copies of the data. The pragmatic adoption of such simplified approach, in respect of a more relaxed scenario interconnecting all sites, was very beneficial during the commissioning of the ATLAS distributed computing system and essential in reducing the operational cost during the first two years of LHC data taking. In the mean time, networks evolved far beyond this initial scenario: while a few countries are still poorly connected with the rest of the WLCG infrastructure, most of the ATLAS computing centers are now efficiently interlinked. Our operational experience in running the computing infrastructure in the last years demonstrated many limitations of the current model: statically defined network paths are sometimes abused, while most of the network links are underutilized together with computing and storage resources at many sites, under the wrong assumption of limited connectivity with the rest of the infrastructure. In this contribution we describe the various steps which ATLAS Distributed Computing went through in order to benefit from the network evolution and move from the current static model to a more relaxed scenario. This will include the development of monitoring and testing tools and the commissioning effort. We will finally describe the gains of the new model in terms of resource utilization at grid sites after many months of experience.
        Speaker: Simone Campana (CERN)
      • 1:30 PM
        Executor framework for DIRAC 4h 45m
        DIRAC framework for distributed computing has been designed as a group of collaborating components, agents and servers, with persistent database back-end. Components communicate with each other using DISET, an in-house protocol that provides Remote Procedure Call (RPC) and file transfer capabilities. This approach has provided DIRAC with a modular and stable design by enforcing stable interfaces across releases. But it made complicated to scale further with commodity hardware. To further scale DIRAC, components needed to send more queries between them. Using RPC to do so requires a lot of processing power just to handle the secure handshake required to stablish the connection. DISET now provides a way to keep stable connections and send and receive queries between components. Only one handshake is required to send and receive any number of queries. Using this new communication mechanism DIRAC now provides a new type of component called executor. Executors process any task (such as resolving the input data of a job) sent to them by a task dispatcher. This task dispatcher takes care of persisting the state of the tasks to the storage backend and distributing them amongst all the executors based on the requirements of each task. In case of a high load, several executors can be started to process the extra load and stop them once the tasks have been processed. This new approach of handling tasks in DIRAC makes executors easy to replace and replicate, thus enabling DIRAC to further scale beyond the current approach based on polling agents.
        Speaker: Adrian Casajus Ramo (University of Barcelona (ES))
      • 1:30 PM
        Experience of BESIII data production with local cluster and distributed computing model 4h 45m
        The BES III detector is a new spectrometer which works on the upgraded high-luminosity collider, the Beijing Electron-Positron Collider (BEPCII). The BES III experiment studies physics in the tau-charm energy region from 2GeV to 4.6GeV . Since spring 2009, BEPCII has produced large scale data samples. All the data samples were processed successfully and many important physics results have been achieved based on these samples. Doing data production correctly and efficiently with limited CPU and storage resources is a big challenge. This paper will describe the implementation of the experiment-specific data production for BESIII in detail, including data calibration with event-level parallel computing model, data reconstruction, inclusive Monte Carlo generation, random trigger background mixing and multi-stream data skimming. Now, with the data sample increasing rapidly, there is a growing demand to move from solely using a local cluster to a more distributed computing model. A distributed computing environment is being set up and expected to go into production use in 2012. The experience of BESIII data production, both with a local cluster and with a distributed computing model, is presented here.
        Speaker: Dr ziyan Deng (Institute of High Energy Physics, Beijing, China)
        Poster
      • 1:30 PM
        Experience of using the Chirp distributed file system in ATLAS 4h 45m
        Chirp is a distributed file system specifically designed for the wide area network, and developed by the University of Notre Dame CCL group. We describe the design features making it particularly suited to the Grid environment, and to ATLAS use cases. The deployment and usage within ATLAS distributed computing are discussed, together with scaling tests and evaluation for the various use cases.
        Speaker: Rodney Walker (Ludwig-Maximilians-Univ. Muenchen (DE))
      • 1:30 PM
        experience with the custom-developed ATLAS trigger monitoring and reprocessing infrastructure 4h 45m
        After about two years of data taking with the ATLAS detector manifold experience with the custom-developed trigger monitoring and reprocessing infrastructure could be collected. The trigger monitoring can be roughly divided into online and offline monitoring. The online monitoring calculates and displays all rates at every level of the trigger and evaluates up to 3000 data quality histograms. The physics analysis relevant data quality information is being checked and recorded automatically. The offline trigger monitoring provides information depending of the physics motivated different trigger streams after a run has finished. Experts are checking the information being guided by the assessment of algorithms checking the current histograms with a reference. The experts are recording their assessment in a so-called data quality defects database which is being used to build a good run list of data good enough for physics analysis. In the first half of 2011 about three percent of all data had an intolerable defect resulting from the ATLAS trigger system. To keep the percentage of data with defects low any changes of trigger algorithms or menus must be tested reliabely. A recent run with a sufficient statistics (in the order of one million events) is being reprocessed to check that the changes do not introduce any unexpected side-effects. The current framework for the reprocessing is a GRID production system custom built for ATLAS requirements called PANDA [1]. The reprocessed datasets are being checked in the same offline trigger monitoring framework that is being used for the offline trigger data quality. It turned out, that the current system works very reliable and all potential problems could be faced. [1] PANDA: T. Maeno [ATLAS Collaboration], PanDA: Distributed production and distributed analysis system for ATLAS, J.Phys.Conf.Ser.119(2008)
        Speaker: Diego Casadei (New York University (US))
        Slides
      • 1:30 PM
        Experiment Dashboard - a generic, scalable solution for monitoring of the LHC computing activities, distributed sites and services 4h 45m
        The Experiment Dashboard system provides common solutions for monitoring job processing, data transfers and site/service usability. Over the last seven years, it proved to play a crucial role in the monitoring of the LHC computing activities, distributed sites and services. It has been one of the key elements during the commissioning of the distributed computing systems of the LHC experiments. The first years of data taking represented a serious test for Experiment Dashboard in terms of functionality, scalability and performance. And given that the usage of the Experiment Dashboard applications has been steadily increasing over time, it can be asserted that all the objectives were fully accomplished.
        Speaker: Pablo Saiz (CERN)
        Slides
      • 1:30 PM
        FermiCloud - A Production Science Cloud for Fermilab 4h 45m
        FermiCloud is an Infrastructure-as-a-Service facility deployed at Fermilab based on OpenNebula that has been in production for more than a year. FermiCloud supports a variety of production services on virtual machines as well as hosting virtual machines that are used as development and integration platforms. This infrastructure has also been used as a testbed for commodity storage evaluations. As part of the development work, an X.509 authentication plugins for OpenNebula were developed and deployed on FermiCloud. These X.509 plugins were contributed back to the OpenNebula project and were made generally available with the release of OpenNebula 3.0 in October 2011. The FermiCloud physical infrastructure has recently been deployed across multiple physical buildings with the eventual goal of being resilient to a single building or network failure. Our current focus is the deployment of a distributed SAN with a shared and mirrored filesystem. We will discuss the techniques being used and the progress to date as well as future plans for the project.
        Speaker: Steven Timm (Fermilab)
        Slides
      • 1:30 PM
        FermiGrid: High Availability Authentication, Authorization, and Job Submission. 4h 45m
        FermiGrid is the facility that provides the Fermilab Campus Grid with unified job submission, authentication, authorization and other ancillary services for the Fermilab scientific computing stakeholders. We have completed a program of work to make these services resilient to high authorization request rates, as well as failures of building or network infrastructure. We will present the techniques used, the response of the system against real world events and the performance metrics that have been achieved.
        Speaker: Steven Timm (Fermilab)
      • 1:30 PM
        Fermilab Multicore and GPU-Accelerated Clusters for Lattice QCD 4h 45m
        As part of the DOE LQCD-ext project, Fermilab designs, deploys, and operates dedicated high performance clusters for parallel lattice QCD (LQCD) computations. Multicore processors benefit LQCD simulations and have contributed to the steady decrease in price/performance for these calculations over the last decade. We currently operate two large conventional clusters, the older with over 6,800 AMD Barcelona cores distributed across 8-core systems interconnected with DDR Infiniband, and the newer with over 13,400 AMD Magny-Cours cores distributed across 32-core systems interconnected with QDR Infiniband. We will describe the design and operations of these clusters, as well as their performance and the benchmarking data that were used to select the hardware and the techniques used to handle their NUMA architecture. We will also discuss the design, operations, and performance of a GPU-accelerated cluster that Fermilab will deploy in late November 2011. This cluster will have 152 nVidia Fermi GPUs distributed across 76 servers coupled with QDR Infiniband. In the last several years GPUs have been used to increase the throughput of some LQCD simulations by over tenfold compared with conventional hardware of the same cost. These LQCD codes have evolved from using single GPUs to using multiple GPUs within a server, and now to multiple GPUs distributed across a cluster. The primary goal of this cluster's design is the optimization of large GPU-count LQCD simulations.
        Speaker: Dr Don Holmgren (Fermilab)
        Poster
      • 1:30 PM
        File and Metadata Management for BESIII Distributed Computing 4h 45m
        The BES III experiment at the Institute of High Energy Physics (IHEP), Beijing, uses the high-luminosity BEPC II e+e- collider to study physics in the τ-charm energy region around 3.7 GeV; BEPC II has produced the world’s largest samples of J/ψ and ψ’ events to date. An order of magnitude increase in the data sample size over the 2011-2012 data-taking period demanded a move from a very centralized to a distributed computing environment, as well as the development of an efficient file and metadata management system. While BES III is on a smaller scale than some other HEP experiments, this poses particular challenges for its distributed computing and data management system. These constraints include limited resources and manpower, and low quality of network connections to IHEP. Drawing on the rich experience of the HEP community, an AMGA-based system has been developed which meets these constraints. The design and development of the BES III distributed data management system, including its integration with other BES III distributed computing components, such as job management, are presented here.
        Speaker: Caitriana Nicholson (Graduate University of the Chinese Academy of Sciences)
        Poster
      • 1:30 PM
        FlyingGrid : from volunteer computing to volunteer cloud 4h 45m
        Desktop grid (DG) is a well known technology aggregating volunteer computing resources donated by individuals to dynamically construct a virtual cluster. A lot of efforts are done these last years to extend and interconnect desktop grids to other distributed computing resources, especially focusing on so called “service grids” middleware such as “gLite”, “ARC” and “Unicore”. In the former “EDGeS” european project (http://edges-grid.eu/), work has been done on standardizing and securing desktop grids to propose, since 2010, a new platform exposing an uniformed view of resources aggregated from DG run by Boinc (http://boinc.berkeley.edu/) or XtremWeb-HEP (http://www.xtremweb-hep.org/), and resources aggregated from EGEE (http://www.eu-egee.org/). Today, the current “EDGI” european project (http://edgi-project.eu/) extends the EDGeS platform by integrating “ARC” and “Unicore” middleware. This project also includes cloud related research topics. In this paper we present our first results on integrating cloud technology into desktop grid. This work has two goals. First goal is to permit to desktop grid users to deploy and use their own virtual machines over a set of volunteer resources aggregated over DG. Second goal is to continue to propose a standardized view to the user who would wish to submit jobs as well as virtual machines
        Speaker: Dr oelg lodygensky (LAL - IN2P3 - CNRS)
        Poster
      • 1:30 PM
        From toolkit to framework - the past and future evolution of PhEDEx 4h 45m
        PhEDEx is the data-movement solution for CMS at the LHC. Created in 2004, it is now one of the longest-lived components of the CMS dataflow/workflow world. As such, it has undergone significant evolution over time, and continues to evolve today, despite being a fully mature system. Originally a toolkit of agents and utilities dedicated to specific tasks, it is becoming a more open framework that can be used in several ways, both within and beyond its original problem domain. In this talk we describe how a combination of refactoring and adoption of new technologies that have become available over the years have made PhEDEx more flexible, maintainable, and scalable. Finally, we describe how we will guide the evolution of PhEDEx into the future.
        Speaker: Dr Tony Wildish (Princeton University)
        Paper
        Poster
      • 1:30 PM
        GFAL 2.0 Evolutions & GFAL-File system introduction 4h 45m
        The Grid File Access Library ( GFAL ) is a library designed for a universal and simple access to grid storage systems. Re-designed and re-written completely, the 2.0 version of GFAL provides a complete abstraction of the complexity and heterogeneity of the grid storage systems ( DPM, LFC, Dcache, Storm, arc, ...) and of the data management protocols ( RFIO, gsidcap, LFN, dcap, SRM, Http/webdav, gridFTP ) by a simpler, faster, more reliable and more consistent POSIX API. GFAL 2.0 is not only an improvement of the GFAL 1.0's reliability, several new functionalities have been developed like the extended attributes management, the runtime configuration setter/getter, a new scalable plugin system, new operations and new protocol (http/webdav) support and the GFAL FUSE module. GFAL 2.0 is delivered with gfalFS ( GFAL 2.0 FUSE module ), a new tool that provides a Virtual File System common to all the grid storage systems ( Dcache, DPM, , WebDAV server ), allowing a user to mount these resources. In this paper I analyse in detail the new functionality and the new possibilities brought by GFAL 2.0 and gfalFS, like the new plugin system for the support of the new protocols , the new error report system, the old issues corrected, the new development-kit provided. A comparison of the performance benefit/loss of the GFAL 2.0/gfalFS vs the other existing tools on the different storage systems is explained. More details are presented as well on the GFAL 2.X future improvements and possibilities.
        Speaker: Adrien Devresse (University of Nancy I (FR))
      • 1:30 PM
        glideinWMS experience with glexec 4h 45m
        Multi-user pilot infrastructures provide significant advantages for the communities using them, but also create new security challenges. With Grid authorization and mapping happening with the pilot credential only, final user identity is not properly addressed in the classic Grid paradigm. In order to solve this problem, OSG and EGI have deployed glexec, a privileged executable on the worker nodes that allows for final user authorization and mapping from inside the pilot itself. The glideinWMS instances deployed on OSG have been now using glexec on OSG sites for several years, and have started using it on EGI resources in the past year. The user experience of using glexec has been mostly positive, although there are still some edge cases where things could be improved. This talk provides both the usage statistics as well as a description of the still remaining problems and the expected solutions.
        Speaker: Mr Igor Sfiligoi (INFN LABORATORI NAZIONALI DI FRASCATI)
        Poster
      • 1:30 PM
        Grid administration: towards an autonomic approach 4h 45m
        Within the DIRAC framework in the LHCb collaboration, we deployed an autonomous policy system acting as a central status information point for grid elements. Experts working as grid administrators have a broad and very deep knowledge about the underlying system which makes them very precious. We have attempted to formalize this knowledge in an autonomous system able to aggregate information, draw conclusions, validate them, and take actions accordingly. The DIRAC Resource Status System is a monitoring and generic policy system that enforces managerial and operational actions automatically. As an example, the status of a grid entity can be evaluated using a number of policies, each making assessments relative to specific monitoring information. Individual results of these policies can be combined to evaluate and propose a global status for the resource. This evaluation goes through a validation step driven by a state machine and an external validation system. Once validated, actions can be triggered accordingly. External monitoring and testing systems such as Nagios or Hammercloud are used by policies for site commission and certification. This shows the flexibility of our system, and of what an autonomous policy system can achieve.
        Speaker: Federico Stagni (CERN)
      • 1:30 PM
        Grid Computing at GSI (ALICE and FAIR) - present and future 4h 45m
        The future FAIR experiments CBM and PANDA have computing requirements that fall in a category that could currently not be satisfied by one single computing centre. One needs a larger, distributed computing infrastructure to cope with the amount of data to be simulated and analysed. Since 2002, GSI operates a Tier2 center for ALICE@CERN. The central component of the GSI computing facility and hence the core of the ALICE Tier2 centre is a LSF/SGE batch farm of 4200+ CPU cores shared by the participating experiments, and accessible both locally and via Grid. In terms of data storage, a 2.5 PB Lustre file system, directly accessible from all worker nodes is maintained, as well as a 400 TB xrootd-based Grid storage element. Based on this existing expertise, and utilising ALICE's middleware 'AliEn', the Grid infrastructure for PANDA and CBM is being built. Besides a Tier0 centre at GSI, the computing Grids of the two FAIR collaborations encompass now more than 17 sites in 11 countries and are constantly expanding. The operation of the distributed FAIR computing infrastructure benefits significantly from the experience gained with the ALICE Tier2 centre. A close collaboration between ALICE Offline and FAIR provides mutual advantages. The employment of a common Grid middleware as well as compatible simulation and analysis software frameworks ensure significant synergy effects. However, there are certain distinctions in usage and deployment between ALICE, CBM and PANDA. Starting from the common attributes, this talk goes on to explore the particularities of the three Grids and the dynamics of knowledge transfer between them.
        Speaker: Dr Kilian Schwarz (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 1:30 PM
        Grid Information Systems Revisited 4h 45m
        The primary goal of a Grid information system is to display the current composition and state of a Grid infrastructure. It's purpose is to provide the information required for workload and data management. As these models evolve, the information system requirements need to be revisited and revised. This paper first documents the results from a recent survey of LHC VOs on the information system requirements. An evaluation of how well these requirements are met by the current system is conducted and directions for future improvements are suggested. It is shown that due to the changing computing models, predominately the adoption of the pilot job paradigm, the main focus for the information system has shifted from scheduling towards service discovery and service monitoring. Six use cases are identified and directions for improved support for these are presented. The paper concludes by suggesting changes to existing system that will provide improved support while maintaining continuity of the service.
        Speaker: Mr Laurence Field (CERN)
      • 1:30 PM
        H1 Monte Carlo Production on the Grid (H1 Collaboration) 4h 45m
        The H1 Collaboration at HERA is now in the era of high precision analyses based on the final and complete data sample. A natural consequence of this is the huge increase in requirement for simulated Monte Carlo (MC) events. As a response to this increase, a framework for large scale MC production using the LCG Grid Infrastructure was developed. After 3 years, the H1 MC Computing Framework has become a high performance, reliable and robust platform operating on the top of gLite infrastructure. The original framework has been expanded into a tool which can handle 600 million simulated MC events per month and 20,000 simultaneously supported jobs on the LHC Grid, decreasing operator effort to the minimum. An annual MC event production rate of over 2.5 billion events has been achieved, and the project is integral to the data analysis performed by H1. Tools have also been developed to allow modifications of H1 detector details, for different levels of MC production steps and for full monitoring of the jobs on the Grid sites. The H1 MC Framework will be described, based on the experience gained during the successful MC simulation for the H1 Experiment, focussing on the solutions which can be implemented for other types of experiments - not only those devoted to HEP. Failure states, deficiencies, bottlenecks and scaling boundaries observed during this full scale physics analysis endeavour are also addressed.
        Speaker: Bogdan Lobodzinski (DESY)
        Poster
      • 1:30 PM
        hBrowse - Generic framework for hierarchical data visualization 4h 45m
        The hBrowse framework is a generic monitoring tool designed to meet the needs of various communities connected to grid computing. It is strongly configurable and easy to adjust and implement accordingly to a specific community needs. It's a html/JavaScript client side application utilizing the latest web technologies to provide presentation layer to any hierarchical data structures. Each part of this software (dynamic tables, user selection etc.) is in fact a separate plugin which can be used separately from the main application. It was especially designed to meet the requirements of Atlas and CMS users as well as to use it as a bulked Ganga monitoring tool.
        Speaker: Lukasz Kokoszkiewicz (CERN)
        Slides
      • 1:30 PM
        Health and performance monitoring of the large and diverse online computing cluster of CMS 4h 45m
        The CMS experiment online cluster consists of 2300 computers and170 switches or routers operating on a 24 hour basis. This huge infrastructure must be monitored in a way that the administrators are proactively warned of any failures or degradation in the system, in order to avoid or minimize downtime of the system which can lead to loss of data taking. The number of metrics monitored per host varies from 20 to 40 and covers basic host checks (disk, network, load) to application specific checks (service running) in addition to hardware monitoring (through IPMI). The sheer number of hosts and checks per host in the system stretches the limits of many monitoring tools and requires careful usage of various configuration optimizations in order to work reliably. The initial monitoring system used in the CMS online cluster was based on Nagios, but suffered from various drawbacks and did not work reliably in the recently expanded cluster. The CMS cluster administrators investigated the different open source tools available and chose to use a fork of Nagios called Icinga, with several plugin modules to enhance its scalability. The Gearman module provides a queuing system for all checks and their results allowing easy load balancing across worker nodes. Supported modules allow the grouping of checks in one single request thereby significantly reducing the network overhead for doing a set of checks on a group of nodes. The PNP4nagios module provides the graphing capability to Icing, which uses files as round robin databases (RRD). Additional software (rrdcached) optimizes access to the RRD files and is vital in order to achieve the required number of operations. Furthermore, to make best use of the monitoring information to notify the appropriate communities of any issues with their systems, much work was put into the grouping of the checks according to, for example, the function of the machine, the services running, the sub-detectors they belong to, and the criticality of the computer. An automated system to generate the configuration of the monitoring system has been produced to facilitate its evolution and maintenance. The use of these performance enhancing modules and the work on grouping the checks has yielded impressive performance improvements over the pervious Nagios infrastructure allowing for the monitoring of X metrics per second (compared to Y on the previous system). Furthermore the design allows the easy growth of the infrastructure without the need to rethink the monitoring system as a whole.
        Speaker: Olivier Raginel (Massachusetts Inst. of Technology (US))
        Poster
      • 1:30 PM
        Hunting for hardware changes in data centers. 4h 45m
        With many servers and server parts the environment of warehouse sized data centers is increasingly complex. Server life-cycle management and hardware failures are responsible for frequent changes that need to be managed. To manage these changes better a project codenamed "hardware hound" focusing on hardware failure trending and hardware inventory has been started at CERN. By creating and using a hardware oriented data set - the inventory - with detailed information on servers and their parts, firmware levels, and other server related data, e.g. rack location, benchmarked processing performance and power consumption, warranty coverage, purchase order, deployment state (production, maintenance), etc; as well as tracking changes to this inventory, the project aims at, for example, being able to discover trends in hardware failure rates, e.g. lower mean time to failure of a given component in a given batch of servers. This contribution will describe the architecture of the project, the inventory data, and real life use cases.
        Speaker: Miguel Coelho Dos Santos (CERN)
        Slides
      • 1:30 PM
        Identifying gaps in Grid middleware on fast networks with the Advanced Network Initiative 4h 45m
        By the end of 2011, a number of US Department of Energy (DOE) National Laboratories will have access to a 100 Gb/s wide-area network backbone. The ESnet Advanced Networking Initiative (ANI) project is intended to develop a prototype network, based on emerging 100 Gb/s ethernet technology. The ANI network will support DOE’s science research programs. A 100 Gb/s network testbed is a key component of the ANI project. The test bed offers the opportunity for early evaluation of 100Gb/s network infrastructure for supporting the high impact data movement typical of science collaborations and experiments. In order to make effective use of this advanced infrastructure, the applications and middleware currently used by the distributed computing systems of large-scale science need to be adapted and tested within the new environment, with gaps in functionality identified and corrected. As a user of the ANI testbed, Fermilab aims to study the issues related to end-to-end integration and use of 100 Gb/s networks for the event simulation and analysis applications of physics experiments. In this paper we discuss our findings evaluating in the high-speed environment existing HEP Physics middleware and application components, including GridFTP, Globus Online, etc. These will include possible recommendations to the system administrators, application and middleware developers on changes that would make production use of the 100 Gb/s networks, including data storage, caching and wide area access.
        Speaker: Dr Gabriele Garzoglio (FERMI NATIONAL ACCELERATOR LABORATORY)
        Poster
      • 1:30 PM
        IFIC-Valencia Analysis Facility 4h 45m
        The ATLAS Tier3 at IFIC-Valencia is attached to a Tier2 that has 50% of the Spanish Federated Tier2 resources. In its design, the Tier3 includes a GRID-aware part that shares some of the features of Valencia's Tier2 such as using Lustre as a file system. ATLAS users, 70% of IFIC's users, also have the possibility of analysing data with a PROOF farm and storing them locally. In this contribution we discuss the design of the analysis facility as well as the monitoring tools we use to control and improve its performance. We also comment on how the recent changes in the ATLAS computing GRID model affect IFIC. Finally, how this complex system can coexist with the other science applications running at IFIC (non-ATLAS users) is presented.
        Speaker: Mr Miguel Villaplana Perez (Universidad de Valencia (ES))
        Slides
      • 1:30 PM
        Improving ATLAS grid site reliability with functional tests using HammerCloud 4h 45m
        With the exponential growth of LHC (Large Hadron Collider) data in 2011, and more to come in 2012, distributed computing has become the established way to analyse collider data. The ATLAS grid infrastructure includes more than 80 sites worldwide, ranging from large national computing centers to smaller university clusters. These facilities are used for data reconstruction and simulation, which are centrally managed by the ATLAS production system, and for distributed user analysis. To ensure the smooth operation of such a complex system, regular tests of all sites are necessary to validate the site capability of successfully executing user and production jobs. We report on the development, optimization and results of an automated functional testing suite using the HammerCloud framework. Functional tests are short light- weight applications covering typical user analysis and production schemes, which are periodically submitted to all ATLAS grid sites. Results from those tests are collected and used to evaluate site performances. Sites that fail or are unable to run the tests are automatically excluded from the PanDA brokerage system, therefore avoiding user or production jobs to be sent to problematic sites. We show that stricter exclusion policies help to increase the grid reliability, and the percentage of user and production jobs aborted due to network or storage failures can be sensibly reduced using such a system.
        Speaker: Federica Legger (Ludwig-Maximilians-Univ. Muenchen)
      • 1:30 PM
        Increasing performance in KVM virtualization within a Tier-1 environment 4h 45m
        This work shows the optimizations we have been investigating and implementing at the KVM virtualization layer in the INFN Tier-1 at CNAF, based on more than a year of experience in running thousands of virtual machines in a production environment used by several international collaborations. These optimizations increase the adaptability of virtualization solutions to demanding applications like those run in our institute (High-Energy Physics). We will show performance differences among different filesystems (like ext3 vs ext4 vs xfs) and caching options, when used as KVM host local storage. We will provide guidelines for solid state disks (SSD) adoption, for deployment of SR-IOV enabled hardware, for providing PCI-passthrough network cards to virtual machines and what is the best solution to distribute and instantiate read-only virtual machine images. This work has been driven by the project called Worker Nodes on Demand Service (WNoDeS), a framework designed to offer local, grid or cloud-based access to computing and storage resources, preserving maximum compatibility with existing computing center policies and work-flows.
        Speaker: Mr Andrea Chierici (INFN-CNAF)
        Poster
      • 1:30 PM
        INFN Tier1 test bed facility. 4h 45m
        The INFN Tier1 at CNAF is the first level Italian High Energy Physics computing center that shares resources to the scientific community using the grid infrastructure. The Tier1 is composed of a very complex infrastructure divided into different parts: the hardware layer, the storage services, the computing resources (i.e. worker nodes adopted for analysis and other activities) and finally the interconnection layer used for data transfers between different Tiers over the grid. Any update of the different parts of this infrastructure, in particular a software update or a change in the services software code, as the activity of adding new hardware, should be carefully tested and debugged before switching to production. For this reason a test bed facility has beed gradually built in order to reproduce the behaviour of the different layers of the Tier1 in a smaller but meaningful scale. Using this test bed system it is possible to perform extensive testing of both the software and hardware layers and certify them before the use at the Tier1.
        Speaker: Mr Pier Paolo Ricci (INFN CNAF)
        Slides
      • 1:30 PM
        Integrated cluster management at the Manchester Tier-2 4h 45m
        We describe our experience of operating a large Tier-2 site since 2005 and how we have developed an integrated management system using third-party, open source components. This system tracks individual assets and records their attributes such as MAC and IP addresses; derives DNS and DHCP configurations from this database; creates each host's installation and re-configuration scripts; monitors the services on each host according to the records of what should be running; and cross references tickets with asset records and per-asset monitoring pages. In addition, scripts which detect problems and automatically remove hosts record these new states in the database which are available to operators immediately through the same interface as tickets and monitoring.
        Speaker: Andrew Mcnab (University of Manchester)
      • 1:30 PM
        Integrating PROOF Analysis in Cloud and Batch Clusters 4h 45m
        High Energy Physics (HEP) analysis are becoming more complex and demanding due to the large amount of data collected by the current experiments. The Parallel ROOT Facility (PROOF) provides researchers with an interactive tool to speed up the analysis of huge volumes of data by exploiting parallel processing on both multicore machines and computing clusters. The typical PROOF deployment scenario is a permanent set of cores configured to run the PROOF daemons. However, this approach is incapable of adapting to the dynamic nature of interactive usage. Several initiatives seek to improve the use of computing resources by integrating PROOF with a batch system, such as PoD or PROOF Cluster. These solutions are currently in production at Universidad de Oviedo and IFCA and are positively evaluated by users. Although they are able to adapt to the computing needs of users, they must comply with the specific configuration, OS and software installed at the batch nodes. Furthermore, they share the machines with other workloads, which may cause disruptions in the interactive service for users. These limitations make PROOF a typical use-case for cloud computing. In this work we take profit from Cloud Infrastructure at IFCA in order to provide a dynamic PROOF environment where users can control the software configuration of the machines. The Proof Analysis Framework (PAF) facilitates the development of new analysis and offers a transparent access to PROOF resources. Several performance measurements are presented for the different scenarios (PoD, SGE and Cloud), showing a speed improvement closely correlated with the number of cores used.
        Speaker: Dr Ana Y. Rodríguez-Marrero (Instituto de Física de Cantabria (UC-CSIC))
        Poster
      • 1:30 PM
        Integration of Globus Online with the ATLAS PanDA Workload Management System 4h 45m
        The PanDA Workload Management System is the basis for distributed production and analysis for the ATLAS experiment at the LHC. In this role, it relies on sophisticated dynamic data movement facilities developed in ATLAS. In certain scenarios, such as small research teams in ATLAS Tier-3 sites and non-ATLAS Virtual Organizations supported by the Open Science Grid consortium (OSG), the overhead of installation and operation of this component makes its use not cost effective. Globus Online is an emerging new tool from the Globus Alliance, which already proved popular within the OSG community. It provides the users with fast and robust file transfer capabilities that can also be managed from a Web interface, and in addition to grid sites, can have individual workstations and laptops serving as data transmission endpoints. We will describe the integration of the Globus Online functionality into the PanDA suite of software, in order to give more flexibility in choosing the method of data transfer to ATLAS Tier-3 and OSG users.
        Speaker: Maxim Potekhin (Brookhaven National Laboratory (US))
      • 1:30 PM
        Integration of WS-PGRADE/gUSE portal and DIRAC 4h 45m
        The gUSE (Grid User Support Environment) framework allows to create, store and distribute application workflows. This workflow architecture includes a wide variety of payload execution operations, such as loops, conditional execution of jobs and combination of output. These complex multi-job workflows can easily be created and modified by application developers through the WS-PGRADE portal. The portal also allows end users to download and use existing workflows, as well as executing them. The DIRAC framework for distributed computing, a complete Grid solution for a community of users needing access to distributed computing resources, has been integrated into the WS-PGRADE/gUSE system. This integration allows the execution of gUSE workflows in a distributed computing environment, thus greatly expanding the capability of the portal to several Grids and Cloud Computing facilities. The main features and possibilities of the WS-PGRADE/gUSE-DIRAC system, as well as the benefits for users, will be outlined and discussed.
        Speaker: Albert Puig Navarro (University of Barcelona (ES))
      • 1:30 PM
        Investigation of Storage Systems for use in Grid Applications 4h 45m
        In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on "bare metal" nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.
        Speaker: Gabriele Garzoglio (Fermi National Accelerator Laboratory)
        Poster
      • 1:30 PM
        IPv6 testing and deployment at Prague Tier 2 4h 45m
        Computing Centre of the Institute of Physics in Prague provides computing and storage resources for various HEP experiments (D0, Atlas, Alice, Auger) and currently operates more than 300 worker nodes with more than 2500 cores and provides more than 2PB of disk space. Our site is limited to one C-sized block of IPv4 addresses, and hence we had to move most of our worker nodes behind the NAT. However this solution demands more difficult routing setup. We see the IPv6 deployment as a solution that provides less routing, more switching and therefore promises higher network throughput. The administrators of the Computing Centre strive to configure and install all provided services automatically. For installation tasks we use PXE and kickstart, for network configuration we use DHCP and for software configuration we use CFengine. Many hardware boxes are configured via specific web pages or telnet/ssh protocol provided by the box itself. All our services are monitored with several tools e.g. Nagios, Munin, Ganglia. We rely heavily on the SNMP protocol for hardware health monitoring. All these installation, configuration and monitoring tools must be tested before we can switch completely to IPv6 network stack. In this contribution we present the tests we have made, limitations we have faced and configuration decisions that we have made during IPv6 testing. We also present testbed built on virtual machines that was used for all the testing and evaluation.
        Speaker: Tomas Kouba (Acad. of Sciences of the Czech Rep. (CZ))
        Poster
      • 1:30 PM
        JavaFIRE: A Replica and File System for Grids 4h 45m
        The work is focused on the creation and validation tests of a replica and transfers system for Computational Grids inspired on the needs of the High Energy Physics (HEP). Due to the high volume of data created by the HEP experiments, an efficient file and dataset replica system may play an important role on the computing model. Data replica systems allow the creation of copies, distributed between the different storage elements on the Grid. In the HEP context, the data files are basically immutable. This eases the task of the replica system, because given sufficient local storage resources any given dataset only needs to be replicated to a particular site once. Concurrent with the advent of computational Grids, another important theme in the distributed systems area that has also seen some significant interest is that of peer-to-peer networks (p2p). P2p networks are an important and evolving mechanism that facilitates the use of distributed computing and storage resources by end users. One common technique to achieve faster file downloads from possibly overloaded storage elements over congested networks is to split the files into smaller pieces. This way, each piece can be transferred from a different replica, in parallel or not, optimizing the moments in that the network conditions are better suited to the transfer. The main tasks achieved by the system are: the creation of replicas, the development of a system for replicas transfer (RFT) and for replicas location (RLS) with a different architecture that the one provided by Globus and the development of a system for file transfer in pieces on computational grids with interfaces for several storage elements. The RLS uses a p2p overlay based on the Kademlia algorithm.
        Speaker: Stephen Gowdy (CERN)
        Poster
      • 1:30 PM
        Key developments of the Ganga task-management framework. 4h 45m
        Ganga is an easy-to-use frontend for the definition and management of analysis jobs, providing a uniform interface across multiple distributed computing systems. It is the main end-user distributed analysis tool for the ATLAS and LHCb experiments and provides the foundation layer for the HammerCloud sytem, used by the LHC experiments for validation and stress testing of their numerous distributed computing facilities. This poster will illustrate recent developments aimed at improving both the efficiency with which computing resources are utilised, and the end-user experience. Notable highlights include a new web-based monitoring interface (WebGUI) that allows users to conveniently view the status of their submitted Ganga jobs and browse the local job repository. Improvements to the core Ganga package will also be outlined. Specifically we will highlight the development of procedures for automatic handling and resubmission of failed jobs, alongside a mechanism that stores an analysis application such that it can be repeated (optionally using different input data) at any point in the future. We will demonstrate how tools that were initially developed for a specific user community have been migrated into the Ganga core, and so can be exploited by a wider user-base. Similarly, examples will be given where Ganga components have been adapted for use by communities in their custom analysis packages.
        Speaker: Michael John Kenyon (CERN)
      • 1:30 PM
        Ksplice: Update without rebooting 4h 45m
        Today, every OS in the world requires regular reboots in order to be up to date and secure. Since reboots cause downtime and disruption, sysadmins are forced to choose between security and convenience. Until Ksplice. Ksplice is new technology that can patch a kernel while the system is running, with no disruption whatsoever. We use this technology to provide Ksplice Uptrack, a service that delivers important security and bugfix updates to your systems. (It's free for Ubuntu Desktop and Fedora, and is also a free feature of Oracle Linux Premier support.) In this talk, we'll very briefly provide an overview of how Ksplice can dramatically reduce the pain associated with large-scale installation maintenance, which is why it is used at 700+ customer sites, on 100,000+ customer systems, including at Brookhaven National Lab. More importantly, we'll provide a detailed look into how the Ksplice technology works and how the Ksplice Uptrack service works, at a technical level primarily targeted at system administrators and developers, but largely accessible to the average Linux user as well.
        Speaker: Waseem Daher (Oracle)
      • 1:30 PM
        LHCbDIRAC: distributed computing in LHCb 4h 45m
        We present LHCbDIRAC, an extension of the DIRAC community Grid solution to handle the LHCb specificities. The DIRAC software has been developed for many years within LHCb only. Nowadays it is a generic software, used by many scientific communities worldwide. Each community wanting to take advantage of DIRAC has to develop an extension, containing all the necessary code for handling their specific cases. LHCbDIRAC is an actively developed extension, implementing the LHCb computing model and workflows. LHCbDIRAC extends DIRAC to handle all the distributed computing activities of LHCb. Such activities include real data processing (reconstruction, stripping and streaming), Monte-Carlo simulation and data replication. Other activities are groups and user analysis, data management, resources management and monitoring, data provenance, accounting for user and production jobs. LHCbDIRAC also provides extensions of the DIRAC interfaces, including a secure web client, python APIs and CLIs. While DIRAC and LHCbDIRAC follow indpendent release cycles, every LHCbDIRAC is built on top of an existing DIRAC release. Before putting in production a new release candidate, a number of certification tests are run in a separate setup. This contribution highlights the versatility of the system, also presenting the experience with real data processing, data and resources management, monitoring for activities and resources.
        Speaker: Federico Stagni (CERN)
        Poster
      • 1:30 PM
        Long-term preservation of analysis software environment 4h 45m
        Long-term preservation of scientific data represents a challenge to all experiments. Even after an experiment has reached its end of life, it may be necessary to reprocess the data. There are two aspects of long-term data preservation: "data" and "software". While data can be preserved by migration, it is more complicated for the software. Preserving source code and binaries is not enough; the full software and hardware environment is needed. Virtual machines (VMs) may offer a solution by "freezing" a virtual hardware platform "in software", where the legacy software can run in the original environment. A complete infrastructure package is developed for easy deployment and management of such VMs. It is based on a dedicated distribution of Linux, CERNVM. Updated versions will be made available for new software, while older versions will still be available for legacy analysis software. Further, a HTTP-based file system, CVMFS, is used for the distribution of the software. Since multiple versions of both software and VMs are available, it is possible to process data with any software version, and a matching VM version. OpenNebula is used to deploy the VMs. Traditionally, there are many tools for managing clouds from a VM point-of-view. However, for experiments, it can be more useful to have a tool which is mainly centred around the data, but also allows for management of VMs. Therefore, a point-and-click web user interface is being developed that can (a) keep track of the processing status of all data; (b) select data to be processed and which type of processing, also selecting the version of software and matching VM; and (c) the configuration of the processing nodes, e.g. memory and number of nodes. It is preferable that the interface has an experiment-dependent module which will allow for easy adoption to various experiments. The complete package is designed to be easy to replicate on any processing site, and to scale well. Besides data preservation, this paradigm also allows for distributed cloud-computing on private and public clouds through the EC2 interface, for both legacy and contemporary experiments, e.g. NA61 and the LHC experiments.
        Speakers: Artem Harutyunyan (CERN), Dag Larsen (University of Bergen (NO))
      • 1:30 PM
        Lxcloud: A Prototype for an Internal Cloud in HEP. Experiences and Lessons Learned 4h 45m
        In 2008 CERN launched a project aiming at virtualising the batch farm. It strictly distinguishes between infrastructure and guests, and is thus able to serve, along with its initial batch farm target, as an IaaS infrastructure, which can be exposed to users. The system was put into production at small scale at Christmas 2010, and has since grown to almost 500 virtual machine slots in spring 2011. It was opened to test case users deploying CERNVM images on it, which opened new possibilities for delivering IT resources to users in a cloud-like way. This presentation gives an overview over the project, its evolution and growth, as well as the different real-life use cases. Operational experiences and issues will reported as well.
        Speaker: Dr Ulrich Schwickerath (CERN)
      • 1:30 PM
        Major changes to the LHCb Grid computing model in year 2 of LHC data 4h 45m
        The increase of luminosity in the LHC during its second year of operation (2011) was achieved by delivering more protons per bunch and increasing the number of bunches. This change of running conditions required some changes in the LHCb Computing Model. The consequences of the higher pileup are a bigger event size and processing time but also the possibility for LHCb to propose and get approved a new physics program, implying an increase in the trigger rate by 50%. These changes led to shortages in the offline distributed data processing resources such an increased need of cpu capacity by a factor 2 for reconstruction, higher storage needs at T1 sites by 70 % and subsequently problems with data throughput for file access from the storage elements. To accommodate these changes the online running conditions and the Computing Model for offline data processing had to be adapted accordingly. This talk will describe in detail the changes implemented for the offline data processing on the Grid, relaxing the Monarc model in a first step and going beyond it subsequently. It will further describe other operational issues discovered and solved during 2011, present the performance of the system and conclude by lessons learned to further improve the data processing reliability and quality for the 2012 run. If available, first results on the computing performance from 2012 run will be presented.
        Speaker: Dr Stefan Roiser (CERN)
        Poster
      • 1:30 PM
        Making Connections - Networking the distributed computing system with LHCONE for CMS 4h 45m
        The LHCONE project aims to provide effective entry points into a network infrastructure that is intended to be private to the LHC Tiers. This infrastructure is not intended to replace the LHCOPN, which connects the highest tiers, but rather to complement it, addressing the connection needs of the LHC Tier-2 and Tier-3 sites which have become more important in the new less-hierarchical computing models. LHCONE is intended to grow as a robust and scalable solution for a global system serving such needs, thus reducing the load on GPN infrastructures in different nations. The CMS experiment pioneered the commissioning of data transfer links between Tier-2 sites in 2010. During 2011, and in the context of preparing for LHCONE to go into production, the CMS Computing project has launched an activity to measure in detail the performance, quality and latency of large-scale data transfers among CMS Tier-2 sites. The outcome of this activity will be presented and its impact on the design and commissioning new transfer infrastructures will be discussed.
        Speaker: Dr Daniele Bonacorsi (Universita e INFN (IT))
      • 1:30 PM
        Managing a site with Puppet 4h 45m
        Installation and post-installation mechanisms are critical points for the computing centres to streamline production services. Managing hundreds of nodes is a challenge for any computing centre and there are many tools able to cope with this problem. The desired features includes the ability to do incremental configuration (no need to bootstrap the service to make it manageable by the tool), simplicity in the description language for the configurations and in the system itself, ease of extension of the properties/capabilities of the system, a rich community for assistance and development, and open-source software. A possible choice to steer post-installations and dynamic post-configurations is Puppet. Puppet is a central point where profiles can be defined, those can easily be propagated around the cluster hence fulfilling the necessities of post-install configurations after the raw Operating System installation. Puppet also ensures the enforcement of the profile and the defined services once has been completely installed. We found in puppet a correct trade-off among simplicity and flexibility, and it was the most fitting to our requirements. Puppet approach to system management is simplistic, non-intrusive and incremental; puppet do not try to control every aspect of the configuration but only the ones you are interested in. Allows to manage a whole site from a central service, easing a lot potential reconfiguration or speeding up disaster recovery procedures.
        Speaker: Dr Xavier Espinal Curull (Universitat Autònoma de Barcelona (ES))
      • 1:30 PM
        Managing Virtual Machine Lifecycle in CernVM Project 4h 45m
        The creation and maintenance of a Virtual Machine (VM) is a complex process. To build the VM image, thousands of software packages have to be collected, disk images suitable for different hypervisors have to be built, integrity tests must be performed, and eventually the resulting images have to become available for download. In the meanwhile, software updates for the older versions must be published, obsolete images must be revoked, and the clouds that use them must be updated. Initially, in the CernVM project we used several commercial solutions to drive this process. In addition to the cost, the drawback of such an approach was lack of a common and coherent framework that would allow for full control of every step in the process and easy adaptation to new technologies (hypervisors, clouds, APIs). In an attempt to provide a complete lifecycle management solution for virtual machines, we collected a set of open-source tools, adapted them to our needs and combined them with our existing development tools in order to create an extensible framework that could serve as end to end solution for VM lifecycle management. This new framework is based on the Archipel Open Source Project and shares some of its main principles, namely, every component of the system is a stand-alone agent; the front-end is a stand-alone application; all of them communicate over the same messaging network based on the Extensible Message and Presence Protocol (XMPP). Each component of the framework can thus interact with each other in order to perform automated tasks and all of them can be managed from a single User Interface. The agents that manage the Hypervisor infrastructure, as well as the agents that deploy and monitor the Virtual Machines and the web-based user interface, are provided by the Archipel Project. In CernVM, we developed iBuilder, a tool to instrument VM images for almost all popular hypervisors. We integrated Tapper, an open-source tool that tests the resulted images, and we developed all the appropriate agents to control the software repositories and the previously mentioned tools. All these agents now allow us to continuously build and test development images. To complete the system, we plan to develop agents that will be capable of deploying contextualized CernVM images in various scenarios such as clouds that support the EC2 API Interface, or private/academic clouds using the native tools. Finally, a new lightweight front-end is under development aiming to provide access and complete control of the framework from the portable devices (smartphones and tablets). In this contribution we will present the details of this system, it's current status and future plans.
        Speaker: Ioannis Charalampidis (Aristotle Univ. of Thessaloniki (GR))
        Poster
      • 1:30 PM
        Many-core experience with HEP software at CERN openlab 4h 45m
        The continued progression of Moore’s law has led to many-core platforms becoming easily accessible commodity equipment. New opportunities that arose from this change have also brought new challenges: harnessing the raw potential of computation of such a platform is not always a straightforward task. This paper describes practical experience coming out of the work with many-core systems at CERN openlab and the observed differences with respect to their predecessors. We provide the latest results for a set of parallelized HEP benchmarks running on several classes of many-core platforms.
        Speaker: Andrzej Nowak (CERN openlab)
        Poster
        Slides
      • 1:30 PM
        MARDI-Gross - Data Management Design for Large Experiments 4h 45m
        MARDI-Gross builds on previous work with the LIGO collaboration, using the ATLAS experiment as a use case to develop a tool-kit on data management for people making proposals for large High Energy Physics experiments, as well a experiments such as LIGO and LOFAR, and also for those assessing such proposals. The toolkit will also be of interest to those in the active data management for new and current experiments.
        Speaker: Prof. Roger Jones (Lancaster University (GB))
      • 1:30 PM
        Model of shared ATLAS Tier2 and Tier3 facilities in EGI/gLite Grid flavour 4h 45m
        The ATLAS computing and data models have moved/are moving away from the strict MONARC model (hierarchy) to a mesh model. Evolution of computing models also requires evolution of network infrastructure to enable any Tier2 and Tier3 to easily connect to any Tier1 or Tier2. In this way some changing of the data model are required: a) Any site can replicate data from any other site. b) Dynamic data caching. Analysis sites receive datasets from any other site “on demand” based on usage pattern, and possibly using a dynamic placement of datasets by centrally managed replication of whole datasets. Unused data is removed. c) Remote data access. Local jobs could access data stored at remote sites using local caching on a file or sub-file level. In this contribution, the model of shared ATLAS Tier2 and Tier3 facilities in the EGI/gLite flavour is explained. The Tier3s in the US and the Tier3s in Europe are rather different because in Europe we have facilities which are Tier2s with a Tier3 component (Tier3 with a co-located Tier2). Data taking in ATLAS has been going on for more than one year. The Tier2 and Tier3 facility setup, how do we get the data, how do we enable at the same time grid and local data access, how Tier2 and Tier3 activities affect the cluster differently and process of hundreds of million of events, will be presented. Finally, an example of how a real physics analysis is working at these sites will be shown, and this is a good occasion to see if we have developed all the Grid tools necessary for the ATLAS Distributed Computing community, and in case we do not, to try to fix it, in order to be ready for the foreseen increase in ATLAS activity in the next years.
        Speaker: Dr Santiago Gonzalez De La Hoz (IFIC-Valencia)
        Poster
      • 1:30 PM
        Monitoring ARC services with GangliARC 4h 45m
        Monitoring of Grid services is essential to provide a smooth experience for users and provide fast and easy to understand diagnostics for administrators running the services. GangliARC makes use of the widely-used Ganglia monitoring tool to present web-based graphical metrics of the ARC computing element. These include statistics of running and finished jobs, data transfer metrics, as well as showing the availability of the computing element and hardware information such as free disk space left in the ARC cache. Ganglia presents metrics as graphs of the value of the metric over time and shows an easily-digestable summary of how the system is performing, and enables quick and easy diagnosis of common problems. This paper describes how GangliARC works and shows numerous examples of how the generated data can quickly be used by an administrator to investigate problems. It also presents possibilities of combining GangliARC with other commonly-used monitoring tools such as Nagios to easily integrate ARC monitoring into the regular monitoring infrastructure of any site or computing centre.
        Speaker: David Cameron (University of Oslo (NO))
        Poster
      • 1:30 PM
        Monitoring of computing resource utilization of the ATLAS experiment 4h 45m
        Due to the good performance of the LHC accelerator, the ATLAS experiment has seen higher than anticipated levels for both the event rate and the average number of interactions per bunch crossing. In order to respond to these changing requirements, the current and future usage of CPU, memory and disk resources has to be monitored, understood and acted upon. This requires data collection at a fairly fine level of granularity: the performance of each object written and each algorithm run, as well as a dozen per-job variables, are gathered for the different processing steps of Monte Carlo generation and simulation and the reconstruction of both data and Monte Carlo. We present a system to collect and visualize the data from both the online Tier-0 system and distributed grid production jobs. Around 40 GB of performance data are expected from up to 200k jobs per day, thus making performance optimization of the underlying Oracle database of utmost importance.
        Speaker: Ilija Vukotic (Universite de Paris-Sud 11 (FR))
      • 1:30 PM
        Monitoring techniques and alarm procedures for CMS services and sites in WLCG 4h 45m
        The CMS offline computing system is composed of more than 50 sites and a number of central services to distribute, process and analyze data worldwide. A high level of stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS; the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operating worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator on duty at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first 2 years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the pro-active alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.
        Speaker: Jorge Amando Molina-Perez (Univ. of California San Diego (US))
        Slides
      • 1:30 PM
        MPI support in the DIRAC Pilot Job Workload Management System 4h 45m
        Parallel job execution in the grid environment using MPI technology presents a number of challenges for the sites providing this support. Multiple flavors of the MPI libraries, shared working directories required by certain applications, special settings for the batch systems make the MPI support difficult for the site managers. On the other hand the workload management systems with pilot jobs became ubiquitous although the support for the MPI applications in the pilot frameworks was not available. This support was recently added in the DIRAC Project in the context of the GISELA Latin American Grid. Special services for dynamic allocation of virtual computer pools on the grid sites were developed in order to deploy MPI rings corresponding to the requirements of the jobs in the central task queue of the DIRAC Workload Management systems. The required MPI software is installed automatically by the pilot agents using user space file system techniques. The same technique is used to emulate shared working directories for the parallel MPI processes. This makes it possible to execute MPI jobs even on the sites not supporting them officially. Reusing so constructed MPI rings for execution of a series of parallel jobs increases dramatically their efficiency and turnaround. In this contribution we will describe the design and implementation of the DIRAC MPI Service as well as its support for various types of the MPI libraries. Advantages of coupling the MPI support with the pilot frameworks will be outlined and examples of usage with real applications will be presented.
        Speaker: Ms Vanessa Hamar (CPPM-IN2P3-CNRS)
        Poster
      • 1:30 PM
        Mucura: your personal file repository in the cloud 4h 45m
        By aggregating the storage capacity of hundreds of sites around the world, distributed data-processing platforms such as the LHC computing grid offer solutions for transporting, storing and processing massive amounts of experimental data, addressing the requirements of virtual organizations as a whole. However, from our perspective, individual workflows require a higher level of flexibility, ease of use and extensibility, which are not yet fully satisfied by the deployed storage systems. In this contribution we report on our experience building Mucura, a prototype of a software system for building cloud-based file repositories of extensible capacity. Intended for individual scientists, the system allows you to store, retrieve, organize and share your remote files from your personal computer, by using both command line and graphical user interfaces. Designed with usability, scalability and operability in mind, it exposes web-based standard APIs for storing and retrieving files and is compatible with the authentication mechanisms used by the existing grid computing platforms. At the core of the system there are components for managing file metadata and for secure storage of the files’ contents, both implemented on top of highly available, distributed, persistent, scalable key-value stores. A front-end component is responsible for user authentication and authorization and for handling requests from clients performing operations on the stored files. We will present the selected open-source implementations for each component of the system and the integration work we have performed. In particular, we will present the rationale and findings of our exploration of key-value data stores as the central component of the system, as opposed to the usage of traditional networked file systems. We will also describe the pros and cons of our choices from the perspectives of both the end-user and the operator of the service. Finally, we will report on the feedback received from the early users and from the operators of the service. This work is inspired not only by the increasing number of commercial services available nowadays to individuals for their personal storage needs (backup, file sharing, synchronization, …) such as Amazon S3, Dropbox, SugarSync, bitcasa, etc., but also by several efforts in the same area in the academic and research worlds (NASA, SDSC, etc.). We are persuaded that the level of flexibility offered to individuals by this kind of systems adds value to the day-to-day work of scientists.
        Speaker: Mr Fabio Hernandez (IN2P3/CNRS Computing Centre & IHEP Computing Centre)
        Poster
      • 1:30 PM
        New data visualization of the LHC Era Monitoring (Lemon) system 4h 45m
        In the last few years, new requirements have been received for visualization of monitoring data: advanced graphics, flexibility in configuration and decoupling of the presentation layer from the monitoring repository. Lemonweb is the data visualization component of the LHC Era Monitoring (Lemon) system. Lemonweb consists of two sub-components: a data collector and a web visualization interface. The data collector is a daemon, implemented in Python, responsible for data gathering from the central monitoring repository and storing into time series data structures. Data are stored on disk in Round Robin Database (RRD) files: one file per monitored entity, with all the available monitoring data. Entities may be grouped into a hierarchical structure, called “clusters” and supporting mathematical operations over entities and clusters (e.g. cluster A + cluster B /clusters C – entity XY). Using the configuration information, a cluster definition is evaluated in the collector engine and, at runtime, a sequence of data selects is built, to optimize access to the central monitoring repository. An overview of the design and architecture as well as highlights of some implemented features will be presented. The CERN Computer Centre instance, visualizing ~17k entries, will be described, with an example of the advanced cluster configuration and integration with the CLUMAN (a job management and visualization system) visualization module.
        Speaker: Ivan Fedorko (CERN)
        Poster
      • 1:30 PM
        New developments in the CREAM Computing Element 4h 45m
        The EU-funded project EMI, now at its second year, aims at providing a unified, standardized, easy to install software for distributed computing infrastructures. CREAM is one of the middleware product part of the EMI middleware distribution: it implements a Grid job management service which allows the submission, management and monitoring of computational jobs to local resource management systems. In this paper we discuss about some new features being implemented in the CREAM Computing Element. The implementation of the EMI Execution Service (EMI-ES) specification (an agreement in the EMI consortium on interfaces and protocols to be used in order to enable computational job submission and management required across technologies) is one of the new functionality being implemented. New developments are also focusing in the High Availability (HA) area, to improve performance, scalability, availability and fault tolerance.
        Speaker: Mr Massimo Sgaravatto (Universita e INFN (IT))
      • 1:30 PM
        New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: the experiments experience 4h 45m
        Since several years the LHC experiments rely on the WLCG Service Availability Monitoring framework (SAM) to run functional tests on their distributed computing systems. The SAM tests have become an essential tool to measure the reliability of the Grid infrastructure and to ensure reliable computing operations, both for the sites and the experiments. Recently the old SAM framework was replaced with a completely new system based on Nagios and ActiveMQ to better support the transition to EGI and to its more distributed infrastructure support model and to implement several scalability and functionality enhancements. This required all LHC experiments and the WLCG support teams to migrate their tests, to acquire expertise on the new system, to validate the new availability and reliability computations and to adopt new visualisation tools. In this contribution we describe in detail the current state of the art of functional testing in WLCG: how the experiments use the new SAM/Nagios framework, the advanced functionality made available by the new framework and the future developments that are foreseen, with a strong focus on the improvements in terms of stability and flexibility brought by the new system.
        Speakers: Alessandro Di Girolamo (CERN), Dr Andrea Sciaba (CERN)
        Poster
      • 1:30 PM
        No file left behind - monitoring transfer latencies in PhEDEx 4h 45m
        The CMS experiment has to move Petabytes of data among dozens of computing centres with low latency in order to make efficient use of its resources. Transfer operations are well established to achieve the desired level of throughput, but operators lack a system to identify early on transfers that will need manual intervention to reach completion. File transfer latencies are sensitive to the underlying problems in the transfer infrastructure, and their measurement can be used as prompt trigger for preventive actions. For this reason, PhEDEx, the CMS transfer management system, has recently implemented a monitoring system to measure the transfer latencies at the level of individual files. For the first time now, the system can predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies early, and correct the issues while the transfer is still in progress. Statistics are aggregated for blocks of files, recording a historical log to monitor the long-term evolution of transfer latencies, which are used as cumulative metrics to evaluate the performance of the transfer infrastructure, and to plan the global data placement strategy. In this contribution, we present the typical patterns of transfer latencies that have been identified in the operational experience acquired with the latency monitor. We show how we are able to detect the sources of latency arising from the underlying infrastructure (such as stuck files) which need operator intervention, and we identify the areas in PhEDEx where a development effort can reduce the latency. The improvement in transfer completion times achieved since the implementation of the latency monitoring in 2011 is demonstrated.
        Speaker: Natalia Ratnikova (KIT - Karlsruhe Institute of Technology (DE))
        Poster
      • 1:30 PM
        NUMA memory hierarchies experience with multithreaded HEP software at CERN openlab 4h 45m
        Newer generations of processors come with no increase in their clock frequency, and the same is true for memory chips. In order to achieve more performance, the core count is getting higher, and to feed all the cores on a chip with instructions and data, the number of memory channels must follow the same trend. Non Uniform Memory Access (NUMA) architecture allowed the CPU manufacturers to reduce nicely the impact of memory subsystem bottlenecks, but, in turn, this solution introduces a cost at the application level. This paper describes our practical experience with the typical CPU servers currently available to the HEP community, based on work with NUMA systems at CERN openlab. We provide the latest measurements of the different NUMA implementations from AMD and Intel, as well as NUMA consequences on some parallelized HEP codes.
        Speaker: Julien Leduc
      • 1:30 PM
        Optimising the read-write performance of mass storage systems through the introduction of a fast write cache 4h 45m
        Reading and writing data onto a disk based high capacity storage system has long been a troublesome task. While disks handle sequential reads and writes well, when they are interleaved performance drops off rapidly due to the time required to move the disk's read-write head(s) to a different position. An obvious solution to this problem is to replace the disks with an alternative storage technology such as solid-state devices which have no such mechanical limitations, however in most applications this is prohibitively expensive. This problem is commonly seen at computer facilities where new data need to be stored while old data are being processed from the same storage, such as WLCG grid sites. In the WLCG case this problem is only going to become more prominent as the LHC luminosity increases, creating larger data-sets. In this paper we explore the possibility of introducing a fast write cache in-front of the storage system to buffer inbound data. This cache allows writes to be coalesced into larger, more efficient blocks before being committed to the primary storage, while also allowing this action to be postponed until the primary storage is sufficiently quiescent. We demonstrate that this is a viable solution to the problem using a real WLCG site as an example of deployment. Finally we also discuss the steps required to tune the surrounding infrastructure, such as the computer network and storage meta-data server in order to sustain high write rates to the cache and allow for the data to be flushed to the bulk storage successfully.
        Speakers: Simon William Fayer (Imperial College Sci., Tech. & Med. (GB)), Stuart Wakefield (Imperial College Sci., Tech. & Med. (GB))
      • 1:30 PM
        Optimization of HEP Analysis activities using a Tier2 Infrastructure 4h 45m
        While the model for a Tier2 is well understood and implemented within the HEP Community, a refined design for Analysis specific sites has not been agreed upon as clearly. We aim to describe the solutions adopted at the INFN Pisa, the biggest Tier2 in the Italian HEP Community. A Standard Tier2 infrastructure is optimized for GRID CPU and Storage access, while a more interactive oriented use of the resources is beneficial to the final data analysis step. In this step, POSIX file storage access is easier for the average physicist, and has to be provided in a real or emulated way. Modern analysis techniques use advanced statistical tools (like RooFit and RooStat), which can make use of multi core systems. The infrastructure has to provide or create on demand computing nodes with many cores available, above the existing and less elastic Tier2 flat CPU infrastructure. At last, the users do not want to have to deal with data placement policies at the various sites, and hence a transparent WAN file access, again with a POSIX layer, must be provided, making use of the just-installed 10 GBit/s regional lines. Even if standalone systems with such features are possible and exist, the implementation of an Analysis site as a virtual layer over an existing Tier2 requires novel solutions; the ones used in Pisa are described here.
        Speaker: Dr Giuseppe Bagliesi (INFN Sezione di Pisa)
        Poster
      • 1:30 PM
        Optimizing Resource Utilization in Grid Batch Systems 4h 45m
        DESY is one of the largest WLCG Tier-2 centres for ATLAS, CMS and LHCb world-wide and the home of a number of global VOs. At the DESY-HH Grid site more than 20 VOs are supported by one common Grid infrastructure to allow for the opportunistic usage of federated resources. The VOs share roughly 4800 job slots in 800 physical CPUs of 400 hosts operated by a TORQUE/MAUI batch system. On Tier-2 sites, the utilization of computing, storage, and network requirements of the Grid jobs differ widely. For instance Monte Carlo production jobs are almost purely CPU bound, whereas physics analysis jobs demand high data rates. In order to optimize the utilization of resources, jobs must be distributed intelligently over the slots, CPUs, and hosts. Although the jobs resource requirements cannot be deduced directly, jobs are mapped to POSIX user/group ID based on their VOMS-proxy. The user/group ID allows to distinguish jobs, assuming VOs make use of the VOMS group and role mechanism. This was implemented in the job scheduler (MAUI) configuration. In the contribution to CHEP 2012 we will sketch our set-up, describe our configuration, and present experiences based on monitoring information.
        Speaker: Andreas Gellrich (DESY)
        Paper
        Poster
      • 1:30 PM
        PEAC - A set of tools to quickly enable PROOF on a cluster 4h 45m
        With advent of the analysis phase of LHC data-processing, interest in PROOF technology has considerably increased. While setting up a simple PROOF cluster for basic usage is reasonably straightforward, exploiting the several new functionalities added in recent times may be complicated. PEAC, standing for PROOF Enabled Analysis Cluster, is a set of tools aiming to facilitate the setup and management of a PROOF cluster. PEAC is based on the experience made by setting up PROOF for the ALICE analysis facilities. PEAC allows to easily build and configure ROOT and the additional software needed on the cluster and features its own distribution system based on xrootd and Proof on Demand (PoD). The latter is also used for resource management (start, stop or daemons). Finally, PEAC sets-up and configures dataset management (using the afdsmgrd daemon), as well as cluster monitoring (machine status and PROOF query summaries) using MonAlisa. In this respect, a Monalisa page has been dedicated to PEAC users, so that a cluster managed by PEAC can be automatically monitored. In this talk we present and describe the main components of PEAC and show details of the existing facilities using it.
        Speaker: Gerardo GANIS (CERN)
      • 1:30 PM
        Performance of Standards-based transfers in WLCG SEs 4h 45m
        While, historically, Grid Storage Elements have relied on semi-proprietary protocols for data transfer (gridftp for site-to-site, and (rfio/dcap/other) for local transfers) ), the rest of the world has not stood still in providing its own solutions to data access. dCache, DPM and StoRM all now support access via the widely implemented HTTP/WebDAV standard, and dCache and DPM both support NFS4.1/pNFS, which is partly implemented in newer releases of the linux kernel. We present results of comparing the performance of these new protocols against the older, more parochial protocols, both on DPM and StoRM systems. The next-iteration ATLAS data movement tool, Rucio, is used for some of these tests, which include examination of interoperability between the DPM and StoRM sites, as well as internal transfer performance within each site.
        Speaker: Sam Skipsey (University of Glasgow / GridPP)
        Slides
      • 1:30 PM
        Performance studies and improvements of CMS Distributed Data Transfers 4h 45m
        CMS computing needs reliable, stable and fast connections among multi-tiered computing infrastructures. CMS experiment relies on File Transfer Services (FTS) for data distribution, a low level data movement service responsible for moving sets of files from one site to another, while allowing participating sites to control the network resource usage. FTS servers are provided by Tier-0 and Tier-1 centers and used by all the computing sites in CMS, subject to established CMS and sites setup policies, including all the virtual organizations making use of the Grid resources at the site, and properly dimensioned to satisfy all the requirements for them. Managing the service efficiently needs good knowledge of the CMS needs for all kind of transfer routes, and the sharing and interference with other Virtual Organizations using the same FTS transfer managers. This contribution deals with a complete revision of all FTS servers used by CMS, customizing the topologies and improving their setup in order to keep CMS transferring data to the desired levels in a reliable and robust way, as well as complete performance studies for all kind of transfer routes, including overheads measurements introduced by SRM servers and storage systems, FTS server misconfigurations and identification of congested channels, historical transfer throughputs per stream for site-to-site data transfer comparisons, file-latency studies, among others... This information is retrieved directly from the FTS servers through the FTS Monitor webpages and conveniently archived for further analysis. The project provides a monitoring interface for all these values. Measurements, problems and improvements in CMS sites connected to LHCOPN are shown, where differences up to x100 are visible, constant performance measurements of data flowing from Tier-0 to Tier-1s, comparison to other existing monitoring tools (PerfSonar, LHCOPN dashboard), as well as the usage of the graphical interface to understand, among others, the effects for sites when connecting to LHCONE network. Given the multi-VO added value of this tool, this work is serving as a reference for building up the WLCG FTS monitoring tool, which will be based on the FTS messaging system.
        Speaker: José Flix
        Poster
      • 1:30 PM
        Performance Tests of CMSSW on the CernVM 4h 45m
        The CERN Virtual Machine (CernVM) Software Appliance is a project developed in CERN with the goal of allowing the execution of the experiment's software on different operating systems in an easy way for the users. To achieve this it makes use of Virtual Machine images consisting of a JEOS (Just Enough Operational System) Linux image, bundled with CVMFS, a distributed file system for software. This image can this be run with a proper virtualizer on most of the platforms available. It also aggressively caches data on the local user's machine so that it can operate disconnected from the network. CMS wanted to compare the performance of the CMS Software running in the virtualized environment with the same software running on a native Linux box. To answer this need a series of tests were made on a controlled environment during 2010-2011. This work presents the results of those tests.
        Speaker: Stephen Gowdy (CERN)
        Poster
      • 1:30 PM
        PLUME – FEATHER 4h 45m
        on behalf of the PLUME Technical Committee <http://projet-plume.org>" for the PLUME abstract. PLUME - FEATHER is a non-profit project created to Promote economicaL, Useful and Maintained softwarE For the Higher Education And THE Research communities. The site references software, mainly Free/Libre Open Source Software (FLOSS) from French universities and national research organisations, (CNRS, INRA...), laboratories or departments. Plume means feather in French. The main goals of PLUME – FEATHER are: • promote the community's own developments, • contribute to the developement and sharing FLOSS (Free/Libre Open Source Software) information, experiences and expertise in the community, • bring together FLOSS experts and knowledgeable people to create a community, • foster and facilitate FLOSS use, deployment and contribution in the higher education and the research communities. PLUME - FEATHER was initiated by the CNRS unit UREC. The UREC unit has been integrated to the CNRS computing division DSI in 2011. The different resources are provided by the main partners involved in the project. The French PLUME server contains more than 1000 software reference cards, edited and peer- reviewed by members of the research and education community. It is online since November 2007, and the first English pages have been published in April 2009. Currently there are 84 software products referenced in the PLUME-FEATHER area. Therefore time has come to announce the availability and potential of the PLUME project on the international level to find not only users, but also contributors: editors and reviewers of frequently used software in our domain.
        Speaker: Dr Dirk Hoffmann (CPPM, Aix-Marseille Université, CNRS/IN2P3, Marseille, France)
        Poster
      • 1:30 PM
        Popularity framework for monitoring user workload 4h 45m
        This paper describes a user monitoring framework for very large data management systems that maintain high numbers of data movement transactions. The proposed framework prescribes a method for generating meaningful information from collected tracing data that allows the data management system to be queried on demand for specific user usage patterns in respect to source and destination locations, period intervals, and other searchable parameters. The feasibility of such a system at the petabyte scale is demonstrated by describing the implementation and operational experience of an enterprise information system employing the proposed framework that uses data movement traces collected by the ATLAS data management system for operations occurring on the Worldwide LHC Computing Grid (WLCG). Our observations suggest that the proposed user monitoring framework is capable of scaling to meet the needs of very large data management systems.
        Speaker: Vincent Garonne (CERN)
      • 1:30 PM
        Preparing for long-term data preservation and access in CMS 4h 45m
        The data collected by the LHC experiments are unique and present an opportunity and a challenge for a long-term preservation and re-use. The CMS experiment is defining a policy for the data preservation and access to its data and is starting the implementation of the policy. This note describes the driving principles of the policy and summarises the actions and activities which are planned for its implementation.
        Speaker: Kati Lassila-Perini (Helsinki Institute of Physics (FI))
      • 1:30 PM
        Present and future of Identity Management in Open Science Grid 4h 45m
        Identity management infrastructure has been a key work area for the Open Science Grid (OSG) security team for the past year. The progress of web-based authentication protocols such as openID, SAML, and scientific federations such as InCommon, prompted OSG to evaluate its current identity management infrastructure and propose ways to incorporate new protocols and methods. For the couple of years we have been working on documenting and then improving the user experience. Our identity roadmap has evolved. In one next step we are working closely with the ESNET DOE Grids CA group on the future for the main US x509 CA. We are now starting a pilot project using a commercial CA, DigiCert CA, which is currently undergoing IGTF accreditation for user and host certificates. We then plan to investigate multiple back end services from a new OSG front-end service to enable integration and support of the new technologies and mechanisms needed by our users. We are participating in the cross-agency MAGIC forum to look at a high level at some of these futures. In this talk, we will present our ideas and activities and speculate on the future.
        Speaker: Mine Altunay (Fermi National Accelerator Laboratory)
      • 1:30 PM
        Proof of concept - CMS Computing Model into volunteer computing 4h 45m
        The motivation of this work is about the ongoing efforts to integrate the CMS Computing Model with a project of volunteer computing under development at CERN, the LHC@home, thus allowing the CMS Analysis jobs and Monte Carlo production activities to be executed on this paradigm that has a growing user base. The LCH@home project allows the use of the CernVM (a virtual machine technology developed at CERN that enables complex simulation code to run easily on the diverse platforms) in an autonomous way on the volunteered machines. To do this it uses on the client side the BOINC (an open-source software platform for computing using volunteered resources) and on the server side the Co-Pilot, a framework developed by the CernVM team that allows to instantiate a distributed computing infrastructure on top of virtualized computing resources. We developed the plugin which can be adapted in the CRAB submission system to use the Co-Pilot system to the CernVMs running into BOINC, and did a proof of concept of the performance of volunteer computing into CMS Computing Model. A possible spin-off is to make it easier for CMS (and many other experiments), a submission system interface with the Co-Pilot system based on Globus, that would mimic a regular Grid Computing Element, but instead, would schedule jobs to the BOINC/CernVM Cloud.
        Speaker: Marko Petek (Universidade do Estado do Rio de Janeiro (BR))
      • 1:30 PM
        Prototype of a cloud-based Computing Service for ATLAS at PIC Tier1 4h 45m
        We present the prototype deployment of a private cloud at PIC and the tests performed in the context of providing a computing service for ATLAS. The prototype is based on the OpenNebula open source cloud computing solution. The possibility of using CernVM virtual machines as the standard for ATLAS cloud computing is evaluated by deploying a Panda pilot agent as part of the VM contextualization. Different mechanisms to do this are compared (EC2, OCCI, cvm-tools) on the basis of their suitability for implementing a VM-pilot factory service. Different possibilities to access the Tier1 Storage Service (based in dCache) from the VMs are also tested: dcap, NFS4.1, http. As a conclusion, the viability of private clouds to be considered as a candidate implementations for Tier1 computing services in the future is discussed.
        Speaker: Alexey SEDOV (Universitat Autònoma de Barcelona (ES))
      • 1:30 PM
        Providing WLCG Global Transfer monitoring 4h 45m
        The WLCG Transfer Dashboard is a monitoring system which aims to provide a global view of the WLCG data transfers and to reduce redundancy of monitoring tasks performed by the LHC experiments. The system is designed to work transparently across LHC experiments and across various technologies used for data transfer. Currently every LHC experiment monitors data transfers via experiment-specific systems but the overall cross-experiment picture is missing. Even for data transfers handled by FTS, which is used by 3 LHC experiments, monitoring tasks such as aggregation of FTS transfer statistics or estimation of transfer latencies are performed by every experiment separately. These tasks could be performed once, centrally, and then served to all experiments via a well-defined set of APIs. In the design and development of the new system, experience accumulated by the LHC experiments in the data management monitoring area is taken into account and a considerable part of the code of the ATLAS DDM Dashboard is being re-used. The presentation will describe the architecture of the Global Transfer monitoring system, the implementation of its components and the first prototype.
        Speaker: Julia Andreeva (CERN)
        Slides
      • 1:30 PM
        Rebootless Linux Kernel Patching with Ksplice Uptrack at BNL 4h 45m
        Ksplice/Oracle Uptrack is a software tool and update subscription service which allows system administrators to apply security and bug fix patches to the Linux kernel running on servers/workstations without rebooting them. The RHIC/ATLAS Computing Facility at Brookhaven National Laboratory (BNL) has deployed Uptrack on nearly 2000 hosts running Scientific Linux and Red Hat Enterprise Linux. The use of this software has minimized downtime, and increased our security posture. In this presentation, we provide an overview of Ksplice's rebootless kernel patch creation/insertion mechanism, and our experiences with Uptrack.
        Speaker: Christopher Hollowell (Brookhaven National Laboratory)
        Poster
      • 1:30 PM
        Recent Improvements in the ATLAS PanDA Pilot 4h 45m
        The Production and Distributed Analysis system (PanDA) in the ATLAS experiment uses pilots to execute submitted jobs on the worker nodes. The pilots are designed to deal with different runtime conditions and failure scenarios, and support many storage systems. This talk will give a brief overview of the PanDA pilot system and will present major features and recent improvements including CERNVM File System integration, file transfers with Globus Online, the job retry mechanism, advanced job monitoring including JEM technology, and validation of new pilot code using the HammerCloud stress-­‐testing system. PanDA is used for all ATLAS distributed production and is the primary system for distributed analysis. It is currently used at over 100 sites world-­‐wide. We analyze the performance of the pilot system in processing LHC data on the OSG, LCG and Nordugrid infrastructures used by ATLAS, and describe plans for its further evolution.
        Speaker: Paul Nilsson (University of Texas at Arlington (US))
        Poster
      • 1:30 PM
        Refurbishing the CERN fabric management system 4h 45m
        The CERN Computer Centre is reviewing strategies for optimizing the use of the existing infrastructure in the future. There have been significant developments in the area of computer centre and configuration management tools over the last few years. CERN is examining how these modern, widely-used tools can improve the way in which we manage the centre, with a view to reducing the overall operational effort, increasing agility and automating as much as possible. This presentation will focus on the current status of deployment of the new configuration toolset, the reasons why specific tools were chosen, and give an outlook on how we plan to manage the change to the new system.
        Speaker: Gavin Mccance (CERN)
        Poster
        Slides
      • 1:30 PM
        Scalability and performance improvements in Fermilab Mass Storage System. 4h 45m
        By 2009 the Fermilab Mass Storage System had encountered several challenges: 1. The required amount of data stored and accessed in both tiers of the system (dCache and Enstore)had significantly increased. 2. The number of clients accessing Mass Storage System had also increased from tens to hundreds of nodes and from hundreds to thousands of parallel requests. To address these challenges Enstore and the SRM part of dCache were modified to scale for performance, access rates, and capacity. This work increased the amount of simultaneously processed requests in a single Enstore Library instance from about 1000 to 30000. The rates of incoming request to Enstore increased from tens to hundreds per second. Fermilab is invested in LTO4 tape technology and we have investigated both LTO5 and Oracle T10000C to cope with the increasing needs in capacity. We have decided to adopt T10000C, mainly due to its large capacity, which allows us to scale up the existing robotic storage space by a factor 6. This paper describes the modifications and investigations that allowed us to meet these scalability and performance challenges and provided some perspectives of Fermilab Mass Storage System.
        Speaker: Alexander Moibenko (Fermilab)
      • 1:30 PM
        Scaling the AFS service at CERN 4h 45m
        Serving more than 3 billion accesses per day, the CERN AFS cell is one of the most active installations in the world. Limited by overall cost, the ever increasing demand for more space and higher I/O rates drive an architectural change from small high-end disks organised in fibre-channel fabrics towards external SAS based storage units with large commodity drives. The presentation will summarise the challenges to scale the AFS service at CERN, discuss the approach taken, and highlight some of the applied techniques, such as SSD block level caching or transparent AFS server failover.
        Speaker: Arne Wiebalck (CERN)
        Poster
      • 1:30 PM
        Scientific Cluster Deployment & Recovery: Using puppet to simplify cluster management 4h 45m
        Deployment, maintenance and recovery of a scientific cluster, which has complex, specialized services, can be a time consuming task requiring the assistance of Linux system administrators, network engineers as well as domain experts. Universities and small institutions that have a part-time FTE with limited knowledge of the administration of such clusters can be strained by such maintenance tasks. This current work is the result of an effort to maintain a data analysis cluster with minimal effort by a local system administrator. The realized benefit is the scientist, who is the local system administrator, is able to focus on the data analysis instead of the intricacies of managing a cluster. Our work provides a cluster deployment and recovery process based on the puppet configuration engine allowing a part-time FTE to easily deploy and recover entire clusters with minimal effort. Puppet is a configuration management system (CMS) used widely in computing centers for the automatic management of resources. Domain experts use Puppet's declarative language to define reusable modules for service configuration and deployment. Our deployment process has three actors: domain experts, a cluster designer and a cluster manager. The domain experts first write the puppet modules for the cluster services. A cluster designer would then define a cluster. This includes the creation of cluster roles, mapping the services to those roles and determining the relationships between the services. Finally, a cluster manager would acquire the resources (machines, networking), enter the cluster input parameters (hostnames, IP addresses) and automatically generate deployment scripts used by puppet to configure it to act as a designated role. In the event of a machine failure, the originally generated deployment scripts along with puppet can be used to easily reconfigure a new machine. The cluster definition produced in our cluster deployment process is an integral part of automating cluster deployment in a cloud environment. Our future cloud efforts will further build on this work.
        Speaker: Valerie Hendrix (Lawrence Berkeley National Lab. (US))
        Slides
      • 1:30 PM
        Secure Wide Area Network Access to CMS Analysis Data Using the Lustre Filesystem 4h 45m
        This paper reports the design and implementation of a secure, wide area network, distributed filesystem by the ExTENCI project, based on the Lustre filesystem. The system is used for remote access to analysis data from the CMS experiment at the Large Hadron Collider, and from the Lattice Quantum ChromoDynamics (LQCD) project. Security is provided by Kerberos authentication and authorization with additional fine grained control based on Lustre ACLs (Access Control List) and quotas. We investigate the impact of using various Kerberos security flavors on the I/O rates of CMS applications on client nodes reading and writing data to the Lustre filesystem, and on LQCD benchmarks. The clients can be real or virtual nodes. We are investigating additional options for user authentication based on user certificates. We compare the Lustre performance to those obtained with other distributed storage technologies.
        Speaker: Dr Dimitri Bourilkov (University of Florida (US))
        Slides
      • 1:30 PM
        Service Availability Monitoring framework based on commodity software 4h 45m
        The Worldwide LHC Computing Grid (WLCG) infrastructure continuously operates thousands of grid services scattered around hundreds of sites. Participating sites are organized in regions and support several virtual organizations, thus creating a very complex and heterogeneous environment. The Service Availability Monitoring (SAM) framework is responsible for the monitoring of this infrastructure. SAM is a complete monitoring framework for grid services and grid operational tools. Its current implementation tailored for a decentralized operation replaces the old SAM system which is now being decommissioned from production. SAM provides functionality for submission of monitoring probes, gathering of probes results, processing of monitoring data, and retrieval of monitoring data in terms of service status, availability, and reliability. In this paper we present the SAM framework. We motivate the need from moving from the old SAM to a new monitoring infrastructure deployed and managed in a distributed environment and explain how SAM exploits and builds on top of commodity software, such as Nagios and Apache ActiveMQ, to provide a reliable and scalable system. We also present the SAM architecture by highlighting the adopted technologies and how the different SAM components deliver a complete monitoring framework.
        Speaker: Mr Pedro Manuel Rodrigues De Sousa Andrade (CERN)
      • 1:30 PM
        Service monitoring in the LHC experiments 4h 45m
        The LHC experiments' computing infrastructure is hosted in a distributed way across different computing centers in the Worldwide LHC Computing Grid and needs to run with high reliability. It is therefore crucial to offer a unified view to shifters, who generally are not experts in the services, and give them the ability to follow the status of resources and the health of critical systems in order to alert the experts whenever a system becomes unavailable. Several experiments have chosen to build their service monitoring on top of the flexible Service Level Status (SLS) framework developed by CERN IT. Based on examples from ATLAS, CMS and LHCb, this contribution will describe the complete development process of a service monitoring instance and explain the deployment models that can be adopted. We will also describe the software package used in ATLAS Distributed Computing to send health reports through the MSG messaging system and publish them to SLS on a lightweight web server.
        Speakers: Alessandro Di Girolamo (CERN), Fernando Harald Barreiro Megino (CERN IT ES)
      • 1:30 PM
        Software installation and condition data distribution via CernVM FileSystem in ATLAS 4h 45m
        The ATLAS Collaboration is managing one of the largest collections of software among the High Energy Physics Experiments. Traditionally this software has been distributed via rpm or pacman packages, and has been installed in every site and user's machine, using more space than needed since the releases could not always share common binaries. As soon as the software has grown in size and number of releases this approach showed its limits, both in terms of manageability, used disk space and performance. The adopted solution is based on the CernVM FileSystem, a fuse-based http, read-only filesystem which guarantees file de-duplication, scalability and performance. Here we describe the ATLAS experience in setting up the CVMFS facility and putting it into production, for different type of use-cases, ranging from single users' machines up to large Data Centers, for both Software and Conditions Data. The performance of CernVMFS, both with software and condition data access, will be shown, comparing with other filesystems currently in use by the Collaboration.
        Speaker: Alessandro De Salvo (Universita e INFN, Roma I (IT))
        Slides
      • 1:30 PM
        SSD Scalability Performance for HEP data analysis using PROOF 4h 45m
        Nowadays the storage systems are evolving not only in size but also in terms of used technologies. SSD disks are currently introduced in storage facilities for HEP experiments and their performance is tested in comparison with standard magnetic disks. The tests are performed by running a real CMS data analysis for a typical use case and exploiting the features provided by PROOF-Lite, that allows to distribute a huge number of events to be processed among different CPU cores in order to reduce the overall time needed to complete the analysis task. These tests are carried on comparing performances over a few computational devices typically hosted at a current Tier2/Tier3 facility. The performance results are provided by focusing on scalability issues in terms of speed up factor and processing event rate, and can be assumed as guidelines for both the typical HEP analyst and the T2/T3 manager. For the former in the configuration of his own analysis task while dealing with increasing data sizes, for the latter in the implementation of interactive data analysis facility for HEP experiments while facing solutions that concern both technological and economical aspects.
        Speaker: Dr Giacinto Donvito (INFN-Bari)
        Poster
      • 1:30 PM
        Status and evolution of CASTOR (Cern Advanced STORage) 4h 45m
        This is an update on CASTOR (CERN Advanced Storage) describing the recent evolution and related experience in production during the latest high-intensity LHC runs. In order to handle the increasing data rates (10GB/s average for 2011), several major improvements have been introduced. We describe in particular the new scheduling system that has replaced the original CASTOR one. It removed the limitations ATLAS and CMS were hitting in terms of file openings rates (from 20 Hz to 200+ Hz) while simplifying the code and operations at the same time. We detail how the usage of the internal database has been optimized to improve efficiency by a factor 3 and cut opening file latency by orders of magnitude (from O(1s) to O(1ms)). Finally, we will report on the evolution of the CASTOR monitoring and give the roadmap for the future.
        Speaker: Sebastien Ponce (CERN)
      • 1:30 PM
        Status of the DIRAC Project 4h 45m
        The DIRAC Project was initiated to provide a data processing system for the LHCb Experiment at CERN. It provides all the necessary functionality and performance to satisfy the current and projected future requirements of the LHCb Computing Model. A considerable restructuring of the DIRAC software was undertaken in order to turn it into a general purpose framework for building distributed computing systems that can be used by various user communities in High Energy Physics and other application domains. The ILC Collaboration started to use DIRAC for their data production system. The Belle Collaboration at KEK, Japan, has adopted the Computing Model based on the DIRAC system for its second phase starting in 2015. The CTA Collaboration uses DIRAC for the data analysis tasks. A large number of other experiments are starting to use DIRAC or are evaluating this solution for their data processing tasks. DIRAC services are included as part of the production infrastructure of the GISELA Latin America grid. Similar services are provided for the users of the French segment of the EGI Grid. The new communities using DIRAC started to provide important contributions to its functionality. Among recent additions can be mentioned the support of the Amazon EC2 computing resources; a versatile File Replica Catalog with the File Metadata capabilities; support for running MPI jobs in the pilot based Workload Management System. Integration with existing application Web Portals, like WS-PGRADE, is demonstrated. In this paper we will describe the current status of the DIRAC Project, recent developments of its framework and functionality as well as the status of the rapidly evolving community of the DIRAC users.
        Speaker: Dr Andrei Tsaregorodtsev (Universite d'Aix - Marseille II (FR))
        Poster
      • 1:30 PM
        Storage Element performance optimization for CMS analysis jobs 4h 45m
        Tier-2 computing sites in the Worldwide Large Hadron Collider Computing Grid (WLCG) host CPU-resources (Compute Element, CE) and storage resources (Storage Element, SE). The vast amount of data that needs to processed from the Large Hadron Collider (LHC) experiments requires good and efficient use of the available resources. Having a good CPU efficiency for the end users analysis jobs requires that the performance of the storage system is able to scale with I/O requests from hundreds or even thousands of simultaneous jobs. In this presentation we report on the work on improving the SE performance at the Helsinki Institute of Physics (HIP) Tier-2 used for the Compact Muon Experiment (CMS) at