CHEP 2009

Europe/Prague
Prague

Prague

Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
Jan Gruntorad (CESNET), Milos Lokajicek (Institute of Physics)
Description
International Conference on Computing in High Energy and Nuclear Physics
    • Poster session: whole day
      • 1
        A Code Inspection Process for Security Reviews
        In recent years, it has become more and more evident that software threat communities are taking an increasing interest in Grid infrastructures. To mitigate the security risk associated with the increased numbers of attacks, the Grid software development community needs to scale up effort to reduce software vulnerabilities. This can be achieved by introducing security review processes as a standard project management practice. The Grid Facilities Department of the Fermilab Computing Division has developed a code inspection process, tailored to reviewing security properties of software. The goal of the process is to identify technical risks associated with an application and their impact. This is achieved by focusing on the business needs of the application (what it does and protects), on understanding threats and exploit communities (what an exploiter gains), and on uncovering potential vulnerabilities (what defects can be exploited). The desired outcome of the process is an improvement of the quality of the software artifact and an enhanced understanding of possible mitigation strategies for residual risks. This paper describes the inspection process and lessons learned on applying it to Grid middleware.
        Speaker: Dr Gabriele Garzoglio (FERMI NATIONAL ACCELERATOR LABORATORY)
        Poster
      • 2
        A Grid Job Monitoring System
        This paper presents a web based Job Monitoring framework for individual Grid sites that allows users to follow in detail their jobs in quasi-real time. The framework consists of several independent components, (a) a set of sensors that run on the site CE and worker nodes and update a database, (b) a simple yet extensible web services framework and (c) an Ajax powered web interface having a look-and-feel and control similar to a desktop application. The monitoring framework supports LSF, Condor and PBS-like batch systems. This is the first such monitoring system where an X509 authenticated web interface can be seamlessly accessed by both end-users and site administrators. While a site administrator has access to all the possible information, a user can only view the jobs for the Virtual Organizations (VO) he/she is a part of. The monitoring framework design supports several possible deployment scenarios. For a site running a supported batch system, the system may be deployed as a whole, or existing site sensors can be adapted and reused with our web services components. A site may even prefer to build the web server independently and choose to use only the Ajax powered web interface. Finally, the system is being used to monitor a glideinWMS instance. This broadens its scope significantly, allowing it to monitor jobs over multiple sites.
        Speaker: Dr Sanjay Padhi (UCSD)
        Paper
        Poster
      • 3
        A minimal xpath parser for accessing XML tags from C++
        A minimal xpath 1.0 parser has been implemented within the JANA framework that allows easy access to attributes or tags in an XML document. The motivating implmentation was to access geometry information from XML files in the HDDS specification (derived from ATLAS's AGDD). The system allows components in the reconstruction package to pick out individual numbers from a collection of XML files with a single line of C++ code. The xpath parsing aspect of JANA will be presented along with examples of both its use and specific tasks where its use would be beneficial.
        Speaker: Dr David Lawrence (Jefferson Lab)
      • 4
        A PanDA Backend for the Ganga Analysis Interface
        Ganga provides a uniform interface for running ATLAS user analyses on a number of local, batch, and grid backends. PanDA is a pilot-based production and distributed analysis system developed and used extensively by ATLAS. This work presents the implementation and usage experiences of a PanDA backend for Ganga. Built upon reusable application libraries from GangaAtlas and PanDA, the Ganga PanDA backend allows users to run their analyses on the worldwide PanDA resources, while providing the ability for users to develop simple or complex analysis workflows in Ganga. Further, the backend allows users to submit and manage "personal" PanDA pilots: these pilots run under the user's grid certificate and provide a secure alternative to shared pilot certificates while enabling the usage of local resource allocations.
        Speaker: Daniel Colin Van Der Ster (Conseil Europeen Recherche Nucl. (CERN))
        Poster
      • 5
        A Web portal for the Engineering and Equipment Data Management System at CERN
        CERN, the European Laboratory for Particle Physics, located in Geneva - Switzerland, has recently started the Large Hadron Collider (LHC), a 27 km particle accelerator. The CERN Engineering and Equipment Data Management Service (EDMS) provides support for managing engineering and equipment information throughout the entire lifecycle of a project. Based on several both in-house developed and commercial data management systems, this service supports management and follow-up of different kinds of information throughout the lifecycle of the LHC project: design, manufacturing, installation, commissioning data, maintenance and more. The data collection phase, carried out by specialists, is now being replaced by a phase during which data will be consulted on an extensive basis by non-experts users. In order to address this change, a Web portal for the EDMS has been developed. It brings together in one space all the aspects covered by the EDMS: project and document management, asset tracking and safety follow-up. This paper presents the EDMS Web portal, its dynamic content management and its “one click” information search engine.
        Speaker: Mr Andrey TSYGANOV (Moscow Physical Engineering Inst. (MePhI))
        Poster
      • 6
        Advanced Data Extraction Infrastructure: Web Based System for Management of Time Series Data
        During operation of high energy physics experiments a big amount of slow control data is recorded. It is necessary to examine all collected data checking the integrity and validity of measurements. With growing maturity of AJAX technologies it becomes possible to construct sophisticated interfaces using web technologies only. Our solution for handling time series, generally slow control data, has a modular architecture: backend system for data analysis and preparation, a web service interface for data access and a fast AJAX web display. In order to provide fast interactive access the time series are aggregated over time slices of few predefined lengths. The aggregated values are stored in the temporary caching database and, then, are used to create generalizing data plots. These plots may include indication of data quality and are generated within few hundreds of milliseconds even if very high data rates are involved. The extensible export subsystem provides data in multiple formats including CSV, Excel, ROOT, and TDMS. The search engine can be used to find periods of time where indications of selected sensors are falling into the specified ranges. Utilization of caching database allows performing most of such lookups within a second. Based on this functionality a web interface facilitating fast (Google-maps style) navigation through the data has been implemented. The solution is at the moment used by several slow control systems at Test Facility for Fusion Magnets (TOSKA) and Karlsruhe Tritium Neutrino (KATRIN).
        Speaker: Dr Suren Chilingaryan (The Institute of Data Processing and Electronics, Forschungszentrum Karlsruhe)
        Poster
        Project page
      • 7
        Alternative Factory Model for Event Processing with Data on Demand
        Factory models are often used in object oriented programming to allow more complicated and controlled instantiation than is easily done with a standard C++ constructor. The alternative factory model implemented in the JANA event processing framework addresses issues of data integrity important to the type of reconstruction software developed for experimental HENP. The data on demand feature of the framework makes it well suited for Level-3 trigger applications. The alternative factory model employed by JANA will be presented with emphasis on how it implements a data on demand mechanism while ensuring the integrity of the data objects passed between reconstruction modules.
        Speaker: Dr David Lawrence (Jefferson Lab)
      • 8
        Association Rule Mining on Grid Monitoring Data to Detect Error Sources
        Grid computing is associated with a complex, large scale, heterogeneous and distributed environment. The combination of different Grid infrastructures, middleware implementations, and job submission tools into one reliable production system is a challenging task. Given the impracticability to provide an absolutely fail-safe system, strong error reporting and handling is a crucial part of operating these infrastructures. There are various monitoring systems in place, which are also able to deliver error codes of failed Grid jobs. Nevertheless, the error codes do not always denote the actual source of the error. Instead, a more sophisticated methodology is required to locate problematic Grid elements. In our contribution we propose to mine Grid monitoring data using association rules. With this approach we are able to produce additional knowledge about the Grid elements' behavior by taking correlations and dependencies between the characteristics of failed Grid jobs into account. This technique finds error patterns - expressedas rules - automatically and fast, which helps tracing back errors to their origin. Therewith a significant decrease in time for fault recovery and fault removal is achieved, yielding an improvement of a Grid's reliability. This work presents the results of investigations on association rule mining algorithms and evaluation methods to find the best rules with respect to monitoring data in a Grid infrastructure.
        Speaker: Ms Gerhild Maier (Johannes Kepler Universität Linz)
        Poster
      • 9
        ATLAS Event Metadata Records as a Testbed for Scalable Data Mining
        At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise "data mining," but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not necessarily for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commisioning as a testbed for scalable data mining tool development and evaluation.
        Speakers: Dr David Malon (Argonne National Laboratory), Dr Peter Van Gemmeren (Argonne National Laboratory)
        Poster
      • 10
        ATLAS Grid Compute Cluster with virtualised service nodes
        The ATLAS computing Grid consists of several hundred compute clusters distributed around the world as part of the Worldwide LHC Computing Grid (WLCG). The Grid middleware and the ATLAS software which has to be installed on each site, often require certain Linux distribution and sometimes even specific version thereof. On the other hand, mostly due to maintenance reasons, computer centres install the same operating system and version on all computers. This might lead to problems with the Grid middleware if the local version is different from the one for which it has been developed. At RZG we partly solved this conflict by using virtualisation technology for the service nodes. We will present the setup used at RZG and show how it helped to solve the problems described above. In addition we will ilustrate the additional advantages gained by the above setup.
        Speaker: José Mejia (Rechenzentrum Garching)
        Poster
      • 11
        ATLAS operation in the GridKa Tier1/Tier2 cloud
        The organisation and operations model of the ATLAS T1-T2 federation/cloud associated to the GridKa T1 in Karlsruhe is described. Attention is paid to cloud level services and the experience gained during the last years of operation. The ATLAS GridKa Cloud is large and divers spanning 5 countries, 2 ROC's and is currently comprised of 13 core sites. A well defined and tested operations model in such a cloud is of the utmost importance. We have defined the core cloud services required by the ATLAS experiment and ensured that they are performed in a managed and sustainable manner. Services such as Distributed Data Management involving data replication,deletion and consistency checks, Monte Carlo Production, software installation and data reprocessing are described in greater detail. In addition to providing these central services we have undertaken several cloud level stress tests and developed monitoring tools to aid with cloud diagnostics. Furthermore we have defined good channels of communication between ATLAS, the T1 and the T2's and have pro-active contributions from the T2 manpower. A brief introduction to the GridKa cloud is provided followed by a more detailed discussion of the operations model and ATLAS services within the cloud. Finally a summary of our experience gained while running these services is presented.
        Speaker: Dr John Kennedy (LMU Munich)
        Poster
      • 12
        Authentication and authorisation in CMS' monitoring and computing web services
        The CMS experiment at the Large Hadron Collider has deployed numerous web-based services in order to serve the collaboration effectively. We present the two-phase authentication and authorisation system in use in the data quality and computing monitoring services, and in the data- and workload management services. We describe our techniques intended to provide a high level of security with minimum harassment, and how we have applied a role-based authorisation model to a variety of services depending on the task and the strength of the authentication. We discuss the experience of implementing authentication at front-end servers separate from application servers, and challenges authenticating both humans and programs effectively. We describe our maintenance procedures and report capacity and performance results.
        Speaker: Lassi Tuura (Northeastern University)
      • 13
        Automated Testing Infrastructure for LHCb Software Framework Gaudi
        An extensive test suite is the first step towards the delivery of robust software, but it is not always easy to implement it, especially in projects with many developers. An easy to use and flexible infrastructure to use to write and execute the tests reduces the work each developer has to do to instrument his packages with tests. At the same time, the infrastructure gives the same look and feel to the tests and allows automated execution of the test suite. For Gaudi, we decided to develop the testing infrastructure on top of the free tool QMTest, used already in LCG Application Area for the routine tests run in the nightly build system. The high flexibility of QMTest allowed us to integrate it in the Gaudi package structure. A specialized test class and some utility functions have been developed to simplify the definition of a test for a Gaudi-based application. Thanks to the testing infrastructure described here, we managed to quickly extend the standard Gaudi test suite and add tests to the main LHCb applications, so that they are executed in the LHCb nightly build system to validate the code.
        Speaker: Marco Clemencic (European Organization for Nuclear Research (CERN))
        Poster
      • 14
        Batch efficiency at CERN
        A frequent source of concern for resource providers is the efficient use of computing resources in their centres. This has a direct impact on requests for new resources. There are two different but strongly correlated aspects to be considered: while users are mostly interested in a good turn-around time for their jobs, resource providers are mostly interested in a high and efficient usage of their available resources. Both things, the box usage and the efficiency of individual user jobs, need to be closely monitored so that the sources of the inefficiencies can be identified. At CERN, the Lemon monitoring system is used for both purposes. Examples of such sources are poorly written user code, inefficient access to mass storage systems, and dedication of resources to specific user groups. As a first step for improvements CERN has launched a project to develop a scheduler add-on that allows careful overloading of worker nodes that run idle jobs. Results on the impact of these developments on the box efficiency will be presented.
        Speaker: Mr Ricardo Manuel Salgueiro Domingues da Silva (CERN)
        Poster
      • 15
        Benchmarking the ATLAS software though the Kit Validation engine
        The measurement of the experiment software performances is a very important metric in order to choose the most effective resources to be used and to discover the bottlenecks of the code implementation. In this work we present the benchmark techniques used to measure the ATLAS software performance through the ATLAS offline testing engine Kit Validation and the online portal Global Kit Validation. The performance measurements, the data collection, the online analysis and display of the results will be presented. The results of the measurement on different platforms and architectures will be shown, giving a full report on the CPU power and memory consumption of the Monte Carlo generation, simulation, digitization and reconstruction of the most CPU-intensive channels. The impact of the multi-core computing on the ATLAS software performance will also be presented, comparing the behavior of different architectures when increasing the number of concurrent processes. The benchmark techniques described in this paper have been used in the HEPiX group since the beginning of 2008 to help defining the performance metrics for the High Energy Physics applications, based on the real experiment software.
        Speaker: Alessandro De Salvo (Istituto Nazionale di Fisica Nucleare Sezione di Roma 1)
        Poster
      • 16
        Build and test system for FairRoot
        One of the challenges of software development for large experiments is to manage the contributions from globally distributed teams. In order to keep the teams synchronized a strong quality control is important. For a software project this means that it has to be tested on all supported platforms if the project can be build from source, if it runs and in the end if the program delivers the correct results. This tests should be done frequently which results immediately in the necessity to do these checks automatically. If the number of different platforms increases it becomes impractical to have installations of all supported platforms at one site. To overcome this problem, the best way is to use a client server architecture, which means to run the quality control at the place where a specific platform is installed and used (client) and only the results are send to a central server responsible for the processing of the data. The scheme used within FairRoot to fulfill this requirements will be presented. The configure, build and test framework is based on CMake an open source tool to generate standard build files for the different operating systems/compiler out of simple configuration files. To process and display the gathered data the open source tool CDash is used. From the generated web pages information about the status of the project at a given time can be obtained.
        Speaker: Dr Florian Uhlig (GSI Darmstadt)
        Poster
      • 17
        CASTOR end-to-end monitoring system
        We present the new monitoring system for CASTOR (CERN Advanced STORage) which allows an integrated view on all the different storage components. With the massive data-taking phase approaching, CASTOR is one of the key elements of the software needed by the LHC experiments. It has to provide a reliable storage machinery for saving the event data, as well as to enable an efficient reconstruction and analysis, making the monitoring of the running CASTOR instances essential. The new CASTOR monitoring system is built around a dedicated database schema which allows to perform the appropriate queries in an efficient way. The monitoring database is currently populated using SQL procedures running on the CASTOR Distributed Logging Facility (DLF) which is a database where the log messages created by the different CASTOR entities are stored. In the future releases, it is envisaged to move to a SYSLOG-based transport and to have the monitoring database to be directly populated by Python scripts parsing and pre-processing the log messages. A web interface has been developed for the presentation of the monitoring information. The different histograms and plots are created using PHP scripts which query the monitoring database. The modular approach of the new monitoring system makes it easy to change the method of populating the monitoring database, or to changes the web interface, without modifying the database itself. After a short introduction about the CASTOR architecture, we will discuss in details the CASTOR monitoring database and present the new web interface.
        Speaker: Witold Pokorski (CERN)
        Poster
      • 18
        CMS conditions database web application service
        The web application service as part of the conditions database system serves applications and users outside the event-processing. The application server is built upon conditions python API in the CMS offline software framework. It responds to http requests on various conditions database instances. The main client of the application server is the conditions database web GUI which currently exposes three main services. The tag browser allows user to see the availability of the conditions data in terms of their version (tag) and the interval of validity (iov). The global tag component is used by physicists to inspect the organization of the tags in a given data taking or data production while production managers use the web service to produce such tag hierarchy. History chart plotting service creates dynamic summary and distribution charts of the payload data in the database. Fast graphical overview of different information greatly helps physicists in monitoring and validating the calibration data stored in the condition database.
        Speaker: Dr Antonio Pierro (INFN-BARI)
        Poster
      • 19
        CMS Dashboard Task Monitoring: A user-centric monitoring view.
        Dashboard is a monitoring system developed for the LHC experiments in order to provide the view of the Grid infrastructure from the perspective of the Virtual Organisation. The CMS Dashboard provides a reliable monitoring system that enables the transparent view of the experiment activities across different middleware implementations and combines the Grid monitoring data with information that is specific to the experiment. The scientists must be able to monitor the execution status, application and grid-level messages of their tasks that may run at any site within the Virtual Organisation. The existing monitoring systems provide this type of information but they are not focused on the user's perspective. Information towards individual users is not easily available at present or even non-existent. The CMS Dashboard Task Monitoring project addresses this gap by collecting and exposing a user-centric set of information to the user regarding submitted tasks. It provides a clear and precise view of the status of the task including job distribution by sites and over time, reason of failure and advanced graphical plots giving a more usable and attractive interface to the analysis and production user. The development was user driven with physicists invited to test the prototype in order to assemble further requirements and identify weaknesses with the application. The solutions implemented and insight into future development plans are presented here.
        Speaker: Edward Karavakis (Brunel University-CERN)
        Poster
      • 20
        CMS data quality monitoring web service
        A central component of the data quality monitoring system of the CMS experiment at the Large Hadron Collider is a web site for browsing data quality histograms. The production servers in data taking provide access to several hundred thousand histograms per run, both live in online as well as for up to several terabytes of archived histograms for the online data taking, Tier-0 prompt reconstruction, prompt calibration and analysis activities, for re-reconstruction at Tier-1s and for release validation. At the present usage level the servers currently handle in total around a million authenticated HTTP requests per day. We describe the main features and components of the system, our implementation for web-based interactive rendering, and the server design. We give an overview of the deployment and maintenance procedures. We discuss the main technical challenges and our solutions to them, with emphasis on functionality, long-term robustness and performance.
        Speaker: Lassi Tuura (Northeastern University)
        Paper
        Poster
      • 21
        CMS Partial Releases: model, tools, and applications. Online and Framework-light releases.
        The CMS Software project CMSSW embraces more than a thousand packages organized in over a hundred subsystems covering the areas of analysis, event display, reconstruction, simulation, detector description, data formats, framework, utilities and tools. The release integration process is highly automated, using tools developed or adopted by CMS. Packaging in rpm format is a built-in step in the software build process. For several well-defined applications it is highly desirable to have only a subset of the CMSSW full package bundle. For example, High Level Trigger algorithms that run on the Online farm, and need to be rebuilt in a special way, require no simulation, event display, or data analysis functionality. Physics analysis applications in the ROOT environment require only a small number of core libraries and the description of CMS specific data formats. We present a model of CMS Partial Releases, used for preparation of the customized CMS software builds, including description of tools, the implementation, and how we deal with technical challenges, such as resolving dependencies and meeting special requirements for concrete applications in a highly automated fashion.
        Speaker: Natalia Ratnikova (Fermilab-ITEP(Moscow)-Karlsruhe University(Germany))
        Poster
      • 22
        CMS Software Build, Release and Distribution --- Large system optimization
        The CMS offline software consists of over two million lines of code actively developed by hundreds of developers from all around the world. Optimal builds and distribution of such a large scale system for production and analysis activities for hundreds of sites and multiple platforms are major challenges. Recent developments have not only optimized the whole process but also helped us identify the remaining build and integration issues. We describe how parallel builds of software and minimal distribution size dramatically reduced the time gap between software build and installation on remote sites and how we have improved the performance of the build environment used by developers. In addition, we discuss our work to produce few big binary products rather than thousands of small ones.
        Speaker: Mr Shahzad Muzaffar (NORTHEASTERN UNIVERSITY)
        Poster
      • 23
        CMS Tier-2 Resource Management
        The Tier-2 centers in CMS are the only location, besides the specialized analysis facility at CERN, where users are able to obtain guaranteed access to CMS data samples. The Tier-1 centers are used primarily for organized processing and storage. The Tier-1s are specified with data export and network capacity to allow the Tier-2 centers to refresh the data in disk storage regularly for analysis. A nominal Tier-2 center will deploy 200 TB of storage for CMS. The CMS expectation for the global Tier-2 capacity is more than 5 PB of usable disk storage. In order to manage such a large and highly distributed resource CMS has tried to introduce policy and structure to the Tier-2 storage and processing. In this presentation we will discuss the CMS policy for dividing resources between the local community, the individual users, CMS centrally, and focused CMS analysis groups. We will focus on the technical challenges associated with management and accounting as well as the collaborative challenges of assigning resources to the whole community. We will explore the different challenges associated with partitioning dynamic resources like processing and more static resources like storage. We will show the level of dynamic data placement and resource utilization achieved and the level of distribution CMS expects to achieve in the future.
        Speaker: Dr Thomas Kress (RWTH Aachen, III. Physikal. Institut B)
        Poster
      • 24
        CMS Usage of the Open Science Grid and the US Tier-2 centers
        The CMS experiment has been using the Open Science Grid, through its US Tier-2 computing centers, from its very beginning for production of Monte Carlo simulations. In this talk we will describe the evolution of the usage patterns indicating the best practices that have been identified. In addition to describing the production metrics and how they have been met, we will also present the problems encountered and mitigating solutions. Data handling and the user analysis patterns on the Tier-2 and OSG computing will be described.
        Speaker: Dr Ajit Kumar Mohapatra (University of Wisconsin, Madison, USA)
      • 25
        Commissioning Distributed Analysis at the CMS Tier-2 Centers
        CMS has identified the distributed Tier-2 sites as the primary location for physics analysis. There is a specialized analysis cluster at CERN, but it represents approximately 15% of the total computing available to analysis users. The more than 40 Tier-2s on 4 continents will provide analysis computing and user storage resources for the vast majority of physicists in CMS. The CMS estimate is that each Tier-2 will be able to support on average 40 people and the global number of analysis jobs per day is between 100k and 200k depending on the data volume and individual activity. Commissioning a distributed analysis system of this scale in terms of distribution and number of expected users is a unique challenge. In this presentation we will discuss the CMS Tier-2 analysis commissioning activities and user experience. The 4 steps deployed during the Common Computing Readiness Challenge that drove the level of activity and participation to an unprecedented scale in CMS will be presented. We will summarize the dedicated commissioning tests employed to prepare the next generation of CMS analysis server. Additionally, we will present the experience from users and the level of adoption of the tools in the collaboration.
        Speaker: Dr Alessandra Fanfani (on beahlf of CMS - INFN-BOLOGNA (ITALY))
        Poster
      • 26
        COOL Performance Optimization and Scalability Tests
        The COOL project provides software components and tools for the handling of the LHC experiment conditions data. The project is a collaboration between the CERN IT Department and Atlas and LHCb, the two experiments that have chosen it as the base of their conditions database infrastructure. COOL supports persistency for several relational technologies (Oracle, MySQL and SQLite), based on the CORAL Relational Abstraction Layer.  For both experiments, Oracle is the backend used for the deployment of COOL database services at Tier0 (both online and offline) and Tier1 sites.  While the development of new software features is still ongoing, performance optimizations and tests have been the main focus of the project in 2008. This presentation will focus on the results of the proactive scalability tests performed by the COOL team for data insertion and retrieval using samples of simulated conditions data. It will also briefly review the results of stress tests performed by the experiments using the production setups for service deployment.
        Speaker: Andrea Valassi (CERN)
        Poster
      • 27
        Cyberinfrastructure for High Energy Physics in Korea
        KISTI (Korea Institute of Science and Technology Information) in Korea is the national headquarter of supercomputer, network, Grid and e-Science. We have been working on cyberinfrastructure for high energy physics experiment, especially CDF experiment and ALICE experiment. We introduce the cyberinfrastructure which includes resources, Grid and e-Science for these experiments. The goal of e-Science is to study high energy physics anytime and anywhere even if we are not on-site of accelerator laboratories. The components are data production, data processing and data analysis. The data production is to take both on-line and off-line shifts remotely. The data processing is to run jobs anytime, anywhere using Grid farms. The data analysis is to work together to publish papers using collaborative environment such as EVO (Enabling Virtual Organization) system. We also present the activities of FKPPL (France-Korea Particle Physics Laboratory) which is the joint laboratory between France and Korea for Grid, ILC, ALICE and CDF experiments. Recently we have constructed FKPPL VO (Virtual Organization). We will present the applications of this VO.
        Speaker: Prof. Kihyeon Cho (KISTI)
        Poster
      • 28
        Data Management tools and operational procedures in ATLAS : Example of the German cloud
        A set of tools have been developed to ensure the Data Management operations (deletion, movement of data within a site and consistency checks) within the German cloud for ATLAS. These tools that use local protocols which allow a fast and efficient processing are described hereafter and presented in the context of the operational procedures of the cloud. A particular emphasis is put on the consistency checks between the Local Catalogues (LFC) and the files stored on the Storage Element. These consistency checks are crucial to be sure that all the data stored in the sites are actually available for the users and to get rid of non registered files also known as Dark Data.
        Speaker: Cédric Serfon (LMU München)
        Poster
      • 29
        dCache with tape storage for High Energy Physics applications
        An interface between dCache and the local Tivoli Storage Manager (TSM) tape storage facility has been developed at the University of Victoria (UVic) for High Energy Physics (HEP) applications. The interface is responsible for transferring the data from disk pools to tape and retrieving data from tape to disk pools. It also checks the consistency between the PNFS filename space and the TSM database. The dCache system, consisting of a single admin node with two pool nodes, is configured to have two read pools and one write pool. The pools are attached to the TSM storage that has a capacity of about 100TB. This system is being used in production at UVic as part of a Tier A site for BaBar Tau analysis. An independent dCache system is also in production for the storage element (SE) of the ATLAS experiment as a part of Canadian Tier-2 sites. This system does not currently employ a tape storage facility, however, it can be added in the future.
        Speaker: Dr Ashok Agarwal (University of Victoria, Victoria, BC, Canada)
        Poster
      • 30
        Development and Commissioning of the CMS Tier0
        The CMS Tier 0 is responsible for handling the data in the first period of it's life, from being written to a disk buffer at the CMS experiment site in Cessy by the DAQ system, to the time transfer completes from CERN to one of the Tier1 computing centres. It contains all automatic data movement, archival and processing tasks run at CERN. This includes the bulk transfers of data from Cessy to a Castor disk pool at CERN, repacking the data into Primary Datasets, storage to tape of and export to the Tier 1 centres. It also includes a first reconstruction pass over all data and and the tape archival and export to the Tier1 centres of the reconstructed data. While performing these tasks, the Tier 0 has to maintain redundant copies of the data and flush it through the system within a narrow time window to avoid data loss. With data taking being imminent, this aspect of the CMS computing effort becomes of the upmost importance. We discuss and explain here the work developing and commissioning the CMS Tier0 undertaken over the last year.
        Speaker: Dirk Hufnagel (Conseil Europeen Recherche Nucl. (CERN))
      • 31
        DIRAC Secure Distributed Platform
        DIRAC, the LHCb community Grid solution, provides access to a vast amount of computing and storage resources to a large number of users. In DIRAC users are organized in groups with different needs and permissions. In order to ensure that only allowed users can access the resources and to enforce that there are no abuses, security is mandatory. All DIRAC services and clients use secure connections that are authenticated using certificates and grid proxies. Once a client has been authenticated, authorization rules are applied to the requested action based on the presented credentials. These authorization rules and the list of users and groups are centrally managed in the DIRAC Configuration Service. Users submit jobs to DIRAC using their local credentials. From then on, DIRAC has to interact with different Grid services on behalf of this user. DIRAC has a proxy management service where users upload short-lived proxies to be used when DIRAC needs to act on behalf of them. Long duration proxies are uploaded by users to MyProxy service, and DIRAC retrieves new short delegated proxies when necessary. This contribution discusses the details of the implementation of this security infrastructure in DIRAC.
        Speaker: Mr Adrian Casajus Ramo (Departament d' Estructura i Constituents de la Materia)
        Paper
        Poster
      • 32
        Distributed Processing and Analysis of ALICE data at distributed Tier2-RDIG
        A. Bogdanov3, L. Malinina2, V. Mitsyn2, Y. Lyublev9, Y. Kharlov8, A. Kiryanov4, D. Peresounko5, E.Ryabinkin5, G. Shabratova2 , L. Stepanova1, V. Tikhomirov3, W. Urazmetov8, A.Zarochentsev6, D. Utkin2, L. Yancurova2, S. Zotkin8 1 Institute for Nuclear Research of the Russian, Troitsk, Russia; 2 Joint Institute for Nuclear Research, Dubna, Russia; 3 Moscow Engineering Physics Institute, Moscow, Russia; 4 Petersburg Nuclear Physics Institute, Gatchina, Russia; 5 Russian Research Center "Kurchatov Institute", Moscow, Russia; 6 Saint-Petersburg State University, Saint-Petersburg, Russian; 7 Skobeltsyn Institute of Nuclear Physics, Moscow, Russia; 8 Institute for High Energy Physics, Protvino, Russia; 9 Institute for Theoretical and Experimental Physics, Moscow, Russia; ( this activity is supported by CERN-INTAS grant 7484) The readiness of Tier-2s to the processing and analysis of LHC data in present days is a subject of worry and control from LHC experiment managements. According to ALICE computing model [1], main tasks of Tier-2 activity are production of simulated data and analysis as simulated as experimental data. Russian sites combined together into distributed Tier-2 RDIG (Russian Intensive Data GRID)[2] were and are participating in the ALICE GRID activity starting from 2004 year. The ALICE GRID activity is based at AliEn[3] with usage of LCG(EGEE) middle ware [4]via interface. The stable operation of AliEn with LCG middleware has been tested and demonstrated in few last year. For the more adequate processing of ALICE data during LHC operation there needed to test stability of processing and analysis data with application more modern services like CREAM-CE and pure xrootd The major subject of this report is demonstration of a possibility for production simulation data necessary for the complex analysis of the forthcoming LHC data and processing this analysis itself. There will be discussed the usage of CPU and DISK resources pledged by RDIG for the GRID activity of ALICE. The installation, test and stable operation support of new services at RDIG sites like CREAM-CE and pure xrootd have been discussed in this report. It will show the advantage of these services usage for ALICE tasks. There will be presented also the information about installation, test and support of parallel analysis facility based on PROOF[5] for the special usage of Russian ALICE community. There will be presented examples of this facility application for analysis of simulated and reconstructed ALICE data for the first LHC physics. [1] ALICE Collaboration, Technical Design Report of Computing,CERN-LHCC-2005-018 [2] http://www.egee-rdig.ru/ [3] P. Saiz et al., Nucl. Instrum. Methods A502 (2003) 437-440; http://alien.cern.ch/; [4]http://www.eu-egee.org/ [5]F.Rademakers et al http://indico.cern.ch/contributionDisplay.py?contribId=307&sessionId=31&confId=3580
        Speaker: Galina Shabratova (Joint Inst. for Nuclear Research (JINR))
        Poster
      • 33
        Dynamic Virtual AliEn Grid Sites on Nimbus with CernVM
        Infrastructure-as-a-Service (IaaS) providers allow users to easily acquire on-demand computing and storage resources. For each user they provide an isolated environment in the form of Virtual Machines which can be used to run services and deploy applications. This approach, also known as 'cloud computing', has proved to be viable for a variety of commercial applications. Currently there are many IaaS providers on the market, the biggest of them is Amazon with its 'Amazon Elastic Computing Cloud (Amazon EC2)' service. The question arises whether scientific communities can benefit from the IaaS approach, and how existing projects can take advantage of cloud computing. Will there be a need to make any changes to existing services and applications? How can services and applications (e.g., grid infrastructure or other distributed tools), currently used by scientists, be integrated to infrastructures offered by IaaS providers? In this contribution we describe some answers to these questions. We show how cloud computing resources can be used within the AliEn Grid framework, developed by CERN ALICE experiment, for performing simulation, reconstruction and analysis of physics data. We use baseline virtual software appliance for the LHC experiments developed by the CernVM project. The appliance provides a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid, independent of physical software and hardware platform. We deploy those appliances on the Science Clouds resources that use the Nimbus project to enable deployment of VMs on remote resources. We further also use Nimbus tools for one click deployment of dynamically configurable AliEn Grid site on the Science Cloud of the Univeristy of Chicago.
        Speaker: Predrag Buncic (CERN)
        Poster
      • 34
        Enabling Virtualization for Atlas Production Work through Pilot Jobs
        Omer Khalid, Paul Nillson, Kate Keahey, Markus Schulz --- Given the profileration of virtualization technology in every technological domain, we have been investigating on enabling virtualization in the LCG Grid to bring in virtualization benefits such as isolation, security and environment portability using virtual machines as job execution containers. There are many different ways to go around about it but as our workload candidate is Atlas experiment, so we choose to enable virtualization through pilot jobs which in Atlas case is Panda Pilot Framework. In our approach, once a pilot would have acquired a resource slot on the grid; it verifies if the server support virtual machines. If it does, then it proceeds to standard phases of job download and environment preparation and finally deploy virtual machine. We have taken a holistic approach in our implementation where all the I/O takes places outside of the virtual machine on the host OS. Once all the data have been downloaded, then the Panda Pilot packages the job in the virtual machines and launches it for execution. Upon termination, panda pilot running on the host machine updates the server and stores the job output to an external SE and then do the clean up to makes the host slot available for next job execution. Installing and maintaining Atlas releases on the worker nodes are the biggest issue, and especially how they could be made available to the virtual machine job execution container. In our implementation, Panda pilot takes an existing Atlas release installation and packages it in the virtual machine before starting it as read-only block device thus enabling the job to execute. Similarly, the base images for the virtual machine are generic to make sure that they are usable for large sets of jobs while keeping the control in the hands of system administrators as Panda pilot only uses the images made available by them. In this way, pilot never looses the slot but at the same time enables virtualization on the grid in a systematic and coherent manner. Additional advantage of this approach is that only the computational over head of the virtualization is incurred which are minimal, and avoids more significant over head of I/O in a virtual machine by downloading/uploading in the host environment rather than in the virtual machine.
        Speaker: Mr omer khalid (CERN)
        Poster
      • 35
        Ensuring Data Consistency Over CMS Distributed Computing System
        CMS utilizes a distributed infrastructure of computing centers to custodially store data, to provide organized processing resources, and to provide analysis computing resources for users. Integrated over the whole system, even in the first year of data taking, the available disk storage approaches 10 peta bytes of space. Maintaining consistency between the data bookkeeping, the data transfer system, and physical storage is an interesting technical and operations challenge. In this presentation we will discuss the CMS effort to ensure that data is consistently available at all computing centers. We will discuss the technical tools that monitor the consistency of the catalogs and the physical storage as well as the operations model used to find and solve inconsistencies.
        Speaker: Paul Rossman (Fermi National Accelerator Lab. (Fermilab))
        Poster
      • 36
        EVE - Event Visualization Environment of the ROOT framework
        EVE is a high-level visualization library using ROOT's data-processing, GUI and OpenGL interfaces. It is designed as a framework for object management offering hierarchical data organization, object interaction and visualization via GUI and OpenGL representations. Automatic creation of 2D projected views is also supported. On the other hand, it can serve as an event visualization toolkit satisfying most HEP requirements: visualization of geometry, simulated and reconstructed data such as hits, clusters, tracks and calorimeter information. Special classes are available for visualization of raw-data. Object-interaction layer allows for easy selection and highlighting of objects and their derived representations (projections) across several views (3D, Rho-Z, R-Phi). Object-specific tooltips are provided in both GUI and GL views. The visual-configuration layer of EVE is built around a data-base of template objects that can be applied to specific instances of visualization objects to ensure consistent object presentation. The data-base can be retrieved from a file, edited during the framework operation and stored to file. EVE prototype was developed within the ALICE collaboration and has been included into ROOT in December 2007. Since then all EVE components have reached maturity. EVE is used as the base of AliEve visualization framework in ALICE, Firework physics-oriented event-display in CMS, and as the visualization engine of FairRoot in FAIR.
        Speaker: Matevz Tadel (CERN)
        Poster
      • 37
        Evolution of the ATLAS Computing Model
        Despite the all too brief availability of beam-related data, much has been learned about the usage patterns and operational requirements of the ATLAS computing model since Autumn 2007. Bottom-up estimates are now more detailed, and cosmic ray running has exercised much of the model in both duration and volume. Significant revisions have been made in the resource estimates, and in the usage of those resources. In some cases, this represents an optimization while in others it attempts to counter lack of functionality in the available middleware. There are also changes reflecting the emerging roles of the different data formats. The model continues to evolve with a heightened focus on end-user performance, and the state of the art after a major review process over winter 08/09 will be presented.
        Speaker: Prof. Roger Jones (Lancaster University)
        Poster
      • 38
        Experience Building and Operating the CMS Tier-1 Computing Centers
        The CMS Collaboration relies on 7 globally distributed Tier-1 computing centers located at large universities and national laboratories for a second custodial copy of the CMS RAW data and primary copy of the simulated data, data serving capacity to Tier-2 centers for analysis, and the bulk of the reprocessing and event selection capacity in the experiment. The Tier-1 sites have a challenging role in CMS because they are expected to ingest and archive data from both CERN and regional Tier-2 centers, while they export data to a global mesh of Tier-2s at rates comparable to the raw export data rate from CERN. The combined capacity of the Tier-1 centers is more than twice the resources located at CERN and efficiently utilizing this large distributed resources represents a challenge. In this presentation we will discuss the experience building, operating, and utilizing the CMS TIer-1 computing centers. We will summarize the facility challenges at the Tier-1s including the stable operations of CMS services, the ability to scale to large numbers of processing requests and large volumes of data, and the ability to provide custodial storage and high performance data serving. We will also present the operations experience utilizing the distributed TIer-1 centers from a distance: transferring data, submitting data serving requests, and submitting batch processing requests.
        Speaker: Claudio Grandi (INFN Bologna)
        Poster
      • 39
        Experience with ATLAS MySQL Panda DataBase service
        The PanDA distributed production and analysis system has been in production use for ATLAS data processing and analysis since late 2005 in the US, and globally throughout ATLAS since early 2008. Its core architecture is based on a set of stateless web services served by Apache and backed by a suite of MySQL databases that are the repository for all Panda information: active and archival job queues, dataset and file catalogs, site configuration information, monitoring information, system control parameters, and so on. This database system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA, currently operating at a scale of half a million jobs per week, with much growth still to come. In this paper we describe the design and implementation of the PanDA database system, its architecture of MySQL servers deployed at BNL and CERN, backup strategy and monitoring tools. The system has been developed, thoroughly tested, and brought to production to provide highly reliable, scalable, flexible and available database services for ATLAS Monte Carlo production, reconstruction and physics analysis.
        Speakers: Dr Tomasz Wlodek (Brookhaven National Laboratory (BNL)), Dr Yuri Smirnov (Brookhaven National Laboratory (BNL))
      • 40
        Experience with Server Self Service Center (S3C)
        CERN has a successful experience with running Server Self Service Center (S3C) for virtual server provisioning which is based on Microsoft Virtual Server 2005. With the introduction of Window Server 2008 and its built-in hypervisor based virtualization (Hyper-V) there are new possibilities for the expansion of the current service. Observing a growing industry trend of provisioning Virtual Desktop Infrastructure (VDI) we try to gather the ideas of how desktop infrastructure could take advantage of thin client technology combined with virtual desktops hosted by the Hyper-V infrastructure. The talk will cover our experience of running Server Self Service Centre, steps for the migration to the Hyper-V based infrastructure and Virtual Desktop Infrastructure implementation.
        Speaker: Juraj Sucik (CERN)
        Poster
      • 41
        First experience in operating the population of the "condition database" for the CMS experiment
        Reliable population of the condition database is critical for the correct operation of the online selection as well as of the offline reconstruction and analysis of data. We will describe here the system put in place in the CMS experiment to populate the database and make condition data promptly available online for the high-level trigger and offline for reconstruction. The system has been designed for high flexibility to cope with very different data sources and uses Pool-ORA technology to store data in an object format that matches best the object oriented C++ programming paradigm used in CMS offline software. To ensure consistency among the various subdetectors, a dedicated package, PopCon (Populator of Condition Objects), is used to store data online. The data are then automatically streamed to the offline database and so immediately accessible offline worldwide. This mechanism has been intensively used during 2008 in the test-runs with cosmic rays. The experience of this first months of operation will be discussed in details.
        Speaker: Mr Michele De Gruttola (INFN, Sezione di Napoli - Universita & INFN, Napoli/ CERN)
        Poster
      • 42
        FROG : The Fast & Realistic OpenGl Event Displayer
        FROG is a generic framework dedicated to visualize events in a given geometry. \newline It has been written in C++ and use OpenGL cross-platform libraries. It can be used to any particular physics experiment or detector design. The code is very light and very fast and can run on various Operating System. Moreover, FROG is self consistent and does not require installation of ROOT or Experiment software (e.g. CMSSW) libraries on user's computer.\newline The slides will describe the principle of the algorithm and its many functionalities such as : 3D and 2D visualization, graphical user interface, mouse interface, configuration files, production of pictures in various format, integration of personal objects... Finally the application of FROG for physic experiment, such as CMS experiment, will be described. http://projects.hepforge.org/frog/ https://twiki.cern.ch/twiki/bin/view/CMS/FROG
        Speaker: Loic Quertenmont (Universite Catholique de Louvain)
        Poster
      • 43
        GEANT 4 TESTING INTEGRATION INTO LCG NIGHTLY BUILDS SYSTEM
        Geant4 is a toolkit to simulate the passage of particles through matter, and is widely used in HEP, in medical physics and for space applications. Ongoing developments and improvements require regular integration testing for new or modified code. The current system uses a customised version of the Bonsai Mozilla tool to collect and select tags for testing, a set of shell and perl scripts to submit building of the software and running the tests to a set of Unix platforms and uses the Tinderbox Mozilla tool to collect and display test results. Mac OS and Windows are not integrated in this system. Geant4 integration testing is being integrated into the LCG applications area nightly builds system. The LCG nightly builds system based on CMT and on pyhton scripts supports testing on many different platforms, including Windows and Mac OS. The CMT configuration management tool is responsible for the configuration of the build and test environment and external dependencies in a structured and modulated way, giving fine control of configuring options for the build and execution of tests. For the testing itself, the LCG nightly builds system uses QMTest, a test suite providing tools to test software and to present the test outcome in different formats. We are working to integrate this tool with Geant4 tests and to improve the presentation of test results, so we can give different outputs to the default ones, and different formats. Further improvements include 'on-the-fly' automatic tag testing, parallel execution of tests, improvements on the time use of the server, testing of patches automatically and efficiency improvements.
        Speaker: Victor Diez Gonzalez (Univ. Rov. i Virg., Tech. Sch. Eng.-/CERN)
        Poster
      • 44
        Geant4 Qt visualization driver
        Qt is a powerfull cross-platform application framework , powerful, free (even on Windows), used by lots of people and applications. That's why, last developments in Geant4 visualization group come with a new driver, based on Qt toolkit. Qt library has OpenGL available, then all 3D scenes could be move by mouse (like in OpenInventor driver). This driver try to resume all the features already present in other drivers, but in addition, added some new ones. For example, a movie record feature, very useful to make movies, debug geometry....
        Speaker: Mr Laurent GARNIER (LAL-IN2P3-CNRS)
        Poster
      • 45
        GLANCE Traceability - Web System for Equipment Traceability and Radiation Monitoring for the ATLAS
        During the operation, maintenance, and dismantling periods of the ATLAS Experiment, the traceability of all detector equipment must be guaranteed for logistic and safety matters. The running of the Large Hadron Collider will expose the ATLAS detector to radiation. Therefore, CERN shall follow specific regulation from French and Swiss authorities for equipment removal, transport, repair, and disposal. GLANCE Traceability, implemented in C++ and Java/Java3D, has been developed to fulfill the requirements. The system registers and associates each equipment part to either a functional position in the detector or a zone outside the underground area through a 3D graphical user interface. Radiation control of the equipment is performed using a radiation monitor connected to the system: the local background gets stored and the threshold is automatically calculated. The system classifies the equipment as non radioactive if its radiation dose does not exceed that limit value. History for both location traceability and radiation measurements is ensured, as well as simultaneous management of multiples equipment. The software is fully operational, being used by the Radiation Protection Experts of ATLAS since the first beam of the LHC. Initially developed for the ATLAS detector, the flexibility of the system has allowed its adaptation for the LHCb detector.
        Speaker: Mr Luiz Henrique Ramos De Azevedo Evora (CERN)
        Poster
      • 46
        gLExec and MyProxy integration in the ATLAS/OSG PanDA Workload Management System.
        Worker nodes on the grid exhibit great diversity, making it difficult to offer uniform processing resources. A pilot job architecture, which probes the environment on the remote worker node before pulling down a payload job, can help. Pilot jobs become smart wrappers, preparing an appropriate environment for job execution and providing logging and monitoring capabilities. PanDA (Production and Distributed Analysis), an ATLAS and OSG workload management system, follows this design. However, in the simplest (and most efficient) pilot submission approach of identical pilots carrying the same identifying grid proxy, end-user accounting by the site can only be done with application-level information (PanDA maintains its own end-user accounting), and end-user jobs run with the identity and privileges of the proxy carried by the pilots, which may be seen as a security risk. To address these issues, we have enabled Panda to use gLExec, a tool provided by EGEE which runs payload jobs under an end-user's identity. End-user proxies are pre-staged in a credential caching service, MyProxy, and the information needed by the pilots to access them is stored in the Panda DB. gLExec then extracts from the user's proxy the proper identity under which to run. We describe the deployment, installation, and configuration of gLExec, and how PanDA components have been augmented to use it. We describe how difficulties were overcome, and how security risks have been mitigated. Results are presented from OSG and EGEE Grid environments performing ATLAS analysis using PanDA and gLExec.
        Speaker: Dr Jose Caballero (Brookhaven National Laboratory (BNL))
        Poster
      • 47
        H1 Grid Production Tool for Monte Carlo Production
        The H1 Collaboration at HERA has entered the period of high precision analyses based on the final data sample. These analyses require a massive production of simulated Monte Carlo (MC) events. The H1 MC framework is a software for mass MC production on the LCG Grid infrastructure and on a local batch system created by H1 Collaboration. The aim of the tool is a full automatization of the MC production workflow, including the experiment specific parts (preparation of input files, running reconstruction and postprocessing calculations), management of the MC jobs on the Grid until copying of the resulting files from the Grid to the H1 tape storage. The H1 MC framework has a modular structure, providing a separate module for specific task. Communication between modules is done via central database. Jobs are created as a fully autonomic and fault-tolerant for reconstruction processes service and can be running on 32 and 64-bit LCG Grid architectures. In the grid running state they can be continuously monitored using R-GMA service. Experimental software is downloaded by jobs from a set of Storage Elements using LFC catalog. Monitoring of the H1 MC activity and detection of problems with submitted jobs and grid sites is performed by regular checks of the jobs state from the database and the Service Availability Monitoring (SAM) framework. The improved stability of the system has allowed a dramatic increase of the MC production rate, which exceeded two billion events in 2008.
        Speaker: Dr Bogdan Lobodzinski (DESY, Hamburg,Germany)
        Poster
      • 48
        HepMC Visual - an interactive HepMC event browser
        Within the last years, the HepMC data format has established itself as the standard data format for simulation of high-energy physics interactions and is commonly used by all four LHC experiments. At the energies of the proton-proton collisisions at the LHC, a full description of the generation of these events and the subsequent interactions with the detector typically involves several thousand particles and several hundred vertices. Currently, the HepMC libraries only provide a text-based representation of these events. HepMCVisual is a visualization package for HepMC events, allowing to interactively browse through the event. Intuitive user guiding and the possibility of expanding/collapsing specific branches of the interaction tree allow quick navigation and visualization of the specific parts of the event of interest to the user. Thus, it may be usefull not only for physics users trying to understand the structure of single events, but may also be a valuable tool for debugging MonteCarlo event generators. Being based on the ROOT graphics libraries, HepMC Visual can be used as a standalone library, as well as interactively from the ROOT console or in combination with the HepMCBrowser interface within the ATLAS software framework. A short description of the user interface and the API will be presented.
        Speaker: Dr Sebastian Böser (University College London)
        Poster
      • 49
        High Performance C++ Reflection
        C++ does not offer access to reflection data: the types and their members as well as their memory layout are not accessible. Reflex adds that: it can be used to describe classes and any other types, to lookup and call functions, to lookup and access data members, to create and delete instances of types. It is rather unique and attracts considerable interest also outside of high energy physics. Reflex is a fundamental ingredient in the data storage framework of most of the LHC experiments. It is used in a production context after several years of development. Based on this experience a new version of Reflex has been designed, allowing faster lookup, a clearer layout, a hierarchical organization of type catalogs, and a straight forward near-term extension to support multithreaded access. This new API is backed by a newly designed, externally contributed test suite based on CMake. We will present these developments and the plans for the near future.
        Speaker: Axel Naumann (CERN)
        Poster
      • 50
        Improved Cache Coherency Approach for CMS Frontier
        The CMS experiment requires worldwide access to conditions data by nearly a hundred thousand processing jobs daily. This is accomplished using a software subsystem called Frontier. This system translates database queries into http, looks up the results in a central database at CERN, and caches the results in an industry-standard http proxy/caching server called Squid. One of the most challenging aspects of any cache system is coherency, that is, ensuring that changes made to the underlying data get propagated out to all clients in a timely manner. Recently, the Frontier system was enhanced to drastically reduce the time for changes to be propagated everywhere, typically as low as 10 minutes for some kinds of data and no more than 60 minutes for the rest of the data, without overloading servers. This was done by taking advantage of an http and Squid feature called "If-Modified-Since" in which the "Last-Modified" timestamp of cached data is sent back to the central server. The server responds to this with a very short message if data has not been modified, which is the case most of the time, and re-validates the cache. In order to use this feature, the Frontier server has to send the "Last-Modified" timestamp, but that information is not normally stored by the Oracle databases so a PL/SQL program was developed to keep track of the modification times of database tables. We discuss the details of this caching scheme and the obstacles overcome including Oracle database and Squid bugs.
        Speaker: Dr David Dykstra (Fermilab)
        Poster
      • 51
        Integrating interactive PROOF into a Batch System
        While the Grid infrastructure for the LHC experiments is well suited for batch-like analysis, it does not support the final steps of an analysis on a reduced data set, e.g. the optimization of cuts and derivation of the final plots. Usually this part is done interactively. However, for the LHC these steps might still require a large amount of data. The German "National Analysis Facility"(NAF) at DESY in Hamburg is envisioned to close this gap. The NAF offers computing resources via the Sun Grid Engine(SGE) workload management system and high bandwidth data access via the network clustering file system Lustre. From the beginning, it was planed to setup a "Parallel ROOT Facility"(PROOF) to allow the users to analyze large amounts of data interactively in parallel. However, a separate central PROOF cluster would be decoupled from the scheduling and accounting of the existing workload management system. Thus, we have developed a setup that interfaces interactive PROOF to the SGE batch system by allowing every user to set up its own PROOF cluster using SGE's parallel environments. In addition, this setup circumvents security issues and incompatibilities between different ROOT versions. We will describe this setup and its performance for different analysis tasks. Furthermore, we will present the different ways offered by the CMS offline software to analyze CMS data with PROOF.
        Speaker: Dr Hartmut Stadie (Universität Hamburg)
        Poster
      • 52
        JINR experience in development of Grid monitoring and accounting systems
        Different monitoring systems are now extensively used to keep an eye on real time state of each service of distributed grid infrastructures and jobs running on the Grid. Tracking current services’ state as well as the history of state changes allows rapid error fixing, planning future massive productions, revealing regularities of Grid operation and many other things. Along with monitoring, accounting is also an area which shows how the Grid is used. The data considered are statistics on Grid sites’ resources utilization by virtual organizations and single users. Here we describe our longstanding experience in successful development and design of Grid monitoring and accounting systems for global grid segments and for local national grid projects in Russia. The main points of the developments always were satisfying real needs of VO and resource managers and administrators, as well as making interoperable and portable solutions which are used in several grid projects. Provided solutions work with different Grid middleware like LCG2, gLite, Condor-G, UNICORE.
        Speaker: Dr Vladimir Korenkov (Joint Institute for Nuclear Research (JINR))
        Poster
      • 53
        Job optimization in ATLAS TAG based Distributed Analysis
        The ATLAS experiment is projected to collect over one billion events/year during the first few years of operation. The efficient selection of events for various physics analyses across all appropriate samples presents a significant technical challenge. ATLAS computing infrastructure leverages the Grid to tackle the analysis across large samples by organizing data in a hierarchical structure and exploiting distributed computing to churn through the computations. This includes the same events at different stages of processing: RAW, ESD (Event Summary Data), AOD (Analysis Object Data), DPD (Derived Physics Data). Event Level Metadata Tags (TAGs) contain a lot of information about all events stored using multiple technologies accessible by POOL and various web services. This allows users to apply selection cuts on quantities of interest across the entire sample to compile a subset of events which are appropriate for their analysis. This paper describes new methods for organizing jobs to using the TAGs criteria to analyze ATLAS data using enhancements to ATLAS POOL Collection Utilities and ATLAS distributed analysis systems. It further compares different access pattern to the event data and different ways to partition the workload for event selection and analysis, where analysis is intended as a broader event processing, including also events selection and reduction operations known as skimming, slimming and thinning, and DPD making. Specifically it compares analysis with direct access to the events (AODs, ESDs, ...) to access mediated by different TAG base event selections. We then compare different ways of splitting the processing to maximize performance.
        Speaker: Marco Mambelli (UNIVERSITY OF CHICAGO)
        Paper
        Poster
      • 54
        Knowledge Management System for ATLAS Scalable Task Processing on the Grid
        In addition to challenges on computing and data handling, ATLAS and other LHC experiments place a great burden on users to configure and manage the large number of parameters and options needed to carry out distributed computing tasks. Management of distribute physics data is being made more transparent by dedicated ATLAS grid computing technologies, such as PanDA (a pilot-based job control system). The laborious procedure of steering the data processing application by providing physics parameters and software configurations remained beyond the scope of large grid projects. The error-prone manual procedure does not scale to the LHC challenges. To reduce human errors and automate the process of populating the ATLAS production database with million of jobs per year we developed a system for ATLAS knowledge management ("Knowledgement") of Task Request (AKTR). AKTR manages configuration parameters, used for massive grid data processing tasks (groups of similar jobs). The system assures a scalable management of ATLAS-wide knowledge of distributed production conditions, and guaranties reproducibility of results. Use of AKTR system resulted in major gains in efficiency and productivity of ATLAS production infrastructure.
        Speaker: Dr Pavel Nevski (BNL)
      • 55
        LHCb Full Experiment System Test (FEST09)
        LHCb had been planning to commission its High Level Trigger software and Data Quality monitoring procedures using real collisions data from the LHC pilot run. Following the LHC incident on 19th September 2008, it was decided to commission the system using simulated data. This “Full Experiment System Test” consists of: - Injection of simulated minimum bias events into the full HLT farm, after selection by a simulated Level 0 trigger. - Processing in the HLT farm to achieve the output rate expected for nominal LHC luminosity running, sustained over the typical duration of an LHC fill. - Real time Data Quality validation of the HLT output, validation of calibration and alignment parameters for use in the reconstruction. - Transmission of the event data, calibration data and book-keeping information to Tier1 sites and full reconstruction of the event data. - Data Quality validation of the reconstruction output. We will report on the preparations and results of FEST09, and on the status of commissioning for nominal LHC luminosity running.
        Speaker: Prof. Marco Cattaneo (CERN)
        Poster
      • 56
        LQCD Workflow Execution Framework: Models, Provenance, and Fault-Tolerance
        Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of whole workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consist of hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity. Preliminary results on a virtual setup with injection failures are shown.
        Speaker: Luciano Piccoli (Fermilab)
        Poster
      • 57
        Managing Large Data Productions in LHCb
        LHC experiments are producing very large volumes of data either accumulated from the detectors or generated via the Monte-Carlo modeling. The data should be processed as quickly as possible to provide users with the input for their analysis. Processing of multiple hundreds of terabytes of data necessitates generation, submission and following a huge number of grid jobs running all over the Computing Grid. Manipulation of these large and complex workloads is impossible without powerful production management tools. In LHCb, the DIRAC Production Management System (PMS) is used to accomplish this task. It enables production managers and end-users to deal with all kinds of data generation, processing and storage. Application workflow tools allow to define jobs as complex sequences of elementary application steps expressed as Directed Acyclic Graphs. Specialized databases and a number of dedicated software agents ensure automated data driven job creation and submission. The productions are accomplished by thorough checks of the resulting data integrity. With the PMS a complete user interface is provided for operations starting from requests generated by the user community till the task completion and bookkeeping. Both command line and a full featured Web based Graphical User Interface allows to perform all the tasks of the production definition, control and monitoring. This facilitates the job of the production managers allowing a single person to steer all the LHCb production activities. In the paper we will provide a detailed description of the DIRAC PMS components, their interactions with the other DIRAC subsystems. The experience with real large-scale productions will be presented and further evolution of the system will be discussed.
        Speaker: Alexey Zhelezov (Physikalisches Institut, Universitaet Heidelberg)
        Poster
      • 58
        Mathematical simulation for 3-Dimensional Temperature Visualization on Open Source-based Grid Computing Platform
        New Iterative Alternating Group Explicit (NAGE) is a powerful parallel numerical algorithm for multidimensional temperature prediction. The discretization is based on the finite difference method of partial differential equation (PDE) with parabolic type. The 3-Dimensional temperature visualization is critical since it’s involves large scale of computational complexity. The three fundamental applied mathematics issues under consideration are as follows: i. The accurate modeling of physical systems using finite differential methods. ii. The investigation of discretization methods that retain constraint-preserving properties of mathematical modeling. iii. The high performance measurements of parallel algorithms involving time and space. This paper proposed the NAGE method as a straight forward transformation from sequential to parallel algorithm using domain decomposition and splitting strategies. The processes involving the scheduling of communication, algometric and mapping the subdomain into a number of processors. This computational challenge encourages us to utilize the power of higher performance computing. By the means of higher performance computing, the computation cannot be relying on just one single set of cluster. Therefore, this research takes the advantage of utilizing multiple set of clusters from geographically different location which is known as grid computing. In realizing this concept, we consider the advantages of data passing between two web services which each are connected with one or multiple set of clusters. For this kind of relationship, we choose service-oriented architecture (SOA) style. Each web services are easily maintainable since there is loose coupling between interacting nodes. The development of this architecture is based on several programming language as it involves algorithm implementation on C, parallelization using Parallel Virtual Machine (PVM) and Java for web services development. The grid computing platform is an open source-based and will be develop under Linux environment. The platform development will increase the acceleration and scaled-out across a virtualized grid. The clusters of processors involved in this platform are developed on increasingly-larger computational hardware with inexpensive architecture. As the conclusions, this leading grid-based application platform has a bright potential in managing highly scalable and reliable temperature prediction visualization. The efficiency of this application will be measured based on the results of numerical analysis and parallel performance.
        Speakers: Noriza Satam (Department of Mathematics, Faculty of Science,Universiti Teknologi Malaysia), Norma Alias (Institute of Ibnu Sina, Universiti Teknologi Malaysia,)
      • 59
        Metrics Correlation and Analysis Service
        In a shared computing environment, activities orchestrated by workflow management systems often need to span organizational and ownership domains. In such a setting, common tasks, such as the collection and display of metrics and debugging information, are challenged by the informational entropy inherent to independently maintained and owned software sub-components. Because such information pool is often disorganized, it becomes a difficult target for business intelligence analysis i.e. troubleshooting, incident investigation, and trend spotting. The Metrics Correlation and Analysis Service (MCAS) provides an integral solution for system operators and users to uniformly access, transform, and represent disjoint metrics, generated by distributed middleware or user services. The proposed software infrastructure assists with indexing and navigation of existing metrics and it supplies tools and services to define and store other quantifiable data. The Project reuses existing monitoring and data collection software deployments, with the goal of presenting a unified view of metrics data. This paper discusses the MCAS system and places special emphasis on applying integration technologies to assist with the process of formalizing the interaction of users with end applications.
        Speaker: Mr Andrew Baranovski (FNAL)
        Paper
        Poster
      • 60
        Monitoring the ATLAS distributed production
        The ATLAS production system is one of the most critical components in the experiment's distributed system, and this becomes even more true now that real data has entered the scene. Monitoring such a system is a non trivial task, even more when two of its main characteristics are the flexibility in the submission of job processing units and the heterogeneity of the resources it uses. In this paper we present the architecture of the monitoring system that is in production today and being used by ATLAS shifters and experts around the world as a main tool for their daily activities. We describe in detail the different sources of job execution information, the different tools aggregating system usage into a relevant set of statistics and collecting site and resource status at near real time. The description of the shifter's routine usage of the application gives a clear idea of the tight integration with the rest of both grid and experiment operations tools.
        Speaker: Benjamin Gaidioz (CERN)
      • 61
        Monitoring the world-wide daily computing operations in ATLAS LHC experiment
        The ATLAS distributed computing activities involve about 200 computing centers distributed world-wide and need people on shift covering 24 hours per day. Data distribution, data reprocessing, user analysis and Monte Carlo event simulation runs continuously. Reliable performance of the whole ATLAS computing community is of crucial importance to meet the ambitious physics goals of the ATLAS experiment. Distributed computing software and monitoring tools are evolving continuously to achieve this target. The world-wide daily operations shift group are the first responders to all faults, alarms and outages. The shifters are responsible to find, report and follow problems at almost every level of a complex distributed infrastructure, and complex processing model. In this paper we present the operations model followed by the experiences of running the world-wide daily operations group for the past year. We will present the most common problems encountered, and the expected future evolution to provide efficient usage of data, resources, manpower and improve communication between sites and the experiment.
        Speaker: Dr Xavier Espinal (PIC/IFAE)
        Poster
      • 62
        Organization and Management of ATLAS nightly builds
        The system of automated multi-platform software nightly builds is a major component in ATLAS collaborative software organization and code approval scheme. Code developers from more than 30 countries use about 25 branches of nightly releases for testing new packages, validation of patches to existing software, and migration to new platforms and compilers. The successful nightly releases are transformed into stable releases used for data processing worldwide. ATLAS nightly builds are managed by NICOS control tool on the computing farm with 40 powerful multiprocessor nodes. NICOS provides a fully automated framework for the release builds, testing, and creation of distribution kits. The modular structure of NICOS allows for an easy integration of third-party build and validation tools. The ATN test tool is embedded within the nightly system and provides the first results even before the full compilations completion. Several ATLAS test frameworks are synchronized with NICOS jobs and run larger production jobs with the nightly releases. NICOS web pages dynamically provide information about the progress and results of the builds. For faster feedback the e-mail notifications about nightly build problems are automatically distributed to responsible developers.
        Speaker: Alexander Undrus (BROOKHAVEN NATIONAL LABORATORY, USA)
      • 63
        Parallel computing of ATLAS data with PROOF at the LRZ Munich
        The PROOF (Parallel ROOT Facility) library is designed to perform parallelized ROOT-based analyses with a heterogeneous cluster of computers. The installation, configuration and monitoring of PROOF have been carried out using the Grid-Computing environments dedicated to the ATLAS experiment. A PROOF cluster hosted at the Leibniz Rechenzentrum (LRZ) and consisting of a scalable amount of worker nodes has been exploited in order to conduct the performance tests in the case of interactive ATLAS analyses. Scenarios of various complexities have been considered to exercise PROOF with ATLAS data and evaluate its utilization in actual conditions. The investigation of the PROOF performance has been done by varying the number of parallelized processing units, the amount of simultaneous users, and the type of the file storage. Strategies based on local files, dCache, and Lustre have been compared.
        Speaker: Dr Philippe Calfayan (Ludwig-Maximilians-University Munich)
        Poster
      • 64
        Parallelization of Maximum Likelihood Fit Technique Using MINUIT and RooFit Packages
        MINUIT is the most common package used in high energy physics for numerical minimization of multi-dimensional functions. The major algorithm of this package, MIGRAD, searches for the minimum by using the gradient function. For each minimization iteration, MIGRAD requires the calculation of the first derivatives for each parameter of the function to be minimized. Minimization is required for data analysis problems based on the maximum likelihood technique. Complex likelihood functions, with several free parameters, many independent variables and large data sample, can be very CPU-time consuming. For such a technique the minimization process requires the calculation of the likelihood function (and corresponding normalization integrals) several times for each minimization iteration. In this presentation we will show how MINUIT algorithm, the likelihood calculation, and the normalization integrals calculation can be easily parallelized using MPI techniques to scale over multiple nodes or multi-threads for multi-cores in a single node. We will present the speed-up improvements obtained in typical physics applications such as complex maximum likelihood fits using the RooFit package. Furthermore, we will also show results of hybrid parallelization between MPI and multi-threads, to take full advantage of multi-core architectures.
        Speaker: Dr Alfio Lazzaro (Universita and INFN, Milano / CERN)
        Poster
      • 65
        Partial Wave Analysis using Graphics Processing Units
        Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA, harnessing the power of GPUs based on the Brook+ framework for general purpose computing on graphics processing units. GPUPWA simplifies the coding of amplitudes in the covariant tensor formalism and other tedious and error-prone tasks involved in partial wave analyses. The user can write a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a significant speedup of the partial wave fit compared to legacy FORTRAN code.
        Speaker: Dr Niklaus Berger (Institute for High Energy Physics, Beijing)
        Poster
      • 66
        Petaminer: Using ROOT for Efficient Data Storage in MySQL Database
        High Energy and Nuclear Physics (HENP) experiments store petabytes of event data and terabytes of calibrations data in ROOT files. The Petaminer project develops a custom MySQL storage engine to enable the MySQL query processor to directly access experimental data stored in ROOT files. Our project is addressing a problem of efficient navigation to petabytes of HENP experimental data described with event-level TAG metadata, which is required by data intensive physics communities such as the LHC and RHIC experiments. Physicists need to be able to compose a metadata query and rapidly retrieve the set of matching events, where improved efficiency will facilitate the discovery process by permitting rapid iterations of data evaluation and retrieval. Our custom MySQL storage engine enabled the MySQL query processor to directly access TAG data stored in ROOT TTrees. As ROOT TTrees are column-oriented, reading them directly provides improved performance over traditional row-oriented TAG databases. Leveraging the flexible and powerful SQL query language to the data stored in ROOT TTrees, the Petaminer approach will enable rich MySQL index-building capabilities for further performance optimization. We studied feasibility of using the built-in ROOT support for automatic schema evolution to ease handling of large volumes of calibrations data of the large working experiment stored in MySQL. Over the lifecycle of calibrations, their schema may change. Support for schema changes in relational databases requires efforts. In contrast, ROOT provides support for automatic schema evolution. Our approach has a potential to ease handling of the metadata needed for efficient access to large volumes of calibrations data.
        Speakers: Alexandre Vaniachine (Argonne National Laboratory), David Malon (Argonne National Laboratory), Jack Cranshaw (Argonne National Laboratory), Jérôme Lauret (Brookhaven National Laboratory), Paul Hamill (Tech-X Corporation), Valeri Fine (Brookhaven National Laboratory)
        Paper
        Poster
      • 67
        Pseudo-interactive monitoring in distributed computing
        Distributed computing, and in particular Grid computing, enables physicists to use thousands of CPU days worth of computing every day, by submitting thousands of compute jobs. Unfortunately, a small fraction of such jobs regularly fail; the reasons vary from disk and network problems to bugs in the user code. A subset of these failures result in jobs being stuck for long periods of time. In order to debug such failures, interactive monitoring is highly desirable; users need to browse through the job log files and check the status of the running processes. Batch systems typically don't provide such services; at best, users get job logs at job termination, and even this may not be possible if the job is stuck in an infinite loop. In this paper we present a novel approach of using regular batch system capabilities of Condor to enable users to access the logs and processes of any running job. This does not provide true interactive access, so commands like vi are not viable, but it does allow operations like ls, cat, top, ps, lsof, netstat and dumping the stack of any process owned by the user; we call this pseudo-interactive monitoring. It is worth noting that the same method can be used to monitor Grid jobs in a glidein-based environment. We further believe that the same mechanism could be applied to many other batch systems.
        Speaker: Mr Igor Sfiligoi (Fermilab)
        Poster
      • 68
        Python-based Hierarchical Configuration of LHCb Applications
        The LHCb software, from simulation to user analysis, is based on the framework Gaudi. The extreme flexibility that the framework provides, through its component model and the system of plug-ins, allows us to define a specific application as its behavior more than its code. The application is then described by some configuration files read by the bootstrap executable (shared by all applications). Because of the modularity of the components we have and the complexity of a typical application, the basic configuration of an application can be a challenging task, made more difficult by the need of the possibility, for user and developers, to tune such configuration. In the last year, to simplify the task, we changed the way we configure applications from static text files to Python scripts. Thanks to the power of Python, we designed an object-oriented hierarchical configuration framework, on top of the initial implementation by Atlas collaboration, where the applications are defined as high level configuration entities that use other entities representing the various configuration subsystems or contexts, thus hiding the complexity of the low level configuration from the user.
        Speaker: Marco Clemencic (European Organization for Nuclear Research (CERN))
      • 69
        Readiness of an ATLAS Distributed TIER-2 for the Physics Analysis of the early collision events at the LHC
        The ATLAS data taking is due to start in Spring 2009. In this contribution and given the expectation, a rigorous evaluation of the readiness parameters of the Spanish ATLAS Distributed Tier-2 is given. Special attention will be paid to the readiness to perform Physics Analysis from different points of view: Network Efficiency, Data Discovery, Data Management, Production of Simulated events, User Support and Distributed Analysis. The prototypes of the local computing infrastructures for data analysis set-up , the so-called Tier-3 , attached to the three sites that make up the Tier-2 are described. Several use cases of Distributed Analysis in the GRID system and local interactive tasks in the non-grid farms are provided in order to evaluate the interplay between both environments and to compare the different performances. The sharing between Monte Carlo Production and Distributed Analysis activities is also studied. The Data Storage and Management systems chosen are described and results on their performance are given.
        Speaker: Ms Elena Oliver (Instituto de Fisica Corpuscular (IFIC) - Universidad de Valencia)
        Poster
      • 70
        ROOT Graphics.
        The ROOT framework provides many visualization techniques. Lately several new ones have been implemented. This poster will present all the visualization techniques ROOT provides highlighting the best use one can do of each of them.
        Speaker: Mr Olivier Couet (CERN)
        Poster
      • 71
        ROOT.NET: Making ROOT accessible from CLR based languages
        ROOT.NET provides an interface between Microsoft’s Common Language Runtime (CLR) and .NET technology and the ubiquitous particle physics analysis tool, ROOT. This tool automatically generates a series of efficient wrappers around the ROOT API. Unlike pyROOT, these wrappers are statically typed and so are highly efficient as compared to the Python wrappers. The connection to .NET means that one gains access to the full series of languages developed for the CLR including functional languages like F# (based on OCaml). Dynamic languages based on the CLR can be used as well, of course (Python, for example). A first attempt at integrating ROOT tuple queries with Language Integrated Query (LINQ) is also described. This poster will describe the techniques used to effect this translation, along with performance comparisons, and examples. All described source code is posted on SourceForge.
        Speaker: Prof. Gordon Watts (UNIVERSITY OF WASHINGTON)
        Poster
      • 72
        Scaling up incident response models to multi-grid security incidents
        Different computing grids may provide services to the same user community, and in addition, a grid resource provider may share its resources across different unrelated user communities. Security incidents are therefore increasingly prone to propagate from one resource center to the another, either via the user community or via cooperating grid infrastructures. As a result, related and connected computing grid infrastructures need to collaborate, define and follow compatible security procedures, exchange information and provide a coordinated response to security incidents. However, a large number of security teams may be involved and may need to share information, which not only is difficult to manage, but also increases the likelihood of information leak. Therefore it is essential to design and implement a carefully structured, tiered, communication model to produce an appropriate information flow during security incidents. This presentation exposes necessary changes to the current model, as well as key challenges to achieve a better coordinated response to security incidents affecting grid infrastructures.
        Speaker: Mr romain wartel (CERN)
        Poster
      • 73
        Setting up Tier2 site at Golias/ Prague farm
        High Energy Nuclear Physics (HENP) collaborations’ experience show that the computing resources available from a single site are often not sufficient nor satisfy the need of remote collaborators eager to carry their analysis in the fastest and most convenient way. From latencies in the network connectivity to the lack interactivity, having fully functional software stack on local resources is a strong enabler of science opportunities for any local group who can afford the time investment. The situation become more complex as vast amount of data not fitting on local resources are often needed to perform meaningful analysis. Prague heavy-ion’s group participating in the RHIC/STAR experiment has been a strong advocate of local computing as the most efficient way of data processing and physics analyses. To create an environment where science can freely expand, a Tier2 computing center was set up at a regional Golias Computing Center for Particle Physics. Golias is the biggest farm in the Czech Republic fully dedicated for particle physics experiments. We report our experience in setting up a fully functional Tier2 center leveraging minimal locally available human and financial resources. We discuss the chosen solution to address the storage space and analysis issue and the impact on overall functionality. This includes locally built STAR analysis framework, integration with a local DPM system (as a cost effective storage solution), influence of the availability and quality of network connection to Tier0 via dedicated CESNET/ESnet link and the development of light-weight yet fully automated data transfer tools allowing moving entire datasets from BNL (Tier0) to Golias (Tier2). We will summarize the impact of the gained computing performance on the efficiency of the offline analysis for the local physics group and show feasibility of such a solution that can used by other groups as well.
        Speaker: Mr Jan KAPITAN (Nuclear Physics Inst., Academy of Sciences, Praha)
        Paper
        Poster
      • 74
        Simulation and reconstruction of cosmic ray showers for the Pierre Auger Observatory on the EGEE grid
        The Pierre Auger Observatory studies ultra-high energy cosmic rays. Interactions of these particles with the nuclei of air gases at energies many orders of magnitude above the current accelerator capabilities induce unprecedented extensive air showers in the atmosphere. Different interaction models are used to describe the first interactions in such showers and their predictions are confronted with measured shower characteristics. We created libraries of cosmic ray showers with more than 35 000 simulated events using CORSIKA with EPOS or QGSjetII models. These showers are reused several times for simulation of detector response at different position within the detector array. We describe our experience with installation of the specific software on the grid and running large amount of jobs on sites supporting the VO auger with dedicated and also opportunistic resources. A web based dashboard for summary of job states was developed together with a custom database of available files with simulated and reconstructed showers.
        Speakers: Ms Jaroslava Schovancova (Institute of Physics, Prague), Dr Jiri Chudoba (Institute of Physics, Prague)
        Poster
      • 75
        SiteDB: Marshalling the people and resources available to CMS
        In a collaboration the size of CMS (approx. 3000 users, and almost 100 computing centres of varying size) communication and accurate information about the sites it has access to is vital in co-ordinating the multitude of computing tasks required for smooth running. SiteDB is a tool developed by CMS to track sites available to the collaboration, the allocation to CMS of resources available at those sites and the associations between CMS members and the sites (as either a manager/operator of the site or a member of a group associated to the site). It is used to track the roles a person has for an associated site or group. SiteDB eases the co-ordination load for the operations teams by providing a consistent interface to manage communication with the people working at a site, by identifying who is responsible for a given task or service at a site and by offering a uniform interface to information on CMS contacts and sites. SiteDB provides api's and reports for other CMS tools to use to access the information it contains, for instance enabling CRAB to use "user friendly" names when black/white listing CE's, providing role based authentication and authorisation for other web based services and populating various troubleshooting squads in external ticketing systems in use daily by CMS Computing operations.
        Speaker: Dr Simon Metson (H.H. Wills Physics Laboratory)
        Poster
      • 76
        Statistical Comparison of CPU performance for LHCb applications on the Grid
        The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…) Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound. Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the achieved performance can be obtained. This contribution will present the result of such comparison from recent LHCb activities on the Grid. The result are compared for different CPU models and the dependence with the CPU clock is shown for CPUs of the same family. Further comparisons with HEPIX WG results and LHCb, and other LHC experiments, benchmarking are also included.
        Speaker: Dr Ricardo Graciani Diaz (Universidad de Barcelona)
        Paper
        Poster
      • 77
        Status of the Grid Computing for the ALICE Experiment in the Czech Republic
        Czech Republic (CR) has been participating in the LHC Computing Grid project (LCG) ever since 2003 and gradually, a middle-sized Tier2 center has been built in Prague, delivering computing services for national HEP experiments groups including the ALICE project at the LHC. We present a brief overview of the computing activities and services being performed in the CR for the ALICE experiment at the LHC.
        Speaker: Dr Dagmar Adamova (Nuclear Physics Institute AS CR)
        Paper
        Poster
      • 78
        Storm-GPFS-TSM: a new approach to Hierarchical Storage Management for the LHC experiments
        In the framework of WLCG, the Tier-1 computing centres have very stringent requirements in the sector of the data storage, in terms of size, performance and reliability. Since some years, at the INFN-CNAF Tier-1 we have been using two distinct storage systems: Castor as tape-based storage solution (also known as the D0T1 storage class in the WLCG language) and the General Parallel File System (GPFS), in conjuction with StoRM as a SRM service, for pure disk access (D1T0). Commencing 2008 we have started to explore the possibility of employing GPFS together with the tape management software TSM as a solution for realizing a tape-disk infrastructure, first implementing a D1T1 storage class (files always on disk with a backup on tape), and then also a D0T1 (hence involving also active recalls of files from tape to disk). The first StoRM-GPFS-TSM D1T1 system is nowadays already in production at CNAF for the LHCb experiment, while a prototype of D0T1 system is under development and study. We describe the details of the new D1T1 and D0T1 implementations, discussing the differences between the Castor-based solution and the StoRM-GPFS-TSM one. We also present the results of some performance studies of the novel D1T1 and D0T1 systems.
        Speaker: Pier Paolo Ricci (INFN CNAF)
      • 79
        Testing PROOF Analysis with Pythia8 Generator Level Data
        We study the performance of different ways of running a physics analysis in preparation for the analysis of petabytes of data in the LHC era. Our test cases include running the analysis code in a Linux cluster with a single thread in ROOT, with the Parallel ROOT Facility (PROOF), and in parallel via the Grid interface with the ARC middleware. We use of the order of millions of Pythia8 generator level QCD multi-jet events to stress the analysis system. The performances of the test cases are reported.
        Speaker: Mr Matti Kortelainen (Helsinki Institute of Physics)
      • 80
        The ATLAS Conditions Database Architecture for the Muon Spectrometer
        The ATLAS Muon Spectrometer is the outer part of the ATLAS detector at LHC. It has been designed to detect charged particles exiting the barrel and end-cap calorimeters and to measure their momentum in the pseudorapidity range |η| < 2.7. The challenge performance in momentum measurements needs an accurate monitoring of detector and calibration parameters and an high complex architecture to stre them. The ATLAS Muon System has extensively started to use the Condition Database to store all the conditions data needed for the reconstruction of the events. The LCG conditions database project 'COOL' as the basis for all its conditions data storage both at CERN and throughout the worlwide collaboration as decided by the ATLAS Collaboration. The management of the Muon COOL conditions database will be one of the most challenging applications for Muon System, both in terms of data volumes and rates, but also in terms of the variety of data stored. The Muon Conditions database is responsible for almost of all the 'non-event' data and detector quality flags storage needed for debugging of the detector operations and for performing reconstruction and analysis. COOL implements an interval of validity database, i.e. objects stored or referenced in COOL have an associated start and end time between which they are valid, the data is stored in folders, which are themselves arranged in a hierarchical structure of foldersets. The structure is simple and mainly optimsed to store and retrieve object(s) associated to a particular time. In this work, an overview of the entire Muon Conditions Database architecture is given, including the different sources of the data and the storage model used, in addition, the software interfaces are also described.
        Speaker: Dr Monica Verducci (INFN Roma)
        Poster
      • 81
        The ATLAS Distributed Data Management Central Catalogues and steps towards scalability and high availability
        The ATLAS Distributed Data Management system, Don Quijote2 (DQ2), has been in use since 2004. Its goal is to manage tens of petabytes of data per year, distributed among the WLCG. One of the most critical components of DQ2 is the central catalogues which comprises a set of web services with a database back-end and a distributed memory object caching system. This component has proven to be very reliable and to fulfill ATLAS requirements regarding performance and scalability. In this paper we present the architecture of the DQ2 central catalogues component and implementation decisions regarding performance, scalability, replication and memory usage. The exploitation of techniques and features of the Oracle database which hosts the application is described together with an overview of the disaster recovery strategy that needs to be in place to address the requirement of high availability.
        Speaker: Pedro Salgado (CERN)
      • 82
        The ATLAS DQ2 Accounting and Storage Usage service
        The DQ2 Distributed Data Management system is the system developed and used by ATLAS for handling very large datasets. It encompasses data bookkeeping, managing of largescale production transfers as well as endusers data access requests. In this paper, we will describe the design and implementation of the DQ2 accounting service. It collects different data usage informations in order to show and compare them from the experiment and application perspective. Today, the DQ2 data volume represents more than 8 petabytes, ~70 million file and 500 k dataset replicas, distributed in more than 500 grid storage endpoints.
        Speaker: Dr Vincent Garonne (CERN)
        Poster
      • 83
        The ATLAS METADATA INTERFACE
        AMI is the main interface for searching for ATLAS datasets using physics metadata criteria. AMI has been implemented as a generic database management framework which allows parallel searching over many catalogues, which may have differing schema, and may be distributed geographically, using different RDBMS. The main features of the web interface will be described; in particular the powerful graphic query builder. The use of XML/XLST technology ensures that all commands can be used either on the web or from a command line interface via a web service. We will also discuss how we have been able to use the AMI mechanism to describe database tables which belong to other applications so that the AMI generic interfaces can be used for browsing or querying the information they contain.
        Speaker: Dr Solveig Albrand (LPSC)
        Poster
      • 84
        The ATLAS TAGS Database distribution and management - Operational challenges of a multi-terabyte distributed database system
        The TAG files store summary event quantities that allow a quick selection of interesting events. This data will be produced at a nominal rate of 200 Hz, and is uploaded into a relational database for access from websites and other tools. The estimated database volume is 6TB per year, making it the largest application running on the ATLAS relational databases, at CERN and at other voluntary sites. The sheer volume and high rate of production makes this application a challenge to data and resource management, on many aspects. This paper will focus on the operational challenges of this system. These include: uploading the data from files to the CERN's and remote sites' databases; distributing the TAG metadata that is essential to guide the user through event selection; controlling resource usage of the database, from the user query load to the strategy of cleaning and archiving of old TAG data.
        Speaker: Florbela Viegas (CERN)
        Poster
      • 85
        The CMS Computing Facilities Operations
        The CMS Facilities and Infrastructure Operations group is responsible for providing and maintaining a working distributed computing fabric with a consistent working environment for Data operations and the physics user community. Its mandate is to maintain the core CMS computing services; ensure the coherent deployment of Grid or site specific components (such as workload management, file transfer and storage systems); monitor the CMS specific site availability and efficiency; systematically trouble-shoot and track facilities related issues. In recent years, the CMS tiered computing infrastructure has grown significantly and was tested via so called “data challenges” and used for processing real cosmic data, routinely running 100k jobs per day distributed over more than 50 sites. In this presentation we will focus on operational aspects in the facilities area in view of the LHC startup. In particular, we will report on the experience gained and the progress made in the computing shift procedures, which are running in dedicated CMS centres inside and outside CERN. The collaborative effort of all CMS centres and good communication with CMS sites has proven to be an essential ingredient for efficient, sustained distributed data processing.
        Speaker: Dr Daniele Bonacorsi (Universita & INFN, Bologna)
        Poster
      • 86
        The CMS Dataset Bookkeeping Service Query Language (DBSql)
        The CMS experiment has implemented a flexible and powerful approach enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to its physics data. In addition to the existing WEB based and programmatic API, a generalized query system has been designed and built. This query system has a query language that hides the complexity of the underlying database structure. This provides a way of querying the system that is straightforward for CMS data managers and physicists. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, then a query builder using a graph representation of the DBS schema constructs the actual SQL sent to underlying database. We will describe the design of the query system and provide details of the language components. We will also provide an overview of how this component fits into the overall data discovery system, as well as providing access to information about Data Quality and Luminosity.
        Speaker: Dr Lee Lueking (FERMILAB)
        Poster
      • 87
        The CMS experiment workflows on StoRM-based storage at Tier-1 and Tier-2 centers
        The CMS experiment is preparing for data taking in many computing activities, including the testing, deployment and operation of various storage solutions to support the computing workflows of the experiment. Some Tier-1 and Tier-2 centers supporting the collaboration are deploying and commissioning StoRM storage systems. That is, posix-based disk storage systems on top of which StoRM implements the Storage Resource Manager (SRM version 2) interface allowing for a standard-based access from the Grid. This paper presents some tests made with CMS applications performing reference Tier-N workflows on StoRM storage, the configurations and solutions adopted and the experience so far achieved in production level operations.
        Speaker: Dr Andrea Sartirana (INFN-CNAF)
      • 88
        The LHCb data bookkeeping system
        The LHCb Bookkeeping is a system for the storage and retrieval of meta data associated with LHCb datasets. e.g. whether it is real or simulated data, which running period it is associated with, how it was processed and all the other relevant characteristics of the files. The meta data are stored in an oracle database which is interrogated using services provided by the LHCb DIRAC3 infrastructure, that provides security, data streaming, and multi threading connections. Users can browse the Bookkeeping database through a command line interface or Graphical User Interface (GUI). The command line presents a view similar to a file system and the GUI is implemented on top of this.
        Speaker: Zoltan Mathe (UCD Dublin)
      • 89
        The LHCb Software distribution
        The installation of the LHCb software is handled by a single python script: install_project.py. This bootstrap script is unique by allowing the installation of software projects on various operating system (Linux, Windows, MacOSX). It is designed for the LHCb software deployment for a single user or for multiple users, in a shared area or on the Grid. It retrieves the software packages and deduces the dependencies using a remote web repository and thus takes care of the consistency of the installation. Among the various features which have been implemented one can list: the fix of the access permission settings for the installed packages, the incremental installation using multiple deployment areas and the consistency check of the retrieved files. The only prerequisite for the use of this tool is to have a recent enough version of the python language installed (2.3 and above) and a reasonable network access.
        Speaker: Hubert Degaudenzi (European Organization for Nuclear Research (CERN))
        Poster
      • 90
        The new ROOT browser
        Description of the new implementation of the ROOT browser
        Speaker: Bertrand Bellenot (CERN)
        Poster
      • 91
        The nightly build and test system for LCG AA and LHCb software
        The core software stack both from the LCG Application Area and LHCb consists of more than 25 C++/Fortran/Python projects build for about 20 different configurations on Linux, Windows and MacOSX. To these projects, one can also add about 20 external software packages (Boost, Python, Qt, CLHEP, ...) which have also to be build for the same configurations. It order to reduce the time of the development cycle and increase the quality insurance, a framework has been developed for the daily (nightly actually) build and test of the software. Performing the build and the tests on several configurations and platform allows to increase the efficiency of the unit and integration tests. Main features: - flexible and fine grained setup (full, partial build) through a web interface - possibility to build several "slots" with different configurations - precise and highly granular reports on a web server - support for CMT projects (but not only) with their cross-dependencies. - scalable client-server architecture for the control machine and its build machines - copy of the results in a common place to allow early view of the software stack The nightly build framework is written in python for portability and it is easily extensible to accommodate new build procedures.
        Speakers: Dr Hubert Degaudenzi (CERN), Karol Kruzelecki (Cracow University of Technology-Unknown-Unknown)
        Poster
      • 92
        The offline Data Quality Monitoring system of the ATLAS Muon Spectrometer
        The ATLAS detector has been designed to exploit the full discovery potential of the LHC proton-proton collider at CERN, at the c.m. energy of 14 TeV. Its Muon Spectrometer (MS) has been optimized to measure final state muons from those interactions with good momentum resolution (3-10% for momentum of 100GeV/c-1TeV/c). In order to ensure that the hardware, DAQ and reconstruction software of the ATLAS MS is functioning properly, Data Quality Monitoring (DQM) tools have been developed both for the online and the offline environment. The offline DQM is performed on histograms of quantities of interest which are filled in the ATLAS software framework ATHENA during different levels of processing - raw hit, reconstructed object (segment and track) and higher (physics) level. Then those histograms can be displayed and browsed by shifters and experts using various macros. They are also given as input to the Data Quality Monitoring Framework (DQMF) application, which applies simple algorithms and/or comparisons with reference histograms to set a status flag, which is propagated to a global status and saved in a database. A web display of DQMF results is also available. This initial processing is done on a subset of data (express stream) within a few hours of the run, and depending on the data quality, the whole statistics are then processed. The offline muon DQM structure and content, as well as the corresponding tools developed, are presented, with examples from the commissioning of the MS with cosmic rays.
        Speaker: Ilektra Christidi (Physics Department - Aristotle Univ. of Thessaloniki)
        Poster
      • 93
        The Open Science Grid -- Operational Security in a Highly Connected World
        Open Science Grid stakeholders invariably depend on multiple infrastructures to build their community-based distributed systems. To meet this need, OSG has built new gateways with TeraGrid, Campus Grids, and Regional Grids (NYSGrid, BrazilGrid). This has brought new security challenges for the OSG architecture and operations. The impact of security incidents now has a larger scope and demands a coordinated response. Operationally, we took first steps towards building an incident sharing community among our peer grids. To reach higher-education user communities, especially HEP researchers, outside the grids, OSG members joined REN-ISAC. We also defined (jointly with EGEE) a set of operational security tools and began implementation. And, because across the infrastructures certificate hygiene is a top priority, we worked with the IGTF (International Grid Trust Federation) to develop risk assessment and incident response processes. Architecturally, we analyzed how proxy credentials are treated end-to-end in the OSG infrastructure. We discovered that the treatment of proxies, after a job is finished, has some shortcomings. Given long proxy lifetimes, a breach of a host can affect multiple users and grids. Finally, we are working on a banning service that can deny access to resources by suspect users at the gatekeeper. We designed this site service to receive alerts from a central banning service managed by the security team in cases of emergencies. We envision that coupled with our operational efforts, this service would be a first-line defense against security incidents.
        Speaker: Dr Mine Altunay (FERMILAB)
      • 94
        The ROOT event recorder
        Description of the ROOT event recorder, a GUI testing and validation tool.
        Speaker: Bertrand Bellenot (CERN)
        Poster
      • 95
        TMemStat - memory usage debugging and monitoring in ROOT and AliROOT
        Memory monitoring is a very important part of complex project development. Open Source tools, such as valgrind, are available for the task, however, their performance penalties make them not suitable for debugging long, CPU-intensive programs, such as reconstruction or simulation. We have developed the TMemStat tool, which, while not providing the full functionality of valgrind, gives developers the possibility to find memory problems even in very large projects,such as full simulation of the ALICE detector in high flux environment. TMemStat uses hooks for alloc and internal gcc functions, and provides detailed information about memory leaks and memory usage, with user-defined frequency or at user-defined watch points.
        Speaker: Anar Manafov (GSI)
      • 96
        TSKIM : a tool for skimming ROOT trees
        The same as many experiments, FERMI is storing its data within ROOT trees. A very common activity of physicists is the tuning of selection criteria which define the events of interest, thus cutting and pruning the ROOT trees so to extract all the data linked to those specific events. It is rather straighforward to write a ROOT script so to skim a single kind of data, for example the reconstructed one. This turns to be more tricky if you want to process also some simulated or analysis data at the same time, because each kind of data is structured with its own rules for what concerns file names, file sizes, tree names, identification of events, etc. TSkim has been designed so to ease this task. Thanks to a meta-data file which says where to find the run and event ids in the different kind of trees, TSkim is able to collect all the tree elements which match a given ROOT cut. The tool will also help when loading the shared libraries which describe the experiment data, or when pruning the tree branches. Initially a pair of PERL and ROOT scripts, TSkim is today a fully compiled C++ application, enclosing our ROOT know-how and offering a panel of features going far beyond the original FERMI requirements. In this talk, we plan to present the features of interest for any ROOT based experiment, including a new kind of event list, and emphasize the implementation mechanisms which make it scalable.
        Speaker: David Chamont (Laboratoire Leprince-Ringuet (LLR)-Ecole Polytechnique-Unknown)
        Poster
      • 97
        Using Python for Job Configuration in CMS
        In 2008, the CMS experiment made the transition from a custom-parsed language for job configuration to using Python. The current CMS software release has over 180,000 lines of Python configuration code. We describe the new configuration system, the motivation for the change, the transition itself, and our experiences with the new configuration language.
        Speaker: Dr Richard Wilkinson (California Institute of Technology)
      • 98
        Validation of software releases for CMS
        The CMS software stack currently consists of more than 2 Million lines of code developed by over 250 authors with a new version being released every week. CMS has setup a release validation process for quality assurance which enables the developers to compare to previous releases and references. This process provides the developers with reconstructed datasets of real data and MC samples. The samples span the whole range of detector effects and important physics signatures to benchmark the performance of the software. They are used to investigate interdependency effects of software packages and to find and fix bugs. The samples have to be available in a very short time after a release is published to fit into the streamlined CMS development cycle. The standard CMS processing infrastructure and dedicated resources at CERN and FNAL are used to achieve a very short turnaround of 24 hours. This talk will present the CMS release validation process and statistics describing the prompt usage of the produced samples. Overall, it will emphasize the importance of a streamlined release validation process for projects with a large code basis and significant number of developers and can function as an example for future projects.
        Speaker: Dr Oliver Gutsche (FERMILAB)
        Poster
      • 99
        Visual Physics Analysis VISPA
        VISPA is a novel development environment for high energy physics analyses which enables physicists to combine graphical and textual work. A physics analysis cycle consists of prototyping, performing, and verifying the analysis. The main feature of VISPA is a multipurpose window for visual steering of analysis steps, creation of analysis templates, and browsing physics event data at different steps of an analysis. VISPA follows an experiment-independent approach and incorporates various tools for steering and controlling required in a typical analysis. Connection to different frameworks of high energy physics experiments is achieved by using a Python interface. We present the look-and-feel for an example physics analysis at the LHC, and explain the underlying software concepts of VISPA.
        Speaker: Tatsiana Klimkovich (RWTH Aachen University)
        Poster
      • 100
        Wide Area Network Access to CMS Data Using the Lustre Cluster Filesystem
        The CMS experiment will generate tens of petabytes of data per year, data that will be processed, moved and stored in large computing facilities at locations all over the globe. Each of these facilities deploys complex and sophisticated hardware and software components which require dedicated expertise lacking at many of the university and institutions wanting access to the data as soon as it becomes available. Also, the standard methods for accessing data remotely rely on grid interfaces and batch jobs that while powerful, significantly increase the amount of procedural overhead and can impede a remote user’s ability to analyze data interactively, develop and debug code and examine detailed information. We believe that enabling direct but remote access to CMS data will greatly enhance the analysis experience for remotes users not situated at a CMS Tier1 or Tier2. The Lustre cluster filesystem allows remote servers the ability to mount filesystems over the wide-area-network as well as over the local-area network as it is more commonly used. It is also has an easy-to-deploy client, is reliable and performs exceptionally well. In this paper we report our experience using the Lustre filesystem to access CMS data from servers located a few hundred kilometers away from the physical filesystem. We describe the procedure used to connect two of the Florida Tier3 sites located in Miami and Daytona Beach to a storage element located in the University of Florida’s, located in Gainesville, Tier2 center and its High Performance Computing Center. We include details on the hardware used, kernel modifications and tunings, report on network bandwidth, system I/O performance and compare these benchmarks with actual CMS application runs. We also propose a possible scenario for implementing this new method of accessing CMS data in the context of the CMS data management system. Finally we explore some of the issues concerning remote user access with Lustre, and touch upon security concerns.
        Speaker: Prof. Rodriguez Jorge Luis (Florida Int'l University)
        Paper
        Poster
    • Opening Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Jan Gruntorad (CESNET)
      • 101
        Opening Address
        Speakers: Jiri Drahos (chair of the Academy of Sciences of the Czech Republic), Vaclav Hampl (rector of the Charles University in Prague), Vaclav Havlicek (rector of the Czech Technical University in Prague)
        Video
    • Plenary: Monday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Harvey Newman (CalTech)
      • 102
        The LHC Machine and Experiments: Status and Prospects
        The LHC Machine and Experiments: Status and Prospects
        Speaker: Prof. Sergio Bertolucci (CERN)
      • 103
        WLCG - Can we deliver ?
        A personal review of WLCG and the readiness for first real LHC data, highlighting some particular successes, concerns and challenges that lie ahead.
        Speaker: Dr Neil Geddes (RAL)
        Slides
        Video
    • 10:30
      coffee break, exhibits and posters
    • Plenary: Monday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Hiroshi Sakamoto (Tokyo University)
      • 104
        Status and Prospects of LHC Experiments Data Acquisiton
        Data Acquisition systems are an integral part of their respective experiments. They are designed to meet the needs set by the physics programme. Despite some very interesting differences in the architecture the unprecedented data-rates at LHC have led to a lot of commonalities among the four large LHC data acquisition systems. All of them rely on commercial local area network technology and more specificially mostly on Gigabit Ethernet. They transport the data from the detector readout-boards to large farms of industry standard servers, where a pure software trigger is run. These four systems will be reviewed, the underlying commonalities will be high-lighted and interesting architectural differences will be discussed. In view of a possible LHC upgrade we will briefly discuss the suitability and evolution of the current architectures to fit the needs of SLHC.
        Speaker: Dr Niko Neufeld (CERN)
        Slides
        Video
      • 105
        Status and Prospects of The LHC Experiments Computing
        Status and Prospects of The LHC Experiments Computing
        Speaker: Prof. Kors Bos (NIKHEF)
        Slides
        Video
      • 106
        LHC data analysis starts on a Grid – What’s next?
        For various reasons the computing facility for LHC data analysis has been organised as a widely distributed computational grid. Will this be able to meet the requirements of the  experiments as LHC energy and luminosity ramp up? Will grid operation become a basic component of science infrastructure? Will virtualisation and the cloud model eliminate the need for complex grid middleware? Will multi-core personal computers relegate the grid to a data delivery service?..... The talk will look at some of the advantages and some of the drawbacks of the grid approach, and will present a personal view on how things might evolve.
        Speaker: Les Robertson (CERN)
        Slides
        Video
    • 13:00
      lunch
    • Collaborative Tools: Monday Club B

      Club B

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Eva Hladka (CESNET)
      • 107
        CMS Centres Worldwide: a New Collaborative Infrastructure
        The CMS Experiment at the LHC is establishing a global network of inter-connected "CMS Centres" for controls, operations and monitoring. These support: (1) CMS data quality monitoring, detector calibrations, and analysis; and (2) computing operations for the processing, storage and distribution of CMS data. We describe the infrastructure, computing, software, and communications, systems required to create an effective and affordable CMS Centre. We present our highly successful operations experiences with the major CMS Centres at CERN, Fermilab, and DESY during the LHC first beam data-taking and cosmic ray commissioning work. The status of the various centres already operating or under construction in Asia, Europe, Russia, South America, and the USA is also described. We emphasise the collaborative communications aspects. For example, virtual co-location of experts in CMS Centres Worldwide is achieved using high-quality permanently-running "telepresence" video links. Generic Web-based tools have been developed and deployed for monitoring, control, display management and outreach.
        Speaker: Dr Lucas Taylor (Northeastern U., Boston)
      • 108
        DIRAC Secure Web User Interface
        Traditionally interaction between users and the Grid is done with command line tools. However, these tools are difficult to use by a non-expert user providing minimal help and generating outputs not always easy to understand especially in case of errors. Graphical User Interfaces are typically limited to providing access to the monitoring or accounting information and concentrate on some particular aspects failing to cover the full spectrum of grid control tasks. To make the Grid more user friendly more complete graphical interfaces are needed. Within the DIRAC project we have attempted to construct a Web based User Interface that provides means not only for monitoring the system behavior but also allows to steer the main user activities on the grid. Using DIRAC's web interface a user can easily track jobs and data. It provides access to job information and allows to perform actions on jobs such as killing or deleting. Data managers can define and monitor file transfer activity as well as check requests set by jobs. Production managers can define and follow large data productions and react if necessary by stopping or starting them. The Web portal is build following all the grid security standards and using modern Web 2.0 technologies which allows to achieve the user experience similar to the desktop applications. Details of the DIRAC Web Portal architecture and User Interface will be presented and discussed.
        Speaker: Mr Adrian Casajus Ramo (Departament d' Estructura i Constituents de la Materia)
        Slides
      • 109
        Lecture archiving on a larger scale at the University of Michigan and CERN
        The ATLAS Collaboratory Project at the University of Michigan has been a leader in the area of collaborative tools since 1999. Its activities include the development of standards, software and hardware tools for lecture archiving, and making recommendations for videoconferencing and remote teaching facilities. Starting in 2006 our group became involved in classroom recordings, and in early 2008 we spawned CARMA, a University-wide recording service. This service uses a new portable recording system that we developed. Capture, archiving and dissemination of rich multimedia content from lectures, tutorials and classes are increasingly widespread activities among universities and research institutes. A growing array of related commercial and open source technologies is becoming available, with several new products being introduced in the last couple years. As the result of a new close partnership between U-M and CERN IT, a market survey of these products is being conducted and will be presented. It will inform an ambitious effort in 2009 to equip many CERN rooms with automated lecture archiving systems, on a much larger scale than before. This new technology is being integrated with CERN’s existing webcast, CDS, and Indico applications.
        Speaker: Mr Jeremy Herr (U. of Michigan)
        Slides
      • 110
        Virtual Logbooks as a Tool for Enriching the Collaborative Experience in Large Scientific Projects
        A key feature of collaboration in large scale scientific projects is keeping a log of what and how is being done - for private use and reuse and for sharing selected parts with collaborators and peers, often distributed geographically on an increasingly global scale. Even better if this log is automatic, created on the fly while a scientist or software developer is working in a habitual way, without the need for extra efforts. The CAVES - Collaborative Analysis Versioning Environment System - and CODESH - COllaborative DEvelopment SHell - projects address this problem in a novel way. They build on the concepts of virtual state and virtual transition to enhance the collaborative experience by providing automatic persistent virtual logbooks. CAVES is designed for sessions of distributed data analysis using the popular ROOT framework, while CODESH generalizes the same approach for any type of work on the command line in typical UNIX shells like bash or tcsh. Repositories of sessions can be configured dynamically to record and make available the knowledge accumulated in the course of a scientific or software endeavor. Access can be controlled to define logbooks of private sessions or sessions shared within or between collaborating groups. As a typical use case we concentrate on building working scalable systems for analysis of Petascale volumes of data expected with the start of the LHC experiments. Our approach is general enough to find applications in many scientific fields.
        Speaker: Dr Dimitri BOURILKOV (University of Floria)
      • 111
        EVO (Enabling Virtual Organizations)
        The EVO (Enabling Virtual Organizations) system is based on a new distributed and unique architecture, leveraging the 10+ years of unique experience of developing and operating large distributed production based collaboration systems. The primary objective being to provide to the High Energy and Nuclear Physics experiments a system/service that meet their unique requirements of usability, quality, scalability, reliability, and cost necessary for nationally and globally distributed research organizations. The EVO system, which will be officially released during June 2007 includes a better-integrated and more convenient user interface, a richer feature set including higher resolution video and instant messaging, greater adaptability to all platforms and operating systems, and higher overall operational efficiency and robustness. All of these aspects will be particularly important as we are entering the startup period of the LHC because the community will require an unprecedented level of daily collaboration. There will be intense demand for long distance scheduled meetings, person-to-person communication, group-to-group discussions, broadcast meetings, workshops and continuous presence at important locations such as control rooms and experimental areas. The need to have the collaboration tools totally integrated in the physicists’ working environments will gain great importance. Beyond all these user-features, another key enhancement is the collaboration infrastructure network created by EVO, which covers the entire globe and which is fully redundant and resilient to failure. The EVO infrastructure automatically adapts to the prevailing network configuration and status, so as to ensure that the collaboration service runs without disruption. Because we are able to monitor the end-user’s node, we are able to inform the user of any potential or arising problems (e.g. excessive CPU load or packet loss) and, where possible, to fix the problems automatically and transparently on behalf of the user (e.g. by switching to another server node in the network, by reducing the number of video streams received, et cetera). The integration of the MonALISA architecture into this new EVO architecture was an important step in the evolution of the service towards a globally distributed dynamic system that is largely autonomous. The EVO system is now the primary collaboration system used by the LHC and more generally by High Energy and Nuclear Physics community going forward.
        Speaker: Philippe Galvez (California Institute of Technology (CALTECH))
        Slides
      • 112
        High Definition Videoconferencing for High Energy Physics
        We describe the use of professional-quality high-definition (HD) videoconferencing systems for daily HEP experiment operations and large-scale media events. For CMS operations at the Large Hadron Collider, we use such systems for permanently running "telepresence" communications between the CMS Control Room in France and major offline CMS Centres at CERN, DESY, and Fermilab, and with a number of smaller sites worldwide on an as-needed basis. We have also used HD systems for large-scale global media events, such as the LHC First Beam Day event on Sept. 10, 2008, the world's largest scientific press event since the moon landing. For such events, poor quality audio or video signals or equipment failure is simply not an option. We describe the systems we use today and our views on the future of HD videoconferencing and HD telepresence systems in High Energy Physics. We describe how high-quality, easy-to-use, extremely reliable videoconferencing systems may be established in a HEP environment at an affordable cost.
        Speaker: Dr Erik Gottschalk (Fermi National Accelerator Laboratory (FNAL))
    • Distributed Processing and Analysis: Monday Club C

      Club C

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Jerome Lauret (BNL)
      • 113
        Towards the 5th LHC VO: The LHC beam studies in the WLCG environment
        Recently a growing number of various applications have been quickly and successfully enabled on the Grid by the CERN Grid application support team. This allowed the applications to achieve and publish large-scale results in short time which otherwise would not be possible. The examples of successful Grid applications include the medical and particle physics simulation (Geant4, Garfield), satellite imaging and geographic information for humanitarian relief operations (UNOSAT), telecommunications (ITU), theoretical physics (Lattice QCD, Feynman-loop evaluation), Bio-informatics (Avian Flu Data Challenge), commercial imaging processing and classification (Imense Ltd.). Based on this successful experience, and that of the 4 LHC VOs, the LHC beam team has decided to run their tracking and collimation applications in the WLCG environment. The large amount of jobs, the level of service and the performance requirements as well as the importance of tracking applications for the four LHC experiments makes the LHC beam community a candidate for the 5th LHC VO. In this talk we present the procedures, tools and services used for enabling the tracking applications in the WLCG environment. We also study the experience of running the LHC tracking applications on the Grid. We draw the analogies with the problems that ITER will have to face in the future to establish a collaboration within the Grid community and make a successful use of the Grid resources.
        Speakers: Dr Jakub Moscicki (CERN IT/GS), Dr Patricia Mendez Lorenzo (CERN IT/GS)
        Slides
      • 114
        CMS FileMover: One Click Data
        The CMS experiment has a distributed computing model, supporting thousands of physicists at hundreds of sites around the world. While this is a suitable solution for "day to day" work in the LHC era there are edge use-cases that Grid solutions do not satisfy. Occasionally it is desirable to have direct access to a file on a users desktop or laptop; for code development, debugging or examining event displays. We have developed a user-friendly, web based tool that bridges the gap between the large scale Grid resources and the smaller, simpler user edge cases. We discuss the development and integration of this new component with existing CMS and Grid services, as well as the constraints we have put in place to prevent misuse. We also explore possible future developments which could turn the current service into a general "low-latency" event delivery service.
        Speaker: Valentin Kuznetsov (Cornell University)
        Slides
      • 115
        Ganga: User-friendly Grid job submission and management tool for LHC and beyond
        Ganga has been widely used for several years in Atlas, LHCb and a handful of other communities in the context of the EGEE project. Ganga provides a simple yet powerful interface for submitting and managing jobs to a variety of computing backends. The tool helps users configuring applications and keeping track of their work. With the major release of version 5 in summer 2008, Ganga's main user-friendly features have been strengthened. New configuration interface, enhanced support for job collections, bulk operations and easier access to subjobs are just few examples. In addition to the traditional batch and Grid backends such as Condor, LSF, PBS, gLite/EDG a point-to-point job execution via ssh on remote machines is now supported. Ganga is used as an interactive job submission interface for the end-users and also, as a job submission component for higher-level tools. For example GangaRobot is used to perform automated, end-to-end testing of the HEP data analysis chain on the Grid. Ganga comes with extensive test suite covering more than 350 test cases. The development model involves all active developers in the release management shifts which is an important and novel approach for the distributed software collaborations. Ganga 5 is a mature, stable and widely-used tool with long-term support from the HEP community.
        Speaker: Dr Daniel van der Ster (CERN)
        Slides
      • 116
        Babar Task Manager II
        The Babar experiment produced one of the largest datasets in high energy physics. To provide for many different concurrent analyses the data is skimmed into many data streams before analysis can begin, multiplying the size of the dataset both in terms of bytes and number of files. As a large scale problem of job management and data control, the Babar Task Manager system was developed. The system proved not able to scale to the size of the problem, and it was wished to distribute the production to many sites and use grid resources to help. A development effort was started, and the Task Manager II was the result. This has now been in production for over a year in Babar, and produced several skim cycles of data, at multiple computing centers, and was able to use grid resources. The structure of the system will be presented, along with details on scalability to number of jobs, and use of remote sites both with and without grid resources.
        Speaker: Dr Douglas Smith (STANFORD LINEAR ACCELERATOR CENTER)
        Slides
      • 117
        Scalla/xrootd WAN globalization tools: where we are.
        The Scalla/Xrootd software suite is a set of tools and suggested methods useful to build scalable, fault tolerant and high performance storage systems for POSIX-like data access. One of the most important recent development efforts is to implement technologies able to deal with the characteristics of Wide Area Networks, and find solutions in order to allow data analysis applications to directly access remote data repositories in an efficient way. This contribution describes the current status of the various features and mechanisms implemented in the Scalla/Xrootd sotware suite, which allow to create and efficiently access 'global' data repositories, obtained by aggregating multiple sites through Wide Area Networks. One of these mechanisms is the ability of the clients to efficiently exploit high-latency high-throughput WANs and access remote repositories in read/write mode for analysis-like tasks. We will also discuss the possibilities of making distant data sub-repositories cooperate. The aim is to give a unique view of their content, and eventually allow external systems to coordinate and trigger data movements among them. Experience in using Scalla/Xrootd remote data repositories will also be reported.
        Speaker: Dr Fabrizio Furano (Conseil Europeen Recherche Nucl. (CERN))
        Slides
      • 118
        Reprocessing LHC beam and cosmic ray data with the ATLAS distributed Production System
        We present our experience with distributed reprocessing of the LHC beam and cosmic ray data taken with the ATLAS detector during 2008/2009. Raw data were distributed from CERN to ATLAS Tier-1 centers, reprocessed and validated. The reconstructed data were consolidated at CERN and ten WLCG ATLAS Tier-1 centers and made available for physics analysis. The reprocessing was done simultaneously in more than 30 centers using the ATLAS Production System. Several challenging issues were solved, such as scalable access to ATLAS conditions and calibration data, bulk data prestaging and data distribution in quasi real time mode. We also describe the ATLAS distributed production system running at 70 Universities and Labs in Asia, Europe, North America and Pacific region with automatic task sumbission, control and aggregation of results at Tier-1 centers.
        Speaker: Dr Alexei Klimentov (BNL)
        Slides
    • Event Processing: Monday Club E

      Club E

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Chris Jones (FNAL)
      • 119
        CMS Software Performance Strategies
        Performance of an experiment's simulation, reconstruction and analysis software is of critical importance to physics competitiveness and making optimum use of the available budget. In the last 18 months the performance improvement program in the CMS experiment has produced more than a ten-fold gain in reconstruction performance alone, a significant reduction in mass storage system load, a reduction in memory consumption and a variety of other gains. We present our application performance analysis methods and our techniques for higher performance memory management, I/O, data persistency, software packaging, code generation, as well as how to reduce total memory usage. We report on specific gains achieved and the main contributing causes. We discuss our estimate of future achievable gains and promising new tools and analysis methods.
        Speaker: Dr Peter Elmer (PRINCETON UNIVERSITY)
        Slides
      • 120
        HEP C++ meets reality -- lessons and tips
        In 2007 the CMS experiment first reported some initial findings on the impedance mismatch between HEP use of C++ and the current generation of compilers and CPUs. Since then we have continued our analysis of the CMS experiment code base, including the external packages we use. We have found that large amounts of C++ code has been written largely ignoring any physical reality of the resulting machine code and run time execution costs, including and especially software developed by experts. We report on a wide range issues affecting typical high energy physics code, in the form of coding pattern - impact - lesson - improvement.
        Speaker: Mr Giulio Eulisse (NORTHEASTERN UNIVERSITY OF BOSTON (MA) U.S.A.)
        Slides
      • 121
        The ATLAS Simulation Validation and computing performance studies
        The ATLAS Simulation validation project is done in two distinct phases. The first one is the computing validation, the second being the physics performance that must be tested and compared to available data. Infrastructure needed at each stage of validation is here described. In ATLAS software development is controlled by nightly builds to check stability and performance. The complete computing performance of the simulation is tested through three types of tests: ATLAS Nightly Tests (ATN), Real Time Tests (RTT) and Full Chain Tests (FCT)., each test being responsible for different levels of validation. In this report tests on robustness, benchmarking computing performance and basic functionality are described. In addition to automatic tests, computing time, memory consumption, and output file size are benchmarked in each stable release in a variety of processes both simple and complex. Single muons, electrons, and charged pions are used, as well as dijets in bins of leading parton pT , Supersymmetric benchmark point three (SU3), minimum bias, Higgs boson decaying to four leptons, Z → e+e−, Z → µ+µ−, and Z → τ+τ− events.
        Speaker: Zachary Marshall (Caltech, USA & Columbia University, USA)
        Slides
      • 122
        The Virtual Point 1 Event Display for the ATLAS Experiment
        We present an event display for the ATLAS Experiment, called Virtual Point 1 (VP1), designed initially for deployment at point 1 of the LHC, the location of the ATLAS detector. The Qt/OpenGL based application provides truthful and interactive 3D representations of both event and non-event data, and now serves a general-purpose role within the experiment. Thus, VP1 is used both online (in the control room itself or remotely via a special "live" mode) and offline environments to provide fast debugging and understanding of events, detector status and software. In addition to a flexible plugin infrastructure and a high level of configurability, this multi-purpose role is mainly facilitated by the application being embedded directly in the ATLAS offline software framework, enabling it to use the native Event Data Model directly, and thus run on any source of ATLAS data, or even directly from within e.g. reconstruction jobs. Finally, VP1 provides high-quality pictures and movies, useful for outreach purposes.
        Speaker: Dr Thomas Kittelmann (University of Pittsburgh)
        Slides
      • 123
        Fireworks: A Physics Event Display for CMS
        Fireworks is a CMS event display which is specialized for the physics studies case. This specialization allows to use a stylized rather than 3D accurate representation when it's appropriate. Data handling is greatly simplified by using only reconstructed information and ideal geometry. Fireworks provides an easy to use interface which allows a physicist to concentrate only on the data to which they are interested. Data is presented via graphical and textual views. Cross view data interpretation is easy since the same object is shown using the same color in all views and if the object is selected it is highlighted in all views. Objects which have been selected can be further studied by displaying a detailed view of just that object. Physicists can select which events (e.g. require a high energy muon), what data (e.g. which track list) and which items in a collection (e.g. only high-pt tracks) to show. Once the physicist has configured Fireworks to their liking they can save the configuration. Fireworks is built using the Eve subsystem of the CERN ROOT project and CMS's FWLite project. The FWLite project was part of CMS's recent code redesign which separates data classes into libraries separate from algorithms producing the data and uses ROOT directly for C++ object storage thereby allowing the data classes to be used directly in ROOT.
        Speaker: Kovalskyi Dmytro (University of California, Santa Barbara)
        Slides
      • 124
        Validation of software releases for CMS
        The CMS software stack currently consists of more than 2 million lines of code developed by over 250 authors with a new version being released every week. CMS has setup a central release validation process for quality assurance which enables the developers to compare the performance to previous releases and references. This process provides the developers with reconstructed datasets of real data and MC samples. The samples span the whole range of detector effects and important physics signatures to benchmark the performance of the software. They are used to investigate interdependency effects of software packages and to find and fix bugs. This talk will describe the composition of the Release Validation sample sets and list the development groups who requested and use these samples. It especially points out the difficulties to compose coherent sample sets from the various requests for release validation samples. All samples have to fit within the available resource constraints. This is achieved by exploiting synergies between the different requester use cases and sample requests. Common to all use cases are the event processing workflows used to produce the samples. They are modified compared to the production workflows to be better suited for validation and described in more detail. Overall, the talk will emphasize the importance of a central release validation process for projects with a large code basis and significant number of developers. It will summarize the extent and impact of the 2008 release validation sample production and can function as an example for future projects.
        Speaker: Oliver Gutsche (FERMILAB)
        Slides
    • Grid Middleware and Networking Technologies: Monday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 125
        The UK particle physics Grid - status and developments
        During 2008 we have seen several notable changes in the way the LHC experiments have tried to tackle outstanding gaps in the implementation of their computing models. The development of space tokens and changes in job submission and data movement tools are key examples. The first section of this paper will review these changes and the technical/configuration impacts they have had at the site level across the GridPP sites. The second section will look in more detail at challenges that have been faced the RAL Tier-1 site, and in particular work that has been done to improve the resilience and stability of core services. The third section of the paper will examine required recent changes in the operational model across the UK Tier-1 and Tier-2s as the focus has shifted to better supporting users and understanding how the user view of services differs from that of the infrastructure provider. This will be tackled through the use of several use cases which highlight common problems that still need to be overcome. The fourth and final section of the paper will present an analysis of GridPP metrics used within the project to assess progress, problems and issues.
        Speaker: Dr Jeremy Coles (University of Cambridge - GridPP)
        Slides
      • 126
        ITIL and Grid services at GridKa
        Offering sustainable Grid services to users and other computing centres is the main aim of GridKa, the German Tier-1 centre of the WLCG infrastructure. The availability and reliability of IT services directly influences the customers’ satisfaction as well as the reputation of the service provider and not to forget the economical aspects. It is thus important to concentrate on processes and tools that increase the availability and reliability of IT services. At the German Tier 1 Centre GridKa a special working group for ITIL processes exists. This Group is responsible for the management of all the IT services offered by the institute. ITIL is a standardized and process-orientated description for the management of IT Services. The ITIL model itself consists of several processes. We will show the different ITIL processes like Incident, Problem, Change and Configuration Management and how they are organized at GridKa. The special roles and a list of the tools which are implemented at GridKa to support the customers and the internal staff members will be presented. A special focus will be the distinction between the view from outside and inside the Steinbuch Centre for Computing and the consequences of this distinction for the ITIL processes.
        Speaker: Tobias Koenig (Karlsruhe Institute of Technology (KIT))
        Slides
      • 127
        Advances in Grid Operations
        A review of the evolution of WLCG/EGEE grid operations Authors: Maria BARROSO, Diana BOSIO, David COLLADOS, Maria DIMOU, Antonio RETICO, John SHADE, Nick THACKRAY, Steve TRAYLEN, Romain WARTEL As the EGEE grid infrastructure continues to grow in size, complexity and usage, the task of ensuring the continued, uninterrupted availability of the grid services to the ever increasing number of user communities becomes more and more challenging. In addition, it is clear that these challenges will only increase with the significant ramp‐up, in 2009, of data taking at the Large Hadron Collider; the main experiments of which are, through the WLCG service, by far the largest users of the EGEE grid infrastructure. In this paper we discuss the ways in which the processes and tools of grid operations have been appraised and enhanced over the last 18 months in order to meet these challenges without any increase in the size of the team, while at the same time improving the overall level of service that the users experience when using the grid infrastructure. The improvements to the operations procedures and tools include: enhancements to the middleware lifecycle processes; improvements to operations communications channels (both to VOs and to sites); strategies to raise the availability and reliability of sites; improvements in the level of service supplied by the central grid operations tools; improvements to the robustness of core middleware services; enhancements to the handing of trouble ticket; sharing of best practices; and others. These points are then brought together to describe how the grid central operations team has learned valuable lessons through the day‐to‐day experience of operating the infrastructure and how operations has evolved as a result of this. In the last part of the paper, we will examine the future plans for further improvements in grid operations, including how we will deal with the unavoidable reduction in the level of effort available to for grid operations, as the funding for EGEE comes to an end in early 2010, just as the use of the grid by the LHC experiments will dramatically increase.
        Speakers: Ms Maite Barroso (CERN), Nicholas Thackray (CERN)
        Slides
      • 128
        A Business Model for the Establishment of the European Grid Infrastructure
        International research collaborations increasingly require secure sharing of resources owned by the partner organizations and distributed among different administration domains. Examples of resources include data, computing facilities (commodity computer clusters, HPC systems, etc.), storage space, metadata from remote archives, scientific instruments, sensors, etc. Sharing is made possible via Grid middleware, i.e. software services exposing a uniform interface regardless of the specific fabric-layer resource properties, providing access according to user role and in full compliance with the policies defined by the resource owners. The Grid Infrastructure consists of: distributed resources – funded and owned by national and local resource providers – with their respective usage policies, interoperable middleware services installed and operated by resource providers, the Grid middleware distribution and the testbeds for its certification and integration, the Grid operations including authentication, authorization, monitoring and accounting, and, finally, user and application support. The European project EGI_DS, brings about the creation of a new organizational model, capable of fulfilling the vision of a sustainable European Grid infrastructure for e-Science. The European Grid Initiative (EGI) is the proposed framework which links seamlessly at a world-wide level the European national e-Infrastructures operated by the National Grid Initiatives, and based on a European Unified Middleware Distribution (UMD), which will be the result of a joint effort of various European Grid middleware consortia. This paper describes the actors contributing to the foundation of the European Grid infrastructure, and the use cases, the mission, the purpose, the offering, and the organizational structure which constitute the EGI business model.
        Speakers: Laura Perini (INFN Milano), Tiziana Ferrari (INFN CNAF)
        Slides
      • 129
        Analysis of the Use, Value and Upcoming Challenges for the Open Science Grid
        The Open Science Grid usage has ramped up more than 25% in the past twelve months due to both the increase in throughput of the core stakeholders – US LHC, LIGO and Run II – and increase in usage by non-physics communities. We present and analyze this ramp up together with the issues encountered and implications for the future. It is important to understand the value of collaborative projects such as the OSG in contributing to the scientific community. This needs to be cognizant of the environment of commercial cloud offerings, the evolving and maturing middleware for grid based distributed computing, and the evolution in science and research dependence on computation. We present a first categorization of OSG value and analysis across several different aspects of the Consortium’s goals and activities. And last, but not least, we analyze the upcoming challenges of LHC data analysis ramp up and our ongoing contributions to the World Wide LHC Computing Grid.
        Speaker: Mrs Ruth Pordes (FERMILAB)
        Slides
      • 130
        CDF way to Grid
        The CDF II experiment has been taking data at FNAL since 2001. The CDF computing architecture has evolved from initially using dedicated computing farms to using decentralized Grid-based resources on the EGEE grid, Open Science Grid and FNAL Campus grid. In order to deliver high quality physics results in a timely manner to a running experiment, CDF has had to adapt to Grid with minimum impact on the physicists analyzing the data. The use of portals to access the computing resources have allowed CDF to migrate Grid computing without changing how the users work. The infrastructure modifications was done by small steps over several years. CDF started from the usage of glidein concept; i.e. submitting Condor-based pilot jobs to the Grid with the first pilot-based pool in 2005 at the CNAF Tier 1 site in Italy, followed shortly by similar pools in N.America, Europe and Asia. This pilot job model evolved in OSG into the PANDA submission model of Atlas and the glideinWMS of CMS and recently integrated also into the CDF infrastructure. In order to access LCG/EGEE resources using the gLite middleware the CDF middleware has been reimplemented into LcgCAF, a dedicated portal. The evolution of the architecture together with the performances reached by the two portal will be discussed.
        Speaker: Dr Donatella Lucchesi (University and INFN Padova)
        Slides
    • Online Computing: Monday Club D

      Club D

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Sponsored by ACEOLE

      Convener: Wainer Vandelli (INFN)
      • 131
        CMS Data Acquisition System Software
        The CMS data acquisition system is made of two major subsystems: event building and event filter. The presented paper describes the architecture and design of the software that processes the data flow in the currently operating experiment. The central DAQ system relies heavily on industry standard networks and processing equipment. Adopting a single software infrastructure in all subsystems of the experiment imposes, however, a number of different requirements. High efficiency and configuration flexibility are among the most important ones. The XDAQ software infrastructure has matured over an eight years development and testing period and has shown to be able to cope well with the CMS requirements. We provide performance figures and report on the initial experience with the system at hand.
        Speaker: Dr Johannes Gutleber (CERN)
        Slides
      • 132
        The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger
        Event selection in the ATLAS High Level Trigger is accomplished to a large extent by reusing software components and event selection algorithms developed and tested in an offline environment. Many of these offline software modules are not specifically designed to run in a heavily multi threaded online data flow environment. The ATLAS High Level Trigger (HLT) framework based on the GAUDI and ATLAS ATHENA frameworks, forms the interface layer, which allows the execution of the HLT selection and monitoring code within the online run control and dataflow software. While such an approach provides a unified environment for trigger event selection across all of ATLAS, it also poses strict requirements on the reused software components in terms of performance, memory usage and stability. Experience of running the HLT selection software in the different environments and especially on large multi node trigger farms has been gained in several commissioning periods using preloaded Monte Carlo events, in data taking periods with cosmic events and in a short period with proton beams from LHC. The contribution discusses the architectural aspects of the HLT framework, its performance and its software environment within the ATLAS computing, trigger and data flow projects. Emphasis is also put on the architectural implications for the software by the use of multi core processors in the computing farms and the experiences gained with multi threading and multi process technologies.
        Speaker: Werner Wiedenmann (University of Wisconsin)
        Slides
      • 133
        A common real time framework for SuperKEKB and Hyper Suprime-Cam at Subaru telescope
        The real time data analysis at next generation experiments is a challenge because of their enormous data rate and size. The SuperKEKB experiment, the upgraded Belle experiment, requires to process 100 times larger data of current one taken at 10kHz. The offline-level data analysis is necessary in the HLT farm for the efficient data reduction. The real time processing of huge data is also the key at the planned dark energy survey using the Subaru telescope. The main camera for the survey called Hyper Suprime-Cam consists of 100 CCDs with 8 mega pixels each, and the total data size is expected to become comparable with that of SuperKEKB. The online tuning of measurement parameters is being planned by the real time processing, which was done empirically in the past. We started a joint development of the real time framework to be shared both by SuperKEKB and Hyper Suprime-Cam. The parallel processing technique is widely adopted in the framework design to utilize a huge number of network-connected PCs with multi-core CPUs. The parallel processing is performed not only in the trivial event-by-event manner, but also in the pipeline of the software modules which are dynamically placed over the distributed computing nodes. The object data flow in the framework is realized by the object serializing technique with the object persistence. On-the-fly collection of histograms and N-tuples is supported for the run-time data monitoring. The detailed design and the development status of the framework is presented.
        Speaker: Mr SooHyung Lee (Korea Univ.)
        Slides
      • 134
        The LHCb Run Control
        LHCb has designed and implemented an integrated Experiment Control System. The Control System uses the same concepts and the same tools to control and monitor all parts of the experiment: the Data Acquisition System, the Timing and the Trigger Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure and the interaction with the CERN Technical Services and the Accelerator. LHCb's Run Control, the main interface used by the experiment's operator, provides access in a hierarchical, coherent and homogeneous manner to all areas of the experiment and to all its sub-detectors. It allows for automated (or manual) configuration and control, including error recovery, of the full experiment in its different running modes: physics, cosmics, calibration, etc. Different instances of the same Run Control interface are used by the various sub-detectors for their stand-alone activities: test runs, calibration runs, etc. The architecture and the tools used to build the control system, the guidelines and components provided to the developers, as well as the first experience with the usage of the Run Control will be presented.
        Speaker: Dr Clara Gaspar (CERN)
        Slides
      • 135
        The ALICE Online-Offline Framework for the Extraction of Conditions Data
        The ALICE experiment is the dedicated heavy-ion experiment at the CERN LHC and will take data with a bandwidth of up to 1.25 GB/s. It consists of 18 subdetectors that interact with five online systems (DAQ, DCS, ECS, HLT and Trigger). Data recorded are read out by DAQ in a raw data stream produced by the subdetectors. In addition the subdetectors produce conditions data derived from the raw data, i.e. calibration and alignment information, which have to be available from the beginning of the reconstruction and therefore cannot be included in the raw data. The extraction of the conditions data is steered by a system called Shuttle. It provides the link between data produced by the subdetectors in the online systems** and a dedicated procedure per subdetector, called preprocessor, that runs in the Shuttle system. The preprocessor performs merging, consolidation and reformatting of the data. Finally, it stores the data in the Grid Offline Conditions Data Base (OCDB) so that they are available for the Offline reconstruction. The reconstruction of a given run is initiated automatically once the raw data are successfully exported to the Grid storage and the run has been processed in the Shuttle framework. While data-taking, a so-called quasi-online reconstruction is performed using the reduced set of conditions data that is already available during the current run. The talk introduces the quasi-online reconstruction strategy within the ALICE online-offline framework, i.e. the Shuttle system. The performance of such a complex system during the ALICE cosmics commissioning and LHC startup is described. Special emphasis is given to operational issues and feedback received. Operational statistics and remaining open issues are presented. ** Processing in the ALICE DAQ is discussed in a separate talk
        Speaker: Ms Chiara Zampolli (CERN)
      • 136
        The DZERO Level 3 Trigger and DAQ System
        The DZERO Level 3 Trigger and data acquisition system has been successfully running since March of 2001, taking data for the DZERO experiment located at the Tevatron at the Fermi National Laboratory. Based on a commodity parts, it reads out 65 VME front end crates and delivers the 250 MB of data to one of 1200 processing cores for a high level trigger decision at a rate of 1 kHz. Accepted events are then shipped to the DZERO online system where they are written to tape. The design is still relatively modern – all data pathways are based on TCP/IP and all components from the single board computer in the readout crates to the Level 3 trigger farm are based on commodity items. All parts except for the central network switch have been upgraded during the lifetime of the system. This paper will discuss the performance – particularly as the Tevatron has continued to increase its peak luminosity -- and the lessons learned during the upgrade of both the farms and the front end readout crate processors. We will also discuss the continued evolution of the automatic program that repairs common problems in the DAQ system.
        Speaker: Prof. Gordon Watts (UNIVERSITY OF WASHINGTON)
        Slides
    • Software Components, Tools and Databases: Monday Club A

      Club A

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Julius Hrivnac (LAL, Orsay)
      • 137
        The CMS Offline condition database software system
        Non-event data describing detector conditions change with time and come from different data sources. They are accessible by physicists within the offline event-processing applications for precise calibration of reconstructed data as well as for data-quality control purposes. Over the past three years CMS has developed and deployed a software system managing such data. Object-relational mapping and the relational abstraction layer of the LHC persistency framework are the foundation; the offline condition framework updates and delivers C++ data objects according to their validity. A high-level tag versioning system allows production managers to organize data in hierarchical view. A scripting API in python, command-line tools and a web service serve physicists in daily work. A mini-framework is available for handling data coming from external sources. Efficient data distribution over the worldwide network is guaranteed by a system of hierarchical web caches. The system has been tested and used in all major productions, test-beams and cosmic runs.
        Speaker: Dr Zhen Xie (Princeton University)
        Slides
      • 138
        Advanced Technologies for Scalable ATLAS Conditions Database Access on the Grid
        During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic workflow, ATLAS database scalability tests provided feedback for Conditions DB software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions DB data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions DB data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of “pilot” job submission on the Grid: instead of the actual query ATLAS utility library sends to the database server a “pilot” query first.
        Speakers: Alexandre Vaniachine (Argonne), Rodney Walker (LMU Munich)
        Slides
      • 139
        LCG Persistency Framework (POOL, CORAL, COOL) - Status and Outlook
        The LCG Persistency Framework consists of three software packages (POOL, CORAL and COOL) that address the data access requirements of the LHC experiments in several different areas. The project is the result of the collaboration between the CERN IT Department and the three experiments (ATLAS, CMS and LHCb) that are using some or all of the Persistency Framework components to access their data. The POOL package is a hybrid technology store for C++ objects, using a mixture of streaming and relational technologies to implement both object persistency and object metadata catalogs and collections. POOL provides generic components that can be used by the experiments to store both their event data and their conditions data. The CORAL package is an SQL-free abstraction layer for accessing data stored using relational database technologies. It is used directly by experiment-specific applications and internally by both COOL and POOL. The COOL package provides specific software components and tools for the handling of the time variation and versioning of the experiment conditions data. This presentation will report on the status and outlook of developments in each of the three sub-projects. It will also briefly review the usage and deployment models for these software packages in the three LHC experiments contributing to their development.
        Speaker: Andrea Valassi (CERN)
        Slides
      • 140
        Distributed Database Services - a Fundamental Component of the WLCG Service for the LHC Experiments - Experience and Outlook
        Originally deployed at CERN for the construction of LEP, relational databases now play a key role in the experiments' production chains, from online acquisition through to offline production, data distribution, reprocessing and analysis. They are also a fundamental building block for the Tier0 and Tier1 data management services. We summarize the key requirements in terms of availability, performance and scalability and explain the primary solutions that have been deployed both on- and off-line, at CERN and outside, to meet these requirements. We describe how the distributed database services deployed in the Worldwide LHC Computing Grid have met the challenges of 2008 - the two phases of CCRC'08, together with data taking from cosmic rays and the short period of LHC operation. Finally, we list the areas - both in terms of the baseline services as well as key applications and data life cycle - where enhancements have been required for 2009 and summarize the experience gained from 2009 data taking readiness testing - aka "CCRC'09" - together with a prognosis for 2009 data taking.
        Speaker: Dr Maria Girone (CERN)
        Slides
      • 141
        CORAL server: a middle tier for accessing relational database servers from CORAL applications
        The CORAL package is the CERN LCG Persistency Framework common relational database abstraction layer for accessing the data of the LHC experiments that is stored using relational database technologies. A traditional two-tier client-server model is presently used by most CORAL applications accessing relational database servers such as Oracle, MySQL, SQLite. A different model, involving a middle tier server solution deployed close to the database servers, has recently been discussed. This would provide several advantages over the simple client-server model in the areas of security (authentication via proxy certificates) and of scalability and performance (multiplexing for several incoming connections, etc.). Data caching is also available, by a "proxy server" component, deployed close to the database users. A joint development of such a middle tier (CERN, SLAC), known as 'CORAL server', is ongoing. This presentation will report on the status and outlook of the developments, solutions and test results for the new software components relevant to this project.
        Speaker: Dr Andrea Valassi (CERN)
        Slides
      • 142
        An Integrated Overview of Metadata in ATLAS
        Metadata--data about data--arise in many contexts, from many diverse sources, and at many levels in ATLAS. Familiar examples include run-level, luminosity-block-level, and event-level metadata, and, related to processing and organization, dataset-level and file-level metadata, but these categories are neither exhaustive nor orthogonal. Some metadata are known a priori, in advance of data taking or simulation; other metadata are known only after processing--and occasionally, quite late (e.g., detector status or quality updates that may appear after Tier 0 reconstruction is complete). Metadata that may seem relevant only internally to the distributed computing infrastructure under ordinary conditions may become relevant to physics analysis under error conditions ("What can I discover about data I failed to process?"). This talk provides an overview of metadata and metadata handling in ATLAS, and describes ongoing work to deliver integrated metadata services in support of physics analysis.
        Speakers: Dr David Malon (Argonne National Laboratory), Dr Elizabeth Gallas (University of Oxford)
        Slides
    • 16:00
      coffee break, exhibits and posters
    • Distributed Processing and Analysis: Monday Club C

      Club C

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Pablo Saiz (CERN)
      • 143
        End-to-end monitoring for data management
        One of the current problem areas for sustainable WLCG operations is in the area of data management and data transfer. The systems involved (e.g. Castor, dCache, DPM, FTS, gridFTP, OPN network) are rather complex and have multiple layers - failures can and do occur in any layer and due to the diversity of systems involved, the differences in the information they have available and their log formats it is currently extremely manpower-intensive to debug problems across these systems. That the information is often located on more than one WLCG site also complicates the problem and increases the latency in problem resolution. Additionally, we lack a good set of monitoring tools to provide a high-level operations-focused overview of what is happening upon the transfer services, and where the current top problems are. The services involved have most of the necessary information - we just don't collect all of it, join it and provide a useful view. The paper will describe the current status of a set of operations tools that allow a service manager to debug acute problem through the multiple layers (allowing them to see how a request is handled across all components involved). It will also report on work towards an "operations dashboard" for service managers to show what (and where) the current top problems in the system are.
        Speaker: Sophie Lemaitre (CERN)
      • 144
        Workflow generator and tracking at the rescue of distributed processing. Automating the handling of STAR's Grid production.
        Processing datasets on the order of tens of terabytes is an onerous task, faced by production coordinators everywhere. Users solicit data productions and, especially for simulation data, the vast amount of parameters (and sometime incomplete requests) point at the need for a tracking, control and archiving all requests made so a coordinated handling could be made by the production team. With the advent of grid computing the parallel processing power has increased but traceability has also become an increasing problematic due to the heterogeneous nature of Grids. Any one of a number of components may fail invalidating the job or execution flow in various stages of completion and re-submission of a few of the multitude of jobs (keeping the entire dataset production consistency) a difficult and tedious process. From the definition of the workflow to its execution, there is a strong need for validation, tracking, monitoring and reporting of problems. To ease the process of requesting production workflow, STAR has implemented several components addressing the full workflow consistency. A Web based online submission request module, implemented using Drupal’s Content Management System API, enforces ahead that all parameters are described in advance in a uniform fashion. Upon submission, all jobs are independently tracked and (sometime experiment-specific) discrepancies are detected and recorded providing detailed information on where/how/when the job failed. Aggregate information on success and failure are also provided in near real-time. We will describe this system in full.
        Speaker: Mr Levente HAJDU (BROOKHAVEN NATIONAL LABORATORY)
        Paper
        Slides
      • 145
        CMS Grid Submission Portal
        We present a Web portal for CMS Grid submission and management. Grid portals can deliver complex grid solutions to users without the need to download, install and maintain specialized software, or worrying about setting up site-specific components. The goal is to reduce the complexity of the user grid experience and to bring the full power of the grid to physicists engaged in LHC analysis through a standard web GUI. We describe how the portal exploits standard, off-the-shelf commodity software together with existing grid infrastructures in order to facilitate job submission and monitoring. Currently users are exposed to different flavors of grid middleware and the installation and maintenance of CMS and Grid specific software is still very complex for most physicists. The goal of the CMS grid submission portal is to hide and integrate the complex infrastructure details that can hinder a user's ability to do science. A rich AJAX user interface provides users the functionality to create, submit, share and monitor grid submissions. The grid portal is built on J2EE architecture employing enterprise technologies powered by JBoss application server. This technology has been used for many years in industry to provide enterprise class application deployments. The architecture is comprised of three tiers; presentation, business logic and data persistence. The presentation layer currently consists of a Java Server Faces web interface developed with Netbeans Visual Web Page development tools. The business logic layer provides interfaces to existing grid infrastructure such as VOMS, Globus, CRAB and CRABSERVER. This paper describes these developments, work in progress and plans for future enhancements.
        Speaker: Norbert Neumeister (Purdue University)
        Slides
      • 146
        Status of the ALICE CERN Analysis Facility
        The ALICE experiment at CERN LHC is intensively using a PROOF cluster for fast analysis and reconstruction. The current system (CAF - CERN Analysis Facility) consists of some 120 CPU cores and about 45 TB of local space. One of the most important aspects of the data analysis on the CAF is the speed with which it can be carried out. Fast feedback on the collected data can be obtained, which allows quasi-online quality assurance of the data as well as fast analysis that is essential for the success of the experiment. CAF aims to provide fast response in prototyping code for users needing many development iterations. PROOF allows the interactive parallel processing of data distributed on a local cluster via the xrootd protocol. Subsets of selected data can be automatically staged in CAF from the Grid storage systems. The talk will present the current setup, performance tests and comparison with a previous cluster and usage statistics. The possibility to use a PROOF setup for parallel data reconstruction is discussed using as example ALICE software framework AliRoot. Furthermore, needed developments, plans and the future scenario of PROOF on a Grid environment are addressed.
        Speaker: Mr Marco Meoni (CERN)
        Slides
      • 147
        CMS Analysis Operations
        During normal data taking CMS expects to support potentially as many as 2000 analysis users. In 2008 there were more than 800 individuals who submitted a remote analysis job to the CMS computing infrastructure. The bulk of these users will be supported at the over 40 CMS Tier-2 centers. Supporting a globally distributed community of users on a globally distributed set of computing clusters is a task that requires reconsidering the normal methods of user support for analysis operations. In 2008 CMS formed an Analysis Support Task Force in preparation for large scale physics analysis activities. The charge of the task force was to evaluate the available support tools, the user support techniques, and the direct feedback of users with the goal of improving the success rate and user experience when utilizing the distributed computing environment. The task force determined the tools needed to assess and reduce the number of non-zero exit code applications submitted to through the grid interfaces and worked with the CMS Experiment Dashboard developers to obtain the necessary information to quickly and proactively identify issues with user jobs and data sets hosted at various sites. Results of the analysis group surveys were compiled. Reference platforms for testing and debugging problems were established in various geographic regions. The task force also assesed the resources needed to make the transition to a permanent Analysis Operations task. In this presentation the results of the task force will be discussed as well as the CMS analysis operations plans for the start of data taking.
        Speaker: Dr James Letts (Department of Physics-Univ. of California at San Diego (UCSD))
        Slides
      • 148
        Babar production - the final dataset?
        The Babar experiment has been running at the SLAC National Accelerator Laboratory for the past nine years, and has measured 500 fb-1 of data. The final data run for the experiment finished in April 2008. Once the data was finished the final processing of all Babar data was started. This was the largest computing production effort in the history of Babar, including a reprocessing of all measured data, a full simulation with latest code versions for all measured detector conditions, and a full skimming of this data into all current analysis streams for use. This effort ended up producing the largest rates of CPU use and data production in the history of an already large scale experiment. The difficulties and successes of this effort will be reported with the amounts of data size, cpu time, and computing centers used.
        Speaker: Dr Douglas Smith (STANFORD LINEAR ACCELERATOR CENTER)
        Slides
    • Event Processing: Monday Club E

      Club E

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Elizabeth Sexton-Kennedy (FNAL)
      • 149
        Experience with the CMS EDM
        The re-engineered CMS EDM was presented at CHEP in 2006. Since that time we have gained a lot of operational experience with the chosen model. We will present some of our findings, and attempt to evaluate how well it is meeting its goals. We will discuss some of the new features that have been added since 2006 as well as some of the problems that have been addressed. Also discussed is the level of adoption throughout CMS, which spans the trigger farm up to the final physics analysis. Future plans, in particular dealing with schema evolution and scaling, will be discussed briefly.
        Speaker: Benedikt Hegner (CERN)
        Slides
      • 150
        File Level Provenance Tracking in CMS
        The CMS Offline framework stores provenance information within CMS's standard ROOT event data files. The provenance information is used to track how every data product was constructed including what other data products were read in order to do the construction. We will present how the framework gathers the provenance information, the efforts necessary to minimize the space used to store the provenance in the file and the tools which will be available to use the provenance.
        Speaker: Dr Christopher Jones (Fermi National Accelerator Laboratory)
        Slides
      • 151
        PAT: the CMS Physics Analysis Toolkit
        The CMS Physics Analysis Toolkit (PAT) is presented. The PAT is a high-level analysis layer enabling the development of common analysis efforts across and within Physics Analysis Groups. It aims at fulfilling the needs of most CMS analyses, providing both ease-of-use for the beginner and flexibility for the advanced user. The main PAT concepts are described in detail and some examples from realistic physics analyses are given.
        Speaker: Giovanni Petrucciani (SNS & INFN Pisa, CERN)
        Slides
      • 152
        ROOT: Support For Significant Evolutions of the User Data Model
        One of the main strength of ROOT I/O is its inherent support for schema evolution. Two distinct modes are supported, one manual via a hand coded Streamer function and one fully automatic via the ROOT StreamerInfo. One draw back of the Streamer function is that they are not usable by TTrees in split mode. Until now, the automatic schema evolution mechanism could not be customized by the user and the only mechanism to go beyond the default rules was to revert to using the Streamer Function. In ROOT 5.22/00, we introduced a new mechanism which allows user extensions of the automatic schema evolution that can be used in object-wise, member-wise and split modes. This presentation will describe the myriads of possibility ranging from the simple assignment of transient members to the complex reorganization of the user's object model.
        Speaker: Mr Philippe Canal (Fermilab)
        Slides
      • 153
        The Software Framework of the ILD detector concept at the ILC detector
        The International Linear Collider is the next large accelerator project in High Energy Physics. The ILD Detector Concept is one of three international working groups that are developing a detector concept for the ILC. It has been created by merging the two concept studies LDC and GLD in 2007. ILD uses a modular C++ application framework (Marlin) that is based on the international data format LCIO. It allows the distributed development of reconstruction and analysis software. Recently ILD has produced a large Monte Carlo data set of Standard Model physics and expected new physics signals at the ILC in order to further optimize the detector concept based on the Particle Flow paradigm. This production was only possible by exploiting grid computing resources available for ILC in the context of the WLCG. In this talk we give an overview of the core framework focusing on recent developments and improvements needed for the large scale Monte Carlo production since it has been last presented at CHEP2007.
        Speaker: Dr Frank Gaede (DESY IT)
        Slides
      • 154
        The CMS Computing, Software and Analysis Challenge
        The CMS experiment has performed a comprehensive challenge during May 2008 to test the full scope of offline data handling and analysis activities needed for data taking during the first few weeks of LHC collider operations. It constitutes the first full-scale challenge with large statistics under the conditions expected at the start-up of the LHC, including the expected initial mis-alignments and mis-calibrations for each sub-detector, and event signatures and rates typical for low instantaneous luminosity. Particular emphasis has been given to the prompt reconstruction workflows, and to the procedures for the alignment and calibration of each sub-detector. The latter were performed with restricted latency using the same computing infrastructure that will be used for real data, and the resulting calibration and alignment constants were used to re-reconstruct the data at Tier-1 centres. The presentation addresses the goals and practical experience from the challenge, and the lessons learned in view of LHC data taking are discussed.
        Speaker: Dr Rainer Mankel (DESY)
        Slides
    • Grid Middleware and Networking Technologies: Monday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 155
        GOCDB, A Topology Repository For A Worldwide Grid Infrastructure
        All grid projects have to deal with topology and operational information like resource distribution, contact lists and downtime declarations. Storing, maintaining and publishing this information properly is one of the key elements to successful grid operations. The solution adopted by EGEE and WLCG projects is a central repository that hosts this information and makes it available to users and client tools. This repository, known as GOCDB, is used through EGEE and WLCG as an authoritative primary source of information for operations, monitoring, accounting and reporting. After giving a short history of GOCDB, the paper describes the current architecture of the tool and gives an overview of its well established development workflows and release procedures. It also presents different collaboration use cases with other EGEE operations tools and deals with the High Availability mechanism put in place to address failover and replication issues. It describes ongoing work on providing web services interfaces and gives examples of integration with other grid projects, such as the NGS in the UK. The paper finally presents our vision of GOCDB's future and associated plans to base its architecture on a pseudo object database model, allowing for its distribution across the 11 EGEE regions. This will be one of the most challenging works to achieve during the third phase of EGEE in order to prepare for a sustainable European Grid Infrastructure.
        Speaker: Mr Gilles Mathieu (STFC, Didcot, UK)
        Slides
      • 156
        Bringing the CMS Distributed Computing System into Scalable Operations
        Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workfload management tools, the various computing workflows and the underlying computing infrastructure located at more than 50 computing centres worldwide interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Programme with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution will report on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
        Speaker: Dr Jose Hernandez (CIEMAT)
        Slides
      • 157
        A Dynamic System for ATLAS Software Installation on OSG Grid site
        ATLAS Grid production, like many other VO applications, requires the software packages to be installed on remote sites in advance. Therefore, a dynamic and reliable system for installing the ATLAS software releases on Grid sites is crucial to guarantee the timely and smooth start of ATLAS production and reduce its failure rate. In this talk, we discuss the issues encountered in the previous software installation system, and introduce the new approach, which is built upon the new development in the areas of the ATLAS workload management system (PanDA), and software package management system (pacman). It is also designed to integrate with the EGEE ATLAS software installation framework. In the new system, ATLAS software releases are packaged as pacball, a uniquely identifiable and reproducible self-installing data file. The distribution of pacballs to remote sites is managed by ATLAS data management system (DQ2) and PanDA server. The installation on remote sites is automatically triggered by the PanDA pilot jobs. The installation job payload connects to the EGEE ATLAS software installation portal, making the information of installation status easily accessible across OSG and EGEE Grids. The deployment of this new system and its performance in USATLAS production will also be discussed.
        Speaker: Mr Xin Zhao (Brookhaven National Laboratory,USA)
        Slides
      • 158
        Migration of ATLAS PanDA to CERN
        The ATLAS Production and Distributed Analysis System (PanDA) is a key component of the ATLAS distributed computing infrastructure. All ATLAS production jobs, and a substantial amount of user and group analysis jobs, pass through the PanDA system which manages their execution on the grid. PanDA also plays a key role in production task definition and the dataset replication request system. PanDA has recently been migrated from Brookhaven National Laboratory (BNL) to the European Organization for Nuclear Research (CERN), a process we describe here. We discuss how the new infrastructure for PanDA, which relies heavily on services provided by CERN IT, was introduced in order to make the service as reliable as possible and to allow it to be scaled to ATLAS's increasing need for distributed computing. The migration involved changing the backend database for PanDA from MySQL to ORACLE, which impacted upon the database schemas. The process by which the client code was optimised for the new database backend is illustrated by example. We describe the procedure by which the database is tested and commissioned for production use. Operations during the migration had to be planned carefully to minimise disruption to ongoing ATLAS operations. All parts of the migration had to be fully tested before commissioning the new infrastructure, which at times involved careful segmenting of ATLAS grid resources in order to verify the new services at scale. Finally, after the migration was completed, results on the final validation and full scale stress testing of the new infrastructure are presented.
        Speaker: Dr Graeme Andrew Stewart (University of Glasgow)
        Paper
        Slides
      • 159
        Critical services in the LHC computing
        The LHC experiments (ALICE, ATLAS, CMS and LHCb) rely for the data acquisition, processing, distribution, analysis and simulation on complex computing systems, run using a variety of services, provided by the experiment services, the WLCG Grid and the different computing centres. These services range from the most basic (network, batch systems, file systems) to the mass storage services or the Grid information system, up to the different workload management systems, data catalogues and data transfer tools, often internally developed in the collaborations. In this contribution we review the status of the services most critical to the experiments by quantitatively measuring their readiness with respect to the start of the LHC operations. Shortcomings are identified and common recommendations are offered.
        Speaker: Dr Andrea Sciabà (CERN)
        Slides
      • 160
        Status and outlook of the HEP Network
        I will review the status, outlook recent technology trends and state of the art developments in the major networks serving the high energy physics community in the LHC era. I will also cover the progress in reducing or closing the Digital Divide separating scientists in several world regions from the mainstream, from the perspective of the ICFA Standing Committee on Inter-regional Connectivity.
        Speaker: Prof. Harvey Newman (Caltech)
    • Hardware and Computing Fabrics: Monday Club B

      Club B

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Isidro Gonzales Caballero (CERN)
      • 161
        A comparison of HEP code with SPEC benchmark on multicore worker nodes
        The SPEC INT benchmark has been used as a performance reference for computing in the HEP community for the past 20 years. The SPEC CPU INT 2000 (SI2K) unit of performance has been used by the major HEP experiments both in the Computing Technical Design Report for the LHC experiments and in the evaluation of the Computing Centres. At recent HEPiX meetings several HEP sites have reported disagreements between actual machine performances and the scores reported by SPEC. Our group performed a detailed comparison of Simulation and Reconstruction code performances from the four LHC experiments in order to find a successor to the SI2K benchmark. We analyzed the new benchmarks from SPEC CPU 2006 suite, both integer and floating point, in order to find the best agreement with the HEP code behaviour, with particular attention paid to reproducing the actual environment of HEP farm i,e., each job running independently on each core, and matching compiler, optimization, percentage of integer and floating point operations, and ease of use.
        Speaker: Michele Michelotto (INFN + Hepix)
        Slides
      • 162
        Experience with low-power x86 processors (ATOM) for HEP usage
        In CERN openlab we have being running tests with a server using a low-power ATOM N330 dual-core/dual-thread processor deploying both HEP offline and online programs. The talk will report on the results, both for single runs as well as max throughput runs, and will also report on the results of thermal measurements. It will also show how the price/performance of an ATOM system compares to a Xeon system. Finally it will make recommendations as to how such low-power systems can be made optimal for HEP usage
        Speaker: Mr Sverre Jarp (CERN)
        Paper
        Slides
      • 163
        Air Conditioning and Computer Centre Power Efficiency: the Reality
        The current level of demand for Green Data Centres has created a growing market for consultants providing advice on how to meet the requirement for high levels of electrical power and, above all, cooling capacity both economically and ecologically. How should one choose, in the face of the many competing claims, the right concept for a cooling system in order to reach the right power level, efficiency, carbon emissions, reliability and to ensure flexibility in the face of future computing technology evolution? This presentation will compare and contrast various alternative computer centre cooling solutions, in particular covering examples of old technologies that are returning to favour in the context of the present energy crisis and new products vying for a place the market in addition to classic design options.
        Speaker: Tony Cass (CERN)
        Slides
      • 164
        A High Performance Hierarchical Storage Management System For the Canadian Tier-1 Centre at TRIUMF
        We describe in this paper the design and implementation of Tapeguy, a high performance non-proprietary Hierarchical Storage Management System (HSM) which is interfaced to dCache for efficient tertiary storage operations. The system has been successfully implemented at the canadian Tier-1 Centre at TRIUMF. The ATLAS experiment will collect a very large amount of data (approximately 3.5 Petabytes each year). An efficient HSM system will play a crucial role in the success of the ATLAS Computing Model which is driven by intensive large-scale data analysis activities that will be performed on the Worldwide LHC Computing Grid infrastructure around the clock. Tapeguy is perl-based. It controls and manages data and tape libraries. Its architecture is scalable and includes Dataset Writing control, a Readback Queuing mechanism and I/O tape drive load balancing as well as on-demand allocation of resources. A central MySQL database records metadata information for every file and transaction (for audit and performance evaluation), as well as an inventory of library elements. Tapeguy Dataset Writing was implemented to group files which are close in time and of similar type. Optional dataset path control dynamically allocates tape families and assign tapes to it. Tape flushing is based on various strategies: time, threshold or external callbacks mechanisms. Tapeguy Readback Queuing reorders all read requests by using a 'scan algothrim', avoiding unnecessary tape loading and unloading. Implementation of priorities will guarantee file delivery to all clients in a timely manner.
        Speaker: Mr Simon Liu (TRIUMF)
        Slides
      • 165
        Fair-share scheduling algorithm for a tertiary storage system
        Any experiment facing Peta bytes scale problems is in need for a highly scalable mass storage system (MSS) to keep a permanent copy of their valuable data. But beyond the permanent storage aspects, the sheer amount of data makes complete dataset availability onto “live storage” (centralized or aggregated space such as the one provided by Scala/Xrootd) cost prohibitive implying that a dynamic population from MSS to faster storage is needed. One of the most difficult aspects of dealing with MSS is the robotic tape component and its intrinsically long access times (latencies) that can dramatically affect the overall performance of any data access systems having MSS as their primary data storage. To speed the retrieval of such data, one could "organize" the requests according to criterion with an aim to deliver maximal data throughput. However, such approaches are often orthogonal to the fairness and a tradeoff between quality of service (responsiveness) and throughput is necessary for an optimal and practical implementation of a truly faire-share oriented file restore policy. Starting from explaining the key criterion used to build such policy, we will present an evaluation and comparisons of three different algorithms, offering fairshare file restoration from MSS and discuss their respective merits. We will further quantify their use impact on a typical file restoration for the RHIC/STAR experimental setup and this, within a development, analysis and production environment relying on a shared MSS service.
        Speaker: Mr Pavel JAKL (Nuclear Physics Inst., Academy of Sciences, Praha)
        Paper
        Slides
      • 166
        Lustre File System Evaluation at FNAL
        As part of its mission to provide integrated storage for a variety of experiments and use patterns, Fermilab's Computing Division examines emerging technologies and reevaluates existing ones to identify the storage solutions satisfying stakeholders' requirements, while providing adequate reliability, security, data integrity and maintainability. We formulated a set of criteria and then analyzed several commercial and open-source storage systems. In this paper we present and justify our evaluation criteria, which have two variants, one for HEP event analysis and one for HPC applications as found in LQCD and Computational Cosmology. We then examine in detail Lustre and compare it to dCache, the predominant (by byte count) storage system for LHC data. After a period of testing we released a Lustre system for use by Fermilab's Computational Cosmology cluster in a limited production environment. The Lattice QCD project will prototype a larger Lustre installation on their Infiniband-based clusters. Finally, we discuss Lustre's fitness for the HEP domain and production environments, and the possible integration of Lustre with GridFTP, SRM, and Enstore HSM.
        Speaker: Stephen Wolbers (FNAL)
        Slides
    • Online Computing: Monday Club D

      Club D

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Sponsored by ACEOLE

      Convener: Pierre Vande Vyvre (CERN)
      • 167
        Reliable online data-replication in LHCb
        In LHCb raw data files are created on a high-performance storage system using a custom, speed-optimized file-writing software. The file-writing is orchestrated by a data-base, which represents the life-cycle of a file and is the entry point for all operations related to files such as run-start, run-stop, file-migration, file-pinning and ultimately file-deletion. File copying to the Tier0 is done using LHCbs standard Grid framework, DIRAC. The file-mover processes also prepare the Offline-reprocessing by entering the files into the LHCb Bookkeeping database. In all these operations a lot of emphasis has been put on reliability via handshakes, cross-checks and retries. This paper presents the architecture, implementation details, performance results from the LHCb Full System test and associated tools (command line, web-interface).
        Speaker: Daniel Sonnick (University of Applied Sciences Kaiserslautern)
        Slides
      • 168
        ECAL Front-End Monitoring in the CMS experiment
        The CMS detector at LHC is equipped with a high precision lead tungstate crystal electromagnetic calorimeter (ECAL). The front-end boards and the photodetectors are monitored using a network of DCU (Detector Control Unit) chips located on the detector electronics. The DCU data are accessible through token rings controlled by an XDAQ based software component. Relevant parameters are transferred to DCS (Detector Control System) and stored into the Condition DataBase. The operational experience from the ECAL commissioning at the CMS experimental cavern is discussed and summarized.
        Speaker: Mr Matteo Marone (Universita degli Studi di Torino - Universita &amp; INFN, Torino)
        Slides
      • 169
        The ALICE data quality monitoring
        ALICE is one of the four experiments installed at the CERN Large Hadron Collider (LHC), especially designed for the study of heavy-ion collisions. The online Data Quality Monitoring (DQM) is an important part of the data acquisition (DAQ) software. It involves the online gathering, the analysis by user-defined algorithms and the visualization of monitored data. This paper presents the final design, as well as the latest and coming features, of the ALICE's specific DQM software called AMORE (Automatic MonitoRing Environment). It describes the challenges we faced during its implementation, including the performances issues, and how we tested and handled them, in particular by using a scalable and robust publish-subscribe architecture. We also review the on-going and increasing adoption of this tool amongst the ALICE collaboration and the measures taken to develop, in synergy with their respective teams, efficient monitoring modules for the sub-detectors. The related packaging and release procedure needed by such a distributed framework is also described. We finally overview the wide range of usages people make of this framework, and we review our own experience, before and during the LHC start-up, when monitoring the data quality on both the sub-detectors and the DAQ side in a real-world and challenging environment.
        Speaker: Mr Barthélémy von Haller (CERN)
      • 170
        Dynamic configuration of the CMS Data Acquisition cluster
        The CMS Data Acquisition cluster, which runs around 10000 applications, is configured dynamically at run time. XML configuration documents determine what applications are executed on each node and over what networks these applications communicate. Through this mechanism the DAQ System may be adapted to the required performance, partitioned in order to perform (test-) runs in parallel, or re-structured in case of hardware faults. This paper presents the CMS DAQ Configurator tool which is used to generate comprehensive configurations of the CMS DAQ system based on a high-level description given by the user. Using a database of configuration templates and a database containing a detailed model of hardware modules, data and control links, compute nodes and the network topology, the tool automatically determines which applications are needed, on which nodes they should run, and over which networks the event traffic will flow. The tool computes application parameters and generates the XML configuration documents as well as the configuration of the run-control system. The performance of the tool and operational experience during CMS commissioning and the first LHC runs are discussed.
        Speaker: Dr Hannes Sakulin (European Organization for Nuclear Research (CERN))
        Slides
      • 171
        The CMS RPC Detector Control System at LHC
        The Resistive Plate Chamber system is composed by 912 double-gap chambers equipped with about 10^4 frontend boards. The correct and safe operation of the RPC system requires a sophisticated and complex online Detector Control System, able to monitor and control 10^4 hardware devices distributed on an area of about 5000 m^2. The RPC DCS acquires, monitors and stores about 10^5 parameters coming from the detector, the electronics, the power system, the gas, and cooling systems. The DCS system, the first results and performances, obtained during the 2007 and 2008 CMS cosmic runs, will be described here.
        Speaker: Giovanni Polese (Lappeenranta Univ. of Technology)
        Slides
      • 172
        First-year experience with the ATLAS Online Monitoring framework
        ATLAS is one of the four experiments in the Large Hadron Collider (LHC) at CERN which has been put in operation this year. The challenging experimental environment and the extreme detector complexity required development of a highly scalable distributed monitoring framework, which is currently being used to monitor the quality of the data being taken as well as operational conditions of the hardware and software elements of the detector, trigger and data acquisition systems. At the moment the ATLAS Trigger/DAQ system is distributed over more than 1000 computers which is about one third of the final ATLAS size. At every minute of an ATLAS data taking session the monitoring framework serves several thousands physics events to monitoring data analysis applications, handles more than 4 million histograms updates coming from more than 4 thousands applications, executes 10 thousands advanced data quality checks for a subset of those histograms, displays histograms and results of these checks on several dozens of monitors installed in main and satellite ATLAS control rooms. This note presents the overview of the online monitoring software framework, and describes the experience which was gained during an extensive commissioning period as well as at the first phase of LHC beam in September 2008. Performance results, obtained on the current ATLAS DAQ system will also be presented, showing that the performance of the framework is adequate for the final ATLAS system.
        Speaker: Alina Corso-Radu (University of California, Irvine)
        Slides
    • Software Components, Tools and Databases: Monday Club A

      Club A

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Paolo Calafiura (LBNL, Berkeley)
      • 173
        Event Selection Services in ATLAS
        ATLAS has developed and deployed event-level selection services based upon event metadata records ("tags") and supporting file and database technology. These services allow physicists to extract events that satisfy their selection predicates from any stage of data processing and use them as input to later analyses. One component of these services is a web-based Event-Level Selection Service Interface (ELSSI). ELSSI supports event selection by integrating run-level metadata, luminosity-block-level metadata (e.g., detector status and quality information), and event-by-event information (e.g., triggers passed and physics content). The list of events that pass the physicist's cuts is returned in a form that can be used directly as input to local or distributed analysis; indeed, it is possible to submit a skimming job directly from the ELSSI interface using grid proxy credential delegation. Beyond this, ELSSI allows physicists who may or may not be interested in event-level selections to explore ATLAS event metadata as a means to understand, qualitatively and quantitatively, the distributional characteristics of ATLAS data: to see the highest missing ET events or the events with the most leptons, to count how many events passed a given set of triggers, or to find events that failed a given trigger but nonetheless look relevant to an analysis based upon the results of offline reconstruction, and more. This talk provides an overview of ATLAS event-level selection services, with an emphasis upon the interactive Event-Level Selection Service Interface.
        Speakers: Dr Jack Cranshaw (Argonne National Laboratory), Dr Qizhi Zhang (Argonne National Laboratory)
        Slides
      • 174
        The JANA Calibrations and Conditions Database API
        Calibrations and conditions databases can be accessed from within the JANA Event Processing framework through the API defined in its JCalibration base class. This system allows constants to be retrieved through a single line of C++ code with most of the context implied by the run currently being analyzed. The API is designed to support everything from databases, to web services to flat files for the backend. A Web Service backend using SOAP has been implemented which is particularly interesting since it addresses many cybersecurity issues.
        Speaker: David Lawrence (Jefferson Lab)
        Slides
      • 175
        The HADES Oracle database and its interfaces for experimentalists
        Since 2002 the HADES experiment at GSI employs an Oracle database for storing of all parameters relevant for simulation and data analysis. The implementation features a flexible, multi-dimensional and easy-to-use version management. Direct interfaces to the ROOT-based analysis and simulation framework HYDRA allow for an automated initialization based on actual or historic data which is needed at all levels of the analysis. Generic data structures, database tables and interfaces were developed to store variable sets of parameters of various types (C-types, binary arrays, ROOT based classes). A snapshot of the data can be stored in a ROOT file for exporting and local access. Web interfaces are used for parameter validation, to show the history of the data and to compare different data sets. They also provide access to additional information not directly used in the analysis (file catalog, beam time logbook, hardware). An interface between the EPICS runtime database and Oracle is realized by a program developed at SLAC. Run-based summary information is provided to allow for fast scans and filtering of the data indispensable for run validation. Web interfaces as well as interfaces to the analysis exist to make e.g. use of the ROOT graphics package. The database concept reported here is a possible platform for the implementation of a database in FAIR-ROOT, the latter being an advancement/offspring of HYDRA.
        Speaker: Dr Ilse Koenig (GSI Darmstadt)
        Slides
      • 176
        A lightweight high availability strategy for Atlas LCG File Catalogs
        The LCG File Catalog (LFC) is a key component of the LHC Computing Grid (LCG) middleware, as it contains the mapping between all logical and physical file names on the Grid. The Atlas computing model foresees multiple local LFC hosted in each Tier-1 and Tier-0, containing all information about files stored in that cloud. As the local LFC contents are presently not replicated, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. The issue of central LFC replication has been successfully addressed in LCG by the 3D project, which has deployed a replication environment (based on Oracle Streams technology) spanning the Tier-0 and all Tier-1. Anyway this solution is not suitable for Tier-1 - Tier-2 clouds, due to the considerable amount of man power needed for Oracle Streams administration/management and the high costs of the additional Oracle licenses needed to deploy Streams replication. A more lightweight solution is to copy the LFC Oracle backend information to one or more Tier-2s, exploiting the Oracle Dataguard technology. We present the results of a wide range of feasibility and performance tests run on a Dataguard-based LFC high availability environment, built between the Italian LHC Tier-1 (INFN - CNAF) and an Atlas Tier-2 located at INFN - Roma1. We also explain how this strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users.
        Speaker: Barbara Martelli (INFN)
        Slides
      • 177
        A RESTful web service interface to the ATLAS COOL database
        The COOL database in ATLAS is primarily used for storing detector conditions data, but also status flags which are uploaded summaries of information to indicate the detector reliability during a run. This paper introduces the use of CherryPy, a Python application server which acts as an intermediate layer between a web interface and the database, providing a simple means of storing to and retrieving from the COOL database which has found use in many web applications. The software layer is designed to be RESTful, implementing the common CRUD (Create, Read, Update, Delete) database methods by means of interpreting the http method (POST, GET, PUT, DELETE) on the server along with a URL identifying the database resource to be operated on. The format of the data (text, xml etc) is also determined by the http protocol. The details of this layer are described along with a popular application demonstrating its use, the ATLAS run list web page.
        Speaker: Dr Shaun Roe (CERN)
      • 178
        The Tile Calorimeter Web Systems for Data Quality Analyses
        The ATLAS detector consists of four major components: inner tracker, calorimeter, muon spectrometer and magnet system. In the Tile Calorimeter (TileCal), there are 4 partitions, each partition has 64 modules and each module has up to 48 channels. During the ATLAS commissioning phase, a group of physicists need to analyze the Tile Calorimeter data quality, generate reports and update the official database, when necessary. The Tile Commissioning Web Systems (TCWS) retrieves information from different directories and databases, executes programs that generate results, stores comments and verifies the calorimeter status. TCWS integrates different applications, each one presenting a unique data view. The Web Interface for Shifters (WIS) supports monitoring tasks by managing test parameters and all the calorimeter status. The TileComm Analysis stores plots, automatic analyses results and comments concerning the tests. With the necessity of increasing granularity, a new application was created: the Monitoring and Calibration Web System (MCWS). This application supports data quality analyses at channels level by presenting the automatic analyses results, the problematic known channels and the channels masked by the shifters. Through the web system, it's possible to generate plots and reports, related to the channels, identify new bad channels and update the Bad Channels List at the ATLAS official database (COOL DB). The Data Quality Monitoring Viewer (DQM Viewer) displays the data quality automatic results through an oriented visualization.
        Speaker: Andressa Sivolella Gomes (Universidade Federal do Rio de Janeiro (UFRJ))
        Slides
    • Poster session: whole day
      • 179
        A Filesystem to access CASTOR
        CASTOR provides a powerful and rich interface for managing files and pools of files backed by tape-storage. The API is modelled very closely on that of a POSIX filesystem, where part of the actual I/O part is handled by the rfio library. While the API is very close to POSIX it is still separated, which unfortunately makes it impossible to use standard tools and scripts straight away. This is particularly inconvenient when applications are written in languages other than C/C++ such as is frequently the case in web-apps. Here up to now the only the recourse was to use command-line utilities and parse their output, which is clearly a kludge. We have implemented a complete POSIX filesystem to access CASTOR using FUSE (Filesystem in Userspace) and have successfully tested and used this on SLC4 and SLC5 (both in 32 and 64 bit). We call it CastorFS. In this paper we will present its architecture and implementation, with emphasis on performance and caching aspects.
        Speaker: Alexander Mazurov (CERN)
        Poster
      • 180
        A Geant4 physics list for spallation and related nuclear physics applications based on INCL and ABLA models
        We present a new Geant4 physics list prepared for nuclear physics applications in the domain dominated by spallation. We discuss new Geant4 models based on the translation of INCL intra-nuclear cascade and ABLA de-excitation codes in C++ and used in the physic list. The INCL model is well established for targets heavier than Aluminium and projectile energies from ~ 150 MeV up to 2.5 GeV ~ 3 GeV. Validity of the Geant4 physics list is demonstrated from the perspective of accelerator driven systems and EURISOL project, especially with the neutron double differential cross sections and residual nuclei production. Foreseen improvements of the physics models for the treatment of light targets (Carbon - Oxygen) and light ion beams (up to Carbon) are discussed. An example application utilizing the physics list is introduced.
        Speaker: Mr Aatos Heikkinen (Helsinki Institute of Physics, HIP)
        Poster
      • 181
        A Monte Carlo study for the X-ray fluorescence enhancement induced by photoelectron secondary excitation
        Well established values for the X-ray fundamental parameters (fluorescence yields, characteristic lines branching ratios, mass absorption coefficients, etc.) are very important but not adequate for an accurate reference-free quantitative X-Ray Fluorescence (XRF) analysis. Secondary ionization processes following photon induced primary ionizations in matter may contribute significantly to the intensity of the detected fluorescence radiation introducing significant errors in quantitative XRF analysis, if not taken into account properly. In the present work, a new developed particle/ray-tracing Monte Carlo (MC) simulation code is presented. The code implements appropriate databases for all the physical interactions that involve between x-rays, electrons and matter leading to the determination of the intensity of the characteristic radiation induced by photoelectrons for any given experimental conditions (sample geometry, incident beam parameters etc). In order to achieve acceptable counting statistics for the secondary photoelectron excitation, that it is a second order phenomenon, the MC simulation code is executed on a powerful cluster-computer facility, which is able to host long time simulations (up to 20 billion events per exciting energy) deducing thus low relative uncertainties. The final goal is to compare the simulated MC data together with high accurate experimental measurements, deduced from well and absolute calibrated experimental setups. In this way the current description of electron ionization cross sections can be properly assessed, whereas in the case that systematic differences are observed, it may lead to the determination of corrective electron ionization cross sections versus energy that fit properly the experimental data.
        Speaker: Dimosthenis Sokaras (N.C.S.R. Demokritos, Institute of Nuclear Physics)
      • 182
        A new Data Format for the Commissioning Phase of the ATLAS Detector
        In the commissioning phase of the ATLAS experiment, low-level Event Summary Data (ESD) are analyzed to evaluate the performance of the individual subdetectors, the performance of the reconstruction and particle identification algorithms, and obtain calibration coefficients. In the GRID model of distributed analysis, these data must be transferred to Tier-1 and Tier-2 sites before they can be analyzed. However, the large size of ESD (~1 MByte/event) constrains the amount of data that can be distributed on the GRID and be made readily available on disks. In order to overcome this constraint and make the data fully available, new data formats - collectively known as Derived Physics Data (DPD) - have been designed. Each DPD format contains a subset of the ESD data, tailored to specific needs of the subdetector and object reconstruction and identification performance groups. Filtering algorithms perform a selection based on physics contents and trigger response, further reducing the data volume. Thanks to these techniques, the total volume of DPD to be distributed on the GRID amounts to 20% of the initial ESD data. An evolution of the tools developed in this context will serve to produce another set of DPDs that are specifically tailored for physics analysis.
        Speaker: Karsten Koeneke (Deutsches Elektronen-Synchrotron (DESY))
        Poster
      • 183
        Adaptive Vertex Reconstruction in CMS
        Reconstruction of interaction vertices is an essential step in the reconstruction chain of a modern collider experiment such as CMS; the primary ("collision") vertex is reconstructed in every event within the CMS reconstruction program, CMSSW. However, the task of finding and fitting secondary ("decay") vertices also plays an important role in several physics cases such as the reconstruction of long-lived particles like Kaons, or the identification of b-jets, i.e. the task of b-tagging. A very simple but powerful general-purpose vertex finding algorithm is presented that is based on the well-established adaptive vertex fitter to find and fit primary and secondary vertices.
        Speaker: Dr Rudi Frühwirth (Institut fuer Hochenergiephysik (HEPHY)-Oesterreichische Akademi)
        Poster
      • 184
        ALICE Tier2 at GSI
        GSI Darmstadt is hosting a Tier2 centre for the ALICE experiment providing about 10% of ALICE Tier2 resources. According to the computing model the tasks of a Tier2 centre are scheduled and unscheduled analysis as well as Monte Carlo simulation. To accomplish this a large water cooled compute cluster has been set up and configured consisting of currently 200 CPUs (1500 Cores). After intensive I/O tests it has been decided to provide on site storage via a Lustre cluster, at the moment 150 TB disk space, which is visible from each individual worker node. Additionally an xrootd managed storage cluster is provided which serves also as a Grid Storage Element. The central GSI batch farm can be accessed with Grid methods from outside as well as via LSF methods for users from the inside of the centre. Both is used mainly for simulation jobs. Moreover for interactive access a PROOF analysis facility, GSIAF, is maintained on a subset of the same machines. On these machines the necessary infrastructure has been statically installed providing to each user 160 PROOF servers and the possibility to analyse 1700 events per seconds. Also the alternative to create a PROOF on demand cluster dynamically on the batch farm machines is supported. The coexistence of interactive processes and batch jobs has been studied and can be dealt with by adjusting the process priorities accordingly. All relevant services are monitored contineously, to a large extend based on MonaLisa. Detailed user experience, data transfer activities, as well as future and ramp up plans are reported also in this presentation. GSI will profit from the expert knowledge it will gain during the set up and operation of the ALICE Tier2 centre for the upcoming Tier0 centre for FAIR.
        Speaker: Dr Kilian Schwarz (GSI)
      • 185
        ALICE TPC particle identification, calibration and performance.
        We will present a Particle identification algorithm, as well as a calibration and performance study in the ALICE Time Projection Chamber (TPC) using the dEdx measurement. New calibration algorithms had to be developed, since the simple geometrical corrections were only suitable at 5-10% level. The PID calibration consists of the following parts: gain calibration, energy deposit calibration as a function of angle and position and Bethe-Bloch energy deposit calibration. The gain calibration is done in the space domain (pad-by-pad gain calibration), as well as in the time domain (gain as a function of time, pressure, temperature and gas composition). The energy deposit calibration is done, taking into account the particle dependence on the track topology (inclination angles with respect to the detection layer and particle position). For the Bethe-Bloch energy calibration, five parameters of the Bethe-Bloch formula, which are used for the TPC PID, were fitted for the TPC gas mixture. The studies were performed on the cosmic data, and the comparison with the MonteCarlo simulation showed good results.
        Speaker: Dr Marian Ivanov (GSI)
      • 186
        ALICE TPC reconstruction performance study
        We will present our studies of the performance of the reconstruction in the ALICE Time projection chamber (TPC). The reconstruction algorithm in question is based on the Kalman filter. The performance is characterized by the resolution in the position, angle and momenta as a function of particle properties (momentum, position). The resulting momentum parametrization is compared with the MonteCarlo simulation, which allows to disectangle the material budget and systematic effects influences. The presented studies were performed on the cosmic data.
        Speaker: Dr Marian Ivanov (GSI)
      • 187
        Alignment of the ATLAS Inner Detector Tracking System
        The CERN's Large Hadron Collider (LHC) is the world largest particle accelerator. ATLAS is one of the two general purpose experiments equipped with a charge particle tracking system built on two technologies: silicon and drift tube based detectors, composing the ATLAS Inner Detector (ID). The required precision for the alignment of the most sensitive coordinates of the silicon sensors is just few microns. Therefore the alignment of the ATLAS ID requires complex algorithms with extensive CPU and memory usage. So far the proposed alignment algorithms are exercised on several applications. We will present the outline of the alignment approach and results from Cosmic Ray runs and large scale computing simulation of physics samples mimicking the ATLAS operation during real data taking. The full alignment chain is tested using that stream and alignment constants are produced and validated within 24 hours. Cosmic ray data serves to produce an early alignment of the real ATLAS Inner Detector even before the LHC start up. Beyond all tracking information, the assembly survey data base contains essential information in order to determine the relative position of one module with respect to its neighbors.
        Speaker: Daniel Kollar
        Poster
      • 188
        Alignment of the LHCb detector with Kalman fitted tracks
        We report on an implementation of a global chisquare algorithm for the simultaneous alignment of all tracking systems in the LHCb detector. Our algorithm uses hit residuals from the standard LHCb track fit which is based on a Kalman filter. The algorithm is implemented in the LHCb reconstruction framework and exploits the fact that all sensitive detector elements have the same geometry interface. A vertex constraint is implemented by fitting tracks to a common point and propagating the change in track parameters to the hit residuals. To remove unconstrained or poorly constrained degrees of freedom (so-called weak modes) the average movements of (subsets of) alignable detector elements can be fixed with Lagrange constraints. Alternatively, weak modes can be removed with a cutoff in the eigenvalue spectrum of the second derivative of the chisquare. As for all LHCb reconstruction and analysis software the configuration of the algorithm is done in python and gives detailed control over the selection of alignable degrees of freedom and constraints. The study the performance of the algorithm on simulated events and first LHCb data.
        Speakers: Jan Amoraal (NIKHEF), Wouter Hulsbergen (NIKHEF)
        Poster
      • 189
        AMS Experiment Parallel Event Processing using ROOT/OPENMP scheme
        The ROOT based event model for the AMS experiment is presented. By adding few pragmas to the main ROOT code the parallel processing of the ROOT chains on the local multi-core machines became possible. The scheme does not require any merging of the user defined output information (like histograms, etc). Also no any pre-installation procedure is needed. The scalability of the scheme is shown on the example of real physics analysis application (~20k histograms). The comparison with the ProofLite performance for the same application is also done.
        Speaker: Vitali CHOUTKO (CERN)
        Slides
      • 190
        Application of the Kalman Alignment Algorithm to the CMS Tracker
        One of the main components of the CMS experiment is the Inner Tracker. This device, designed to measure the trajectories of charged particles, is composed of approximately 16,000 planar silicon detector modules, which makes it the biggest of its kind. However, systematical measurement errors, caused by unavoidable inaccuracies in the construction and assembly phase, reduce the precision of the measurements drastically. The geometrical corrections that are therefore required should be known to an accuracy that is better than the intrinsic resolution of the detector modules, such that special alignment algorithms have to be utilized. The Kalman Alignment Algorithm (KAA) is a novel approach to extract a set of alignment constants from a sufficiently large collection of recorded particle tracks, suited even for a system as big as the CMS Inner Tracker. To show that the method is functional and well understood, and thus expedient for the data-taking period of the CMS experiment, two significant case studies are discussed. Results from detailed simulation studies demonstrate that the KAA is able to align the CMS Inner Tracker under the conditions expected during the LHC start-up phase. Moreover, it has been shown that the associated computational effort can be kept at a reasonable level by deploying the available CMS computing resources to process the data in parallel. Furthermore, an analysis of the first experimental data from cosmic particle tracks, recorded directly after the assembly of the CMS Inner Tracker, shows that the KAA is at least competitive to existing algorithms when applied to real data.
        Speaker: Dr Edmund Widl (Institut für Hochenergiephysik (HEPHY Vienna))
        Poster
      • 191
        ATLAS@Amazon Web Services: Running ATLAS software on the Amazon Elastic Compute Cloud
        We show how the ATLAS offline software is ported on the Amazon Elastic Compute Cloud (EC2). We prepare an Amazon Machine Image (AMI) on the basis of the standard ATLAS platform Scientific Linux 4 (SL4). Then an instance of the SLC4 AMI is started on EC2 and we install and validate a recent release of the ATLAS offline software distribution kit. The installed software is archived as an image on the Amazon Simple Storage Service (S3) and can be quickly retrieved and connected to new SL4 AMI instances using the Amazon Elastic Block Store (EBS). ATLAS jobs can then configure against the release kit using the ATLAS configuration management tool (cmt) in the standard way. The output of jobs is exported to S3 before the SL4 AMI is terminated. Job status information is transferred to the Amazon SimpleDB service. The whole process of launching instances of our AMI, starting, monitoring and stopping jobs and retrieving job output from S3 is controlled from a client machine using python scripts implementing the Amazon EC2/S3 API via the boto library working together with small scripts embedded in the SL4 AMI. We report our experience with setting up and operating the system using standard ATLAS job transforms.
        Speaker: Stefan Kluth (Max-Planck-Institut für Physik)
        Poster
      • 192
        Automatic TTree creation from Reconstructed Data Objects in JANA
        Automatic ROOT tree creation is achived in the JANA Event Processing Framework through a special plugin. The janaroot plugin can automatically define a TTree from the data objects passed though the framework without using a ROOT dictionary. Details on how this is achieved as well as possible applications will be presented.
        Speaker: Dr David Lawrence (Jefferson Lab)
      • 193
        Building a Storage Cluster with Gluster
        Gluster, a free cluster file-system scalable to several peta-bytes, is under evaluation at the RHIC/USATLAS Computing Facility. Several production SunFire x4500 (Thumper) NFS servers were dual-purposed as storage bricks and aggregated into a single parallel file-system using TCP/IP as an interconnect. Armed with a paucity of new hardware, the objective was to simultaneously allow traditional NFS client access to discreet systems as well as access to the GlusterFS global namespace without impacting production. Gluster is elegantly designed and carries an advanced feature set including, but not limited to, automated replication across servers, server striping, fast db backend, and I/O scheduling. GlusterFS exists as a layer above existing file-systems, does not have a single-point-of-failure, supports RDMA, distributes metadata, and is entirely implemented in user space via FUSE. We will provide a background of Gluster along with its architectural underpinnings, followed by a description of our test-bed, environmentals, and performance characteristics.
        Speaker: Robert Petkus (Brookhaven National Laboratory)
      • 194
        Building and Commissioning of the CMS CERN Analysis Facility (CAF)
        The CMS CERN Analysis Facility (CAF) was primarily designed to host a large variety of latency-critical workflows. These break down into alignment and calibration, detector commissioning and diagnosis, and high-interest physics analysis requiring fast-turnaround. In addition to the low latency requirement on the batch farm, another mandatory condition is the efficient access to the RAW detector data stored at the CERN Tier-0 facility. The CMS CAF also foresees resources for interactive login by a large number of CMS collaborators located at CERN, as an entry point for their day-by-day analysis. These resources will run on a separate partition in order to protect the high-priority use-cases described above. While the CMS CAF represents only a modest fraction of the overall CMS resources on the WLCG GRID, an appropriately sized user-support service needs to be provided. In this presentation we will describe the building, commissioning and operation of the CMS CAF during the year 2008. The facility was heavily and routinely used by almost 250 users during multiple commissioning and data challenge periods. It reached a CPU capacity of 1.4MSI2K and a disk capacity at the Petabyte scale. In particular, we will focus on the performances in terms of networking, disk access and job efficiency and extrapolate prospects for the upcoming LHC first year data taking. We will also present the experience gained and the limitations observed in operating such a large facility, in which well controlled workflows are combined with chaotic type analysis by a large number of physicists.
        Speaker: Dr Peter Kreuzer (RWTH Aachen IIIA)
        Poster
      • 195
        Calibration of ATLAS Resistive Plate Chambers
        Resistive Plate Chambers (RPC) are used in ATLAS to provide the first level muon trigger in the barrel region. The total size of the system is about 16000 m2, readout by about 350000 electronic channels. In order to reach the needed trigger performance, a precise knowledge of the detector working point is necessary, and the high number of readout channels calls for severe requirements on the analysis tools to be developed. First of all, high-statistics data samples will have to be used as input. Second, the results would me unmanageable without a proper interface to some database technology. Moreover, the CPU power needed for the anlaysis makes it necessary to use distributed computing resources. A set of analysis tools will be presented, coping with all the critical aspects of this task, ranging from the use of a dedicated data stream (the so-called muon calibration stream), to the automatic job submission on the GRID, to the implementation of an interface to ATLAS' conditions database. Integration with Detector Control System information and impact of the calibration on the performance of the reconstruction algorithms will be discussed as well.
        Speaker: Andrea Di Simone (INFN Roma2)
        Poster
      • 196
        Calibration of the Barrel Muon DT System of CMS with Cosmic Data
        The calibration process of the Barrel Muon DT System of CMS as developed and tuned during the recent cosmic data run is presented. The calibration data reduction method, the full work flow of the procedure and final results are presented for real and simulated data.
        Speaker: Dr Silvia Maselli (INFN Torino)
        Poster
      • 197
        CASTOR Tape Performance Optimisation at the UK LCG Tier-1
        The UK LCG Tier-1 computing centre located at the Rutherford Appleton Laboratory is responsible for the custodial storage and processing of the raw data from all four LHC experiments; CMS, ATLAS, LHCb and ALICE. The demands of data import, processing, export and custodial tape archival place unique requirements on the mass storage system used. The UK Tier-1 uses CASTOR as the storage technology of choice, which currently handles 2.3PB of disk across 320 disk servers. 18 Sun T10000 tape drives provide the custodial back-end. This paper describes work undertaken to optimise the performance of the CASTOR infrastructure at RAL. Significant gains were achieved and the lessons learned have been deployed at other LHC CASTOR sites. Problems were identified with the performance of tape migration when disk servers were under production-level load. An investigation was launched at two levels; hardware and operating system performance, and the impact of CASTOR tape algorithms and job scheduling. A test suite was written to quantify the low-level performance of disk servers with various tunings applied, and CMS test data coupled with the existing transfer infrastructure was used to verify the performance of the tape system with realistic experimental data transfer patterns. The improvements identified resulted in the instantaneous tape migration rate per drive reaching near line-speed of 100MB/s, a vast improvement on the previous attainable rate of around 16MB/s.
        Speaker: Mr James Jackson (H.H. Wills Physics Laboratory - University of Bristol)
      • 198
        CERN automatic audioconference service
        Scientists all over the world collaborate with the CERN laboratory day by day. They must be able to communicate effectively on their joint projects at any time, so telephone conferences become indispensable and widely used. The traditional conference system, managed by 6 switchboard operators, was hosting more than 20000 hours and 5500 conference per year. However, the system needed to be modernized in three ways. Firstly, to ensure researchers autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audioconferences into a collaborative framework. To solve this issue, the CERN telecommunications team drew up a specification to implement a new system. After deep analysis, it was decided to use a new Alcatel collaborative conference solution based on the SIP protocol. During 2005/2006 the system was tested as the first European pilot and, based on CERN’s recommendations, several improvements were implemented: billing, security, redundancy, etc. The new automatic conference system has been operational since the second half of 2006. It is very popular for the users: 39000 calls and 30000 accumulated hours for around 5000 conferences during the last twelve months. Furthermore, to cope with the demand, the capacity of the service is about to be tripled and new features, such as apps sharing and on-line presentation, should be proposed in the near future.
        Speaker: Rodrigo Sierra Moral (CERN)
        Poster
      • 199
        CERN GSM monitoring system
        As a result of the tremendous development of GSM services over the last years, the number of related services used by organizations has drastically increased. Therefore, monitoring GSM services is becoming a business critical issue in order to be able to react appropriately in case of incident. In order to provide with GSM coverage all the CERN underground facilities, more than 50 km of leaky feeder cable have been deployed. This infrastructure is also used to propagate VHF radio signals for the CERN’s fire brigade. Even though CERN’s mobile operator monitors the network, it cannot guarantee the availability of GSM services, and for sure not VHF services, where signals are carried by the leaky feeder cable. So, a global monitoring system has become critical to CERN. In addition, monitoring this infrastructure will allow to characterize its behaviour over time, especially with LHC operation. Given that commercial solutions were not yet mature, CERN developed a system based on GSM probes and an application server which collects data from them via the CERN GPRS network. By placing probes in strategic locations and comparing measurements between probes, it is possible now possible to determine if there is a GSM or VHF problem on one leaky feeder cable segment. This system has been successfully working for several months in underground facilities, allowing CERN to inform GSM users and fire brigade in case of incidents.
        Speaker: Mr Carlos Ghabrous (CERN)
        Poster
      • 200
        ci2i and CMS-TV: Generic Web Tools for CMS Centres
        The CMS Experiment at the LHC is establishing a global network of inter-connected "CMS Centres" for controls, operations and monitoring at CERN, Fermilab, DESY and a number of other sites in Asia, Europe, Russia, South America, and the USA. "ci2i" ("see eye to eye") is a generic Web tool, using Java and Tomcat, for managing: hundreds of displays screens in many locations; monitoring content and mappings to displays; CMS Centres' hardware configuration; user login rights and group accounts; screen snapshot services; and operations planning tools. ci2i enables CMS Centre users anywhere in the world to observe displays in other CMS Centres, notably CERN, and manage the content remotely if authorised. Distributed shifts are already happening. "CMS-TV" aggregates arbitrary (live) URLs into a cyclic program that can be watched full-screen in any Web browser. "TV channels" can be trivially created and configured with either specific expert content or for outreach displays in public places. All management is done from a simple Web interface with secure authentication. We describe the specific deployment at CERN to manage operations in the CMS Centre @ CERN (more than 850 active users and ever increasing) including the aspects of system administration (PXE aims kickstart, gdm auto-login, security, afs account and acl management, etc.).
        Speaker: Dr Lucas Taylor (Northeastern U., Boston)
        Poster
      • 201
        CluMan: High-density displays and cluster management
        LHC computing requirements are such that the number of CPU and storage nodes, and the complexity of the services to be managed are bringing new challenges. Operations like checking configuration consistency, executing actions on nodes, moving them between clusters etc. are very frequent. These scaling challenges are the basis for CluMan, a new cluster management tool being designed and developed at CERN. High-density displays such as heat maps, grids or color maps are more and more commonly used in various applications like data visualization or monitoring systems. They allow humans to see, interpret and understand complex and detailed information at a glance. We propose to present the ideas behind the CluMan project, and to show how high density displays are used to help service managers to understand, manage and control the state and behavior of their clusters.
        Speaker: Miroslav Siket (CERN)
        Poster
      • 202
        Cluster Filesystem usage for HEP Analysis
        Having the first analyses capable data from LHC on the horizon, more and more sites are facing the question/problem of building a high efficient analysis facility, for their local physicists, mostly attached to a Tier2/3. The most important ingredient for such a facility is the underlying storage system and here the selected option for the data management and data access system - well known as 'Filesystem'. At DESY we've build up a facility deploying the HPC grounded cluster filesystem Lustre, serving as a 'very big and fast playground' for various purposes like compiling large packages, accessing n-tuple data for histogramming or even private mc generation. We will show the actual configuration, measurements and experience from the user perpective together with impressions and measures from the system perspective.
        Speaker: Martin Gasthuber (DESY)
      • 203
        CMS production and processing system - Design and experiences
        ProdAgent is a set of tools to assist in producing various data products such as Monte Carlo simulation, prompt reconstruction, re-reconstruction and skimming In this paper we briefly discuss the ProdAgent architecture, and focus on the experience in using this system in recent computing challenges, feedback from these challenges, and future work. The computing challenges have proven invaluable for scaling the system to the level desired for the first LHC physics runs. The feedback from the recent computing challenges resulted in a design review of some of the ProdAgent core components. Results of this review and the mandate to converge development within the data management sub projects, led to the establishment of the WCore project: a common set of libraries for CMS workflow systems, with the aim of reducing code duplication between sub projects, and increasing maintainability. This paper discusses some of the lessons learned from recent computing challenges and how this experience has been incorporated into the WMCore project. The current ProdAgent project has shifted towards bulk operations (optimizing database performance) and buffered tasks (so to better handle reliability when interacting with third party components). Two significant areas of development effort are the migration to a common set of libraries (WMCore) for all CMS workflow systems and a system to split and manage work requests between ProdAgents - to better utilise the available resources.
        Speaker: Mr Stuart Wakefield (Imperial College)
        Poster
      • 204
        Commissioning of the ATLAS Inner Detector software infrastructure with cosmic rays
        T Cornelissen on behalf of the ATLAS inner detector software group Several million cosmic tracks were recorded during the combined ATLAS runs in Autumn of 2008. Using these cosmic ray events as well as first beam events, the software infrastructure of the inner detector of the ATLAS experiment (pixels and microstrips silicon detectors as well as straw tubes withadditional transition radiation detection) is being commissioned. The full software chain has been set up in order to reconstruct and analyse this kind of events. Final detector decoders have been developed, different pattern recognition algorithms and track fitters have been validated as well as the various calibration methods. The infrastructure to deal with conditions data coming from the data acquisition, detector control system and calibration runs has been put in place, allowing also to apply alignment and calibration constants. The software has also been essential to monitor the detector performance during data taking. Detector efficiencies, noise occupancies and resolutions are being studied in detail as well as the performance of the track reconstruction itself.
        Speaker: Johanna Fleckner (CERN / University of Mainz)
        Poster
      • 205
        Commissioning of the ATLAS reconstruction software with first data
        Looking towards first LHC collisions, the ATLAS detector is being commissioned using all types of physics data available: cosmic rays and events produced during a few days of LHC single beam operations. In addition to putting in place the trigger and data acquisition chains, commissioning of the full software chain is a main goal. This is interesting not only to ensure that the reconstruction, monitoring and simulation chains are ready to deal with LHC physics data, but also to understand the detector performance in view of achieving the physics requirements. The recorded data have allowed us to study the ATLAS detector in terms of efficiencies, resolutions, channel integrity, alignment and calibrations. They have also allowed us to test and optimize the sub-systems reconstruction as well as some combined algorithms, such as combined tracking tools and different muon identification algorithms. The status of the integration of the complete software chain will be presented as well as the data analysis results.
        Speaker: Arshak Tonoyan (CERN)
        Poster
      • 206
        Commissioning the CMS Alignment and Calibration Framework
        The CMS experiment has developed a powerful framework to ensure the precise and prompt alignment and calibration of its components, which is a major prerequisite to achieve the optimal performance for physics analysis. The prompt alignment and calibration strategy harnesses computing resources both at the Tier-0 site and the CERN Analysis Facility (CAF) to ensure fast turnaround for updating the corresponding database payloads. An essential element is the creation of dedicated data streams concentrating the specific event information required by the various alignment and calibration workflows. The resulting low latency is required for feeding the resulting constants into the prompt reconstruction process, which is essential for achieving swift physics analysis of the LHC data. The presentation discusses the implementation and the computational aspects of the alignment & calibration framework. Recent commissioning campaigns with cosmic muons, beam halo and simulated data have been used to gain detailed experience with this framework, and results of this validation are reported.
        Speaker: David Futyan (Imperial College, University of London)
        Poster
      • 207
        Customizable Scientific Web-Portal for DIII-D Nuclear Fusion Experiment
        Increasing utilization of the Internet and convenient web technologies has made the web-portal a major application interface for remote participation and control of scientific instruments. While web-portals have provided a centralized gateway for multiple computational services, the amount of visual output often is overwhelming due to the high volume of data generated by complex scientific instruments and experiments. Since each scientist may have different priorities and areas of interest in the experiment, filtering and organizing information based on the individual user’s need can increase the usability and efficiency of a web-portal. DIII-D is the largest magnetic nuclear fusion device in the US. A web-portal has been designed to support the experimental activities of DIII-D researchers worldwide. It offers a customizable interface with personalized page layouts and list of services for users to select. Each individual user can create a unique working environment to fit their own needs and interests. Customizable services are: real-time experiment status monitoring, diagnostic data access, interactive data analysis and visualization. The web-portal also supports interactive collaborations by providing collaborative logbook, shared visualization and online instant messaging services. The DIII-D web-portal development utilizes multi-tier software architecture, and web2.0 technologies, such as AJAX and Django, to develop a highly-interactive and customizable user interface. A set of client libraries was also created to provide a solution for conveniently plugging in new services to the portal. A live demonstration of the system will be presented. _ *Work supported by U.S. DOE SciDAC program at General Atomics under Cooperative Agreement DE-FC02-01ER25455.
        Speaker: Mr Gheni Abla (General Atomics)
        Poster
      • 208
        Data Driven Approach to Calorimeter Simulation in CMS
        CMS is looking forward to tune detector simulation using the forthcoming collision data from LHC. CMS established a task force in February 2008 in order to understand and reconcile the discrepancies observed between the CMS calorimetry simulation and the test beam data recorded during 2004 and 2006. Within this framework, significant effort has been made to develop a strategy of tuning fast and flexible parametrizations describing showering in the calorimeter with available data from test beams. These parametrizations can be used within the context of Full as well as Fast Simulation. The study is extended to evaluate the use of first LHC collision data, when it becomes available, to rapidly tune the CMS calorimeter.
        Speaker: Sunanda Banerjee (Fermilab, USA)
        Poster
      • 209
        dCache Storage Cluster at BNL
        Over the last (2) years, the USATLAS Computing Facility at BNL has managed a highly performant, reliable, and cost effective dCache storage cluster using SunFire x4500/4540 (Thumper/Thor) storage servers. The design of a discreet storage cluster signaled a departure from a model where storage resides locally on a disk-heavy compute farm. The consequent alteration of data flow mandated a dramatic re-construction of the network fabric. This work will cover all components of our dCache storage cluster (from door to pool) including OS/ZFS file-system configuration, 10GE network tuning, monitoring, and environmentals. Performance metrics will be surveyed within the context of our Solaris 10 production system as well as those rendered during evaluations of OpenSolaris and Linux. Failure modes, bottlenecks, and deficiencies will be examined. Lastly, we discuss competing architectures under evaluation, scaling limits in our current model, and future technologies that warrant close surveillance.
        Speaker: Robert Petkus (Brookhaven National Laboratory)
      • 210
        DeepConference: A complete conference in a picture
        Particle physics conferences lasting a week (like CHEP) can have 100’s of talks and posters presented. Current conference web interfaces (like Indico) are well suited to finding a talk by author or by time-slot. However, browsing the complete material in a modern large conference is not user friendly. Browsing involves continually making the expensive transition between HTML viewing and talk-slides (which are either PDF files or some other format). Further the web interfaces aren’t designed for undirected browsing. The advent of multi-core computing and advanced video cards means that we have more processor power available for visualization than any time in the past. This poster describes a technique of rendering a complete conference’s slides and posters as a single very large picture. Standard plug-in software for a browser allows a user to zoom in on a portion of the conference that looks interesting. As the user zooms further more and more details become visible, allowing the user to make a quick and cheap decision on whether to spend more time on a particular talk. The project, DeepConference, has been implemented as a public web site and can render any conference whose agenda is powered by Indico. The rendering technology is powered by the free download, Silverlight. The poster discusses the implementation and use as well as cross platform performance and possible future directions. A demo will be shown.
        Speaker: Prof. Gordon Watts (UNIVERSITY OF WASHINGTON)
        Poster
      • 211
        Development of a simulated trigger generator for the ALICE commissioning
        ALICE (A Large Ion Collider Experiment) is an experiment at the LHC (Large Hadron Collider) optimized for the study of heavy-ion collisions. The main aim of the experiment is to study the behavior of strongly interaction matter and quark gluon plasma. In order to be ready for the first real physics interaction, the 18 sub-detectors composing ALICE have been tested using cosmic rays and sequences of random trigger used to simulate p-p and heavy ion interactions. In order to simulate real triggers, the RTG (Random Trigger Generator) has been developed and it is able to provide 6 concurrent sequences of trigger with different probabilities. This paper will describe the hardware that generates the binary stream used as trigger and the software algorithms to create the sequences and to control the hardware. It will describe the tests performed in the laboratory on the random trigger generator to confirm its correct behavior and the details of the installation in the counting room of ALICE where it provides the triggers for all the sub-detectors. It will also discuss the configurations used to simulate several trigger combinations likely to happen with the real beam.
        Speaker: Dr Filippo Costa (CERN)
        Poster
      • 212
        Electronic Calibration of the ATLAS LAr Calorimeter
        The Liquid Argon (LAr) calorimeter is a key detector component in the ATLAS experiment at the LHC, designed to provide precision measurements of electrons, photons, jets and missing transverse energy. A critical element in the precision measurement is the electronic calibration. The LAr calorimeter has been installed in the ATLAS cavern and filled with liquid argon since 2006. The electronic calibration of the readout system has been continuously exercised in the commissioning phase, resulting a fully commissioned calorimeter with its readout and a small number of problematic channels. A total of only 0.02% of the read out channels are dead beyond repair and 0.4% need special treatment for calibration. Throughout the last two years, a large amount of calibration data have been collected. We present here the the LAr electronic calibration scheme, large scale acquisition and processing of the calibration data, the measured stability of the pedestal, the pulse shape and the gain, and the expected calibration procedure for LHC running. Various problems observed and addressed during the commissioning phase will also be discussed.
        Speaker: Dr Martin Aleksa (for the LAr conference committee) (CERN)
      • 213
        Enhancing GridFTP and GPFS performances using intelligent deployment
        Many High Energy Physics experiments must share and transfer large volumes of data. Therefore, the maximization of data throughput is a key issue, requiring detailed analysis and setup optimization of the underlying infrastructure and services. In Grid computing, the data transfer protocol called GridFTP is widely used for efficiently transferring data in conjunction with various types of file systems. In this paper, we focus on the interaction and performance issues in a setup, which combines GridFTP server with the IBM General Parallel File System (GPFS), adopted for providing storage management and capable of handling petabytes of data and billions of files. A typical issue is the size of the data blocks read from disk used by the GridFTP server version 2.3, which can potentially impair the data transfer threshold achievable with an IBM GPFS data block. We propose an experimental deployment of GridFTP server characterized by being on a Scientific Linux Cern 4 (SLC4) 64-bit platform, having GridFTP server and IBM GPFS over a Storage Area Network (SAN) infrastructure aimed to improve data throughput and to serve distributed remote Grid sites. We present the results of data-transfer measurements, such as CPU load, network utilization, data read and write rates, obtained performing several tests at INFN Tier1 where the described deployment has been setup. During this activity, we have verified a significant improvement of the GridFTP performances (of almost 50%) on SLC4 64-bit over SAN saturating the Gigabit with a very low CPU load.
        Speaker: Mrs Elisabetta Ronchieri (INFN CNAF)
        Poster
      • 214
        Experience with LHCb alignment software on first data
        We report results obtained with different track-based algorithms for the alignment of the LHCb detector with first data. The large-area Muon Detector and Outer Tracker have been aligned with a large sample of tracks from cosmic rays. The three silicon detectors --- VELO, TT-station and Inner Tracker --- have been aligned with beam-induced events from the LHC injection line. We compare the results from the track-based alignment with expectations from detector survey.
        Speakers: Marc Deissenroth, Marc Deissenroth (Universität Heidelberg)
        Poster
      • 215
        Experimental validation of the Geant4 ion-ion models for carbon beams interaction at the hadron-therapy energy range (0 - 400 AMeV)
        Geant4 is a Monte Carlo toolkit describing transport and interaction of particles with matter. Geant4 covers all particles and materials, and its geometry description allows for complex geometries. Initially focused on high energy applications, the use of Geant4 is growing also in different like radioprotection, dosimetry, space radiation and external radiotherapy with proton and carbon beams. External radiotherapy using ion beams presents many advantages, both in terms of dose distributions and in biological efficiencies, compared to either conventional electron or photon beams as well as compared to the proton therapy. Nevertheless, an efficient and proper use of ions for patient irradiation requires a very accurate understanding of the complex processes governing interactions of ions with matter for both electromagnetic and hadronic interactions. In particular, the accurate knowledge of secondary neutral and charged particles production is of fundamental importance as it is strictly related to the biological dose released in tissues. Dose released in an ion-therapy treatment cannot be, in fact, correctly evaluated without these information. Is it moreover demonstrated that a lack exists for both experimental data (in terms of accurate double differential production cross sections) and validated nucleus-nucleus models in the particles and energy ranges typical of hadron-therapy applications: light incident ions (up to Carbon) at energies between 0 and 400 AMeV. In this work we will report and discuss a set of specific validations we performed to the test some of the nucleus-nucleus models actually provided inside Geant4. Double differential production cross sections of neutron and charged particles from 12C beams on different thin targets, obtained using alternative Geant4 models, will be compared to existing published data and to new data acquired by our group in a dedicated experiment performed at INFN/LNS.
        Speaker: Dr Pablo Cirrone (INFN-LNS)
      • 216
        Expression and cut parser for CMS event data
        We present a parser to evaluate expressions and boolean selections that is applied on CMS event data for event filtering and analysis purposes. The parser is based on boost spirit grammar definition, and uses Reflex dictionary for class introspections. The parser allows a natural definition of expressions and cuts in users configuration, and provides good run-time performances compared to other existing parsers.
        Speaker: Luca Lista (INFN Sezione di Napoli)
        Poster
      • 217
        Fast Simulation of the CMS detector at the LHC
        The experiments at the Large Hadron Collider (LHC) will start their search for answers to some of the remaining puzzles of particle physics in 2008. All of these experiments rely on a very precise Monte Carlo Simulation of the physical and technical processes in the detectors. A fast simulation has been developed within the CMS experiment, which is between 100-1000 times faster than its Geant4-based counterpart, at the same level of accuracy. Already now, the fast simulation is essential for the analyses carried out in CMS, because it facilitates studies of high statistics physics backgrounds and systematic errors that would otherwise be impossible to evaluate. Its simple and flexible design will be a major asset toward a quick and accurate tuning on the first data. The methods applied in the fast simulation, both software and physics wise, are being outlined. This includes the concepts of simulating the interaction of particles with the detector material and the response of the various parts of the detector, namely the silicon tracker, the electromagnetic and hadron-calorimeters and the muon system.
        Speaker: Douglas Orbaker (University of Rochester)
        Poster
      • 218
        Fit of weighted histograms in the ROOT framework.
        Weighted histograms are often used for the estimation of a probability density functions in High Energy Physics. The bin contents of a weighted histogram can be considered as a sum of random variables with random number of terms. A generalization of the Pearson’s chi-square statistics for weighted histograms and for weighted histograms with unknown normalization has been recently proposed by the first author. The usage of these statistics provide the possibility of fitting the parameters of a probability density functions. A new implementation of this statistical method has been recently realized within the ROOT statistical framework using the MINUIT algorithm for minimization. We will describe this statistical method and its new implementation including some examples of applications. A numerical investigation is presented for fitting various histograms with different numbers of events. Restrictions related with the application of the procedure for histograms with small statistics of events are also discussed.
        Speakers: Lorenzo Moneta (CERN), Prof. Nikolai GAGUNASHVILI (University of Akureyri, Iceland)
      • 219
        Forget multicore. The future is manycore - An outlook to the explosion of parallelism likely to occur in the LHC era.
        This talk will start by reminding the audience that Moore's law is very much alive. Transistors will continue to double for every new silicon generation every other year. Chip designers are therefore trying every possible "trick" for putting the transistors to good use. The most notable one is to push more parallelism into each CPU: More and longer vectors, more parallel execution units, more cores and more hyperthreading inside each core. In addition highly parallel graphics processing units (GPUs) are also entering the game and compete efficiently with CPUs in several computing fields. The speaker will try to predict the CPU dimensions we will reach during the LHC era, based on what we have seen in the recent past and the projected roadmap for silicon. He will also discuss the impact on HEP event processing software. Can we continue to rely on event-level parallelism at the process levels or do we need to move to a new software paradigm? Finally he will show several examples for successfully threading of HEP software.
        Speaker: Mr Sverre Jarp (CERN)
      • 220
        Geant4 models for simulation of multiple scattering
        The process of multiple scattering of charge particles is an important component of Monte Carlo transport. At high energy it defines deviation of particles from ideal tracks and limitation of spatial resolution. Multiple scattering of low-energy electrons defines energy response and resolution of electromagnetic calorimeters. Recent progress in development of multiple scattering models within Geant4 toolkit is presented. The default Geant4 model based on Lewis approach and tuned to the available data. In order to understand precision of this model and to provide more precise alternatives new developments were carried out. The single Coulomb scatting model samples each elastic collision of a charged particle. This model is adequate for low-density media. It is combined with the new multiple scattering model based on Wentzel scattering function. This model assumed for muons and hadrons. Another new alternative model based on Goudsmit-Saunderson formalism have been developed for sampling of electron transport. The comparisons with the data are shown. The trade of precision and CPU performance is discussed with the focus on LHC detectors simulation.
        Speaker: Prof. Vladimir Ivantchenko (CERN, ESA)
        Poster
      • 221
        HEP Specific Benchmarks of Virtual Machines on multi-core CPU Architectures
        Virtualization technologies such as Xen can be used in order to satisfy the disparate and often incompatible system requirements of different user groups in shared-use computing facilities. This capability is particularly important for HEP applications, which often have restrictive requirements. The use of virtualization adds flexibility, however, it is essential that the virtualization technology place little overhead on the HEP application. We present an evaluation of the practicality of running HEP applications in multiple Virtual Machines (VMs) on a single multi-core Linux system. We use the benchmark suite used by the HEPiX CPU Benchmarking Working Group to give a quantitative evaluation relevant to the HEP community. Benchmarks are packaged inside VMs, and then the VMs are booted onto a single multi-core system. Benchmarks are then simultaneously executed on each VM to simulate highly loaded VMs running HEP applications. These techniques are applied to a variety of multi-core CPU architectures and VM configurations.
        Speaker: Ian Gable (University of Victoria)
        Poster
      • 222
        HepMCAnalyser - a tool for MC generator validation
        HepMCAnalyser is a tool for generator validation and comparisons. It is a stable, easy-to-use and extendable framework allowing for easy access/integration to generator level analysis. It comprises a class library with benchmark physics processes to analyse HepMC generator output and to fill root histogramms. A web-interface is provided to display all or selected histogramms, compare to references and validate the results based on Kolmogorov Tests. Steerable example programs can be used for event generation. The default steering is tuned to optimally align the distributions of the different generators. The tool will be used for generator validation by the Generator Services (GENSER) LCG project e.g. for version upgrades. It is supported on the same platforms as the GENSER libraries and is already in use at Atlas.
        Speaker: Cano Ay (University of Goettingen)
        Poster
      • 223
        High availability using virtualization
        High availability has always been one of the main problems for a data center. Till now high availability was achieved by host per host redundancy, a highly expensive method in terms of hardware and human costs. A new approach to the problem can be offered by virtualization. Using virtualization, it is possible to achieve a redundancy system for all the services running on a data center. This new approach to high availability allows to distribute the running virtual machines over the only servers up and running, by exploiting the features of the virtualization layer: start, stop and move virtual machines between physical hosts. The system (3RC) is based on a finite state machine, providing the possibility to restart each virtual machine over any physical host, or reinstall it from scratch. A complete infrastructure has been developed to install operating system and middleware in a few minutes. To virtualize the main servers of a data center, a new procedure has been developed to migrate physical to virtual hosts. The whole Grid data center SNS-PISA is running at the moment in virtual environment under the high availability system. As extension of the 3RC architecture, several storage solutions have been tested to store and centralize all the virtual disks, from NAS to SAN, to grant data safety and access from everywhere. Exploiting virtualization and ability to automatically reinstall a host, we provide a sort of host on demand, where the action on a virtual machine is performed only when a disaster occurs.
        Speaker: Dr Federico Calzolari (Scuola Normale Superiore - INFN Pisa)
        Poster
      • 224
        ILCSoft reconstruction software for the ILD Detector Concept at ILC
        The International Linear Collider is proposed as the next large accelerator project in High Energy Physics. The ILD Detector Concept Study is one of three international groups working on designing a detector to be used at the ILC. The ILD Detector is being optimised to employ the so called Particle Flow paradigm. Such an approach means that hardware alone will not be able to realise the full resolution of the detector, placing a much greater significance on the reconstruction software than has traditionally been the case at previous lepton colliders. This means that it is imperative that the detector is optimised using a full reconstruction chain employing prototypes of Particle Flow Algorithms. To meet this requirement ILD has assembled a full reconstruction suite of algorithms contained in the software package ILCSoft, comprising of low level digitisation through to higher level event analysis, such as jet finders and vertexing. The reconstruction software in ILCSoft uses the modular C++ application framework Marlin that is based on the international data format LCIO. ILCSoft also contains reconstruction packages for the detector prototype test beam studies with the EUDET project. Having developers create reconstruction software for both the full detector and prototype studies within one single package maximises the of application of algorithms. In this talk we give an overview of the reconstruction software in ILCSoft.
        Speaker: Dr Steven Aplin (DESY)
      • 225
        Implementation of a Riemann Helical Fit for GlueX track reconstruction
        The future GlueX detector in Hall D at Jefferson Lab is a large acceptance (almost 4pi) spectrometer designed to facilitate the study of the excitation of the gluonic field binding quark--anti-quark pairs into mesons. A large solenoidal magnet will provide a 2.2-Tesla field that will be used to momentum-analyze the charged particles emerging from a liquid hydrogen target. The trajectories of forward-going particles will be measured with a set of four planar cathode strip drift chamber packages with six layers per package. The design naturally separates the track into segments where the magnetic field is relatively constant, thereby opening up the possibility of performing local helical fits to the data within individual packages. We have implemented the Riemann Helical Fit algorithm to fit the track segments. The Riemann Helical Fit is a fast and elegant algorithm combining a circle fit for determining the transverse momentum and a line fit for determining the dip angle and initial z value that does not require computation of any derivative matrices. The track segments are then linked together by swimming through the field from one package to the next to form track candidates. A comparison between the Riemann Circle Fit and a simple linear regression method that assumes that the origin is on the circle will be presented. A comparison between the Riemann Helical Fit and a full least-squares fit with a non-uniform magnetic field will also be presented.
        Speaker: Simon Taylor (Jefferson Lab)
      • 226
        Improving collaborative documentation in CMS
        Complete and up-to-date documentation is essential for efficient data analysis in a large and complex collaboration like CMS. Good documentation reduces the time spent in problem solving for users and software developers. The scientists in our research environment do not necessarily have the interests or skills of professional technical writers. This results in inconsistencies in the documentation. To improve the quality, we have started a multidisciplinary project involving CMS user support and expertise in technical communication from the University of Turku, Finland. In this paper, we present possible approaches to study the usability of the documentation, for instance, usability tests conducted recently for the CMS software and computing user documentation.
        Speaker: Kati Lassila-Perini (Helsinki Institute of Physics HIP)
        Poster
      • 227
        INSPIRE: a new scientific information system for HEP
        The status of high-energy physics (HEP) information systems has been jointly analyzed by the libraries of CERN, DESY, Fermilab and SLAC. As a result, the four laboratories have started the INSPIRE project – a new platform built by moving the successful SPIRES features and content, curated at DESY, Fermilab and SLAC, into the open-source CDS Invenio digital library software that was developed at CERN. INSPIRE will integrate present acquisition workflows and databases to host the entire body of the HEP literature (about one million records), aiming to become the reference HEP scientific information platform worldwide. It will provide users with fast access to full-text journal articles and preprints, but also material such as conference slides and multimedia. INSPIRE will empower scientists with new tools to discover and access the results most relevant to their research, enable novel text- and data-mining applications, and deploy new metrics to assess the impact of articles and authors. In addition, it will introduce the "Web 2.0" paradigm of user-enriched content in the domain of sciences, with community-based approaches to scientific publishing. INSPIRE represents a natural evolution of scholarly communication built on successful community-based information systems, and it provides a vision for information management in other fields of science. Inspired by the needs of HEP, we hope that the INSPIRE project will be inspiring for other communities.
        Speaker: Radoslav Ivanov (Unknown)
        Paper
        Poster
      • 228
        LHC First Beam Event Display at CMS from online to the World Press - the first 3 minutes
        Geneva, 10 September 2008. The first beam in the Large Hadron Collider at CERN was successfully steered around the full 27 kilometers of the world¿s most powerful particle accelerator at 10h28 this morning. This historic event marks a key moment in the transition from over two decades of preparation to a new era of scientific discovery. (http://www.interactions.org/cms/?pid=1026796) From 9:44 am CET attention of the CMS physicists in the control room is drawn to the CMS event display - the "eyes" of the detector. We observe the tell-tale splash events, the beam gas and beam halo muons. We see in real time how the beam events become more and more clean as the beam is corrected. The article describes the key component of the CMS event display: IGUANA - a well-established generic interactive visualisation framework based on a C++ component model and open-source graphics products. We describe developments since the last CHEP, including: online displays of the first real beam gas and beam halo data from the LHC first beam, flexible interactive configuration, integration with CMSSW framework, event navigation and filtering. We give an overview of the deployment and maintenance procedures in the commissioning and early detector operation and how the lessons learnt help us in getting ready for collisions.
        Speaker: Mrs Ianna Osborne (NORTHEASTERN UNIVERSITY)
        Poster
      • 229
        MDT data quality assessment at the Calibration centre for the ATLAS experiment at LHC
        ATLAS is a large multipurpose detector, presently in the final phase of construction at LHC, the CERN Large Hadron Collider accelerator. In ATLAS the muon detection is performed by a huge magnetic spectrometer, built with the Monitored Drift Tube (MDT) technology. It consists of more than 1,000 chambers and 350,000 drift tubes, which have to be controlled to a spatial accuracy better than 10 micrometers and an efficiency close to 100%. Therefore, the detector automated monitor is an essential aspect of the operation of the spectrometer. The quality procedure collects data from online and offline sources and from the "calibration stream" at the calibration centres, situated in Ann Arbor (Michigan), MPI (Munich) and INFN Rome. The assessment at the Calibration Centres is performed using the DQHistogramAnalyzer utility of the Athena package. This application checks the histograms in an automated way and, after a further inspection with a human interface, reports results and summaries. In this study a complete description of the entire chain, from the calibration stream up to the database storage is presented. Special algorithms have been implemented in the DQHistogramAnalyzer for the Monitored Drift Tube chambers. A detailed web display is provided for easy data quality consultation. The analysis flag is stored inside an Oracle Database using the COOL LCG library, through a C++ object-oriented interface. This quality flag is compared with the online and offline results, produced in a similar way, and the final decision is stored in a DB using a standalone C++ tool. The final DB, which uses the same COOL technology, is accessed by the reconstruction and analysis programs.
        Speaker: Dr Monica Verducci (INFN RomaI)
        Poster
      • 230
        Monte Carlo simulations of spallation experiments
        Monte Carlo codes MCNPX and FLUKA are used to analyze the experiments on simplified Accelerator Driven Systems, which are performed at the Joint Institute for Nuclear Research Dubna. At the experiments, protons or deuterons with the energy in the GeV range are directed to thick, lead targets surrounded by different moderators and neutron multipliers. Monte Carlo simulations of these complex systems are performed using PBS and MPI parallelization. The processing powers of some systems and experience with such types of parallelization are presented.
        Speaker: Mitja Majerle (Nuclear Physics institute AS CR, Rez)
        Poster
      • 231
        Multi-threaded Event Reconstruction with JANA
        Multi-threading is a tool that is not only well suited to high statistics event analysis, but is particularly useful for taking advantage of the next generation many-core CPUs. The JANA event processing framework has been designed to implement multi-threading through use of posix threads. Thoughtful implementation allows reconstruction packages to be developed that are thread enabled while requiring little or no knowledge of thread programming by the reconstruction code authors. How this design goal is achieved along with test results showing rate scaling for CPU bound jobs as well as improved performance on I/O bound jobs will be shown.
        Speaker: Dr David Lawrence (Jefferson Lab)
        Poster
      • 232
        Muon identification procedure for the ATLAS detector at the LHC using Muonboy reconstruction package and tests of its performance using cosmic rays and single beam data
        ATLAS is one of the four experiments at the Large Hadron Collider (LHC) at CERN. This experiment has been designed to study a large range of physics including searches for previously unobserved phenomena such as the Higgs Boson and super-symmetry. The ATLAS Muon Spectrometer (MS) is optimized to measure final state muons in a large momentum range, from a few GeV up to TeV. Its momentum resolution varies from (2-3%) at 10-100 GeV/c to 10% at 1 TeV, taking into account the high level background environment, the inhomogeneous magnetic field, and the large size of the apparatus (24 m diameter by 44 m length). A robust muon identification and high momentum measurement accuracy is crucial to fully exploit the physics potential of the LHC. The basic principles of the muon reconstruction package "Muonboy" are discussed in this paper. Details of the modifications done in order to adapt the pattern recognition to the cosmic-ray configuration as well as its performance with the recent cosmic-rays and single beam data are presented.
        Speaker: Dr Rosy Nikolaidou (CEA Saclay)
        Poster
      • 233
        Network Information and Monitoring Infrastructure (NIMI)
        Fermilab is a high energy physics research lab that maintains a highly dynamic network which typically supports around 15,000 active nodes. Due to the open nature of the scientific research conducted at FNAL, the portion of the network used to support open scientific research requires high bandwidth connectivity to numerous collaborating institutions around the world, and must facilitate convenient access by scientists at those institutions. Network Information and Monitoring Infrastructure (NIMI) is a framework built to help network management personnel and the computer security team monitor and manage the FNAL network. This includes the portions of the network used to support open scientific research as well as the portions for more tightly controlled administrative and scientific support activities. As an infrastructure, NIMI has been used to build such applications as Node Directory, Network Inventory Database and Computer Security Issue Tracking System (TIssue). These applications have been successfully used by FNAL Computing Division personnel to manage local network, maintain necessary level of protection of LAN participants against external threats and promptly respond to computer security incidents. The article will discuss NIMI structure, functionality of major NIMI-based applications, history of the project, its current status and future plans.
        Speaker: Mr Igor Mandrichenko (FNAL)
      • 234
        New development of CASTOR at IHEP
        Some large experiments at IHEP will generate more than 5 Petabytes of data in the next few years, which brings great challenges for data analysis and storage. CERN CASTOR version 1 was firstly deployed at IHEP in 2003, but now it is difficult to meet the new requirements. Taking into account the issues of management, commercial software etc., we don’t update CASTOR from version 1 to version 2. Instead, based on CASTOR version 1 and MySQL, we developed a new open-source software with good scalability, high performance and easy-to-use features. This paper will give the introduction of our requirements, the design and implementation of new stager, which we call DCC (disk cache for CASTOR), MySQL 5.x compatibility, LTO4 tape support, the deployment, monitoring, alerting and so on. DCC adopts database centric architecture just like CASTOR version 2 stager, which makes it more modular and flexible. The detailed design and performance measure of DCC will also be described in this paper.
        Speaker: Dr Yaodong CHENG (Institute of High Energy Physics,Chinese Academy of Sciences)
        Poster
      • 235
        New Developments in File-based Infrastructure for ATLAS Event Selection
        In ATLAS software, TAGs are event metadata records that can be stored in various technologies, including ROOT files and relational databases. TAGs are used to identify and extract events that satisfy certain selection predicates, which can be coded as SQL-style queries. Several new developments in file-based TAG infrastructure are presented. TAG collection files support in-file metadata to store information describing all events in the collection. Event Selector functionality has been augmented to provide such collection-level metadata to subsequent algorithms. The ATLAS I/O framework has been extended to allow computational processing of TAG attributes to select or reject events without reading the event data. This capability enables physicists to use more detailed selection criteria than are feasible in an SQL query. For example, the TAGs contain enough information not only to check the number of electrons, but also to calculate their distance to the closest jet--a calculation that would be difficult to express in SQL. Another new development allows ATLAS to write TAGs directly into event data files. This feature can improve performance by supporting advanced event selection capabilities, including computational processing of TAG information, without the need for external TAG file or database access.
        Speaker: Dr Peter Van Gemmeren (Argonne National Laboratory)
        Paper
        Poster
      • 236
        Physics and Software validation for ATLAS
        The ATLAS experiment recently entered the data taking phase, with the focus shifting from software development to validation. The ATLAS software has to be both robust to process large datasets and produce the high quality output needed for the experiment scientific exploitation. The validation process is discussed in this talk, starting from the validation of the nightly builds and pre-releases to the final validation of software releases used for data taking and scientific results. A few thousands events are processed every day using the most recent nightly build and physics and technical histograms are processed automatically. New versions of the software are released every 3 weeks and are validated using a set of 100K events that are monitored by people appointed by each of the ATLAS subsystems. Patch version of the software can be deployed at the ATLAS Tier0 and on the grid within a 12-24 hours cycle and a crew of validation shifters continuously monitor bug reports that are submitted by the operation teams.
        Speakers: Andreu Pacheco (IFAE Barcelona), Davide Costanzo (University of Sheffield), Iacopo Vivarelli (INFN and University of Pisa), Manuel Gallas (CERN)
        Poster
      • 237
        Pixel detector Data Quality Monitoring in CMS
        The silicon pixel detector in CMS contains approximately 66 million channels, and will provide extremely high tracking resolution for the experiment. To ensure the data collected is valid, it must be monitored continuously at all levels of acquisition and reconstruction. The Pixel Data Quality Monitoring process ensures that the detector, as well as the data acquisition and reconstruction chain, is functioning properly. It is critical that the monitoring process not only examine the pixel detector with high enough granularity such that potential problems can be identified and isolated, but also run quickly enough that action can be taken before much data is compromised. We present a summary of the software system we have developed to accomplish this task. We focus on the implementation designed to maximize the amount of available information, and the methodology by which we store persistent information such that known problems can be recorded and historical trends preserved.
        Speaker: Keith Rose (Dept. of Physics and Astronomy-Rutgers, State Univ. of New Jerse)
      • 238
        Powerfarm: a power and emergency management thread-based software tool for the ATLAS Napoli Tier2
        The large potential storage and computing power available in the modern grid and data centre infrastructures enable the development of the next generation grid-based computing paradigm, in which a large number of clusters are interconnected through high speed networks. Each cluster is composed of several or often hundreds of computers and devices each with its own specific role in the grid. In such a distributed environment, it is of critical importance to ensure and preserve the functioning of the data centre. It is therefore essential to have a management and fault recovery system that preserves the integrity of the systems both in presence of serious faults such as power outages or temperature peaks and in maintenance operations. In such a context, for the ATLAS INFN Napoli Tier2 and for the SCoPE project of the University “Federico II” of Napoli, we developed Powerfarm, a customizable thread-based software system that monitors several parameters such as, for example, the status of power supplies, room and CPU temperatures and promptly responds to values out of range with the appropriate actions. Powerfarm enforces hardware and software dependencies between devices and is able to switch them on/off in the particular order induced by the dependencies. Indeed, Powerfarm makes use of specific parametric plugins in order to manage virtually any kind of devices and represents the whole structure by means of XML configuration files. In this optic, Powerfarm may become an indispensable tool for power and emergency management of the modern grid and data centre infrastructures.
        Speaker: Dr Alessandra Doria (INFN Napoli)
        Poster
      • 239
        R&D on co-working transport schemes in Geant4
        A R&D project, named NANO5, has been recently launched at INFN to address fundamental methods in radiation transport simulation and revisit Geant4 kernel design to cope with new experimental requirements. The project, that gathers an international collaborating team, focuses on simulation at different scales in the same environment. This issue requires novel methodological approaches to radiation transport across the current boundaries of condensed-random-walk and discrete methods: the ability is needed to change the scale at which the problem is described and analyzed within a complex experimental set-up. An exploration is also foreseen about exploiting and extending already existing Geant4 features to apply Monte Carlo and deterministic transport methods in the same simulation environment. The new developments have been motivated by requirements in various physics domains, which challenge the conventional application domain of Monte Carlo transport codes like Geant4: ongoing R&D for nanotechnology-based tracking detectors for HEP experiments, radiation effects on components at high luminosity colliders and in space science, optimization of astrophysics instrumentation, nanodosimetry, investigations of new generation nuclear power sources etc. The main features of the project are presented, together with the first prototype developments and results. A new concept introduced in the simulation – mutable physics entities (process, model or other physics-aware object), whose state and behavior depend on the environment and may evolve as an effect of it, is illustrated. The interdisciplinary nature of the R&D is described, highlighting the mutual benefits of collaborative contributions and beta-testing in HEP and other physics research domains.
        Speaker: Dr Maria Grazia Pia (INFN GENOVA)
        Poster
      • 240
        RSC: tool for analysis modelling, combination and statistical studies
        RSC is a software framework based on the RooFit technology and born for the CMS experiment community, whose scope is to allow the modelling and combination of multiple analysis channels together with the accomplishment of statistical studies. That is performed through a variety of methods described in the literature implemented as classes. The design of these classes is oriented to the execution of multilple cpu intensive jobs on batch systems or on the GRID, facilitating the splitting of the calculations and the recollection of the results. In addition the production of plots by means of sophisticated formatting, drawing and graphics manipulation routines is provided transparently for the user. Analyses and their combinations are characterised in configuration files, thus separating physics inputs from the C++ code. The deployment of such a feature eases the sharing of the input models among the analysis groups establishing common guidelines to summarise Physics results. A maximum statistical advantage can be drawn from the analyses combination allowing the definition of common variables, constrained parameters and arbitrary correlations among the different quantities. RSC is therefore meant to complement the existing analyses by means of their combination therewith obtaining earlier discoveries, sharper limits and more refined measurements of physically relevant quantities.
        Speaker: Mr Danilo Piparo (Universitaet Karlsruhe)
        Poster
      • 241
        Simulations and software tools for the CMS Tracker at SLHC
        The luminosity upgrade of the Large Hadron Collider (SLHC) is foreseen starting from 2013. An eventual factor-of-ten increase in LHC statistics will have a major impact in the LHC Physics program. However, the SLHC as well as offering the possibility to increase the physics potential will create an extreme operating environment for the detectors, particularly the tracking devices and the trigger system. An increase in the number of minimum-bias events by at least an order of magnitude beyond the levels envisioned for design luminosity creates the need to handle much higher occupancies and for the innermost layers unprecedented levels of radiation. This will require a fully upgraded tracking system giving a higher granularity, while trying not to exceed the material budget and power levels of the current system, and a revision of the current trigger system. Additional trigger information from the rebuilt tracking system could reduce the L1 trigger rate or could be used earlier in the higher level triggers. Detailed simulations are needed to help in the design of the new Tracker and to study the possibility of including tracking information in the L1 trigger system. At the same time, the huge increase in pile-up events imposes sever constraints also in the existing software that needs to be optimized in order to produce realistic studies for SLHC.This will require a fully upgraded tracking system giving a higher granularity, while trying not to exceed the material budget and power levels of the current system. Detailed simulations are needed to help in the design of the new Tracker and to study the possibility of including tracking information in the L1 trigger system. At the same time, the huge increase in pile-up events imposes sever constraints also in the existing software that needs to be optimized in order to produce realistic studies for SLHC.
        Speaker: Dr Kristian Harder (RAL)
        Poster
      • 242
        Storm-GPFS-TSM: a new approach to Hierarchical Storage Management for the LHC experiments
        In the framework of WLCG, the Tier-1 computing centres have very stringent requirements in the sector of the data storage, in terms of size, performance and reliability. Since some years, at the INFN-CNAF Tier-1 we have been using two distinct storage systems: Castor as tape-based storage solution (also known as the D0T1 storage class in the WLCG language) and the General Parallel File System (GPFS), in conjuction with StoRM as a SRM service, for pure disk access (D1T0). Commencing 2008 we have started to explore the possibility of employing GPFS together with the tape management software TSM as a solution for realizing a tape-disk infrastructure, first implementing a D1T1 storage class (files always on disk with a backup on tape), and then also a D0T1 (hence involving also active recalls of files from tape to disk). The first StoRM-GPFS-TSM D1T1 system is nowadays already in production at CNAF for the LHCb experiment, while a prototype of D0T1 system is under development and study. We describe the details of the new D1T1 and D0T1 implementations, discussing the differences between the Castor-based solution and the StoRM-GPFS-TSM one. We also present the results of some performance studies of the novel D1T1 and D0T1 systems.
        Speaker: Luca Dell'Agnello (INFN)
      • 243
        Swiss ATLAS Grid computing in preparation for the LHC collision data
        Computing for ATLAS in Switzerland has two Tier-3 sites with several years of experience, owned by Universities of Berne and Geneva. They have been used for ATLAS Monte Carlo production, centrally controlled via the NorduGrid, since 2005. The Tier-3 sites are under continuous development. In case of Geneva the proximity of CERN leads to additional use cases, related to commissioning of the experiment. The work requires processing of the latest ATLAS data using the latest software under development, which is not distributed to grid sites. We rely on the AFS file system to have the software and we are planning to rely on the ATLAS Distributed Data Management for what concerns latest data. An SRM interface will be installed in Geneva for this purpose. The Swiss Tier-2 at the CSCS centre has a recent and powerful cluster, serving three LHC experiments, including ATLAS. The system features two implementations of the grid middleware, NorduGrid ARC and the LCG gLite, which operate simultaneously on the same resources. In this talk will present our implementation choices and our experience with hardware, middleware and ATLAS-specific grid software. We will discuss the requirements of our users and how we meet them. We will present the status of our work and our plans for the ATLAS data taking period in 2009.
        Speaker: Dr Szymon Gadomski (DPNC, University of Geneva)
        Poster
      • 244
        The ALTAS b-Tagging Infrastructure
        The ATLAS detector, one of the two collider experiments at the Large Hadron Collider, will take high energy collision data for the first time in 2009. A general purpose detector, its physics program encompasses everything from Standard Model physics to specific searches for beyond-the-standard-model signatures. One important aspect of separating the signal from large Standard Model backgrounds is the accurate identification of jets of particles originating from a bottom quark. A physics analysis in-and-of-itself, ATLAS has developed a series of algorithms based on the unique aspects of bottom quark decay (soft lepton association, long life time). This talk gives a brief overview of these algorithms and the software infrastructure required to support them in a production environment like the one found at ATLAS. Some attention will also be paid to the different perspectives of the algorithm writer, who wants to understand exactly how a jet is tagged as being from a bottom quark, and an analysis user, who is only curious to know if a jet is “tagged” and what the fake rate is.
        Speakers: Prof. Gordon Watts (UNIVERSITY OF WASHINGTON), Dr Laurent Vacavant (CPPM)
      • 245
        The ATLAS Detector Digitization Project for 2009 data taking
        The ATLAS digitization project is steered by a top-level PYTHON digitization package which ensures uniform and consistent configuration across the subdetectors. The properties of the digitization algorithms were tuned to reproduce the detector response seen in lab tests, test beam data and cosmic ray running. Dead channels and noise rates are read from database tables to reproduce conditions seen in a particular run. The digits are then persistified as Raw Data Objects (RDO) with or without intermediate bytestream simulation depending on the detector type. Emphasis is put on the description of the digitization project configuration, its flexibility in events handling for processing and in the global detector configuration, as well as its variety of options including detector noise simulation, random number service, metadata and details of pile-up background events to be overlaid. The LHC beam bunch spacing is also configurable, as well as the number of bunch crossings to overlay and the default detector conditions (including noisy channels, dead electronics associated with each detector layout). Cavern background calculation, beam halo and beam gas treatment, pile-up with real data is also part of this report.
        Speaker: John Chapman (Dept. of Physics, Cavendish Lab.)
        Poster
      • 246
        The CMS Tracker calibration workflow: experience with cosmic ray data
        The CMS Silicon Strip Tracker (SST) consists of 25000 silicon microstrip sensors covering an area of 210m2 and 10 million readout channels. Starting from December 2007 the SST has been inserted and connected inside the CMS experiment and since summer 2008 it has been commissioned using cosmic muons with and without magnetic field. During these data taking the performance of the SST have been carefully studied: the noise of the detector, together with its correlations with the strip length and the temperature, the data integrity, the S/N ratio, the hit reconstruction efficiency and the calibration constants have been all monitored with time and for different conditions, at the full detector granularity. In this presentation an overview of the SST calibration workflow and the detector performance results will be given.
        Speaker: Simone Frosali (Dipartimento di Fisica - Universita di Firenze)
      • 247
        The CMSSW benchmarking suite: using HEP code to measure cpu performance
        The demanding computing needs of the CMS experiment require thoughtful planning and management of its computing infrastructure. A key factor in this process is the use of realistic benchmarks when assessing the computing power of the different architectures available. In recent years a discrepancy has been observed between the cpu performance estimates given by the reference benchmark for HEP computing (SPEC INT) and actual performances of HEP code. Making use of the cpu performance tools from the CMSSW performance suite, comparative cpu performance studies have been carried out on several architectures. A benchmarking suite has been developed and integrated in the CMSSW framework, to allow computing centers and interested third parties to benchmark architectures directly with CMSSW. The CMSSW benchmarking suite can be used out of the box, to test and compare several machines in terms of CPU performance and report with the wanted level of detail the different benchmarking scores (e.g. by processing step) and results. In this talk we describe briefly the CMSSW software performance suite, and in detail the CMSSW benchmarking suite client/server design, the performance data analysis and the choice and composition of the benchmark scores. The interesting issues encountered in the use of HEP code for benchmarking will be discussed and CMSSW benchmark results presented.
        Speaker: Dr Gabriele Benelli (CERN PH Dept (for the CMS collaboration))
        Poster
      • 248
        The Effect of the Fragmentation Problem in Decision Tree Learning Applied to the Search for Single Top Quark Production
        Decision tree learning constitutes a suitable approach to classification due to its ability to partition the input (variable) space into regions of class-uniform events, while providing a structure amenable to interpretation (as opposed to other methods such as neural networks). But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large backgrounds (similar to W+jets and tt¯ events), low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of classification error attributed to the fragmentation problem.
        Speaker: Roberto Valerio (Cinvestav Unidad Guadalajara)
        Poster
      • 249
        The Introduction of Data Analysis System of MDC for BEPCII/BESIII
        The BEPCII/BESIII(Beijing Electron Positron Collider / Beijing Spectrometer) had been installed and operated successfully in July 2008 and has been commissioning since Sep. 2008. The luminosity has reached 1.3*1032 cm-2s-1@489mA*530mA with 90 bunches now. About 13M psi(2S) physics data is collected by BESIII. The offline data analysis system of BESIII have been tested and operated to handle the real experiments data. The data analysis system of the MDC(Main Drift Chamber) includes the event reconstruction, track fitting, offline calibration and events start time algorithm and Monte Carlo tuning between the MC data and real data. Among them, the Event Start Time Determination is the first step of charged track reconstruction of MDC. It is the important process in the Charged particle track reconstruction of BESIII offline data analysis, because of the multi-bunch colliding mode used in the BEPCII, the pipeline arrangement method of trigger system is used in the BESIII data acquisition system, a special time measurement method is used for the MDC electronic system. The performance of the software System of MDC, includes the tracking efficiency, CPU consume, the preliminary results of offline calibration and Monte Carlo tuning of MDC for real experiment data are presented. The preliminary performance of MDC is indicated: the spatial resolution is about 128um, the momentum resolution is about 0.81%.
        Speaker: Dr Ma Xiang (Institute of High energy Physics, Chinese Academy of Sciences)
      • 250
        The LHCb track fitting concept and its performance
        The reconstruction of charged particles in the LHCb tracking systems consists of two parts. The pattern recognition links the signals belonging to the same particle. The track fitter running after the pattern recognition extracts the best parameter estimate out of the reconstructed tracks. A dedicated Kalman-Fitter is used for this purpose. The track model employed in the fit is based on a trajectory concept originally introduced by the BaBar collaboration, which has been further developed and improved. To scope with various applications on trigger level and in the offline reconstruction software the fitter has been designed to be very flexible to be adapted to the individual requirements in CPU time and resolution. E.g. a simplified geometry model has been introduced which speeds up the computation time of the fitter significantly, obtaining almost identical resolution than the full geometry description. We will report on the LHCb fitting concept and present its current performance in various applications based on the latest simulation.
        Speaker: Rodrigues Figueiredo Eduardo (University Glasgow)
        Poster
      • 251
        The Offline Software of BESIII Muon detector
        The new spectrometer for the challenging physics in the tau-charm energy region, BESIII, has been constructed and gone into the commissioning phase at BEPCII, the upgraded e+e- collider with peak luminosity up to 10^33cm^-2s^-1 in Beijing, China. The BESIII muon detector will mainly contribute to the distinguishing muons from hadrons, especially the pions. The Resistive Plate Chambers(RPCs) have been used to the BESIII muon detector. These RPCs work in the streamer mode and are made of a new type of bakelite material with melamine treatment instead of linseed oil treatment. The offline software of BESIII muon detector has been developed successfully and validated preliminarily with cosmic ray data and Ψ(2S) data. We describe the ideas and implementation of the simulation, reconstruction and calibration packages. The detector commissioning and software validation results are presented. The Monte Carlo and data comparison are shown.
        Speaker: Xie Yuguang (Institute of High energy Physics, Chinese Academy of Sciences)
      • 252
        The Online Histogram Presenter for the ATLAS experiment: a modular system for histogram visualization
        The challenging experimental environment and the extreme complexity of modern high-energy physics experiments make online monitoring an essential tool to assess the quality of the acquired data. The Online Histogram Presenter (OHP) is the ATLAS tool to display histograms produced by the online monitoring system. In spite of the name, the Online Histogram Presenter is much more than just a histogram display. To cope with the large amount of data, the application has been designed to actively minimise the network traffic; sophisticated caching, hashing and filtering algorithms reduce memory and CPU usage. The system uses Qt and ROOT for histogram visualisation and manipulation. In addition, histogram visualisation can be extensively customised through configuration files. Finally, its very modular architecture features a lightweight plugin system, allowing extensions to accommodate specific user needs. The Online Histogram Presenter unifies the approach to histogram visualisation inside the ATLAS online environment in a general purpose, highly configurable, interactive application. After an architectural overview of the application, the paper is going to present in detail the solutions adopted to increase the performance and a description of the plugin system. Examples of OHP use from ATLAS commissioning and first LHC beam will also be presented.
        Speaker: Andrea Dotti (INFN and Università Pisa)
        Poster
      • 253
        The PetaQCD project
        The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice QCD simulations started in early ’08 among a consortium of 7 laboratories (IN2P3, CNRS, INRIA, CEA) and 2 SMEs. This consortium received a grant from the French ANR agency in July, and the PetaQCD project kickoff is expected to take place in January ’09. Building upon several years of fruitful collaborative studies in this area, the aim of this project is to demonstrate that the simulation of a 256x128^3 lattice can be achieved through the HMC software, using a machine with a reasonable cost/relia-bility/power consumption. It is expected that this machine can be built out of a rather limited number of processors (e.g. between 1000 and 4000), although capable of a sustained petaflop CPU performance. The proof-of-concept should be a mock-up cluster built as much as possible with off-the-shelf components, and 2 particularly attractive axis will be mainly investigated, in addition to fast all-purpose multi-core processors: the use of the new brand of IBM-Cell processors (with on-chip accelerators) and the very recent Nvidia GP-GPUs (off-chip co-processors). This cluster will obviously be massively parallel, and heterogeneous. Communication issues between processors, implied by the Physics of the simulation and the lattice partitioning, will certainly be a major key to the project.
        Speaker: Mr Gilbert Grosdidier (LAL/IN2P3/CNRS)
      • 254
        The RooFit toolkit for data modeling
        RooFit is a library of C++ classes that facilitate data modeling in the ROOT environment. Mathematical concepts such as variables, (probability density) functions and integrals are represented as C++ objects. The package provides a flexible framework for building complex fit models through classes that mimic math operators, and is straightforward to extend. For all constructed models RooFit provides a concise yet powerful interface for fitting (binned and unbinned likelihood, chi^2, plotting and toy Monte Carlo generation as well as sophisticated tools to manage large scale projects. RooFit has matured since 1999 into an industrial strength tool and has been used in the BABAR experiments most complicated fits. Recent developments include the ability to persist probability density functions into ROOT files that can be easily shared and used with a simple interface, without the need to distribute code. Model persistence enables the concept of digital publishing of complex physics result and provide a foundation for higher level statistical tools for the LHC experiments to calculate combined physics results.
        Speaker: Wouter Verkerke (NIKHEF)
      • 255
        The Status of the Simulation Project for the ATLAS Experiment in view of the LHC startup
        The Simulation suite for ATLAS is in a mature phase ready to cope with the challenge of the 2009 data. The simulation framework already integrated in the ATLAS framework (Athena) offers a set of pre-configured applications for full ATLAS simulation, combined test beam setups, cosmic ray setups and old standalone test-beams. Each detector component was carefully described in all details and performance monitored. The few still missing pieces of the apparatus (forward and very forward detectors) inert material and services (toroid supports, support rails, detector feet) are about to be integrated in the current simulation suite. Detailed description of ideal and real geometry for each ATLAS subcomponent made possible optimization studies and validation. Short/medium scale productions are constantly and daily monitored through a set of tests for different samples of physics events and large scale productions on the Grid verify the robustness of the implementation as well as possible errors only visible on large statistics. Metadata handling is the latest subject of interest for the conditions monitoring and recording during the simulation process. A fast shower simulation suite was also developed in ATLAS and performance comparisons are part of the overall evaluation.
        Speaker: Zachary Marshall (Caltech, USA & Columbia University, USA)
        Poster
      • 256
        The Use of the TWiki Web in ATLAS
        The ATLAS Experiment, with over 2000 collaborators, needs efficient and effective means of communicating information. The Collaboration has been using the TWiki Web at CERN for over three years and now has more than 7000 web pages, some of which are protected. This number greatly exceeds the number of “static” HTML pages, and in the last year, there has been a significant migration to the TWiki. The TWiki is one example of the many different types of Wiki web which exist. In this talk, a description will be given of the ATLAS TWiki at CERN. The tools used by the Collaboration to manage the TWiki will be described and some of the problems encountered will be explained. A very useful development has been the creation of a set of Workbooks (Users’ Guides) – these have benefitted from the TWiki environment and, in particular, a tool to extract pdf from the associated pages.
        Speaker: Fred Luehring (Indiana University)
        Poster
      • 257
        TMVA - The Toolkit for Multivariate Data Analysis
        The toolkit for multivariate analysis, TMVA, provides a large set of advanced multivariate analysis techniques for signal/background classification. In addition, TMVA now also contains regression analysis, all embedded in a framework capable of handling the pre-processing of the data and the evaluation of the output, thus allowing a simple and convenient use of multivariate techniques. The analysis techniques implemented in TMVA can be invoked easily and the direct comparison of their performance allows the user to choose the most appropriate for a particular data analysis. This talk gives an overview of the TMVA package and presents recently developed features.
        Speaker: Dr Peter Speckmayer (CERN)
        Poster
      • 258
        Track Reconstruction in the Muon and Transition Radiation Detectors of the CBM Experiment at FAIR
        The Compressed Baryonic Matter (CBM) experiment at the future FAIR accelerator at Darmstadt is being designed for a comprehensive measurement of hadron and lepton production in heavy-ion collisions from 8-45 AGeV beam energy, producing events with large track multiplicity and high hit density. The setup consists of several detectors including as tracking detectors the silicon tracking system (STS), the muon detector (MUCH) or alternatively a set of Transition Radiation Detectors (TRD). In this contribution, the status of the track reconstruction software including track finding, fitting and propagation is presented for MUCH and TRD. Since both MUCH and TRD detectors have similar designs where material layers are alternating with detector stations the track reconstruction algorithm is flexible with respect to its applicability to different detectors. It is an important ingredient to feasibility studies of different physics channels and to the optimization of the detectors. The track propagation algorithm takes into account an inhomogeneous magnetic field and includes accurate calculation of multiple scattering and energy losses in the detector material. Track parameters and covariance matrices are estimated using the Kalman filter method and a Kalman filter modification by assigning weights to hits and using simulated annealing. Two different track finder methods based on track following and these approaches are developed with either using track branches or not. The track reconstruction efficiency for central Au+Au collisions at 25 AGeV beam energy using events from the UrQMD model is at the level of 93-97% for both detectors.
        Speaker: Mr Andrey Lebedev (GSI, Darmstadt / JINR, Dubna)
        Poster
      • 259
        TrackInCaloTools: a package for measuring muon energy loss and calorimetric isolation in ATLAS
        Muons in the ATLAS detector are reconstructed by combining the information from the Inner Detectors and the Muon Spectrometer (MS), located in the outermost part of the experiment. Until they reach the MS, muons traverse typically 100 radiation lengths (X0) of material, most part instrumented by the electromagnetic and hadronic calorimeters. The proper account for multiple scattering and energy loss effects is essential for the reconstruction and the use of the calorimeter measurement can improve the transverse momentum resolution, specially in case of high energy deposits. On the other hand, the calorimeter activity around a muon, or conversely its isolation, is one the most powerful features to distinguish W and Z decays from semi-leptonic decays of heavy flavour mesons (containing b and c quarks). The principle of the software that performs these tasks, called TrackInCaloTools, is presented, together with the expected performance for early LHC data in 2009 and the impact in first physics analysis.
        Speaker: Mr Bruno Lenzi (CEA - Saclay)
        Poster
      • 260
        Upgrade and design of the Pluto event generator
        Due to the fact, that experimental setups are usually not suited to cover the complete full solid angle, event generators are very important tools for experiments. Here, theoretical calculations provide valuable input as they can describe specific distributions for parts of the kinematic variables very precicely. The caveat is that an event has several degrees of freedom which can be correlated. Practically, the experimental physics need a tool in hand which allows for the exchange of almost all kinematic variables with a manageable user interface. Recently, the user-friendly Pluto event generator was re-designed in order to introduce a more modular, object-oriented structure, thereby making additions such as new particles, decays of resonances, new models up to modules for entire changes easily applicable. Overall consistency is ensured by a plugin- and distribution manager. One specific feature of Pluto is that we do not use monolithic decay models but allow for the splitting into different models in a very granular way (e.g. to exchange form factors or total cross sections). This turned out to be a very important tool in order to check various scenarious among with measured data, which will be outlined with a few examples Therefore Pluto allows for the attachment of secondary models for all kinds of purposes. Here, a secondary model is an object for a particle/decay returning a (complex) number as a function of a defined number of values. All models are connected via a relative data base. All features can be employed by the user without re-compiling the package, which makes Pluto extremely configurable. In our contribution, we present the new structure for the Pluto event generator, originally intended to work for experiment proposals but now upgraded to allow for the implementation of user-defined functions and models.
        Speaker: Dr Ingo Fröhlich (Goethe-University)
        Poster
      • 261
        Validation and verification of Geant4 standard electromagnetic physics
        The standard electromagnetic physics packages of Geant4 are used for simulation of particle transport and HEP detector response. The requirements to the precision and stability of computations are strong, for example, calorimeter response for ATLAS and CMS should be reproduced well within 1%. To keep and control long-stand quality of the package the software suites for validation and verification have been developed. In this work we describe main approaches for the validation, the structure of validation software and show examples of comparison between Geant4 simulation and the data.
        Speaker: Prof. Vladimir Ivantchenko (CERN, ESA)
        Poster
      • 262
        VETRA - offline analysis and monitoring software platform for the LHCb VELO
        The LHCb experiment is dedicated to studying CP violation and rare decays phenomena. In order to achieve these physics goals precise tracking and vertexing around the interaction point is crucial. This is provided by the VELO (VErtex LOcator) silicon detector. After digitization, large FPGAs are employed to run several algorithms to suppress noise and reconstruct clusters. This is performed by a FPGA based processing board. An off-line software framework, VETRA, has been developed which performs a bit perfect emulation of this complex processing in the FPGAs. This is a novel development as this hardware emulation is not standalone but rather is fully integrated into the LHCb software to allow the reconstruction of full data from the detector. This software platform facilitates: developing and understanding the behaviour of the processing algorithms; optimizing the parameters of the algorithms that will be loaded into the FPGA; and monitoring the performance of the detector. This framework has also been adopted by the Silicon Tracker detector of LHCb. This framework was successfully used with the first 1500 tracks of data in the VELO obtained from the LHC beam in September 2008. The software architecture and utilisation of VETRA project will be discussed in detail.
        Speaker: Dr Tomasz Szumlak (Glasgow)
        Poster
      • 263
        Videoconferencing infrastructure at IN2P3
        IN2P3, the institute bringing together HEP laboratories in France along CEA's IRFU, opened a videoconferencing service in 2002 based on a H323 MCU. This service has steadily grown up since then, serving other French communities than the HEP one, to reach an average of about 30 different conferences a day. The relatively small amount of manpower that has been devoted to this project can be explained by the very sound design and the large array of capabilities of the equipment that replaced the original one in 2005. The service will be described, and its original mode of operation not resorting to the use of a gatekeeper, in contrast with more traditional customs, will be compared to others, notably those put in place by ESnet and DFN. An outline of developments around MCUs that could be of interest to the whole community will be presented. Some issues of integration of this service with other collaborative tools in use today will be discussed.
        Speaker: Christian Helft (LAL/IN2P3/CNRS)
        Poster
      • 264
        Virtuality and Efficiency - Overcoming Past Antinomy in the Remote Collaboration Experience
        Several recent initiatives have been put in place by the CERN IT Department to improve the user experience in remote dispersed meetings and remote collaboration at large in the LHC communities worldwide. We will present an analysis of the factors which were historically limiting the efficiency of remote dispersed meetings and describe the consequent actions which were undertaken at CERN to overcome these limitations. After giving a status update of the different equipment available at CERN to enable the virtual sessions and the various collaboration tools which are currently proposed to users, we will focus on the evolution of this market: how can the new technological trends (among others, HD videoconferencing, Telepresence, Unified Communications, etc.) impact positively the user experience and how to attain the best usage of them. Finally, by projecting ourselves in the future, we will give some hints as to how to answer the difficult question of selecting the next generation of collaboration tools: which set of tools among the various offers (systems like Vidyo H264 SVC, next generation EVO, Groupware offers, standard H323 systems, etc.) is best suited for our environment and how to unify this set for the common user. This will finally allow us to definitively overcome the past antinomy between virtuality and efficiency.
        Speaker: Mr Joao Fernandes (CERN)
        Poster
    • Plenary: Tuesday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Ian Bird (CERN)
      • 265
        Interoperability - Grids, Clouds and Collaboratories
        The reach and diversity of computationally based Collaboratories continues to expand. The quantity and quality of remote processing and storage continues to advance with new additional entrants from the Commercial Clouds and coverage by Campus, Regional and National Grids. Ensuring interoperability across all these computing facilities is an important responsibility for the common infrastructure projects and community at large. Ruth Pordes is an Associate Head of the Fermilab Computing Division. She has a long history of working on collaborative projects between domain scientists, computing professionals and computer sciences. Ruth is the Executive Director of the Open Science Grid. She is really enjoying the new opportunities of not only supporting the core physics experiments but also bringing the organization and technology experience of these communities to the broader domains of scientific scholarship.
        Speaker: Ruth Pordes (FNAL)
      • 266
        Collaborating at a Distance: Operations Centres, Tools, and Trends
        Commissioning the LHC accelerator and experiments will be a vital part of the worldwide high-energy physics program in 2009. Remote operations centers have been established in various locations around the world to support collaboration on LHC activities. For the CMS experiment the development of remote operations centers began with the LHC@FNAL ROC and has evolved into a unified approach with multiple operations centers, collectively referred to as CMS Centres Worldwide. An overview of the development of operations centers for CMS will be presented. Other efforts to enhance remote collaboration in high-energy physics will be presented, along with a brief overview of collaborative tools and remote operations capabilities developed in otherfields of research. Possible future developments and trends that are sure to make remote collaboration ubiquitous in high-energy physics will be explored.
        Speaker: Dr Erik Gottschalk (FNAL)
        Slides
        Video
      • 267
        Belle Monte-Carlo production on the Amazon EC2 cloud
        The SuperBelle project to increase the Luminosity of the KEKB collider by a factor 50 will search for Physics beyond the Standard Model through precision measurements and the investigation of rare processes in Flavour Physics. The data rate expected from the experiment is comparable to a current era LHC experiment with commensurate Computing needs. Incorporating commercial cloud computing, such as that provided the Amazon Elastic Computing Cloud (EC2), into the SuperBelle computing model may provide a lower Total Cost of Ownership for the SuperBelle computing solution. To investigate this possibility, we have deployed the complete Belle Monte-Carlo simulation chain on EC2 to benchmark the cost and performance of the service. This presentation will describe how this was achieved as well as the bottlenecks and costs of large-scale Monte-Carlo production on EC2.
        Speaker: Prof. Martin Sevior (University of Melbourne)
        Slides
        Video
    • 10:30
      coffee break, exhibits and posters
    • Plenary: Tuesday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Chen Gang (Beijing)
      • 268
        Addressing the Challenges of High Performance Computing with IBM Innovation and iDataPlex: Take Advantage of Cooler, Denser, and More Efficient Compute Power
        In 2008 IBM shattered the U.S. patent record becoming the first company to surpass 4,000 patents in a single year - the 16th consecutive year that IBM has achieved U.S. patent leadership. Come learn how IBM has leveraged our deep Research and Development innovation to deliver the iDataPlex server solution. With over 40 patented innovations, the iDataPlex product is one of the x86 first clean-sheet designs optimized for energy efficient High Performance Computing. IBM has built iDataPlex from the ground up to maximize data center density, optimize server deployment efficiency, to use less energy, be easy to service and to lower your high performance computing expenses. The IBM innovation in the iDataPlex solution results in up to 40% less energy consumption (when compared to equivalently configured standard 1U servers), enables you to efficiently deploy racks of servers at a time and offers an option to virtually eliminate special data center air conditioning. This presentation will cover these features and explore the technology behind the iDataPlex High Performance Computing alternative.
        Speaker: Gregg McKnight (IBM)
        Slides
        Video
      • 269
        More Computing with Less energy.
        Today’s processors designs have some significant challenges in the coming years. Compute demands are projected to continue to grow at a compound aggregate growth rate of 45% per year, with seemingly no end in sight.  Also, energy as well as property, plant and equipment costs continue to increase as well.    Processor designers can no longer afford to trade off increasing power for increasing performance.  System designers need to consider the power requirements of the entire system and not just the power associated with the processor.  This talk will focus on the challenges and efforts to bring a class of performance traditionally delivered by high performance system designs with the energy efficiency in the class of today’s low power processor platforms.
        Speaker: Dr Steve Pawlowski (Intel)
        Slides
      • 270
        Datacenter Re-Evolution - Change Happens
        "Change is the law of life. And those who look only to the past or present are certain to miss the future" - John F. Kennedy. The Data Center landscape is changing at an incredible rate. Demand is increasing and technology is advancing rapidly, more so than at any other time in our history. Data Center operational cost increases, growing consumption, and the corresponding carbon footprint have increased the executive visibility, and pressure to get the environment under control. The world is now participating and the demand on the datacenters to feed it will not stop. This Revolution requires an Evolution in thinking. This session will dive into the latest trends, industry activities, technologies, and design methodologies that datacenter owners and operators should be aware of. Dean Nelson, Sr Director of Sun Microsystems Global Lab & Datacenter Design Services, and co-founder of Data Center Pulse, an exclusive community of Data Center owners and operators, will be sharing information on Sun's latest datacenter innovations as well as updates on Data Center Pulse activities, including CO2 (http://datacenterpulse.org/ThechillOff), Focus Tracks (http://datacenterpulse.org/Summit2009CalResults), and more.
        Speaker: Prof. Dean Nelson (SUN)
        Slides
        Video
    • 13:00
      lunch
    • Collaborative Tools: Tuesday Club D

      Club D

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 271
        Indico Central - Events Organisation, Ergonomics and Collaboration Tools Integration
        While the remote collaboration services at CERN slowly aggregate around the Indico event management software, its new version which is the result of a careful maturation process includes improvements which will set a new reference in its domain. The presentation will focus on the description of the new features of the tool, the user feedback process which resulted in a new record of usability. We will also describe the interactions with the worldwide community of users and server administrators and the impact this has had on our development process, as well as the tools set in place to streamline the work between the different collaborating sites. A last part will be dedicated to the use of Indico as a central hub for operating other local services around the event organisation (registration epayment, audiovisual recording, webcast, room booking, and videoconference support)
        Speaker: Mr Jose Benito Gonzalez Lopez (CERN)
        Slides
      • 272
        Collaborative Tools and the LHC: Some Success, Some Plans
        I report major progress in the field of Collaborative Tools, concerning the organization, design and deployment of facilities at CERN, in support of the LHC. This presentation discusses important steps made during the past year and a half, including the identification of resources for equipment and manpower, the development of a competent team of experts, tightening of the user-feedback loop, and the final design and installation of facilities at CERN. I also summarize current discussions to extend this progress to other services and present my own proposals for future development.
        Speaker: Dr Steven Goldfarb (University of Michigan)
        Paper
        Slides
    • Commercial parallel: Tuesday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 273
        Value of HPC communities to build efficient HPC clusters: a Sun Microsystems contribution
        Speaker: Dr Philippe Trautmann (Sun Microsystems)
      • 274
        Datacenter - discussion
        Speaker: Prof. Dean Nelson (SUN)
    • Distributed Processing and Analysis: Tuesday Club C

      Club C

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Fons Rademakers (CERN)
      • 275
        Scalla As A Full-Fledged LHC Grid SE
        Scalla (also known as xrootd) is quickly becoming a significant part of LHC data analysis as a stand-alone clustered data server (US Atlas T2 and CERN Analysis Farm), globally clustered data sharing framework (ALICE), and an integral part of PROOF-base analysis (multiple experiments). Until recently, xrootd did not fit well in the LHC Grid infrastructure as a Storage Element (SE) largely because it did not provide Storage Resource Manager (SRM) access; making it difficult to justify wider deployment. However, Scalla’s extensible plug-in event-based architecture as well as its storage system view, were instrumental in integrating SRM access, using BestMan and Linux/FUSE, in a relatively short time. Today, Scalla can provide full SRM functionality in a way that is independent of the data access system itself. This makes it trivial to migrate to newer versions or run multiple versions of the SRM as they become available. This paper discusses the architectural elements of the SRM-Scalla integration, where new code had to be written, how key SRM features (e.g., static space tokens and Grid quotas) leveraged the pre-existing Scalla infrastructure, and the overall flow of data as well as management information through system. We also discuss a side-effect of this effort, called xrootdFS, which offers a single file system view of an xrootd cluster; including its performance and what needs to be done to improve it.
        Speaker: Mrs Andrew Hanushevsky (SLAC National Accelerator Laboratory)
        Slides
      • 276
        Distributed analysis with PROOF in ATLAS Collaboration
        The Parallel ROOT Facility - PROOF is a distributed analysis system which allows to exploit inherent event level parallelism of high energy physics data. PROOF can be configured to work with centralized storage systems, but it is especially effective together with distributed local storage systems - like Xrootd, when data are distributed over computing nodes. It works efficiently on different types of hardware and scales well from a multi-core laptop to large computing farms. From that point of view it is well suited for both large central analysis facilities and Tier 3 type analysis farms. PROOF can be used in interactive or batch like regimes. The interactive regime allows user to work with typically distributed data from ROOT command prompt and get a real time feedback on analysis progress and intermediate results. We will discuss our experience with PROOF in the context of ATLAS Collaboration distributed analysis. In particular we will discuss PROOF performance in various analysis scenarios and in multi-user, multi-session environment. We will also describe PROOF integration with ATLAS distributed data management system and prospects of running PROOF on geographically distributed analysis farms.
        Speaker: Dr Sergey Panitkin (Department of Physics - Brookhaven National Laboratory (BNL))
        Slides
      • 277
        CMS data quality monitoring: systems and experiences
        In the last two years the CMS experiment has commissioned a full end to end data quality monitoring system in tandem with progress in the detector commissioning. We present the data quality monitoring and certification systems in place, from online data taking to delivering certified data sets for physics analyses, release validation and offline re-reconstruction activities at Tier-1s. We discuss the main results and lessons learnt so far in the commissioning and early detector operation. We outline our practical operations arrangements and the key technical implementation aspects.
        Speaker: Lassi Tuura (Northeastern University)
        Paper
        Slides
      • 278
        Performance of Combined Production And Analysis WMS in DIRAC
        DIRAC, the LHCb community Grid solution, uses generic pilot jobs to obtain a virtual pool of resources for the VO community. In this way agents can request the highest priority user or production jobs from a central task queue and VO policies can be applied with full knowledge of current and previous activities. In this paper the performance of the DIRAC WMS will be presented with emphasis on how the system copes with many varied job requirements. In order to ensure traceability of jobs as well as security, the actual user’s identity has to be established before running the actual payload workflow. Generic pilot jobs take advantage of the deployment of the gLExec utility in order to achieve this. Experience with glexec will be described.
        Speaker: Dr Stuart Paterson (CERN)
        Slides
      • 279
        A collaborative analysis framework in use for ALICE experiment
        ALICE offline group has developed a set of tools that do formalize data access patterns and impose certain rules on how individual data analysis modules have to be structured in order to maximize the data processing efficiency at the whole collaboration scale. The ALICE analysis framework was developed and extensively tested on MC reconstructed data during the last 2 years in the ALICE distributed computing environment. The talk will describe the architecture of the framework and its main features making it a success among ALICE users: transparent usage of the computing infrastructure (PROOF, GRID), data access performance for several concurrent tasks and simplicity to use. We will also focus on the experience and results accumulated during this period, discussing pros and cons of this unifying approach at data analysis level.
        Speaker: Mr Andrei Gheata (CERN/ISS)
        Slides
    • Event Processing: Tuesday Club E

      Club E

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Adele Rimoldi (CERN)
      • 280
        New models for PIXE simulation with Geant4
        The production of particle induced X-ray emission (PIXE) resulting from the de-excitation of an ionized atom is an important physical effect that is not yet accurately modelled in Geant4, nor in other general-purpose Monte Carlo systems. Its simulation concerns use cases in various physics domains – from precision evaluation of spatial energy deposit patterns to material analysis, low background particle physics experiments and astroparticle physics instrumentation in space science. The correct simulation of PIXE is a challenge for general-purpose Monte Carlo codes: in fact, PIXE is intrinsically a discrete process, while all major Monte Carlo systems rely on condensed transport schemes to handle the infrared divergence of ionization cross sections. We describe our ongoing effort for improving the Geant4 implementation of PIXE. Our activities include a new design of the software model, the creation of an extended and improved data base of shell ionization cross sections, investigations into improved particle transport schemes, and techniques to deal with infrared divergence in the context of ionization and atomic relaxation.
        Speaker: Dr Georg Weidenspointner (MPE and MPI-HLL , Munich, Germany)
      • 281
        Validation of Geant4 Hadronic Physics Models at Intermediate Energies
        Geant4 provides a number of physics models at intermediate energies (corresponding to incident momenta in the range 1-20 GeV/c). Recently, these models have been validated with existing data from a number of experiments: (a) inclusive proton and neutron production with a variety of beams (pi^-, pi^+, p) at different energies between 1 and 9 GeV/c on a number of nuclear targets (from beryllium to uranium); (2) inclusive pion/kaon/proton production from 14.6 GeV/c proton beams on nuclear targets (from beryllium to gold); (3) inclusive pion production from pion beams between 3-13 GeV/c on a number of nuclear targets (from beryllium to lead). The results of simulation/data comparison for different Geant4 models are discussed in the context of validating the models and determining their usage in physics lists for high energy application. Due to the increasing number of validations becoming available, and the requirement that they be done at regular intervals corresponding to the Geant4 release schedule, automated methods of validation are being developed. These will also be discussed.
        Speaker: Sunanda Banerjee (Fermilab)
        Slides
      • 282
        GEANT4E Track Extrapolation in the Belle Experiment
        We report on the use of the GEANT4E, the track extrapolation feature written by Pedro Arce, in the analysis of data from Belle experiment: (1) to project charged tracks from the tracking devices outward to the particle identification devices, thereby assisting in the identification of the particle type of each charged track, and (2) to project charged tracks from the tracking devices outward to the muon-detection device and then perform progressive Kalman-like track fitting by combining (and correcting) the projected track with the hits in the muon detector. To allow for the combination of GEANT4 detector simulation with event reconstruction in one program, we use the novel technique of merging the GEANT4 and GEANT4E physics lists through the instantiation and use of distinct particles for GEANT4E.
        Speaker: Prof. Leo Piilonen (Virginia Tech)
        Slides
      • 283
        ATLAS Upgrade Simulation with the Fast Track Simulation FATRAS
        With the completion of installation of the ATLAS detector in 2008 and the first days of data taking, the ATLAS collaboration is increasingly focusing on the future upgrade of the ATLAS tracking devices. Radiation damage will make it necessary to replace the innermost silicon layer (b-layer) after about five years of operation. In addition, with future luminosity upgrades of the LHC machine the current combination of silicon pixel and strip detectors and a transition radiation tracker will surpass the maximum hit occupancy at which pattern recognition is feasible. Therefore the ATLAS collaboration is preparing a replacement with a higher-granularity all-silicon detector. During the last years, a new fast track simulation (FATRAS) has been developed for the ATLAS tracking devices and successfully interfaced with a fast calorimeter simulation to be part of the standard full and fast simulation "cocktail" needed to comply with both, the high statistics of simulated Monte Carlo samples for various physics analyses and the computing budget of the experiment. FATRAS has undergone various validation steps against full the simulation chain to guarantee compatibility and to understand shortcomings that arise from necessary simplifications traded off for a reduction in CPU consumption. During the design phase of FATRAS dedicated emphasis has been put on a flexible way of integrating geometry and detector technologies, making it a useful tool to evaluate the impact of different layouts and technologies for the future ATLAS inner tracking devices on the expected detector performance.
        Speaker: Dr Andreas Salzburger (DESY & CERN)
        Slides
      • 284
        Tuning and optimization of the CMS simulation software
        The CMS simulation has been operational within the new CMS software framework for more than 3 years. While the description of the detector, in particular in the forward region, is being completed, during the last year the emphasis of the work has been put on fine tuning of the physics output. The existing test beam data for the different components of the calorimetric system have been exploited to adjust different parts of the Geant4 models for hadronic and electromagnetic showers, as well as the CMS custom code used, in close collaboration with Geant4 developers. Significant improvements have been achieved in describing the data. A consequence of this work has been a sizable deterioration of the computing performances of the code, already under close monitoring. A suite of performance analysis tools has been put in place and has been used to drive several optimizations to allow the code to fit the constraints posed by the CMS computing model.
        Speaker: Dr Fabio Cossutti (INFN Trieste)
        Slides
      • 285
        Parallelization of ALICE simulation - a jump through the looking-glass
        HEP computing is approaching the end of an era when simulation parallelisation could be performed simply by running one instance of full simulation per core. The increasing number of cores and appearance of hardware-thread support both pose a severe limitation on memory and memory-bandwidth available to each execution unit. Typical simulation and reconstruction jobs of AliRoot differ significantly in memory usage - reconstruction requires approximately three times more memory than simulation. Further, reconstruction accesses memory in a less restrained manner, as it requires access to geometry, alignment and calibration data on practically every step. This makes simulation a more natural candidate for parallelization, especially since the simulation code is relatively stable and we do not expect the reconstruction code to settle until the detector is fully calibrated with real data and understood under stable running conditions. We have chosen to use multi-threading solution with one primary particle and all its secondaries being tracked by a given thread. This model corresponds well to Pb-Pb ion collision simulation where 60,000 primary particles need to be tracked. After the MC processing of a primary particle is completed, the same thread also performs output buffer compression and passes the compressed buffers to the thread dedicated to output streaming. Modifications of ROOT, AliRoot and GEANT3 that were required to perform this task are discussed. Performance of the parallelized version of simulation under varying running conditions is presented.
        Speaker: Matevz Tadel (CERN)
        Slides
    • Hardware and Computing Fabrics: Tuesday Club B

      Club B

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Jiri Chudoba (FZU)
      • 286
        The ALICE Online Data Storage System
        The ALICE (A Large Ion Collider Experiment) Data Acquisition (DAQ) system has the unprecedented requirement to ensure a very high volume, sustained data stream between the ALICE Detector and the Permanent Data Storage (PDS) system which is used as main data repository for Event processing and Offline Computing. The key component to accomplish this task is the Transient Data Storage System (TDS), a set of data storage elements with its associated hardware and software components, which supports raw data collection, its conversion into a format suitable for subsequent high-level analysis, the storage of the result using highly parallelized architectures, its access via a cluster file system capable of creating high-speed partitions via its affinity feature, and its transfer to the final destination via dedicated data links. We describe the methods and the components used to validate, test, implement, operate, and monitor the ALICE Online Data Storage system and the way it has been used in the early days of commissioning and operation for the ALICE Detector. We will also introduce the future developments needed from next year, when the ALICE Data Acquisition System will shift its requirements from those associated to the test and commissioning phase to those imposed by long-duration data taking periods alternated by shorter validation and maintenance tasks which will be needed to adequately operate the ALICE Experiment.
        Speaker: Roberto Divià (CERN)
        Slides
      • 287
        Integration of Virtualized Worker Nodes into Batch Systems.
        Todays experiments in HEP only use a limited number of operating system flavours. Their software might only be validated on one single OS platform. Resource providers might have other operating systems of choice for the installation of the batch infrastructure. This is especially the case if a cluster is shared with other communities, or communities that have stricter security requirements. One solution would be to statically divide the cluster into separated subclusters. In such a scenario, no opportunistic distribution of the load can be achieved, resulting in a poor overall utilization efficiency. Another approach is to make the batch system aware of virtualization, and to provide each community with its favored operating system in a virtual machine. Here, the scheduler has full flexibility, resulting in a better overall efficiency of the resources. In our contribution, we present a lightweight concept for the integration of virtual worker nodes into standard batch systems. We demonstrate two prototype implementations, one based on the Sun Grid Engine (SGE), the other using Maui/Torque as a batch system. Both solutions support local job as well as Grid job submission. The hypervisor currently used is XEN, a port to another system is easily envisageable. To better handle different virtual machines on the physical host, a management solution was developed and is shown. We will present first experience from running the two prototype implementations. In a last part, we will show the potential future use of this lightweight concept when integrated into high-level (i.e. Grid) workflows.
        Speaker: Oliver Oberst (Karlsruhe Institute of Technology)
        Slides
      • 288
        SL(C)5 for HEP - a status report
        The ramping up of available resources for LHC data analysis at the different sites continues. Most sites are currently running on SL(C)4. However, this operating system is already rather old, and it is becomming difficult to get the required hardware drivers, to get the best out of recent hardware. A possible way out is the migration to SL(C)5 based systems where possible, in combination with virtualization methods. The former is typically possible for nodes where the software to run the services is available and tested, while the latter offers a possibility to make use of the new hardware platforms whilst maintaining operating system compatibility. Since autumn 2008, CERN has offered public interactive and batch worker nodes for evaluation to the experiments. For the Grid environment, access is granted by a dedicated CE within a preproduction pilot project. The status of the evaluation, feedback received from the experiments and the status of the migration will be reviewed, and the status of virtualization of services at CERN will be reported. Beyond this, the migration to a new operating system also offers an excellent opportunity to upgrade the fabric infrastructure used to manage the servers. Upgrades which directly affect users, for example for managing VOBoxes at CERN, will be described.
        Speaker: Ricardo SALGUEIRO DOMINGUES DA SILVA (CERN)
        Slides
      • 289
        The NAF: National Analysis Facility at DESY
        In the framework of a broad collaboration among German particle physicists - the strategic Helmholtz Alliance "Physics a the TeraScale", an Analysis facility has been set up at DESY.The facility is intended to provide the best possible analysis infrastructure for researches of the ATLAS, CMS, LHCb and ILC experiments and also for theory researchers. In a first part of the contribution, we will present the concept of the NAF and its place in the existing distributed Grid landscape of the experiments. In a second part, the building blocks of the NAF will be detailed with an emphasis on technical implementations of some parts: - Usage of VOMS for separating Grid resources between collaboration-wide and NAF-specific resources - interactive and batch cluster and integration with PROOF - usage of Grid-Proxies to access workgroup servers and AFS - the usage and operation of Lustre for fast data access. A special focus is the seamless integration of the facility into the two geographically separated DESY sites and its implications. In a third part, the experience of running the facility for one year will be reported.
        Speakers: Andreas Haupt (DESY), Yves Kemp (DESY)
        Slides
      • 290
        Operational Experience with CMS Tier-2 Sites
        In the CMS computing model, about one third of the computing resources are located at Tier-2 sites, which are distributed across the countries in the collaboration. These sites are the primary platform for user analyses; they host datasets that are created at Tier-1 sites, and users from all CMS institutes submit analysis jobs that run on those data through grid interfaces. They are also the primary resource for the production of large simulation samples for general use in the experiment. As a result, Tier-2 sites have an interesting mix of organized experiment-controlled activities and chaotic user-controlled activities. CMS currently operates about 40 Tier-2 sites in 22 countries, making the sites a far-flung computational and social network. We describe our operational experience with the sites, touching on our achievements, the lessons learned, and the challenges for the future.
        Speaker: Dr Isidro Gonzalez Caballero (Instituto de Fisica de Cantabria, Grupo de Altas Energias)
        Slides
      • 291
        ScotGrid: Providing an Effective Distributed Tier-2 in the LHC Era
        ScotGrid is a distributed Tier-2 centre in the UK with sites in Durham, Edinburgh and Glasgow. ScotGrid has undergone a huge expansion in hardware in anticipation of the LHC and now provides more than 4MSI2K and 500TB to the LHC VOs. Scaling up to this level of provision has brought many challenges to the Tier-2 and we show in this paper how we have adopted new methods of organising the centres, from fabric management and monitoring to remote management of sites to management and operational procedures, to meet these challenges. We describe how we have coped with different operational models at the sites, where Glagsow and Durham sites are managed "in house" but resources at Edinburgh are managed as a central university resource. This required the adoption of a different fabric management model at Edinburgh and a special engagement with the cluster managers. Challenges arose from the different job models of local and grid submission that required special attention to resolve. We show how ScotGrid has successfully provided an infrastructure for ATLAS and LHCb monte carlo production. Special attention has been paid to ensuring that user analysis functions efficiently, which has required optimisation of local storage and networking to cope with the demands of user analysis. Finally, although these Tier-2 resources are pledged to the whole VO, we have established close links with our local physics user communities as being the best way to ensure that the Tier-2 functions effectively as a part of the LHC grid computing framework.
        Speakers: Dr Graeme Andrew Stewart (University of Glasgow), Dr Michael John Kenyon (University of Glasgow), Dr Samuel Skipsey (University of Glasgow)
        Slides
    • Software Components, Tools and Databases: Tuesday Club A

      Club A

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Maria Girone (CERN)
      • 292
        Ajax, XSLT and SVG: Displaying ATLAS conditions data with new web technologies
        The combination of three relatively recent technologies is described which allows an easy path from database retrieval to interactive web display. SQL queries on an Oracle database can be performed in a manner which directly return an XML description of the result, and Ajax techniques (Asynchronous Javascript And XML) are used to dynamically inject the data into a web display accompanied by an XSLT transform template which determines how the data will be formatted. By tuning the transform to generate SVG (Scalable Vector Graphics) a direct graphical representation can be produced in the web page while retaining the database data as the XML source, allowing dynamic links to be generated in the web representation, but programmatic use of the data when used from a user application. With the release of the SVG 1.2 Tiny draft specification, the display can also be tailored for display on mobile devices. The technologies are described and a sample application demonstrated, showing conditions data from the ATLAS Semiconductor Tracker.
        Speaker: Dr Shaun Roe (CERN)
      • 293
        Visualization of the CMS Python Configuration System
        The job configuration system of the CMS experiment is based on the Python programming language. Software modules and their order of execution are both represented by Python objects. In order to investigate and verify configuration parameters and dependencies naturally appearing in modular software, CMS employs a graphical tool. This tool visualizes the configuration objects, their dependencies, and the data flow. Furthermore it can be used for documentation purposes. The underlying software concepts as well as the visualization are presented.
        Speaker: Andreas Hinzmann (RWTH Aachen University)
        Slides
      • 294
        Usage of the Python programming language in the CMS Experiment
        Being a highly dynamic language and allowing reliable programming with quick turnarounds, Python is a widely used programming language in CMS. Most of the tools used in workflow management and the GRID interface tools are written in this language. Also most of the tools used in the context of release management: integration builds, release building and deploying, as well as performance measurements are in Python. With an interface to the CMS data formats, rapid prototyping of analyses and debugging is an additional use case. Finally in 2008 the CMS experiment switched to using Python as its configuration language. This increased the amount of Python code in the CMS experiment even further. This talk will give an overview of the general usage of Python in the CMS experiment and discuss which features of the language make it suite well for the existing use cases.
        Speaker: Benedikt Hegner (CERN)
        Slides
      • 295
        User-friendly Parallelization of GAUDI Applications with Python
        GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelization techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
        Speaker: Dr Pere Mato (CERN)
        Slides
      • 296
        Harnessing multicores: strategies and implementations in ATLAS
        Computers are no longer getting faster: instead, they are growing more and more CPUs, each of which is no faster than the previous generation. This increase in the number of cores evidently calls for more parallelism in HENP software. If end-users' stand-alone analysis applications are relatively easy to modify, LHC experiments frameworks, being mostly written with a single 'thread' of execution in mind and consequent code bases, are on the other hand more challenging to parallelize. Widespread and inconsiderate changes so close to data taking are out of the equation: we need clear strategies and guidelines to reap the benefits out of the multicore/manycore era while minimizing the code changes. Exploiting parallelism is usually achieved via a) multithreading or b) multiple processes (or a combination of both) but each option has its own set of tradeoffs in terms of code changes (and code complication) and possible speed-ups. This paper describes the different strategies the Offline and Online communities from the ATLAS collaboration investigated, which have been implemented and integrated into the Athena framework, and finally how well they perform in terms of speed-ups, memory usage, I/O and code development. We present the work integrated in AthenaMT to harness the High Level Trigger farms' computing power via multithreading, the needed improvements and modifications applied to Athena/Gaudi in order to raise its thread awareness and the impact on common design patterns to preserve thread safety across release cycles. Threads sharing the same address space, AthenaMT is the most promising option in terms of speed-ups and memory usage efficiency at the cost of an increased development load for the code writer needing to worry about locks, data races and the like. AthenaMP leverages the 'fork()' system call and the 'Copy On Write' mechanism through the 'multiprocessing' python module, to free oneself from the usual concurrency problems that stem from sharing state, while still retaining part of the memory usage efficiency at the cost of a diminished flexibility (compared to threading.) We'll detail the AthenaMP implementation, highlighting the minimized code changes, how they seamlessly blend into the Athena framework and their interplay with I/O and OS resources.
        Speaker: Dr Sebastien Binet (LBNL)
        Slides
      • 297
        Hierarchy Software Development Framework (h-dp-fwk) Project
        Hierarchy Software Development Framework provides a lightweight tool for building portable modular applications for performing automated data analysis tasks in a batch mode. The history of design and development activities devoted to the project has begun in March 2005 and from the very beginning it was targeting the case of building experimental data processing applications for the CMD-3 experiment which was being commissioned at Budker Institute of Nuclear Physics (BINP, Novosibirsk, Russia). Its design addresses the generic case of modular data processing application operating within the well defined distributed computing environment. The main features of the Framework are modularity, built-in message and data exchange mechanisms, XInclude and XML schema enabled XML configuration management tools, dedicated log management tools, internal debugging tools, both dynamic and static module chains support, internal DSO version and consistency checking, well defined API for developing specialized frameworks. It is supported on Scientific Linux 4 and 5 and planned to be ported to other platforms as well. The project is provided with the comprehensive set of technical documentation and users’ guides. The licensing schema for the source code, binaries and documentation implies that the product is free for non-commercial use. Although the development phase is not over and many features are to be implemented yet the project is considered ready for public use and creating applications in various fields including development of events reconstruction software for small and moderate scale HEP experiments.
        Speaker: Mr Alexander Zaytsev (Budker Institute of Nuclear Physics (BINP))
        Paper
        Slides
    • Online Computing: Tuesday Club D

      Club D

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Sponsored by ACEOLE

      Convener: Gordon Watts (Washington University)
      • 298
        Event reconstruction in the LHCb Online cluster
        The LHCb experiment at the LHC accelerator at CERN will collide particle bunches at 40 MHz. After a first level of hardware trigger with output at 1 MHz, the physically interesting collisions will be selected by running dedicated trigger algorithms in the High Level Trigger (HLT) computing farm. It consists of up to roughly 16000 CPU cores and 44TB of storage space. Although limited by environmental constraints, the computing power is equivalent to that provided by all Tier-1's to LHCb. The HLT duty cycle follows the LHC collisions, thus it has several months of winter shutdown, as well as several hours a day of interfill gaps. This contribution describes the strategy for using these idle resources for data reconstruction. Due to the specific features of the HLT farm, typical processing à la Tier-1 (1 core - 1 file) is not feasible. A radically different approach has been chosen, based on parallel processing the data in farm slices of O(1000) cores. Single events are read from the input files, distributed to the cluster and merged back into files once they have been processed. A detailed description of this architectural solution and the obtained performance will be presented.
        Speakers: Albert Puig Navarro (Universidad de Barcelona), Markus Frank (CERN)
        Slides
      • 299
        Commissioning of the ATLAS High Level Trigger with Single Beam and Cosmic Rays
        ATLAS is one of the two general-purpose detectors at the Large Hadron Collider (LHC). The trigger system is responsible for making the online selection of interesting collision events. At the LHC design luminosity of 10^34 cm-2s-1 it will need to achieve a rejection factor of the order of 10^-7 against random proton-proton interactions, while selecting with high efficiency events that are needed for physics analyses. After a first processing level using custom electronics based on FPGAs and ASICs, the trigger selection is made by software running on two processor farms, containing a total of around two thousand multi-core machines. This system is known as the High Level Trigger (HLT). To reduce the network data traffic and the processing time to manageable levels, the HLT uses seeded, step-wise reconstruction, aiming at the earliest possible rejection of background events. The recent LHC startup and short single-beam run provided a "stress test" of the system and some initial calibration data. Following this period, ATLAS continued to collect cosmic-ray events for detector alignment and calibration purposes. After giving an overview of the trigger design and its innovative features, this paper focuses on the experience gained from operating the ATLAS trigger with single LHC beams and cosmic-rays.
        Speaker: Dr Alessandro Di Mattia (MSU)
        Slides
      • 300
        The GigaFitter: Performances at CDF and Perspectives for Future Applications
        The Silicon-Vertex-Trigger (SVT) is a processor developed at CDF experiment to perform online fast and precise track reconstruction. SVT is made of two pipelined processors, the Associative Memory, finding low precision tracks, and the Track Fitter, refining the track quality with high precision fits. We will describe the architecture and the performances of a next generation track fitter, the GigaFitter, developed to reduce the degradation of the SVT efficiency due to the increasing instantaneous luminosity. The GigaFitter reduces the track parameter reconstruction to a few clock cycles and can perform many fits in parallel, thus allowing high resolution tracking at very high rate. The core of the GigaFitter is implemented in a modern Xilinx Virtex-5 FPGA chip, rich of powerful DSP arrays. The FPGA is housed on a mezzanine board which receives the data from a subset of the tracking detector and transfers the fitted tracks to a Pulsar motherboard for the final corrections. Instead of the current 12 boards, one per each sector of the detector, the final system will be much more compact, consisting of a single GigaFitter Pulsar board equipped with four mezzanine cards receiving the data from the entire tracking detector. Moreover, the GigaFitter modular structure is adequate to scale for much better performances and is general enough to be easily adapted to future High Energy Physics (HEP) experiments and applications outside HEP.
        Speaker: Dr Silvia Amerio (University of Padova & INFN Padova)
        Slides
      • 301
        Online processing in the ALICE DAQ - The Detector Algorithms
        ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). Some specific calibration tasks are performed regularly for each of the 18 ALICE sub-detectors in order to achieve most accurate physics measurements. These procedures involve events analysis in a wide range of experimental conditions, implicating various trigger types, data throughputs, electronics settings, and algorithms, both during short sub-detector standalone runs and long global physics runs. A framework was designed to collect statistics and compute some of the calibration parameters directly online, using resources of the Data Acquisition System (DAQ), and benefiting from its inherent parallel architecture to process events. This system has been used at the experimental area for one year, and includes more than 30 calibration routines in production. This paper describes the framework architecture and the synchronization mechanisms involved at the level of the Experiment Control System (ECS) of ALICE. The software libraries interfacing detector algorithms (DA) to the online data flow, configuration database, experiment logbook, and offline system are reviewed. The test protocols followed to integrate and validate each sub-detector component are also discussed, including the automatic build system and validation procedures used to ensure a smooth deployment. The offline post-processing and archiving of the DA results is covered in a separate paper.
        Speaker: Vasco Chibante Barroso (CERN)
        Slides
    • Grid Middleware and Networking Technologies: Tuesday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 302
        Where is the Internet heading to?
        Despite many coordinated efforts to promote the use of IPv6, the migration from IPv4 is far from being up to the expectations of most Internet experts. However, time is running fast and unallocated IPv4 address space should run out within the next 3 years or so. The speaker will attempt to explain the reasons behind the lack of enthusiasm for IPv6, in particular, the lack of suitable migration tools. A short review of the ongoing efforts to re-design the Internet in a clean-slate approach will then be made. Finally, an overview of Internet Governance areas and bodies will be given.
        Speaker: Olivier Martin (Ictconsulting)
        Slides
      • 303
        SVOPME: A Scalable Virtual Organization Privileges Management Environment
        Grids enable uniform access to resources by implementing standard interfaces to resource gateways. Gateways control access privileges to resources using user's identify and personal attributes, which are available through Grid credentials. Typically, Gateways implement access control by mapping Grid credentials to local privileges. In the Open Science Grid (OSG), privileges are granted on the basis of the user's membership to a Virtual Organization (VO). Currently, access privileges are determined solely by the individual sites that own the resources. While this guarantees full control on access rights to the sites, it makes VO privileges heterogeneous throughout the Grid and hardly fits with the Grid paradigm of uniform access to resources. In addition, there is no automated mechanism for a VO to define and publish privileges specific to the VO, such as the need for outbound network access from the resource. To address these challenges, we are developing the Scalable Virtual Organization Privileges Management Environment (SVOPME), which provides tools for VOs to define and publish desired privileges and assists sites to provide the appropriate access policies. At a site, SVOPME analyzes how access policies are defined for its resources. These policies are then compared with the ones published by the VO, so that Sites and VOs can verify policy compliance. Upon request, SVOPME can generate directives for site administrators on how the local access policies can be amended to achieve such compliance. This paper discusses what access policies are of interest to the OSG community and how SVOPME implements privilege management for the OSG.
        Speaker: Gabriele Garzoglio (FERMI NATIONAL ACCELERATOR LABORATORY)
        Slides
    • 16:00
      coffee break, exhibits and posters
    • Distributed Processing and Analysis: Tuesday Club C

      Club C

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Patricia Mendez Lorenzo (CERN)
      • 304
        A generic Job Submission Tool (JST).
        The Job Submitting Tool provides a solution for the submission of a large number of jobs to the grid in an unattended way. Indeed the tool is able to manage the grid submission, bookkeeping and resubmission of failed jobs . It also allows the monitor in real time of the status of each job using the same framework. The key elements of this tool are: A Relational Db that contains all the tasks to execute on the grid (the solution was found to be scalable up to few thousands of concurrent running jobs). It is possible to launch on the grid tasks handled by different software. JST is also capable to handle the dependence between tasks. If task B depends on the execution of task A, then JST will launch task B only after the correct execution of task A. a submission agent that submits jobs on behalf of the user if the database contain tasks to be executed. A job wrapper that is executed on the WN. It takes care of interacting with the task queue server and manages the execution of the real application on the WN, in particular check is the application execution was completed without errors. A web interface that allow the use to submit new run or new application using JST. The tool has been extensively tested to submit bioinformatics application to the grid, but it is general enough to be used in other fields including particle physics data analysis
        Speaker: Dr Giacinto Donvito (INFN-Bari)
        Slides
      • 305
        PROOF-Lite: Exploiting the Power of Many-Core Machines
        PROOF-Lite is an implementation of the Parallel ROOT Facility (PROOF) optimized for many-core machines. It gives ROOT users a straight-forward way to exploit the many-cores by using them all in parallel for a data analysis or generic computing task controlled via the ROOT TSelector mechanism. PROOF-Lite is, as the name suggests, a lite version of PROOF, where the multi-tier architecture has been reduced to a 2-tier one, with the local ROOT client directly interacting with the PROOF workers. By default one gets as many workers as available cores. To improve performance as much as possible, PROOF-Lite uses local communication technologies as unix-sockets, shared memory and memory mapped files. PROOF-Lite is a zero-config technology and does not require pre-installation of daemons and config files, it comes as an integral part of ROOT. In this talk we will show how almost perfect scalability is achieved for CPU intensive tasks and how the scalability is limited to the disk resources for I/O intensive tasks. We will also show the huge improvements the new SSD (Solid State Disk) technology brings and how it can be used to achieve almost perfect scalability also for I/O intensive tasks.
        Speaker: Gerardo GANIS (CERN)
        Slides
      • 306
        Use of glide-ins in CMS for production and analysis
        With the evolution of various grid federations, the Condor glide-ins represent a key feature in providing a homogeneous pool of resources using late-binding technology. The CMS collaboration uses the glide-in based Workload Management System, glideinWMS, for production (ProdAgent) and distributed analysis (CRAB) of the data. The Condor glide-in daemons traverse to the worker nodes, submitted via Condor-G. Once activated, they preserve the Master-Worker relationships, with the worker first validating the execution environment on the worker node before pulling the jobs sequentially until the expiry of their lifetimes. The combination of late-binding and validation significantly reduces the overall failure rate visible to CMS physicists. We discuss the extensive use of the glideinWMS since the computing challenge, CCRC08, in order to prepare for the forthcoming LHC data-taking period. The key features essential to the success of large-scale production and analysis at CMS resources across major grid federations, including EGEE, OSG and NorduGrid are outlined. Use of glide-ins via the CRAB server mechanism and ProdAgent as well as first hand experience of using the next generation CREAM computing element within the CMS framework is also discussed.
        Speaker: Dr Sanjay Padhi (UCSD)
        Slides
      • 307
        Building a Reliable High Performance PanDA Facility
        PanDA, ATLAS Production and Distributed Analysis framework, has been identified as one of the most important services provided by the ATLAS Tier 1 facility at Brookhaven National Laboratory (BNL), and enhanced to what is now a 24x7x365 production system. During this period, PanDA has remained under active development for additional functionalities and bug fix, and processing requirements have increased geometrically, leading to challenges in service provision. We used a RedHat Satellite system, cfEngine, and custom scripting to streamline the deployment, provisioning, and maintenance of the OS, Grid Middleware, and PanDA. We deployed redundant hardware and multiple service instances for each critical Panda component, and added a high performance/high availability capability by introducing a Layer4/7 smart switch from F5 in front of some components. This cost-effective approach greatly improves throughput and reliability, and prevents any single point of failure caused by hardware, network, grid middleware, operating system, or local PanDA application issues. Its transparency allows flexible management of the heterogeneous service, with only minimal application-level configuration and coding necessary to support integration with the smart switch. We have also implemented an extensive monitoring and alert system using Ganglia, Nagios (with extensive custom probes), RT (Request Tracker), and a custom-written ticket opening/escalation system. These tools work together to alert us to problems as they occur, and greatly assist in quickly troubleshooting any failures. In Summary, our contributions in innovation hardware resilience, extensive monitoring and automatic problem report and tracking significantly enhance the reliability of the evolving Panda system while allowing Panda developers ready access to the system for software improvement. Our experiment shows that the Panda performance was more than triple that of the legacy Panda instance, and any single failure was transparent to ATLAS users.
        Speaker: Dr Dantong Yu (BROOKHAVEN NATIONAL LABORATORY)
        Paper
        Slides
      • 308
        Distributed Analysis in ATLAS using GANGA
        The distributed data analysis using Grid resources is one of the fundamental applications in high energy physics to be addressed and realized before the start of LHC data taking. The needs to manage the resources are very high. In every experiment up to a thousand physicist will be submitting analysis jobs into the Grid. Appropriate user interfaces and helper applications have to be made available to assure that all users can use the Grid without expertise in Grid technology. These tools enlarge the number of grid users from a few production administrators to potentially all participating physicists. The GANGA job management system (http://cern.ch/ganga), developed as a common project between the ATLAS and LHCb experiments provides and integrates these kind of tools. GANGA provides a simple and consistent way of preparing, organizing and executing analysis tasks within the experiment analysis framework, implemented through a plug-in system. It allows trivial switching between running test jobs on a local batch system and running large-scale analyzes on the Grid, hiding Grid technicalities. We will be reporting on the plug-ins and our experiences of distributed data analysis using GANGA within the ATLAS experiment. Support for all grids presently used by ATLAS, namely the LCG/EGEE, NDGF/NorduGrid, and OSG/PanDA is provided. The integration and interaction with the ATLAS data management system DQ2 into GANGA is a key functionality. An intelligent job brokering is setup by using the job splitting mechanism together with dataset and file location knowledge. The brokering is aided by an automated system that regularly processes test analysis jobs at all ATLAS DQ2 supported sites. Large amounts of analysis jobs can be sent to the locations of data following the ATLAS computing model. GANGA supports amongst other things tasks of user analysis with reconstructed data and small scale production of Monte Carlo data.
        Speaker: Johannes Elmsheuser (Ludwig-Maximilians-Universität München)
        Slides
      • 309
        Performance of an ARC-enabled computing grid for ATLAS/LHC physics analysis and Monte Carlo production under realistic conditions
        A significant amount of the computing resources available to the ATLAS experiment at the LHC are connected via the ARC grid middleware. ATLAS ARC-enabled resources, which consist of both major computing centers at Tier-1 level and lesser, local clusters at Tier-2 and 3 level, have shown excellent performance running heavy Monte Carlo (MC) production for the experiment. However, with the imminent arrival of LHC physics data, it is imperative that the deployed grid middlewares also can handle data access patterns caused by user-defined physics analysis. Such grid jobs can have radically different demands than systematic, centrally controlled MC production. We report on the performance of the ARC middleware, as deployed for ATLAS, for realistic situations with concurrent MC production and physics analysis running on the same resources. Data access patterns for ATLAS MC and physics analysis grid jobs will be shown, together with the performance of various possible storage and file staging models.
        Speaker: Bjoern Hallvard Samset (Fysisk institutt - University of Oslo)
        Slides
    • Event Processing: Tuesday Club E

      Club E

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Dotti Andrea (INFN and Università Pisa)
      • 310
        Monte Carlo Generators in Atlas software
        The Atlas software framework, Athena, is written in C++ using python for job configuration scripts. Physics generators which provide the four-vectors describing the results of LHC collisions are written in general by third parties and are not part of Athena. These libraries are linked from the LCG Generator Services (GENSER) distribution. Generators are run from within Athena and put the generated event output into a transient store, in HepMC format, using StoreGate. A common interface, implemented via inheritence of a GeneratorModule class, guarantees common functionality for the basic generation steps and for event formating such as stripping partons from the HepMC record while keeping the mother-daughter relationships within the events. The ATLAS detector simulation packages access the truth informtion in StoreGate. A TruthHelpers package provides some standard functions for querying and manipulating generator information. Steering is done through the specific interfaces to allow for flexible configuration using Atlas python scripts. Interfaces to most general purpose generators, including: pythia6, pythia8, fortran herwig, Herwig++ and Sherpa are provided, as well as to more specialised packages, for example Phojet and Cascade. A second type of interface exist for the so called Matrix Element generators that only generate the particles produced in the hard scattering process and write events in the Les Houches event format. A generic interface to pass these events to Pythia6 and Herwig for parton showering and hadronisation has been written. In addition, Athena provides interfaces to validation tools, i.e. for MCTester, Rivet and HepMCAnalysisTool.
        Speaker: Dr Cano Ay (Goettingen)
        Slides
      • 311
        Overview of the LHCb Tracking System and its Performance on Simulation and on First Data
        The LHCb Tracking system consists of four major sub-detectors and a dedicated magnet. A sequence of algorithms have been developed to optimally exploit the capability of all tracking sub-detects. Different configurations of the same algorithms are used to reconstruct tracks at various stages of the trigger system, in the standard offline pattern recognition and under initial conditions of real data taking, which e.g. still suffers from large misalignments. To scope with all the corresponding requirements the algorithms have been designed to be extremely flexible and simultaneously optimized on efficiency, purity and CPU consumption. We will give an overview of the LHCb tracking algorithms and report on their performance base on the latest simulations, on cosmic data and on data from beam injection tests.
        Speakers: Eduardo Rodrigues Figueiredo (University of Glasgow), Manuel Schiller (Universität Heidelberg)
        Slides
      • 312
        Data Quality Monitoring for the CMS Silicon Strip Tracker
        The CMS Silicon Strip Tracker (SST), consisting of more than 10 millions of channels, is organized in about 16,000 detector modules and it is the largest silicon strip tracker ever built for high energy physics experiments. The Data Quality Monitoring system for the Tracker has been developed within the CMS Software framework. More than 100.000 monitorable quantities need to be managed by the DQM system that organizes them in a hierarchical structure reflecting the detector arrangement in subcomponents and the various levels of data processing. Monitorable quantities computed at the level of individual detectors are processed to extract automatic quality checks and summary results that can be visualized with specialized graphical user interfaces. In view of the great complexity of the CMS Tracker detector the standard visualization tools based on histograms have been complemented with 2 and 3 dimensional graphical images of the subdetector that can show the whole detector down to single channel resolution. The functionalities of the CMS Silicon Strip Tracker DQM system and the experience acquired during the SST commissioning will be discussed.
        Speaker: Maria Assunta Borgia (Unknown)
        Slides
      • 313
        Commissioning of the Muon Track Reconstruction in the ATLAS Experiment
        The Muon Spectrometer for the ATLAS experiment at the LHC is designed to identify muons with transverse momentum greater than 3 GeV/c and measure muon momenta with high precision up to the highest momenta expected at the LHC. The 50-micron sagitta resolution translates into a transverse momentum resolution of 10% for muon transverse momenta of 1 TeV/c. The design resolution requires an accurate control of the positions of the muon detectors and of the distortions of the nominal layout of individual chambers, induced by mechanical stress and thermal gradients during the experiment operation. Accurate calibration of the time to distance relation in the Monitored Drift Tubes is also required to reach design performance. We describe the software chain that implements corrections for the alignment and calibration of the chambers, as well as the algorithms implemented to perform pattern recognition and track fitting in the ATLAS Muon Spectrometer. In particular, we report on the performance of the complete software chain in the context of first single-beam LHC running as well as ATLAS combined cosmics data taking.
        Speaker: Martin Woudstra (University of Massachusetts)
        Slides
      • 314
        A New Tool for Measuring Detector Performance in ATLAS
        The determination of the ATLAS detector performance in data is essential for all physics analyses and even more important to understand the detector during the first data taking period. Hence a common framework for the performance determination provides a useful and important tool for various applications. We report on the implementation of a performance tool with common software solutions for the corresponding data analyses. The tool provides a framework for gathering the input data, a common format of the output data, as well as methods to store the results in a collaboration wide accessible database. The aim is to implement an ATLAS standard that will be used for performance monitoring, physics analyses, and as realistic input to Monte Carlo event simulation. Deployment in every level of LHC data production centers, so-called Tier-1/2/3 centers, is supported. The overall concept of the performance tool, its realization and first experiences will be presented.
        Speakers: Dr Arno Straessner (IKTP, TU Dresden), Dr Matthias Schott (CERN)
        Slides
      • 315
        Parallel ALICE offline reconstruction with PROOF
        The fast feedback from the offline reconstruction is essential for understanding the ALICE detector and the reconstruction software, especially for the first LHC physics studies. For this purpose, ALICE offline reconstruction based on the Parallel ROOT Facility (PROOF) has been designed and developed. The architecture and implementation are briefly described. Particular attention is given to the raw and conditions data access as well as the handling of the resulting event summary data. The achieved performance is discussed in details.
        Speaker: Peter Hristov (CERN)
        Slides
    • Grid Middleware and Networking Technologies: Tuesday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 316
        VOMRS / VOMS Utilization Patterns And Convergence Plan
        The Grid community uses two well-established registration services, which allow users to be authenticated under the auspices of Virtual Organizations (VOs). The Virtual Organization Membership Service (VOMS), developed in the context of the Enabling Grid for E-sciencE (EGEE) project, is an Attribute Authority service that issues attributes expressing membership information of a subject within a VO. VOMS allows to partition users in groups, assign them roles and free-form attributes which are then used to drive authorization decisions. The VOMS administrative application, VOMS-Admin, manages and populates the VOMS database with membership information. The Virtual Organization Management Registration Service (VOMRS), developed at Fermilab, extends the basic registration and management functionalities present in VOMS-Admin. It implements a registration workflow that requires VO usage policy acceptance and membership approval by administrators. VOMRS supports management of multiple grid certificates, and handling users' request for group and role assignments, and membership status. VOMRS is capable of interfacing to local systems with personnel information (e.g. the CERN Human Resource Database) and of pulling relevant member information from them. VOMRS synchronizes the relevant subset of information with VOMS. The recent development of new features in VOMS raises the possibility of rationalizing the support and converging on a single solution by continuing and extending existing collaborations between EGEE and OSG. Such strategy is supported by WLCG, OSG, US CMS, US Atlas, and other stakeholders worldwide. In this paper, we will analyze features in use by major experiments and the use cases for registration addressed by the mature single solution.
        Speakers: Andrea Ceccanti (INFN CNAF, Bologna, Italy), Tanya Levshina (FERMI NATIONAL ACCELERATOR LABORATORY)
        Slides
      • 317
        The new gLite Authorization System
        The new authorization service of the gLite middleware stack is presented. In the EGEE-II project, the overall authorization study and review gave recommendations that the authorization should be rationalized throughout the middleware stack. As per the accepted recommendations, the new authorization service is designed to focus on EGEE gLite computational components: WMS, CREAM, and glexec. At the same time, the design and implementation of this system keeps in mind other service types such as data management or user portals. This paper will outline the full design for the new gLite Authorization Service which meets the requirements provided in the authorization service requirements document. At a high level this service is designed to allow authorization policies to be administered by policy authorities, evaluated locally or remotely and enforced within an application. The result of a policy evaluation includes the authorization decision and may also include the environment under which a task must execute in order to be considered authorized. This uniform chain of policy management, evaluation and choice of environment gives a large advantage over the current authorization systems present in the gLite middleware stack.
        Speakers: Andrea Ceccanti (CNAF - INFN), John White White (Helsinki Institute of Physics HIP)
        Slides
      • 318
        On the role of integrated distributions in grid computing
        Grid computing as currently understood is normally enabled through the deployment of integrated software distributions which expose specific interfaces to core resources (data, CPU), provide clients and also higher level services. This paper examines the reasons for this reliance on large distributions and discusses whether the benefits are genuinely worth the considerable investment involved in their maintenance. Looking ahead to a context of mature standards, pervasive virtualisation and administrative decentralisation, is it time to embrace alternative models in order to optimally enable a grid infrastructure?
        Speaker: Dr Oliver Keeble (CERN)
        Slides
      • 319
        CDF software distribution on Grid using Parrot
        Large international collaborations that use de-centralized computing models are becoming a custom rather than an exception in High Energy Physics. A good computing model for such big and spread collaborations has to deal with the distribution of the experiment-specific software around the world. When the CDF experiment developed its software infrastructure, most computing was done on dedicated clusters. As a result,libraries, configuration files, and large executable were deployed over a shared file system. In order to adapt its computing model with the Grid CDF decided to distribute its software to all the European Grid sites using Parrot, a user-level application able to attach existing programs to remote I/O systems through the filesystem interface. This choice allows CDF to use just one centralized source of code and a scalable set of caches all around Europe to efficiently distribute its code and requires almost no interaction with the existing Grid middle-ware or with local system administrators. This system is in production at CDF in Europe since more than 1 year. Here, we present the CDF implementation of Parrot and the performances. We will discuss in detail scalability issues and the improved performances with the usage of cache-coherence which has been developed inside CDF and integrated in the Parrot release.
        Speaker: Dr Simone Pagan Griso (University and INFN Padova)
        Slides
      • 320
        Grid Middleware for WLCG - where are we now, and where do we go from here?
        This paper will provide a review of the middleware that is currently used in WLCG, and how that compares to what was initially expected when the project started. The talk will look at some of the lessons to be learned, and why what is in use today is sometimes quite different from what may have been anticipated. For the future it is clear that finding the effort for long term support and maintenance is an issue, we will look at some proposals to move away from idiosyncratic middleware to more mainstream software components and consider how to make use of other grid or cloud technology developments . In particular the paper will discuss possible future strategies for management of data and for job submission and management.
        Speaker: Dr Ian Bird (CERN)
        Slides
      • 321
        Modern methods of application code distributions on the Grid
        AliEn is the GRID interface that ALICE has developed to be able to do its distributed computing. AliEn provides all the components needed to build a distributed environment, including a file and metadata catalogue, a priority-based job execution model and a file replication system. Another of the components provided by AliEn is an automatic software package installation service, PackMan. PackMan allows having multiple versions of each package, dependencies, support for different platforms, post installation methods and package setup and configuration. The rest of this paper will describe how the PackMan works, starting with how the packages are created and registered in the AliEn file catalogue, how the users specify the packages that they need, and finally how the packages are installed and configured before the jobs are executed.
        Speaker: Pablo Saiz (CERN)
        Slides
    • Hardware and Computing Fabrics: Tuesday Club B

      Club B

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Sverre Jarp (CERN)
      • 322
        Study of Solid State Drives performance in PROOF distributed analysis system
        Solid State Drives (SSD) is a very promising storage technology for High Energy Physics parallel analysis farms. Its combination of low random access time and relatively high read speed is very well suited for situations where multiple jobs concurrently access data located on the same drive. It also has lower energy consumption and higher vibration tolerance than Hard Disk Drive (HDD) which makes it an attractive choice in many applications raging from personal laptops to large analysis farms. The Parallel ROOT Facility - PROOF is a distributed analysis system which allows to exploit inherent event level parallelism of high energy physics data. PROOF is especially efficient together with distributed local storage systems like Xrootd, when data are distributed over computing nodes. In such an architecture the local disk subsystem I/O performance becomes a critical factor, especially when computing nodes use multi-core CPUs. We will discuss our experience with SSDs in PROOF environment. We will compare performance of HDD with SSD in I/O intensive analysis scenarios. In particular we will discuss PROOF system performance scaling with a number of simultaneously running analysis jobs.
        Speaker: Dr Sergey Panitkin (Department of Physics - Brookhaven National Laboratory (BNL))
        Slides
      • 323
        Monitoring Individual Traffic Flows in the Atlas TDAQ Network
        The ATLAS data network interconnects up to 2000 processors using up to 200 edge switches and five multi-blade chassis devices. Classical, SNMP-based, network monitoring provides statistics on aggregate traffic, but something else is needed to be able to quantify single traffic flows. sFlow is an industry standard which enables an Ethernet switch to take a sample of the packets traversing it and send them to a collector for permanent storage. The packet samples are analyzed in software and conversations at different network layers can be individually traced. Implementing statistical packet sampling into the ATLAS network gives us the ability to identify and examine the causes of unknown traffic patterns. As every switch in ATLAS supports sFlow, there is the potential to concurrently monitor over 4000 ports. Since brief transactions can be important, we operate sFlow at high sampling rates, up to one sample per 512 packets, which together with the large number of ports in the system generates a data handling problem of its own. This paper describes how this problem is addressed by making it possible to collect and store data either centrally or distributed according to need. The developed system consists of a collector, a service exposing the data, and a web interface. The methods used to present the results in a meaningful fashion for system analysts are discussed and we explore the possibilities and limitations of this diagnostic tool, giving examples of its use in solving system problems that arise during the ATLAS data taking.
        Speaker: Mr Rune Sjoen (Bergen University College)
        Slides
      • 324
        Oracle and storage IOs, explanations and experience at CERN
        The Oracle database system is used extensively in the High Energy Physics community. Access to the storage subsystem is one of the major components of the Oracle database. Oracle has introduced new ways to access and manage the storage subsystem in the past years like ASM (10.1), Direct NFS (11.1) and Exadata (11.1). This paper presents our experience with the different features linked to storage access and management that we have been using in the past years with a comparison of functionality. It compares the different solutions used at CERN and the Tier 1 for Oracle storage.
        Speaker: Mr Eric Grancher (CERN)
        Slides
      • 325
        A Service-Based SLA for the RACF at Brookhaven National Lab
        The RACF provides computing support to a broad spectrum of scientific programs at Brookhaven. The continuing growth of the facility, the diverse needs of the scientific programs and the increasingly prominent role of distributed computing requires the RACF to change from a system to a service-based SLA with our user communities. A service-based SLA allows the RACF to coordinate more efficiently the operation, maintenance and development of the facility by mapping out a matrix of system and service dependencies and by creating a new, configurable alarm management layer that automates service alerts and notification of operations staff. This presentation describes the adjustments made by the RACF to transition to a service-based SLA, including the integration of its monitoring software, alarm notification mechanism and service ticket system at the facility to make the new SLA a reality. A status update of the implementation of the new SLA will also be reported during this presentation.
        Speakers: Dr Jason Smith (Brookhaven National Laboratory), Ms Mizuki Karasawa (Brookhaven National Laboratory)
        Slides
      • 326
        The Integration of Virtualization into the U.S. ATLAS Tier 1 Facility at Brookhaven
        The RHIC/ATLAS Computing Facility (RACF) processor farm at Brookhaven National Laboratory currently provides over 7200 cpu cores (over 13 million SpecInt2000 of processing power) for computation. Our ability to supply this level of computational capacity in a data-center limited by physical space, cooling and electrical power is primarily due to the availability of increasingly dense multi-core x86 cpu's. In this era of dense and inexpensive multi-core processors, the use of system virtualization has become increasingly important. By virtualizing a single multi-core server into many, one can minimize the impact of operating system and service failures. Virtualization can also serve as a useful tool in the elimination of resource contention issues on compute nodes. For these reasons, we have split a number of our multi-core systems into virtual Condor batch, interactive and testbed components with Xen. The flexibility offered by virtualization comes with a price, however, a new level of configuration management complexity. This presentation will discuss our experiences with Xen. In particular, we will cover our development of a custom software toolkit to simplify Xen configuration management. This has allowed us to integrate Xen deployments with our existing, automated OS provisioning system and can potentially lead to the virtualization to thousands of hosts.
        Speakers: Mr Christopher Hollowell (Brookhaven National Laboratory), Mr Robert Petkus (Brookhaven National Laboratory)
        Slides
      • 327
        Analysis of internal network requirements for the distributed Nordic Tier-1
        The Tier-1 facility operated by the Nordic DataGrid Facility (NDGF) differs significantly from other Tier-1s in several aspects: It is not located at one or a few locations but instead distributed throughout the Nordic, it is not under the governance of a single organisation but but is instead build from resources under the control of a number of different national organisations. Being physically distributed makes the design and implementation of the networking infrastructure a challenge. NDGF has its own internal OPN connecting the sites participating in the distributed Tier-1. To assess the suitability of the network design and the capacity of the links, we present a model of the internal bandwidth needs for the NDGF Tier-1 and its associated Tier-2 sites. The model takes the different type of workloads into account and can handle different kinds of data management strategies. It has already been used to dimension the internal network structure of NDGF. We also compare the model with real life data measurements.
        Speaker: Dr Josva Kleist (Nordic Data Grid Facility)
        Slides
    • Online Computing: Tuesday Club D

      Club D

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Sponsored by ACEOLE

      Convener: Rainer Mankel (CERN)
      • 328
        Data Acquisition Backbone Core DABC Release v1.0.
        For the new experiments at FAIR new concepts of data acquisition systems have to be developed like the distribution of self-triggered, time stamped data streams over high performance networks for event building. The Data Acquisition Backbone Core (DABC) is a general purpose software framework designed for the implementation of such data acquisition systems. It is based on C++ and Java. A first version is now published and available. The DABC framework can be used to develop data acquisition systems ranging from small to high performance systems, but also for the implementation of various test beds for detector tests, readout components test, and data flow investigations. It provides the event building over fast networks like InfiniBand or Gigabit Ethernet. All kinds of data channels (front-end systems) are supported by program plug-ins into functional components of DABC like data input, combiner, scheduler, event builder, analysis and storage components. The DABC kernel is separated from the controlling environment by generic interface classes. Several implementations for configuration and runtime control can be attached. For the first release, both a lightweight batch-like environment, and a full runtime controls system based on XDAQ (http/SOAP communication) with DIM are provided. Here commands and parameters of DABC and its application plug-ins are published by DIM servers or optionally on web-servers, respectively. A generic Java GUI provides the dynamic control and visualization of these components. Application specific GUIs can be added. A first set of plug-ins has been implemented to use DABC as event builder for the front-end components of the GSI standard DAQ system MBS (Multi Branch System). Another implementation covers the connection to DAQ readout chains from detector front-end boards (N-XYTER) linked to read-out controller boards (ROC) over UDP into DABC for event building, archiving and data serving. This was applied for data taking in the September 2008 test beamtime for the CBM experiment at GSI. The development of key components is supported by the FutureDAQ project of the European Union (RP6 I3HP JRA1).
        Speaker: Dr Hans G. Essel (GSI)
        Slides
      • 329
        Commissioning the ATLAS Inner Detector Trigger
        The ATLAS experiment is one of two general-purpose experiments at the Large Hadron Collider (LHC). It has a three-level trigger, designed to reduce the 40MHz bunch-crossing rate to about 200Hz for recording. Online track reconstruction, an essential ingredient to achieve this design goal, is performed at the software-based second (L2) and third levels (Event Filter, EF), running on farms of commercial PCs. The L2, designed to provide about a 50-fold reduction in the event rate with an average execution time of about 40ms, uses custom fast tracking algorithms, doing complementary pattern recognition on data either from the Si detectors or from the transition-radiation tracker. The EF uses offline software components and has been designed to give about a further 10-fold rate reduction with an average execution time of about 4s. We report on the commissioning of the tracking algorithms during the first operation of the LHC and their performance with cosmic-ray data collected recently in the first combined running with the whole detector fully assembled. We describe customizations to the algorithms to have close to 100% efficiency for cosmic tracks that are used for the alignment of the trackers, since they are normally tuned for tracks originating from around the beampipe.
        Speaker: Dr Mark Sutton (University of Sheffield)
        Slides
      • 330
        Alignment data streams for the ATLAS Inner Detector
        The ATLAS experiment uses a complex trigger strategy to be able to achieve the necessary Event Filter rate output, making possible to optimize the storage and processing needs of these data. These needs are described in the ATLAS Computing Model, which embraces Grid concepts. The output coming from the Event Filter will consist of three main streams: a primary stream, the express stream and the calibration stream. The calibration stream will be transferred to the Tier-0 facilities which will allow the prompt reconstruction of this stream with a minimum latency of 8 hours, producing calibration constants of sufficient quality to permit a first-pass processing. An independent calibration stream is developed and tested, which selects tracks at the trigger level 2 after the reconstruction. The stream is composed of raw data, in byte-stream format, and contains information of limited parts of the detector, in particular only the hit information of the selected tracks. This leads to a significantly improved bandwidth usage and storage capability. The stream will be used to derive and update the calibration and alignment constants if necessary every 24h. Processing is done using specialized algorithms running in Athena framework in dedicated Tier-0 resources, and the alignment constants will be stored and distributed using the COOL conditions database infrastructure. The work is addressing in particular the alignment requirements, the needs for track and hit selection, timing and bandwidth issues.
        Speaker: Belmiro Pinto (Universidade de Lisboa)
        Slides
      • 331
        The CMS Muon System Alignment
        The alignment of the Muon System of CMS is performed using different techniques: photogrammetry measurements, optical alignment and alignment with tracks. For track-based alignment, several methods are employed, ranging from a hit-impact point (HIP) algorithm and a procedure exploiting chamber overlaps to a global fit method based on the Millepede approach. For start-up alignment, cosmic muon and beam halo signatures play a very strong role, in particular as long as available integrated luminosity is still significantly limiting the size of the muon sample from collisions. During the last commissioning runs the first aligned geometries have been produced and validated, and have been used at the CMS offline computing infrastructure in order to perform improved reconstructions. This presentation develops the computational aspects related to the calculation of alignment constants at the CERN Analysis Facility (CAF), the production and population of databases and the validation and performance in the official reconstruction. Also the integration of track-based and other sources of alignment is discussed.
        Speaker: Mr Pablo Martinez Ruiz Del Arbol (Instituto de Física de Cantabria)
        Slides
      • 332
        Online testbench for LHCb High Level Trigger validation
        The High Level Trigger and Data Acquisition system selects about 2 kHz of events out of the 40 MHz of beam crossings. The selected events are consolidated into files on an onsite storage and then sent to permanent storage for subsequent analysis on the Grid. For local and full-chain tests a method to exercise the data-flow through the High Level Trigger when there are no actual data is needed. In order to test the system as much as possible under identical conditions as for data-taking the solution would be to inject data at the input of the HLT at a minimum rate of 2 kHz. This is done via a software implementation of the trigger system which sends data to the HLT. The application has to simulate that the data it sends come from a real LHCb readout-boards. Both simulation data and previously recorded real data can be re-played through the system in this manner. As the data rate is high (~ 100 MB/s), care has been taken to optimise the emulator for throughput from the SAN. The emulator can be run in stand-alone mode or run as a pseudo-subdetector of LHCb, allowing to use all the standard run-control tools, down to the trigger control. The architecture, implementation and performance results of the emulator and full tests will be presented.
        Speaker: Jean-Christophe Garnier (Conseil Europeen Recherche Nucl. (CERN)-Unknown-Unknown)
        Slides
      • 333
        Time Calibration of the ATLAS Tile Calorimeter
        The ATLAS Tile Calorimeter is ready for data taking during the proton-proton collisions provided by the Large Hadron Collider (LHC). The Tile Calorimeter is a sampling calorimeter with iron absorbers and scintillators as active medium. The scintillators are read out by wave length shifting fibers and PMTs. The LHC provides collisions every 25ns, putting very stringent requirements on the synchronization of the ATLAS triggering systems and the read out of the on-detector electronics. More than 99% of the read out channels of the Tile Calorimeter have been time calibrated using laser pulses sent directly to the PMTs. Timing constants can be calculated after corrections for i) propagation of trigger and clock signals, ii) differences in laser light paths to the different parts of the calorimeter. The calibration is implemented by i) programming delays in the on-detector electronics for groups of 6 channels, ii) residual deviations from perfect synchronization are stored in a database and used during the offline reconstruction of the Tile Calorimeter data. From the point of view of triggering and clock signals the Tile Calorimeter is divided into 4 independent sections. The time calibration has been used in each of the 4 sections, on the data taken during long ATLAS cosmic runs and during LHC beam time in September 2008. This has confirmed a timing uniformity of 2ns in each of the 4 calorimeter sections. The remaining delays between the 4 calorimeter sections have been measured i) using the laser pulses interleaved with cosmic trigger inside a global ATLAS run and ii) using real LHC events, and show consistent results. The main limitations on the precision of the time calibration are presented.
        Speaker: Mr Bjorn (on behalf of the ATLAS Tile Calorimeter system) Nordkvist (Stockholm University)
        Slides
    • Software Components, Tools and Databases: Tuesday Club A

      Club A

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Maria Grazia Pia (INFN)
      • 334
        Recent Developments in the Gaudi Software Framework
        After ten years from its first version, the Gaudi software framework underwent many changes and improvements with a subsequent increased of the code base. Those changes were almost always introduced preserving the backward compatibility and reducing as much as possible changes in the framework itself; obsolete code has been removed only rarely. After a release of Gaudi targeted to the data taking of 2008, it has been decided to have a review the code of the framework with the aim of a general consolidation in view of the data taking of 2009. We also decided to take the occasion to introduce those improvements never implemented because of the big impact they have on the rest of the code, and those changes of the framework needed to solve some intrinsic problems of the implementation, but never made because too disruptive. With this contribution we want to describe which are the problems we addressed and the improvements we made to the framework during this review.
        Speaker: Marco Clemencic (European Organization for Nuclear Research (CERN))
        Slides
      • 335
        Job Life Cycle Management libraries for CMS Workflow Management Projects
        Three different projects within CMS produce various workflow related data products: CRAB (analysis centric), ProdAgent (simulation production centric), T0 (real time sorting and reconstruction of real events). Although their data products and workflows are different, they all deal with job life cycle management (creation, submission, tracking, and cleanup of jobs). WMCore provides a set of common libraries to assist sub projects with the development of their job life cycle management infrastructure and incorporates experiences and lessons learned from the sub projects it serves. WMCore consists of several libraries: A model for associating workflows, jobs and files, modules for building autonomous components, communication, synchronization and database access, and other components usable by all three sub projects. WMCore does not provide specifics on how various data products need to be produced but enables developers from these sub projects to focus on this while using the basic building blocks from WMCore. WMCore is a common set of libraries for CMS workflow systems, with the aim of reducing code duplication between sub projects, increasing maintainability and enable the developers to focus on the core goals of their respective projects: analysis, production and sorting/reconstruction. This paper will introduce the concept of job life cycle management as the common theme in the CMS workflow management projects and gives an overview of the various WMCore libraries.
        Speakers: Mr Frank van Lingen (California Institute of Technology), Mr Stuart Wakefield (Imperial College)
        Slides
      • 336
        ATLAS Data Quality Offline Monitoring
        The ATLAS experiment at the Large Hadron Collider reads out 100 Million electronic channels at a rate of 200 Hz. Before the data are shipped to storage and analysis centres across the world, they have to be checked to be free from irregularities which render them scientifically useless. Data quality offline monitoring provides prompt feedback from full first-pass event reconstruction at the Tier-0 computing centre and can unveil problems in the detector hardware and in the data processing chain. Detector information and reconstructed proton-proton collision event characteristics are distilled into a few key histograms and numbers which are automatically compared with a reference. The results of the comparisons are saved as status flags in a database and are published together with the histograms on a web server. They are inspected by a 24/7 shift crew who can notify on-call experts in case of problems and in extreme cases signal data taking abort. The talk explains the technical realisations of the offline monitoring chain.
        Speaker: Peter Onyisi (University of Chicago)
        Slides
      • 337
        Flexible Session Management in a Distributed System
        Many secure communication libraries used by distributed systems, such as SSL, TLS, and Kerberos, fail to make a clear distinction between the authentication, session, and communication layers. In this paper we introduce CEDAR, the secure communication library used by the Condor High Throughput Computing software, and present the advantages to a distributed computing system resulting from CEDAR's separation of these layers. Regardless of the authentication method used, CEDAR establishes a secure session key, which has the flexibility to be used for multiple capabilities. We demonstrate how a layered approach to security sessions can avoid round-trips and latency inherent in network authentication. The creation of a distinct session management layer allows for optimizations to improve scalability by way of delagating sessions to other components in the system. This session delegation creates a chain of trust that reduces the overhead of establishing secure connections and enables centralized enforcement of system-wide security policies. Additionally, secure channels based upon UDP datagrams are often overlooked by existing libraries; we show how CEDAR's structure accommodates this as well. As an example of the utility of this work, we show how the use of delegated security sessions and other techniques inherent in CEDAR's architecture enables US CMS to meet their scalability requirements in deploying Condor over large-scale, wide-area grid systems.
        Speaker: Zachary Miller (University of Wisconsin)
        Slides
      • 338
        Global Overview of the current ROOT system
        In the last few years ROOT has continued to consolidate and improve the existing code base and infrastructure. This includes a very smooth transition to SVN that subsequently enabled us to reorganize the existing libraries into semantic packages, which in turn help in improving the documentation. We also continued to improvement performance and reduce memory footprint for example by introducing a way to split vector of pointers or by adding support for decompressing the file data buffer in advance in an additional thread. In this presentation we will discuss these developments as well as many other improvements and our plans for the coming year.
        Speaker: Dr Rene Brun (CERN)
        Slides
      • 339
        New ROOT Graphical User Interfaces for fitting
        ROOT, as a scientific data analysis framework, provides extensive capabilities via graphics user interfaces (GUI) for performing interactive analysis and visualize data objects like histograms and graphs. A new interface for fitting has been developed for performing, exploring and comparing fits on data point sets such as histograms, multi-dimensional graphs or trees. With this new interfaces, users can build interactively the fit model function, set parameter values and constraints and select fit and minimization methods with their options. Functionality for visualizing the fit results is as well provided, with the possibility of drawing residuals or confidence intervals. Furthermore, the new fit panel reacts as a standalone application and it does not prevent users from interacting with other windows. We will describe in great detail the functionality of this user interface, covering as well the new capability provided by the new fitting and minimization tools introduced recently in the ROOT framework.
        Speaker: David Gonzalez Maline (CERN)
    • Plenary: Wednesday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Nobuhiko Katayamu (KEK)
      • 340
        Optical Networks - Evolution and Future
        Optical Networks - Evolution and Future
        Speaker: Prof. Hans Döbbeling (DANTE)
        Video
      • 341
        Grid Security and Identity Management
        Grid Security and Identity Management
        Speaker: Mine Altunay (FERMI NATIONAL ACCELERATOR LABORATORY)
        Slides
        Video
      • 342
        Computing for the RHIC Experiments
        Computing for the RHIC Experiments
        Speaker: Prof. Jerome Lauret (BNL)
        Slides
        Video
    • 10:30
      coffee break
    • Plenary: Wednesday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Lothar Bauerdick (Fermilab)
      • 343
        Software Developments Tools and Techniques, Core Performance
        When experiments get close to data taking, the pace of software development becomes frantic, and experiments librarians and software developers rely on performance monitoring and optimization to keep core resources usage (memory and CPU) under control. Performance monitoring and optimization share many tools, but they are distinct processes with very different workflows. In this talk we will outline the challenges currently facing LHC experiments and review some of the tools and techniques available, based on the experience of several HENP experiments.
        Speaker: Dr Paolo Calafiura (LBL)
        Slides
        Video
      • 344
        Geometry, Reconstruction and Event Visualization
        After more than a decade of software development the LHC experiments have successfully released their offline software for the commissioning with data. Sophisticated detector description models are necessary to match the physics requirements on the simulation, while fast geometries are in use to speed up the high level trigger and offline track reconstruction. The experiments explore modern reconstruction techniques involving novel fitting and pattern recognition ideas to optimize the performance for physics. Calibration and alignment of the detectors using cosmic data has been the focus for the past months. Event displays are an indispensable tool for the monitoring and the debugging of the detectors and of the reconstruction software. The talk will give an overview of recent developments in the LHC experiments in the area of geometry, reconstruction and event visualization.
        Speaker: Prof. Markus Elsing (CERN)
        Slides
        Video
      • 345
        Data and metadata management - distributed data access and distributed databases
        Data and meta data management at Petabyte scale remains at the key challenges for the High Energy Physics community. Efficient distribution and reliable access to Petabytes of distributed data in files and relational database will be required to exploit the physics potential of LHC data and the resources available to the experiments in the world wide LHC computing grid. In this presentation we will summarise the software and deployment infrastructure for distributed data management at CERN and the WLCG partner sites and review the upcoming challenges for sustainable production deployment. We will focus on common technical challenges for the storage and distribution systems as the experiments are ramping up their distributed production and analysis work at CERN and the tier sites and outline the impact of new technologies such as data access protocols, virtualised storage, clustered file systems and changing storage media roles in the medium and long term.
        Speaker: Dr Dirk Duellmann (CERN)
        Slides
        Video
    • 13:00
      lunch box
    • 14:00
      Tours
    • Poster session: whole day
      • 346
        A Collaborative Network Middleware Project by Lambda Station, TeraPaths, and Phoebus
        The TeraPaths, Lambda Station, and Phoebus projects were funded by the Department Of Energy's (DOE) network research program to support efficient, predictable, prioritized petascale data replication in modern high-speed networks, directly address the "last-mile" problem between local computing resources and WAN paths, and provide interfaces to modern, high performance hybrid networks with low entry barrier for users. Within the framework of the three projects, we successfully developed services that establish on-demand and manage true end-to-end, Quality-of-Service (QoS) aware, virtual network paths across multiple administrative network domains, select network paths and gracefully reroute traffic over these dynamic paths, and streamline traffic between packet and circuit networks using transparent gateways. These services function as “network middleware” and improve network QoS and performance for applications, playing a critical role in the effective use of emerging dynamic circuit network services. They provide interfaces to applications, such as dCache SRM, translate network service requests into network device configurations, and coordinate with each other to setup up end-to-end network paths. Building upon the success of the three projects, which target the same user community, utilize compatible technologies, and have similar goals, we work together to research and develop the next generation of network middleware. We address challenges such as cross-domain control plane signaling and interoperability, authentication and authorization, topology discovery, and dynamic status tracking. Our roadmap is to co-design network models that ensure effective inter-domain topology discovery and network utilization, utilize the perfSONAR infrastructure to monitor dynamic circuit status and measure performance, enhance Grid authentication and authorization to support inter-domain trust, and integrate our joint work with the Inter-Domain Control plane efforts (IDC). The new network middleware will be deployed and fully vetted in the Large Hadron Collider data movement environment.
        Speaker: Dr Dantong Yu (BROOKHAVEN NATIONAL LABORATORY)
        Paper
      • 347
        A DAQ System for CAMAC controlller CC/NET using DAQ-Middleware
        We report DAQ System based on DAQ-Middleware. This system is consisting of GUI client application and CC/NET readout programs. CC/NET is a CAMAC crate controller module which was created by us from a joint research of TOYO corporation and KEK. CC/NET based on pipeline processing can operate at CAMAC specification limit speed. It has a single board computer that Linux operating system ran on. It is easily accessible to the internet because the CC/NET have a 100-based ethernet interface connector on the front panel. DAQ-Middleware is a system for enabling distributed data acquisition for the next generation Nuclear Physics experiments at KEK. The DAQ-Middleware have several components consisting of data monitoring, data logging, data gathering and so on. DAQ-Middleware provide all necessary functions for DAQ System. These DAQ functions have been made functional segmentation into the conponent. The user of the DAQ-Middleware can construct the DAQ system by combining these conponents. The GUI client of this DAQ system is a simple web service approach providing client HTTP access to a http server on the DAQ-Middleware.  We constructed this DAQ System by addition of the GUI client application and CC/NET readout programs to the DAQ-Middleware.   We show how it is convenient to use DAQ-Middleware.
        Speaker: Mr Eiji Inoue (KEK)
        Poster
      • 348
        A High Performance Data Transfer Service
        To satisfy the demands of data intensive applications it is necessary to move to far more synergetic relationships between data transfer applications and the network infrastructure. The main objective of the High Performance Data Transfer Service we present is to effectively use the available network infrastructure capacity and to coordinate, manage and control large data transfer tasks between many distributed computing centers. The Fast Data Transfer (FDT) application is used to perform all data transfer tasks. FDT is capable of reading and writing at disk or storage speed over wide area networks using standard TCP. FDT is based on an asynchronous, flexible multithreaded system which is used to continuously stream a dataset though one or more TCP sockets. In the MonALISA framework, we developed a set of collaborating agents which are used to monitor the network topology and its load, the available computing and storage resources and based on all this information to efficiently schedule and optimize many concurrent data transfer tasks requested by users or production tools.
        Speaker: Dr Iosif Legrand (CALTECH)
        Poster
      • 349
        A massive BES job submit and management tool
        The operation of the BESIII experiment started on July, 2008. More than 5 PB data will be produced in the coming 5 years. To increase the efficiency of data analysis and simulation, it is necessary sometimes for the physicists to cut a long job into a certain number of small jobs and execute in a distributed mode. A tool is developed for the BESIII job submission and management. With the tool, a large BES job will be split into a series of sub-jobs which can be processed in parallel. Client / Server architecture is used in the tool. Two kinds of job client, namely web and command line interfaces, are provided for the job operations. To implement the tool system, the web services running on several job servers are used to manage the jobs and accept jobs from the job clients. A set of load balance policy is assigned among the job servers. A Database is used to keep the information of submitted jobs and to provide the information for the BES dataset. Jobs could be submitted to the different computing back-ends of local batch system as well as grid system according to the user's request. Job result checking and re-submission functions are provided in case of job failures.
        Speaker: Dr Jingyan Shi (IHEP)
        Poster
      • 350
        A Multicore Communication Architecture for Distributed Petascale Computing
        Distributed petascale computing involves analysis of massive data sets in a large-scale cluster computing environment. Its major concern is to efficiently and rapidly move the data sets to the computation and send results back to users or storage. However, the needed efficiency of data movement has hardly been achieved in practice. Present cluster operating systems usually are general-purpose operating systems, typically Linux or some other UNIX variant. UNIX was developed more than three decades ago, when computing systems were all single core. Computation intensive applications and timesharing were the major concerns. Though the UNIX OS family has evolved through the years, Unix network services are not well prepared for distributed petascale computing. The proliferation of multi-core architectures has added a new dimension of parallelism in computer systems. In this paper, we describe a Multi-core Communication Architecture (MCA) for the distributed petascale computing environment. Our goal is to design OS mechanisms that optimize network I/O operations for multi-core systems. In our proposed architecture, MCA vertically partitions CPU cores on a multi-core system, allocating cores for either computation or communication, respectively. Cores dedicated to communication perform “TCP Onloading.” MCA will dynamically adjust core partitioning, based on detected system loads. CPU cores could be dynamically reassigned between communication and computation. Combined with Receive-Side Scaling and flow pinning technologies, MCA would perform flow scheduling to ensure interrupt- and connection-level affinity for TCP/IP processing.
        Speaker: Dr Wenji Wu (Fermi National Accelerator Laboratory)
        Poster
      • 351
        A new CDF model for data movement based on SRM
        Being a large international collaboration established well before the full development of the Grid as the main computing tool for High Energy Physics, CDF has recently changed and improved its computing model, decentralizing some parts of it in order to be able to exploit the rising number of distributed resources available nowadays. Despite those efforts, while the large majority of CDF Monte Carlo production has moved to the Grid, data processing is still mainly performed in dedicated farms hosted at FNAL, requiring a centralized management of data and Monte Carlo samples needed for physics analysis. This rises the question on how to manage the transfer of produced Monte Carlo samples from remote Grid sites to FNAL in an efficient way; up to now CDF has relied on a non scalable centralized solution based on dedicated data servers accessed through rcp protocol, which has proven to be unsatisfactory. A new data transfer model has been designed that uses SRMs as local caches for remote Monte Carlo production sites, interfaces them with SAM, the experiment data catalog, and finally realizes the file movement exploiting the features provided by the data catalog transfer layer. We describe here the model and its integration within the current CDF computing architecture. We discuss the performance gain and the benefits of the new framework in comparison with the old approach.
        Speakers: Dr Gabriele Compostella (CNAF INFN), Dr Manoj Kumar Jha (INFN Bologna)
        Poster
      • 352
        A prototype of a Virtual Analysis Facility: first experiences
        Current Grid deployments for LHC computing (namely the WLCG infrastructure) do not allow efficient parallel interactive processing of data. In order to allow physicists to interactively access subsets of data (e.g. for algorithm tuning and debugging before running over a full dataset) parallel Analysis Facilities based on PROOF have been deployed by the ALICE experiment at CERN and elsewhere. Whereas large Tier-1 centres may afford to build such facilities at the expense of their Grid farms, or exploit the large number of jobs finishing at any given time to quickly collect a number of nodes to temporarily allocate for interactive work, this is likely not to be true for smaller Tier-2s centres. Leveraging on the virtualisation of highly performant multi-core machines, it is possible to build a fully virtual Analysis Facility on the same Worker Nodes that compose an existing LCG Grid Farm. Using the Xen paravirtualisation hypervisor, it is then possible to dynamically move resources from the batch instance to the interactive one when needed, minimizing latencies and wasted resources. We present the status of the prototype being developed, and some experience from the very first users.
        Speaker: Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare (INFN))
        Poster
      • 353
        An Assessment of a Model for Error Processing in the CMS Data Acquisition System
        The CMS Data Acquisition System consists of O(1000) of interdependent services. A monitoring system providing exception and application-specific data is essential for the operation of this cluster. Due to the number of involved services the amount of monitoring data is higher than a human operator can handle efficiently. Thus moving the expert-knowledge for error analysis from the operator to a dedicated system is a natural choice. This reduces the number of notifications to the operator for simpler visualization and provides meaningful error cause descriptions and suggestions for possible countermeasures. This paper discusses an architecture of a workflow-based hierarchical error analysis system based on Guardians for the CMS Data Acquisition System. Guardians provide a common interface for error analysis of a specific service or subsystem. To provide effective and complete error analysis, the requirements regarding information sources, monitoring and configuration, are analyzed. Formats for common notification types are defined and a generic Guardian based on Event-Condition-Action rules is presented as a proof-of-concept.
        Speaker: Mr Roland Moser (CERN and Technical University of Vienna)
        Poster
      • 354
        Analysing ATLAS user behaviour for improved distributed mass-storage performance
        Unrestricted user behaviour is becoming one of the most critical properties in data intensive supercomputing. While policies can help to maintain a usable environment in clearly directed cases, it is important to know how users interact with the system so that it can be adapted dynamically, automatically and timely. We present a statistical and generative model that can replicate and simulate user behaviour in a large scale data intensive system. This model can help site administrators to anticipate future workload from users and therefore provide accurate local improvements to their storage systems. The theoretical foundation of the model is examined and validated against experimental results from the ATLAS distributed data management system DQ2.
        Speaker: Mr Mario Lassnig (CERN & University of Innsbruck)
        Poster
      • 355
        Analysis of job executions through the EGEE Grid
        The Grid as an environment for large-scale job execution is now moving beyond the prototyping phase to real deployments on national and international scales providing real computational cycles to application scientists. As the Grid move into production, characteristics about how users are exploiting the resources and how the resources are coping with production load are essential in understanding how the Grid is working and how it should be modified in order to better meet the needs of the growing user communities. Such characteristics as user submission patterns, average job execution times – for different communities which are organized into Virtual Organizations (VOs), the number of active members within a VO along with how the Grid infrastructure is coping with this load – both at a resource and middleware level – are vital in order to judge what are the critical bottlenecks that need overcoming to ensure continued success. In order to better understand these characteristics a full analysis of the Grid is essential. Through related work with the EGEE Real time Monitor (RTM) we have been able to collect trace logs for over 52 million job executions since September 2005. The RTM provides a near-real time graphical view of the status of the EGEE Grid requiring privileged access to the databases within EGEE. By recording this information and post processing we are able to determine the life cycle of each job submitted through the EGEE. In this paper we analyze these trace logs to determine these characteristics.
        Speaker: Dr Andrew Stephen McGough (Imperial College London)
        Poster
      • 356
        Architecture of the CMS Data Acquisition System
        The CMS event builder assembles events accepted by the first level trigger and makes them available to the high-level trigger. The system needs to handle a maximum input rate of 100 kHz and an aggregated throughput of 100 GBytes/s originating from approximately 500 sources. This paper presents the chosen hardware and software architecture. The system consists of 2 stages: an initial pre-assembly and several independent Readout Builder (RU builder) slices. The RU builder is based on 3 separate services: the buffering of event fragments during the assembly, the event assembly, and the data flow manager. A further component is responsible to handle events accepted by the high-level trigger: the Storage Manager compresses and stores the events on disk at a peak rate of 2 GBytes/s and makes them available for offline processing. In addition, events and data-quality histograms are served to online monitoring clients. We discuss the operational experience from the first months of reading out the complete CMS detector.
        Speaker: Dr Vivian ODell (FNAL)
      • 357
        ATLAS DataFlow Infrastructure: recent results from ATLAS cosmic and first-beam data-taking
        The ATLAS DataFlow infrastructure is responsible for the collection and conveyance of event data from the detector front-end electronics to the mass storage. Several optimized and multi-threaded applications fulfill this purpose operating over a multi-stage Gigabit Ethernet network which is the backbone of the ATLAS Trigger and Data Acquisition System. The system must be able to efficiently transport event-data with high reliability, while providing aggregated bandwidths larger than 5 GByte/s and coping with many thousands network connections. Nevertheless, routing and streaming capabilities and monitoring and data accounting functionalities are also fundamental requirements. During 2008, a few months of ATLAS cosmic data-taking and the first experience with the LHC beams provided an unprecedented testbed for the evaluation of the performance of the ATLAS DataFlow, in terms of functionality, robustness and stability. Besides, operating the system far from its design specifications helped in exercising its flexibility and contributed in understanding its limitations. Moreover, the integration with the detector and the interfacing with the off-line data processing and management have been able to take advantage of this extended data taking-period as well. In this paper we report on the usage of the DataFlow infrastructure during the ATLAS data-taking. These results, backed-up by complementary performance tests, validate the architecture of the ATLAS DataFlow and prove that the system is robust, flexible and scalable enough to cope with the final requirements of the ATLAS experiment.
        Speaker: Dr Wainer Vandelli (Conseil Europeen Recherche Nucl. (CERN))
        Poster
      • 358
        ATLAS Grid Information System
        The ATLAS Distributed Computing system provides a set of tools and libraries enabling data movement, processing and analysis on a grid environment. While reaching a state of maturity high enough for real data taking, it became clear that one component was missing exposing consistent information regarding site topology, service and resource information from all three distinct ATLAS grids (EGEE, OSG, NDGF). In this paper we describe the ATLAS Grid Information System (AGIS), its architecture, implementation choices, security requirements and the static and semi-static data it exposes. We discuss the different information collectors, in many cases specific to a single grid flavour, and the multiple options for remotely accessing the service, along with the available libraries and output formats. Performance results show that the final component is able to serve more than the expected load coming from services and end users.
        Speaker: Raquel Pezoa Rivera (Univ. Tecnica Federico Santa Maria (UTFSM))
        Paper
      • 359
        ATLAS High Level Calorimeter Trigger Software Performance for Cosmic Ray Events
        The ATLAS detector is undergoing intense commissioning effort with cosmic rays preparing for the first LHC colisions next spring. Combined runs with all of the ATLAS subsystems are being taken in order to evaluate the detector performance. This is an unique opportunity also for the trigger system to be studied with different detector operation modes, such as different event rates and detector configuration. The ATLAS trigger starts with a hardware based system which tries to identify detector regions where interesting physics objects may be found (eg: large energy depositions in the calorimeter system). An approved event will be further processed by more complex algorithms at the second level where detailed features are extracted (full detector granularity data for small portions of the detector is available). Events accepted at this level will be reprocessed at the so-called event filter level. Full detector data at full granularity is available for offline like processing with complete calibration to achieve the final decision. This year we could extend the experience by including more algorithms at the second level and event filter calorimeter trigger. Clustering algorithms for electrons, photons, taus, jets and missing transverse energy are being commissioned during this combined run period. We report the latest results for such algorithms. Issues such as hot calorimeter regions identification, processing time for the algorithms, data access (specially at the second level) are being evaluated. Intense usage of online and quasi-online (during offline reconstruction of runs) monitoring helps to trace and fix problems.
        Speaker: Denis Oliveira Damazio (Brookhaven National Laboratory)
        Poster
      • 360
        ATLAS Tile Calorimeter Data Preparation for LHC first beam data taking and commissioning data
        TileCal is the barrel hadronic calorimeter of the ATLAS experiment presently in an advanced state of commissioning with cosmic and single beam data at the LHC accelerator. The complexity of the experiment, the number of electronics channels and the high rate of acquired events requires a systematic strategy of the System Preparation for the Data Taking. This is done through a precise calibration of the detector, prompt update of the DataBase reconstruction constants, validation of the Data Processing and assessment of the System Data Quality both with calibration signals as well as processing data obtained with cosmic muons and the first LHC beam. This review will present the developed strategies and tools to calibrate the detector and to monitor the variations of the extracted calibration constants as a function of time; the present plan and future upgrades to deploy and update the detector constants used in reconstruction; the techniques employed to validate the reconstruction software; the set of tools of the present TileCal data quality system and its integration in ATLAS online and offline frameworks.
        Speaker: Dr Luca Fiorini (IFAE Barcelona)
        Poster
      • 361
        Automated agents for management and control of the ALICE Computing Grid
        A complex software environment such as the ALICE Computing Grid infrastructure requires permanent control and management for the large set of services involved. Automating control procedures reduces the human interaction with the various components of the system and yields better availability of the overall system. In this paper we will present how we used the MonALISA framework to gather, store and display the relevant metrics in the entire system from central and remote site services. We will also show the automatic local and global procedures that are triggered by the monitored values. Decision-taking agents are used to restart remote services, alert the operators in case of problems that cannot be automatically solved, submit production jobs, replicate and analyze raw data, resource load-balance and other control mechanisms that optimize the overall work flow and simplify day-to-day operations. Synthetic graphical views for all operational parameters, correlations, state of services and applications as well as the full history of all monitoring metrics are available for the entire system that now encompasses 80 sites all over the world, more than 10000 CPUs and 10PB of storage.
        Speaker: Mr Costin Grigoras (CERN)
        Poster
      • 362
        BESIII Data Acquisition System
        BEPCII is designed with a peak luminosity of 1033cm-2sec-1. After the Level 1 trigger, the event rate is estimated to be around 4000Hz at J/ψ peak. A pipelined front-end electronic system is designed and developed and the BESIII DAQ system is accomplished to satisfy the requirement of event readout and processing with such a high event rate. BESIII DAQ system consists of about 100 high performance computers. The system can be divided into two subsystems: approximately 40 readout VME crates as the frond end and 42 IBM eServerBlade HS20 as the online computer farm. The communications between the two subsystems and among computer nodes are realized by high speed optical links, high speed Ethernet switches. The BESIII data acquisition system is designed with a capacity of 80 MB/s for reading out data from VME crates. The online computer farms must have a throughput of 50 MB/s and capability of writing data to tape must exceed 40 MB/s after the event filtering. Multi-level buffering, parallel processing, high-speed VME readout, high-performance computing and network transmission techniques are introduced in the BESIII DAQ design. The goal of the DAQ design is to achieve high reliability, maintainability, stability, scalability and portability. The system is highly scalable and can be expanded easily when need arises. The DAQ software used in BESIII was developed based on the framework of Atlas TDAQ software. It accomplishes the data collection, assembling, filtering and recording of event data. It also provides additional control and test functions.
        Speaker: Hongyu ZHANG (Experimental Physics Center, Experimental Physics Center, Chinese Academy of Sciences, Beijing, China)
      • 363
        Cloud Storage as a new Storage Class: QoS Characterization and Cost Analysis
        In the storage model adopted by WLCG, the quality of service for a storage capacity provided by an SRM-based service is described by the concept of Storage Class. In this context, two parameters are relevant: the Retention Policy and the Access Latency. With the advent of cloud-based resources, virtualized storage capabilities are available like the Amazon Simple Storage Service (Amazon S3). Amazon S3 is a simple Web services interface offering access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of Web sites. In our previous work, we have described how Amazon S3 can be exposed via an SRM-based interface. In this paper, we discuss an extension of the definition of Storage Class to include billing aspects. Impact on the GLUE 2.0 information model are also investigated. Finally, we present a storage utilization model enabling to evaluate the potential costs of an AWS-S3 based storage system in Grid considering different integration strategies.
        Speaker: Riccardo Zappi (INFN-CNAF)
        Poster
      • 364
        CMS results from Computing Challenges and Commissioning of the computing infrastructure
        During February and May 2008, CMS participated to the Combined Computing Readiness Challenge (CCRC'08) together with all other LHC experiments. The purpose of this world-wide exercise was to check the readiness of the computing infrastructure for LHC data taking. Another set of major CMS tests called Computing, Software and Analysis challenge (CSA'08) - as well as CMS cosmic runs - were also running at the same time: CCRC augmented the load on computing with additional tests to validate and stress-test all CMS computing workflows at full data taking scale, also trying to extend this to the global WLCG community. CMS exercised most aspects of the CMS computing model, with very comprehensive tests. During May, we moved more than 3.6 Petabytes among more than 300 links in the complex Grid topology. We demonstrated that we can safely move data out of CERN to the Tier-1s, sustaining the required rate of more than 600 MB/s as a daily average for more than seven days in a row, with enough headroom and with hourly peaks of up to 1.7 GB/s. We ran hundreds of simultaneous jobs at each Tier-1 site re-reconstructing and skimming hundreds of millions of events. After re-reconstruction the fresh AOD (Analysis Object Data) has to be synchronized between Tier-1 centers: we demonstrated that the required inter-Tier-1 transfers are achievable within a few days. We also showed that skimmed analysis data sets can be transferred to Tier-2s for analysis with sufficient rate, regionally as well as inter-regionally, achieving all our goals in over 90% of the ~200 links. Simultaneously we also ran a large Tier-2 analysis exercise, where realistic analysis jobs were submitted to a large set of Tier-2 sites by a large number of people to produce a “chaotic” workload across the systems, and with more than 400 analysis users in May. Taken all together, CMS routinely achieved submission of ~100k jobs/day, with peaks up to 200k jobs/day. All the achieved results are presented and discussed in the paper.
        Speaker: Dr Daniele Bonacorsi (CMS experiment / INFN-CNAF, Bologna, Italy)
      • 365
        Commissioning and first experiences of the ALICE High Level Trigger
        For the ALICE heavy-ion experiment a large cluster will be used to perform the last triggering stages in the High Level Trigger. For the first year of operation the cluster consists of about 100 SMP nodes with 4 or 8 CPU cores each, to be increased to more than 1000 nodes for the later years of operation. During the commissioning phases of the detector, the preparations for first LHC beam, as well as during the periods of first LHC beam, the HLT has been used extensively already to reconstruct, compress, and display data from the different detectors. For example the HLT has been used to compress SDD data by a factor of 15, lossless, on the fly at a rate of more than 800 Hz. For the TPC the HLT has been used to reconstruct tracks online and show the reconstructed tracks in an online event display. The event display can also display online reconstructed data from the Dimuon and PHOS detectors. For the later detector a first selection mechanism has also been put into place to select only events for forwarding to the only display in which data has passed through the PHOS detector. In the talk we will present experiences and results from these commissioning phases.
        Speaker: Dr Timm Steinbeck (Institute of Physics)
        Poster
      • 366
        Commissioning of a StoRM based Data Management system for ATLAS at INFN sites
        In the framework of WLCG, Tier1s need to manage large manage volumes of data ranging in the PB scale. Moreover they need to be able to transfer data, from CERN and with the other centres (both Tier1s and Tier2s) with a sustained throughput of the order of hundreds of MB/s over the WAN offering at the same time a fast and reliable access also to the computing farm. In order to cope with these challenging requirements, at INFN Tier1 we have adopted a storage model based on StoRM/GPFS/TSM for the D1T0 and D1T1 Storage Classes and on CASTOR for the D0T1. In this paper we present the results of the commissioning tests of this system for the ATLAS experiment reproducing the real production case with a full matrix transfer from the Tier0 and with all the other involved centers. Noticeably also the new approach of direct file access from farm to data is covered showing showing positive results. GPFS/StoRM has also been sucessfully deployed, configured and commissioned as storage solution for an ATLAS INFN Tier2, specifically the one of Milan; the results are shown and discussed in this paper together with the ones obtained for the Tier1.
        Speaker: Claudia Ciocca (INFN)
      • 367
        Computing for CBM: Requirements and Challenges
        The Compressed Baryonic Matter experiment (CBM) is one of the core experiments to be operated at the future FAIR accelerator complex in Darmstadt, Germany, from 2014 on. It will investigate heavy-ion collisions at moderate beam energies but extreme interaction rates, which give access to extremely rare probes such as open charm or charmonium decays near the production threshold. The high event rates combined with the high track multiplicities result in extreme data rates to be processed in an online first level event selection (FLES) in order to arrive at at a reasonable archival rate. Novel concepts for online data processing are necessary to cope with this environment. Even after efficient online event selection, a large data volume (several PB per runtime) will have to be made available for the offline analysis. Event reconstruction algorithms must be fast and precise in order to give access to the rare physics signals. In this presentation, we will discuss the requirements for CBM computing, both online and offline, and present the current status of the preparations, including the simulation and reconstruction framework and algorithms developed by now.
        Speaker: Dr Volker Friese (GSI Darmstadt)
        Poster
      • 368
        Condor Enhancements for a Rapid-response Adaptive Computing Environment for LHC
        A number of recent enhancements to the Condor batch system have been stimulated by the challenges of LHC computing. The result is a more robust, scalable, and flexible computing platform. One product of this effort is the Condor JobRouter, which serves as a high-throughput scheduler for feeding multiple (e.g. grid) queues from a single input job queue. We describe its principles and how it has been used at large scale in CMS production on the Open Science Grid. Improved scalability of Condor is another welcome advance. We describe the scaling characteristics of the Condor batch system under large workloads and when integrating large pools of resources; we then detail how LHC physicists have directly profited under the expanded scaling regime. Finally, we present some practical configurations that we have used to take advantage of Condor's adaptability: many flavors of prioritization, policies for sharing resources in a campus grid, and a good start on supporting a mix of single-core and multi-core jobs.
        Speaker: Daniel Charles Bradley (High Energy Physics)
        Poster
      • 369
        Control Oriented Ontology Language
        The ever growing heterogeneity of physics experiment control systems presents a real challenge to uniformly describe control system components and their operational details. Control Oriented Ontology Language (COOL) is an experiment control meta-data modeling language that provides a generic means for concise and uniform representation of physics experiment control processes and components, their relationships, rules and axioms. It provides a semantic reference frame that is useful for automating the communication of information for control process configuration, deployment and operation. Additionally, COOL provides precise specification of experiment control system software and hardware components. This paper discusses control domain specific ontology that is built on top of the domain-neutral Resource Definition Framework (RDF) model. Specifically, we will discuss the relevant set of ontology concepts along with the relationships among them to describe experiment control components and general purpose event-based state machines. COOL has been successfully used to develop complete and dynamic knowledge base for AFECS experiment control systems.
        Speaker: Vardan Gyurjyan (JEFFERSON LAB)
        Poster
      • 370
        D-Grid Integration reference installation
        D-Grid is the German initiative for building a national computing grid. When its customers want to work within the German grid, they need dedicated software, called ‘middleware’. As D-Grid site administrators are free to choose their middleware according to the needs of their users, the project ‘DGI (D-Grid Integration) reference installation’ was launched. Its purpose is to assist the site administrators in setting up their software solutions. The reference installation follows a strict workflow which begins with installing and configuring several middlewares, comes to verifying their functionality and – in case the setup passes these phases – is completed with an extensive documentation that will be supplied online to all users. This workflow is repeated every six months with publication of alpha, beta and final revisions during its progress. The first production release of the reference installation is scheduled for January 15, 2009. It will include manuals for installation of three job submission frontends, which will be gLite, GlobusToolkit and Unicore. Furthermore two data management frontends, OGSA-DAI and dCache, will also be part of this release.
        Speaker: Xavier Mol (Forschungszentrum Karlsruhe)
        Poster
      • 371
        Data Location-Aware Job Scheduling in the Grid. Application to the GridWay Metascheduler.
        Grid infrastructures constitute nowadays the core of the computing facilities of the biggest LHC experiments. These experiments produce and manage petabytes of data per year and run thousands of computing jobs every day to process that data. It is the duty of metaschedulers to allocate the tasks to the most appropriate resources at the proper time. Our work reviews the policies that have been proposed for the scheduling of grid jobs in the context of very data-intensive applications. We indicate some of the practical problems that such models will face and describe what we consider essential characteristics of an optimum scheduling system: aim to minimize not only job turnaround time but also data replication, flexibility to support different virtual organization requirements and capability to coordinate the tasks of data placement and job allocation while keeping their execution decoupled. These ideas have guided the development of an enhanced prototype for GridWay, a general purpose metascheduler, part of the Globus Toolkit and member of the EGEE's RESPECT program. Current GridWay's scheduling algorithm is unaware of data location. Our prototype makes it possible for job requests to set data needs not only as absolute requirements but also as functions for resource ranking. As our tests show, this makes it more flexible than currently used resource brokers to implement different data-aware scheduling algorithms.
        Speaker: Mr Antonio Delgado Peris (CIEMAT)
        Poster
      • 372
        Data Quality from the Detector Control System at the ATLAS Experiment.
        At the ATLAS experiment, the Detector Control System (DCS) is used to oversee detector conditions and supervise the running of equipment. It is essential that information from the DCS about the status of individual sub-detectors be extracted and taken into account when determining the quality of data taken and its suitability for different analyses. DCS information is written online to the ATLAS conditions database (COOL) and then summarised to provide a status flag for each sub-detector and displayed on the web. We discuss how this DCS information should be used, the technicalities of making this summary, and experience in running the tool during the period of cosmic ray data taking in 2008.
        Speaker: Peter Onyisi (University of Chicago)
        Poster
      • 373
        Data Quality Monitoring Display for the ATLAS Experiment at the LHC
        The start of collisions at the LHC brings with it much excitement and many unknowns. It’s essential at this point in the experiment to be prepared with user-friendly tools to quickly and efficiently determine the quality of the data. Easy visualization of data for the shift crew and experts is one of the key factors in the data quality assessment process. The Data Quality Monitoring Display (DQMD) is a visualization tool for the automatic data quality assessment of the ATLAS experiment. It is the interface through which the shift crew and experts can validate the quality of the data being recorded or processed, be warned of problems related to data quality, and identify the origin of such problems. This tool allows great flexibility for visualization of results from automatic histogram checking through custom algorithms, the configuration used to run the algorithms, and histograms used for the check, with an overlay of reference histograms when applicable. The display also supports visualization of the results in graphical form ie hardware view of the detector to easily detect faulty channels or modules. It provides the shift crew with a checklist before the final assessment of the data is saved to the database, a list of experts to contact in case of problems, and actions to perform in case of failure. This paper describes the design and implementation of the DQMD and discusses experience from its usage and performance during ATLAS commissioning with cosmic ray and single beam data.
        Speaker: Mr Yuriy Ilchenko (SMU)
        Poster
      • 374
        Data transfer over the wide area network with a large round trip time
        A Tier-2 regional center is running at the University of Tokyo in Japan. This center receives a large amount of data of the ATLAS experiment from the Tier-1 center in France. Although the link between the two centers has 10Gbps bandwidth, it is not a dedicated link but is shared with other traffic, and the round trip time is 280msec. It is not easy to exploit the available bandwidth for such a link, so-called long fat network. We performed data transfer tests by using gridftp in various combinations of the parameters, such as the number of parallel streams and the TCP window size. In addition, we have gained experience of the actual data transfer in our production system where the Disk Pool Manager (DPM) is used as the Storage Element and the data transfer is controlled by the File Transfer Service (FTS). We report results of the tests and the daily activity, and discuss the improvement of the data transfer throughput.
        Speaker: Dr Hiroyuki Matsunaga (ICEPP, University of Tokyo)
        Poster
      • 375
        Database usage for the CMS ECAL Laser Monitoring System.
        The CMS detector at LHC is equipped with a high precision electromagnetic crystal calorimeter (ECAL). The crystals experience a transparency change when exposed to radiation during LHC operation, which recovers in absents of irradiation on the time scale of hours. This change of the crystal response is monitored with a laser system which performs a transparency measurement of each crystal of the ECAL within twenty minutes. The monitoring data is analyzed on a PC farm attached to the central data acquisition system of CMS. After analyzing the raw data, a reduced data set is stored in the Online Master Data Base (OMDS) which is connected to the online computing infrastructure of CMS. The data stored in OMDS, representing the largest data set stored in OMDS for ECAL, contains all necessary information to perform a detailed crystal response monitoring as well as an analysis of the dynamics of the transparency change. For the CMS physics event data reconstruction, only a reduced set of information from the transparency measurement is required. This data is stored in the offline Reconstruction Conditions data base (ORCOF). To transfer the data from the OMDS to ORCOF, the reduced data is transferred to Off-line Reconstruction Conditions DB On-line subset (ORCON) in a procedure known as Online to Offline transfer, which includes various checks for data consistency. In this talk we describe the laser monitoring work flow and the specifics of the data bases usage for the ECAL laser monitoring system. The strategies implemented to optimize the data transfer and to perform quality checks are being presented.
        Speaker: Mr Vladlen Timciuc (California Institute of Technology)
        Poster
      • 376
        dCache administration at the GridKa Tier-1-center, ready for data taking
        The dCache installation at GridKa, the German Tier-1 center, is ready for LHC data taking. After years of tuning and dry runs, several software and operational bottlenecks have been identified. This contribution describes several procedures to improve stability and reliability of the Tier-1 storage setup. These range from redundant hardware and disaster planning over fine grained monitoring and automatic fault recovery to 24/7 on-call maintenance; therefore GridKa is expected to meet the required MOU targets with a minimum of administrator control. Prior to updates a mirror setup is used to test and become familiar with new releases. The mirror setup is also used to replay scenarios for which problems have occurred. The role of the mirror system and its use is explained and evaluated. Error reports and trouble tickets are handled in an escalation procedure which involves operators, grid administrators and dCache experts. The workflow for solving tickets and fixing problems is described in detail. Also, we present an analysis and categorization of trouble tickets handled during the last two years that served to improve stability and service of the data management systems.
        Speaker: Dr Silke Halstenberg (Karlsruhe Institute of Technology)
        Poster
      • 377
        dCache NFSv41 - GRID aware industry standard distributed storage
        Starting spring 2009, all WLCG data management services have to be ready and prepared to move terabytes of data from CERN to the Tier 1 centers world wide, and from the Tier 1s to their corresponding Tier 2s. Reliable file transfer services, like FTS, on top of the SRM v2.2 protocol are playing a major role in this game. Nevertheless, moving large junks of data is only part of the challenge. As soon as the LHC experiments go online, thousands of physicists across the world will start data analysis, to provide first results as soon as possible. At that point in time, local file access becomes crucial. Currently, large numbers of local file access protocols are supported by various Storage Systems – dcap, gsidcap, rfio-dpm, rfio-castor, http and xrootd. A standard protocol, usable by any unmodified application, assuming POSIX data access, is highly desirable. The NFSv4.1 protocol, defined by IETF and implemented by various Operating System and Storage Box vendors, e.g. EMC, IBM, Linux, NetApp , Panasas and SUN, provides all necessary functionality: security mechanism negotiation (GSS-API, GSI, X509, UNIX), data access protocol negotiation (NFSv4 mandatory), clear distinction between metadata ( namespace ) and data access, support of multiple dataservers, ACLs, client and server crash recovery and much more. The client modules are being developed for AIX, Linux, and the Solaris kernels. NFSv4.1 is an open standard, industry backed protocol which easily integrates into the dCache architecture. Together with the new namespace provider, Chimera, dCache provides a native NFSv4.1 implementation. At the most recent NFS “Bakeathon” at SUN-Microsystems September 2008, dCache has proven to be compatible to all existing clients. At the time of this presentation, standard NFSv4.1 client will be a pert of Linux kernel distribution. Starting dCache release 1.9.3, scheduled for January 2009, NFSv41 support will be included.
        Speaker: Mr Tigran Mkrtchyan Mkrtchyan (Deutsches Elektronen-Synchrotron DESY)
      • 378
        Dealing with orphans: catalogue synchronisation with SynCat
        In the gLite grid model a site will typically have a Storage Element (SE) that has no direct mechanism for updating any central or experiment-specific catalogues. This loose coupling was a deliberate decision that simplifies SE design; however, a consequence of this is that the catalogues may provide an incorrect view of what is stored on a SE. In this paper, we present work to allow catalogue re-synchronisation. This shows how catalogues can be more certain that the files are stored on the SE and how more transitory metadata may be propagated with low latency; for example, whether a file stored on tape is currently available on disk.
        Speaker: Dr Paul Millar (DESY)
        Poster
      • 379
        Debugging Data Transfers in CMS
        The CMS experiment at CERN is preparing for LHC data taking in several computing preparation activities. In early 2007 a traffic load generator infrastructure for distributed data transfer tests was designed and deployed to equip the WLCG Tiers which support the CMS Virtual Organization with a means for debugging, load-testing and commissioning data transfer routes among CMS Computing Centres. The LoadTest is based upon PhEDEx as a reliable, scalable data set replication system. The Debugging Data Transfers (DDT) Task Force was created to coordinate the debugging of the data transfer links. The task force aimed to commission most crucial transfer routes among CMS tiers by designing and enforcing a clear procedure to debug problematic links. Such procedure aimed to move a link from a debugging phase in a separate and independent environment to a production environment when a set of agreed conditions are achieved for that link. The goal was to deliver one by one working transfer routes to Data Operations. The preparation, activities and experience of the DDT Task Force within the CMS experiment are discussed. Common technical problems and challenges encountered during the lifetime of the taskforce in debugging data transfer links in CMS are explained and summarized.
        Speaker: Dr James Letts (Department of Physics-Univ. of California at San Diego (UCSD))
        Poster
      • 380
        Design of Gluon: an Atom-oriented approach for publishing GLUE 2.0 information
        The GLUE 2.0 specification is an upcoming OGF specification for standard-based Grid resource characterization to support functionalities such as discovery, selection and monitoring. An XML Schema realization of GLUE 2.0 is available, nevertheless, Grids still lack a standard information service interface. Therefore, there is no uniform agreed solution to expose resource descriptions. On the other side, the Atom Syndication Format (ASF) and the Atom Publishing Protocol (AtomPub) are Web standards which enable the publishing and editing of Web resources using RESTful HTTP and XML. These standards are successfully adopted to provide access and manipulation to a large variety of information in the Web. For instance the Google GDATA API, which is based on AtomPub, offers access to most of the Google services. In this paper, we propose to leverage these standards in order to represent GLUE 2.0 information using ASF, and to publish them via AtomPub. By this approach, we provide a uniform approach that could be adopted by all Grid services to expose GLUE-based information in a common manner. In this study, we consider also extensibility aspects to support the inclusion of extra information not captured by the GLUE specification.
        Speaker: Dr Sergio Andreozzi (INFN-CNAF)
        Poster
      • 381
        Development and deployment of an integrated T2-oriented monitoring infrastructure
        Together with the start of LHC, high-energy physics researchers will start massive usage of LHC Tier2s. It is essential to supply physics user groups with a simple and intuitive “user-level” summary of their associated T2 services’ status, showing for example available, busy and unavailable resources. At the same time, site administrators need “technical level” monitoring, namely a view of parameters and details about services, statistics, also with event notification, in order to guarantee full service availability and reliability. Classic cluster monitoring tools cover only partially the T2 needs. The development of supplementary tools (basically a bunch of Bash cronjobs) to control our farm has led, day by day, to an out-and-out new monitoring infrastructure (Mon2), providing both technical views (RAID systems, network, temperature, operating system parameters, etc.) and user-level views (central availability tests, job monitoring…). A central server collects information sent by hosts, publishes it on the Web and via RSS feed and sends configured alarms via e-mail and SMS; interesting parameters can be stored locally into a flat-file transactional SQL database engine to generate plots and help troubleshooting and forensics. Security and easy management are achieved by using public key authentication for data exchange among hosts, using pure HTML on the Web interface, and using no DB servers. Exclusive use of Perl and Bash scripting assure the possibility for site administrators to customize sensors, accounting and e-mail notifications on their own site.
        Speaker: Dr Vincenzo Spinoso (INFN, Bari)
        Poster
      • 382
        EDGeS: The art of bridging EGEE to BOINC and XtremWeb
        Desktop grids, such as XtremWeb and BOINC, and service grids, such as EGEE, are two different approaches for science communities to gather computing power from a large number of computing resources. Nevertheless, little work has been done to combine these two Grid technologies in order to establish a seamless and vast grid resource pool. In this paper we present the EGEE service grid, the BOINC and XtremWeb desktop grids. Then, we present the EDGeS solution to bridge the EGEE service grid with the BOINC and XtremWeb desktop grids.
        Speaker: Gabriel Caillat (LAL, Univ. Paris Sud, IN2P3/CNRS)
        Poster
      • 383
        Efficient Multi-site data movement using Constraint programing for data hungry science
        For the past decade, HENP experiments have been heading towards a distributed computing model in an effort to concurrently process tasks over enormous data sets that have been increasing in size as a function of time. In order to optimize all available resources (geographically spread) and minimize the processing time, it is necessary to face also the question of efficient data transfers and placements. A key question is whether the time penalty for moving the data to the computational resources is worth the presumed gain. Onward to the truly distributed task scheduling we present the technique using Constraint Programming (CP) approach. The CP technique schedules data transfers from multiple resources considering all available paths of diverse characteristic (capacity, sharing and storage) having minimum user's waiting time as an objective. We introduce a model for planning data transfers to a single destination (data transfer) as well as its extension for an optimal data set spreading strategy (data placement). Several enhancements for solver of CP model will be shown, leading to a faster schedule computation time using symmetry breaking, branch cutting, well studied principles from job-shop scheduling field and several heuristics. Finally, we will present the design and implementation of a corner-stone application aimed at moving datasets according to the schedule. Results will include comparison of performance and trade-off between CP techniques and Peer-2-Peer model from simulation framework as well as the real case scenario taken from a practical usage of CP scheduler.
        Speaker: Mr Michal ZEROLA (Nuclear Physics Inst., Academy of Sciences)
        Paper
        Poster
      • 384
        Experience Commissioning the ATLAS Distributed Data Management system on top of the WLCG Service
        The ATLAS Experiment at CERN developed an automated system for data distribution of simulated and detector data. Such system, which partially consists of various ATLAS specific services, strongly relies on the WLCG service infrastructure, both at the level of middleware components, service deployment and operations. Because of the complexity of the system and its highly distributed nature, a dedicated effort was put in place to deliver a reliable service for ATLAS data distribution, offering the necessary performance, high availability and accommodating the main use cases. This contribution will describe the various challenges and activities carried on in 2008 for the commissioning of the system, together with the experience distributing simulated data and detector data. The main commissioning activity was concentrated in two Combined Computing Resource Challenges, in February and May 2008, where it was demonstrated that the WLCG service and the ATLAS system could sustain the peak load of data transfer according to the computing model, for several days in a row, concurrently with other LHC experiment activities. This dedicated effort led to the consequential improvements of ATLAS and WLCG services and to daily operation activities throughout the last year. The system has been delivering to WLCG tiers many hundreds of terabytes of simulated data and, since the summer of 2008, more than two petabytes of cosmic and beam data.
        Speaker: Dr Simone Campana (CERN/IT/GS)
        Poster
      • 385
        Experiment-specific monitoring systems, their role in operating of the WLCG Grid infrastructure. High level view of the experiment computingactivities on the Grid
        One of the most important conclusion of the analysis of the results of CCRC08 and operational experience after CCRC08 is that the LHC experiment specific monitoring systems are the main sources of the monitoring information. They are widely used by people taking computing shifts. They are the first ones to detect the problems of various nature. Though these systems provide rather complete and reliable picture of the computing activities of the LHC experiments on the Grid, the diversity of these tools creates certain problems. It is very difficult for non expert users to find required information. It is not always possible to correlate information coming from multiple sources. Currently there is no high level view of the computing activities of all LHC experiments. The talk will cover current development aiming to provide such view using information from the experiment specific monitoring systems.
        Speaker: Julia Andreeva (CERN)
      • 386
        FermiGrid Site AuthoriZation Service
        Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, Fermilab has placed its production computer resources in a Campus Grid infrastructure called 'FermiGrid'. The architecture of FermiGrid facilitates seamless interoperation of the multiple heterogeneous Fermilab resources with the resources of the other regional, national and international Grids. To assure that only authorized Virtual Organizations and individuals are using the Fermilab resources, Fermilab has developed the Site AuthoriZation (SAZ) service that offers configurable centralized site wide policy based Grid access control. We will report on our development, deployment, and operational experience of the Site AuthoriZation service within the Fermilab Campus Grid.
        Speaker: Dr Chadwick Keith (Fermilab)
        Poster
      • 387
        Generic monitoring solution for LHC site commissioning activity and LHC computing shifts
        The LHC experiments are going to start collecting data during the spring of 2009. The number of people and centers involved in such experiments sets a new record in the physics community. For instance, in CMS there are more than 3600 physicists, and more than 60 centers distributed all over the world. Managing such a big number of distributed sites and services is not a trivial task. Moreover, the definition of a proper behavior for a site strongly depends on the software model of each experiment. To make the situation even more difficult, the status of the sites changes dynamically. To be able to cope with such a large scale heterogeneous infrastructure, it is necessary to have monitoring tools providing complete and reliable view of the overall performance and status of the sites. The LHC experiments need to follow their computing activities at the sites and would like to make sure that the sites do provide required level of reliability and performance. The Site Status Board application has been developed in the Dashboard framework in order to monitor the status of the sites from the perspective of the LHC experiments or any other virtual organization.. The definition of the status is based on metrics defined by the Virtual Organization. Moreover, the Site Status Board keeps track of how the different metrics have been evolving over time. The Site Status Board is generic, and can be used by any Virtual Organization. At the moment, it is being used both by the commissioning activity and the Computing shifts in CMS. In the rest of this paper we will describe the details of the Site Status Board implementation and functionality, its use cases and the direction of the new developments.
        Speaker: Pablo Saiz (CERN)
        Poster
      • 388
        German Contributions to the CMS Computing Infrastructure
        The CMS computing model anticipates various hierarchically linked tier centres to counter the challenges provided by the enormous amounts of data which will be collected by the CMS detector at the Large Hadron Collider, LHC, at CERN. During the past years, various computing exercises were performed to test the readiness of the computing infrastructure, the Grid middleware and the experiment's software for the startup of the LHC which took place in September 2008. In Germany, several tier sites are set up to allow for an efficient and reliable way to simulate possible physics processes as well as to reprocess, analyse and interpret the numerous stored collission events of the experiment. It will be shown that the German computing sites played an important role during the experiment's preparation phase and during data-taking of CMS and, therefore, scientific groups in Germany will be ready to compete for discoveries in this new era of particle physics. This presentation focuses on the German Tier1 centre GridKa, located at Forschungszentrum Karlsruhe, the German CMS Tier2 federation DESY/RWTH with installations at the University of Aachen and the research centre DESY. In addition, various local computing resources in Aachen, Hamburg and Karlsruhe are briefly introduced as well. It will be shown that an excellent cooperation between the different German institutions and physicists led to well established computing sites which cover all parts of the CMS computing model. Therefore, the following topics are discussed and the achieved goals and the gained knowledge are depicted: data management and distribution among the different tier sites, Grid-based Monte Carlo production at the Tier2 as well as Grid-based and locally submitted inhomogeneous user analyses at the Tier3s. Another important task is to ensure a proper and reliable operation 24 hours a day, especially during the time of data-taking. For this purpose, the meta-monitoring tool "Happyface", which was developed at the University of Karlsruhe, is used in order to allow even non-expert shift crews to monitor and operate a centre continuously and to contact on-call experts, if needed.
        Speaker: Dr Armin Scheurer (Karlsruhe Institute of Technology)
        Poster
      • 389
        Gratia: Interpreting Grid Usage Data In a Shared Environment.
        The Open Science Grid's usage accounting solution is a system known as, "Gratia." Now that it has been deployed successfully the Open Science Grid's next accounting challenge is to correctly interpret and make the best possible use of the information collected. One such issue is, "Did we use and/or get credit for, the resource we think we used?" Another example is the problem of ensuring that accounting reports to stakeholders arriving from multiple grids (for example reports from OSG and the Eurpoean grid, EGEE, to the Worldwide LHC Computing Grid, WLCG) use consistent normalization criteria and are at the appropriate level of detail. Other interesting challenges include understanding the level of inter-VO sharing of a grid resource and how one accounts for this sharing in the cases where the resource has been purchased jointly. Reporting gets even more complex when forwarding (presenting multiple resources under one umbrella interface) and pilot jobs (jobs which accept payloads from multiple users) are taken into account: each poses its own set of challenges. We attempt to enumerate some of these challenges to obtaining meaningful information from the data and how they may be overcome.
        Speaker: Mr Philippe Canal (Fermilab)
      • 390
        Grid topology repository for WLCG Monitoring Infrastructure
        The Worldwide LHC Computing Grid (WLCG) is based on a four-tiered model that comprises collaborating resources from different grid infrastructures such as EGEE and OSG. While grid middleware provides core services on variety of platforms, monitoring tools like Gridview, SAM, Dashboards and GStat are being used for monitoring, visualization and evaluation of the WLCG infrastructure. The topology of the WLCG comprises a set of resources and administrative domains such as sites/services, VOs and their associations. Presently, topology related information is coming from various information providers like GOCDB, CIC, BDII and OSG resources’ list and it needs to be aggregated at the application level. The absence of a single authoritative information provider is hampering the effectiveness of aggregation and consumption of data by the applications. Also, it is becoming difficult to pin-point the operational problems as the information is aggregated from various data providers. The end result is that WLCG monitoring tools’ reliability is adversely affected. To resolve this issue, it is envisaged to have a single WLCG grid topology repository for aggregating and distributing topology related information. This repository will be extremely useful for tracking the historical information of grid resources and will greatly improve the reliability of monitoring tools. It will become much easier to consume and process data in applications as they will refer to a single source of information. The paper outlined below describes the present state of WLCG topology information resources, their existing functional and implementation issues together with a list of desired future enhancements.
        Speaker: Mr David Collados Polidura (CERN)
        Poster
      • 391
        GridSite
        We present an overview of the current status of the GridSite toolkit, describing the new security model for interactive and programmatic uses introduced in the last year. We discuss our experiences of implementing these internal changes and how they have been promoted by requirements from users and wider security trends in Grids (such as CSRF). Finally, we explain how these have improved the user experience of the GridPP website, and wider implications for portals.
        Speaker: Andrew McNab (Unknown)
        Poster
      • 392
        GStat 2.0 Grid Information System Status Monitoring
        Authors: Laurence Field, Felix Ehm, Joanna Huang, Min Tsai Grid Information Systems are mission-critical components in todays production grid infrastructures. They enable users, applications and services to discover which services exists in the infrastructure and further information about the service structure and state. It is therefore important that the information system components themselves are functioning correctly and that the information content is reliable. Grid Status (GStat) is a tool that monitors the structural integrity of the EGEE information system, which is a hierarchical system built out of more than 260 site level and about 70 global aggregation services. GStat checks the content and presents summary and history displays for Grid operators and System Administrators. A major new version, GStat 2.0, aims to build on the experience of GStat in production and provide additional functionality which enables it to be extended and combined with other tools. This paper describes the new architecture used for GStat 2.0 and how it can be at all levels to help provide a reliable information system.
        Speaker: Mr Laurence Field (CERN)
        Poster
      • 393
        High Performance Data Transfer and Monitoring for RHIC and USATLAS
        Modern nuclear and high energy experiments yield large amounts of data and thus require efficient and high capacity storage and transfer. BNL, the hosting site for RHIC experiments and the US center for LHC ATLAS, plays a pivotal role in transferring to and from other sites in the US and around the world in a tiered fashion for data distribution and processing. Each component in the infrastructure from data acquisition system to local analysis facility must be monitored, tested, and tuned to transfer such a sheer volume of data over such long distance. BNL deploys monitoring tools such as Cacti and Ganglia and testing tools such as iperf, perfsonar, and Bbcp, and has also created its own tools: 1) an automatic iperf tcp tuning tool for recording and graphing various combinations of critical TCP parameters to determine the optimal combination, and 2) the hierarchical monitoring tool for performing tests at various middleware levels such as network, gridftp, FTS and graphing them in a web service framework. It becomes easier through the use of these tools to determine optimal TCP settings and isolate problems with a network device, host network adapter, disk storage, or a particular layer of the data transfer software stack. Before and after results from tuning are usually drastic, often yielding more than ten fold increase in transfer rate. This was seen in all of our tests from BNL to and from it's USATLAS tier 2 sites, from BNL RHIC experiment data acquisition systems to the computing center in Japan (CCJ) on behalf of the PHENIX experiment, and to the Korea Institute of Science and Technology Information (KISTI) on behalf of the STAR experiment. The outcomes of this work are being integrated into: 1) the global data taking and reconstruction framework for RHIC experiments to leverage the computing resources of RHIC international collaborators, and 2) the ATLAS production and analysis framework to allow USATLAS regional centers to do data processing.
        Speaker: Dantong Yu (BNL)
        Poster
      • 394
        Highly parallel algorithm for high-pT physics at FAIR-CBM
        Unusually high intensity ( 10**11 proton/sec ) beam is planned to be ejected for fixed targets at FAIR accelerator upto 90 GeV energy. Using this beam the FAIR-CBM experiment provides an unique high luminosity facility to measure high pT phenomena with unprecedented sensitivity exceeding by orders of magnitude that of previous experiments. Applying 1% target the expected minimum bias event rate will be in the range of GHz. In order to get a selective trigger which reduces this rate to the kHz range one needs an extremely fast algorithm. Due to the fact that the minimal read-out time for pixel detectors is about 1 microsec, one could get 1000-fold pile-up. Proton-Carbon interactions in the STS detector of CBM were simulated assuming 4 pixel and 5 strip detector (x,y) planes, a highly parallel online algorithm is proposed which could select the high pT tracks with high efficiency. The proposed mosaic-trigger system is data-driven. The recorded hits are directed toward corridors assigned to the corresponding "mosaics" in the given Si-plane. Narrow tubes are defined on a subset and an exhaustive search is performed on the full set applying content addressable memories (CAM's).
        Speaker: Gyoergy Vesztergombi (Res. Inst. Particle & Nucl. Phys. - Hungarian Academy of Science)
        Poster
      • 395
        Improving data distribution on disk pools for dCache
        Most Tier-1 centers of LHC Computing Grid are using dCache as their storage system. dCache uses a cost model incorporating CPU and space costs for the distribution of data on its disk pools. Storage resources at Tier-1 centers are usually upgraded once or twice a year according to given milestones. One of the effects of this procedure is the accumulation of heterogeneous hardware resources. For a dCache system a heterogeneous set of disk pools complicates the process of weighting CPU and space costs for an efficient distribution of data. The German Tier-1, GridKa, has decided to optimize its weighing of the costs by simulating the data transfers to and from its disk pools. This software focuses on these pools which receive the data from the outside and transfer it to the tape backend, as these particular pools are of importance in data reception from the LHC experiments and as they are a potential bottleneck. This presentation/poster will focus on the results gained from simulation software and how they have been used for optimization at the GridKa Tier-1.
        Speaker: Dr Christopher Jung (Forschungszentrum Karlsruhe)
        Poster
      • 396
        Increasing Performance and Scalability of dCache Storage Resource Manager
        The dCache disk caching file system has been chosen by a majority of LHC Experiments' Tier 1 centers for their data storage needs. It is also deployed at many Tier 2 centers. In preparation for the LHC startup, very large installations of dCache - up to 3 Petabytes of disk - have already been deployed, and the systems have operated at transfer rates exceeding 2000 MB/s over the WAN. As the LHC experiments go into production, it is expected that data storage capacity requirements and data transfer rates will continue to grow beyond currently tested limits. It is estimated that Tier-1 center serving just CMS experiment needs to support a sustained data throughput of 800 MB/sec. As any other software being in production for years, dCache faced a change in access profile and required performance. In order to cope with evolving requirements and with access patterns requiring better performance, the dCache team regularly investigate improving components which might no longer be state of the art. As we did with other dCache components, we are now evaluating a possible redesign of The Storage Resource Manager (SRM) for scalability. SRM is a main Grid Storage Interface and a single point of entry into dCache, is one of the most critical components. SRM needs to be able to scale with increased load and to remain resilient against changing usage patterns. We will present an analysis of the dCache architecture its performance bottlenecks with emphasis on the SRM and the current and future effort to improve scalability and stability in order to satisfy the ever increasing LHC experiments' requirements.
        Speaker: Timur Perelmutov (FERMI NATIONAL ACCELERATOR LABORATORY)
        Poster
      • 397
        Increasing the efficiency of tape-based storage backends
        HSM systems such as the CERN’s Advanced STORage manager (CASTOR) [1] are responsible for storing Petabytes of data which is first cached on disk and then persistently stored on tape media. The contents of these tapes are regularly repacked from older, lower-density media to new-generation, higher-density media in order to free up physical space and ensure long term data integrity and availability. With the evolution of price decay and higher capacity of disk (and flash memory) based storage, our future vision for tape usage is to move away from serving on demand, random, per-file access to non-disk cached files, and to move towards loosely coupled, efficient bulk data transfers where large-volume data sets are stored and retrieved in aggregations, fully exploiting the stream-based nature of tape media. Mechanisms for grouped migration policies and priorities have been implemented, and an innovative tape format optimized for data aggregations is being developed. This new tape format will also allow for increasing the performance of repacking data from old to newer generation tape media with substantially reduced hardware costs. In this paper, we will describe the proposed tape format and the improvements in the tape layer architecture, and the changes which will be applied to the architecture of the CASTOR mass storage system. [1] http://cern.ch/castor
        Speaker: Ms Giulia Taurelli (CERN)
        Poster
      • 398
        Information System Evolution
        Author: Laurence Field, Markus Schulz, Felix Ehm, Tim Dyce Grid Information Systems are mission-critical components in todays production grid infrastructures. They enable users, applications and services to discover which services exists in the infrastructure and further information about the service structure and state. As the Grid Information System is pervasive throughout the infrastructure, it is especially sensitive to the size of the infrastructure. As a grid infrastructure grows, so does the usage of the information system and there are currently two factors driving this growth. Grid interoperation activities will bring growth shocks to the existing system both in terms of additional information content and end users queries. The paradigm of multi-core processors has the potential to bring exponential growth to the number of execution environments in the infrastructure and hence the number of simultaneous computing activities which will place a significant additional query load. To ensure that the current information systems are able to handle this increase in scale, it is necessary to reevaluate their architectures. This paper presents an improved information system architecture which has the potential to meet these future requirements.
        Speaker: Mr laurence field (cern)
        Poster
      • 399
        Integrated production-quality gLite FTS usage in CMS PhEDEx DM system
        PhEDEx, the CMS data- placement system, uses the FTS service to transfer files. Towards the end of 2007 PhEDEx was beginning to show some serious scaling issues, with excessive numbers of processes on the site VOBOX running PhEDEx, poor efficiency in use of FTS job-slots, high latency for failure-retries, and other problems. The core PhEDEx architecture was changed in May 2008 to eliminate these problems. By introducing cooperative multi-threading we could adopt a more modular approach, resulting in fewer FTS job-submissions, constant file-level monitoring load on the FTS server regardless of throughput, link-level optimisation, fewer processes on the VOBOX, and fewer connections to the central Oracle database.
        Speaker: Dr Tony Wildish (PRINCETON)
      • 400
        Integration of CBM readout controller into DABC framework
        New experiments at FAIR like CBM require new concepts of data acquisition systems, where instead of central trigger self-triggered electronics with time-stamped readout should be used. A first prototype of such a system was implemented in form of a CBM readout controller (ROC) board, which is designed to read time-stamped data from a front-end board equipped with nXYTER chips and transfer that data to a PC via optical link. As an alternative option, Ethernet can be used for data transfer. A software library (called KNUT) was developed, which allows to control the ROC and read the data. A data transfer protocol over UDP was implemented to achieve high data rates with high reliability. A ROOT interface of KNUT provides easy access to ROC. DABC (Data Acquisition Backbone Core) is a general purpose software framework for building wide range of DAQ systems – from simple single-board readouts to complex, multi-node fast .event building. Via a flexible plug-in architecture DABC provides the integration of different kinds of devices, transports, data formats and analysis algorithms. To get ROC data into DABC, ROC-specific device and transport classes were implemented in DABC (based on KNUT library). In addition, combiner and time-calibrator modules were implemented to merge data in DABC from several ROCs. The complete readout chain including three ROCs, UDP data transfer, DABC readout application and a monitoring GUI were tested in the first CBM test beam time in September 2008. For the next test beam in summer 2009 the usage of optical links and DABC event building is planned.
        Speaker: Dr Sergey Linev (GSI Darmstadt)
        Poster
      • 401
        Interoperability and Scalability within glideinWMS
        Physicists have access to thousands of CPUs in grid federations such as OSG and EGEE. With the start-up of the LHC, it is essential for individuals or groups of users to wrap together available resources from multiple sites across multiple grids under a higher user-controlled layer in order to provide a homogeneous pool of available resources. One such system is glideinWMS, which is based on the Condor batch system. A general discussion of glideinWMS can be found elsewhere. Here, we focus on recent advances in extending its reach: scalability and integration of heterogeneous compute elements. We demonstrate that the new developments achieve the design goal of over 10,000 simultaneous running jobs under a single Condor schedd, using strong security protocols across global networks, and sustaining a steady-state job completion rate of a few Hz. We also show interoperability across heterogeneous computing elements achieved using client-side methods. We discuss this technique and the challenges in direct access to NorduGrid and CREAM compute elements, in addition to Globus based systems.
        Speaker: Daniel Bradley (University of Wisconsin)
        Poster
      • 402
        Job Centric Monitoring for ATLAS jobs in the LHC Computing
        As the Large Hadron Collider (LHC) at CERN, Geneva, has begun operation in September, the large scale computing grid LCG (LHC Computing Grid) is meant to process and store the large amount of data created in simulating, measuring and analyzing of particle physic experimental data. Data acquired by ATLAS, one of the four big experiments at the LHC, are analyzed using compute jobs running on the grid and utilizing the ATLAS software framework 'Athena'. The analysis algorithms themselves are written in C++ by the physicists using Athena and the ROOT toolkit. Identifying the reason for a job failure (or even the occurance of the failure itself) in this context is a tedious, repetitive and - more often than not - unsuccessful task. The debugging of such problems was not foreseen and tracing back problems is even more difficult by the fact that the output-sandbox, which contains the jobs' output and error logs, is discarded by the grid middleware if the job failed. So, valuable information that could aid in finding the failure reason is lost. These issues result in high job failure rates and less than optimal resource usage. As part of the High Energy Particle Physics Community Grid project (HEPCG) of the German D-Grid Initiave, the University of Wuppertal has developed the Job Execution Monitor (JEM). JEM helps finding job failure reasons by two means: It periodically provides vital worker node system data and collects jobrun-time monitoring data. To gather this data, a supervised line-by-line execution of the user job is performed. JEM is providing new possibilities to find problems in largely distributed computing grids and to analyze these problems in nearly real-time. All monitored information is presented to the user almost instantaneously and additionally stored in the jobs' output sandbox for further analysis. As a first step, JEM has been seamlessly integrated into ATLAS' and LHCb's grid user interface 'ganga'. In this way, submitted jobs are monitored transparently, requiring no additional effort by the user.
        Speaker: Sergey Kalinin (Universite Catholique de Louvain)
      • 403
        Job execution in virtualized runtime environments in grid
        Grid systems are used for calculations and data processing in various applied areas such as biomedicine, nanotechnology and materials science, cosmophysics and high energy physics as well as in a number of industrial and commercial areas. Traditional method of execution of jobs in grid is running jobs directly on the cluster nodes. This limits the choice of the operational environment to the operating system of the node and also does not allow to enforce resource sharing policies or jobs isolation nor guarantee minimal level of available system resources. We propose a new approach to running jobs on the cluster nodes when each grid job runs in its own virtual environment. This allows to use different operating systems for different jobs on the same nodes in cluster, provides better isolation between running jobs and allows to enforce resource sharing policies. The implementation of the proposed approach was made in the framework of gLite middleware of the EGEE/WLCG project and was successfully tested in SINP MSU. The implementation is transparent for the grid user and allows to submit binaries compiled for other operating systems using exactly the same interface from the standard gLite user interface node. Virtual machine images with the standard gLite worker node software and sample MS Windows execution environment were created.
        Speaker: Lev Shamardin (Scobeltsyn Institute of Nuclear Physics, Moscow State University (SINP MSU))
        Poster
      • 404
        Machine assisted histogram classification
        LHCb is one of the four major experiments under completion at the Large Hadron Collider (LHC). Monitoring the quality of the acquired data is important, because it allows the verification of the detector performance. Anomalies, such as missing values or unexpected distributions can be indicators of a malfunctioning detector, resulting in poor data quality. Spotting faulty components can be either done visually using instruments such as the LHCb Histogram Presenter, or by automated tools. In order to assist detector experts in handling the vast monitoring information resulting from the sheer size of the detector, a graph-theoretic based clustering tool, combined with machine learning algorithms is proposed and demonstrated by processing histograms representing 2D event hitmaps. The concept is proven by detecting ion feedback events in the LHCb RICH subdetector.
        Speaker: Somogyi Peter (Technical University of Budapest)
        Poster
      • 405
        Managing distributed grid sites with quattor
        Quattor is a system administration toolkit providing a powerful, portable, and modular set of tools for the automated installation, configuration, and management of clusters and farms. It is developed as a community effort and provided as open-source software. Today, quattor is being used to manage at least 10 separate infrastructures spread across Europe. These range from massive single-site installations such as CERN (where more than 7000 machines are managed) to highly-distributed grid infrastructures such as Grid-Ireland (which is made up of 18 physical installations). In this work we want to stress the capability of quattor to manage distributed grid sites. Grids increasingly organize their sites using a distributed model, where resources at multiple physical locations appear to the grid as a single logical site. The Quattor Working Group (QWG) templates are developed by a consortium of grid sites. They provide an integrated "configuration distribution" for coordinating multiple collaborating sites running gLite middleware, providing support for sharing configuration and local customization.
        Speaker: Dr Andrea Chierici (INFN-CNAF)
        Poster
      • 406
        Migration of Monte Carlo Simulation of High Energy Atmospheric Showers to GRID Infrastructure
        The MAGIC telescope, a 17-meterCherenkov telescope located on La Palma (Canary Islands), is dedicated to the study of the universe in Very High Energy gamma-rays. These particles arrive at the Earth's atmosphere producing atmospheric showers of secondary particles that can be detected on ground through their Cherenkov radiation. MAGIC relies on a large number of Monte Carlo simulations for the calibration of the recorded events . The simulations are used to evaluate efficiencies and identify patterns to distinguish between genuine gamma-ray events and unwanted background events. Up to now, these simulations were executed on local queuing systems, resulting in large execution times and a complex organizational task. Due to the parallel nature of these simulations, a Grid-based simulation system is the natural solution. Here, a system which uses the current resources of the MAGIC Virtual Organization on EGEE is proposed. It can be easily generalized to support the simulation of any similar system, as the planned Cherenkov Telescope Array. The proposed system, based on a Client/Server architecture, provides the user with a single access point to the simulation environment through a remote graphical user interface, the Client. The Client can be accessed via web browser, using web service technology, with no additional software installation on the user side required. The Server processes the user request and uses a database for both data catalog and job management inside the Grid. The design, first production tests and lessons learned from the system will be discussed at the conference.
        Speaker: Mr Adolfo Vazquez (Universidad Complutense de Madrid)
        Poster
      • 407
        Monitoring of the ATLAS LAr Calorimeter
        The ATLAS detector at the Large Hadron Collider is expected to collect an unprecedented wealth of new data at a completely new energy scale. In particular its Liquid Argon electromagnetic and hadronic calorimeters will play an essential role in measuring final states with electrons and photons and in contributing to the measurement of jets and missing transverse energy. Efficient monitoring data will be crucial from the earliest data taking onward and are implemented at multiple levels of the readout and triggering systems. By providing essential information about the performance of each sub-detector and their impact on physics quantities, the monitoring will be crucial in guaranteeing data to be ready for physics analysis. The tools and criteria for monitoring the LAr data in the cosmics data-taking will be discussed. The software developed for the monitoring of collisions data will be described and results of monitoring performance for data obtained from a full simulation of the data processing that includes data streams foreseen in the ATLAS operation will be presented. The status of automated data quality checks will be shown.
        Speaker: Jeremiah Jet Goodson (Department of Physics - State University of New York (SUNY))
        Poster
      • 408
        Monitoring the CMS Data Acquisition System
        The CMS data acquisition system comprises of O(10000) of interdependent services that need to be monitored in near real-time. The ability to monitor a large number of distributed applications accurately and effectively is of paramount importance for operation. Application monitoring entails the collection of a large number of simple and composed values made available by the software components and hardware devices. A key aspect is that detection of deviations from the specified behaviour is supported in a timely manner. This is a prerequisite to take corrective actions efficiently. Given the size and time constraints, efficient application monitoring is an interesting research problem. In this article we are highlighting the limitations of existing solutions and we propose an approach that use the emerging paradigm of Web-service based eventing systems in combination with hierarchical data collection and load-balancing. Scalability and efficiency are achieved by a decentralized architecture, splitting up data collections into regions of collections. An implementation following the presented scheme is deployed as monitoring infrastructure of the CMS experiment at the Large Hadron Collider. All services in this distributed data acquisition system are providing standard web service interfaces via XML, SOAP and HTTP. Continuing on this path we adopted WS-* standards implementing a monitoring system layered on top of the W3C standards stack. We designed a load-balanced publisher/subscriber system with the ability to include high-speed protocols for efficient data transmission and serving data in multiple data formats. We discuss the requirements for monitoring in LHC scale distributed data acquisition systems and shed light on the implementation and it's performance.
        Speaker: Luciano Orsini (CERN)
        Poster
      • 409
        Monitoring the DIRAC Distributed System
        DIRAC, the LHCb community Grid solution, is intended to reliably run large data mining activities. The DIRAC system consists of various services (which wait to be contacted to perform actions) and agents (which carry out periodic activities) to direct jobs as required. An important part of ensuring the reliability of the infrastructure is the monitoring and logging of these DIRAC distributed systems. The monitoring is done collecting information from two sources - one is from pinging the services or by keeping track of the regular heartbeats of the agents, and the other from the analysis of the error messages generated both by agents and services and collected by a logging system. This allows us to ensure that the components are running properly and to collect useful information regarding their operations. The process status monitoring is displayed using the SLS sensor mechanism that also automatically allows to plot various quantities and keep a history of the system. A dedicated GridMap interface (ServiceMap) allows production shifters and experts to have an immediate, high-impact view of all LHCb critical services status while offering the possibility to refer to details of the SLS and SAM sensors. Error types and statistics provided by the logging service can be accessed via dedicated web interfaces on the DIRAC portal or programmatically via the python based API and CLI.
        Speaker: Dr Raja Nandakumar (Rutherford Appleton Laboratory)
        Poster
      • 410
        MSG as a core part of the new WLCG monitoring infrastructure
        The MSG (Messaging System for the Grid) is a set of tools that make a Message Oriented platform available for communication between grid monitoring components. It has been designed specifically to work with the EGEE operational tools and acts as an integration platform to improve the reliability and scalability of the existing operational services. MSG is a core component as WLCG monitoring moves towards a more automated operational mode. It is being used for integrating job monitoring information from different sources, enabling a flexible link between OSG, SAM and Gridview, or routing information between Nagios and the ROCs.
        Speaker: Mr Daniel Filipe Rocha Da Cunha Rodrigues (CERN)
      • 411
        Network Model for Circuit-Based Services.
        There are a number of active projects to design and develop a data control plane capability that steers traffic onto alternate network paths, instead of the default path provided though standard IP connectivity. Lambda Station, developed by Fermilab and Caltech, is one example of such solution, and is currently deployed at US CMS Tier1 facility at Fermilab and various Tier2 sites. When the Lambda Station project started, the first challenge that we faced was how to decompose the complex, inter-related functions of the system into smaller, more distinct ones that end users and network administrators could clearly understand. Our task became to represent the network in abstract form, and be able to describe its elements in programming code. In other words, develop a reference model of the network. In this paper, we will present a three level model of the network that evolved out of the Lambda Station project. This model is being used to describe network infrastructure of Fermilab and the local networks of collaborating sites. Based on our model, a site's Lambda Station server can reconfigure local network infrastructure to redirect selected traffic flows over alternate paths dynamically. An example XML description of Fermilab circuit service will be presented.
        Speaker: Mr Andrey Bobyshev (FERMILAB)
        Poster
      • 412
        On a Lossless Compression of an ATLAS Tile Calorimeter Drawer Raw Data.
        At this moment, at 100 KHz frequency, in the Tile Calorimeter ROD DSP using Optimal Filtering Reconstruction method Amplitude, Time and Quality Factor (QF) parameters are calculated. If QF is good enough only Amplitude, Time and QF are stored, otherwise the data quality is considered bad and it is proposed to store raw data for further studies. Without any compression, bandwidth limitation allows to send up to 9 channels of additional raw data. Simple considerations show that when QF is bad due to the shape differences between standard pulse shape and current signal, all channels are likely to have bad QF. So, the possibility to send just 9 samples is insufficient. Experiments show that standard compression tools such as RAR cannot successfully deal with this problem because they cannot take benefit of smooth curved shape of the raw data and correlations between channels. A lossless data compressing algorithm is proposed which is likely to better meet existing challenges. This method was checked on SPLASH events (run 87851, contains 26 SPLASH events) and proved to be sufficient to save ALL channels data using the existing bandwidth. Unlike the common purpose compressing tools the proposed method exploits heavily the geometry-depended correlations between different channels. On the other hand, it does not require exact information about the pulse shape function to compress the data. So this method can be tried used for recording either bad or piled up signal.
        Speaker: Vakhtang Tsiskaridze (Tbilisi State University, Georgia)
        Poster
      • 413
        On gLite WMS/LB Monitoring and Management through WMSMon
        The Workload Management System is the gLite service supporting the distributed production and analysis activities of various HEP experiments. It is responsible of dispatching computing jobs to remote computing facilities by matching job requirements and the resource status information collected from the Grid information services. Given the distributed and heterogeneous nature of the Grid, the monitoring of the job lifecycle and of the aggregate workflow patterns generated by multiple user communities, and the reliability of the service are of great importance. In this paper we deal with the problem of WMS monitoring and management. We present the architecture and implementation of the WMSMonitor, a tool for WMS monitoring and management, which has been designed to meet the needs of various WMS user categories: administrators, developers, advanced Grid users and performance testers. The tool was successfully deployed to monitor the progress of WMS job submission activities during HEP computing challenges. We also describe how, for each WMS in a cluster, WMSMon produces status indexes and a load metric that can be used for automated notification of critical events via Nagios, or for ranking of service instances deployed in load balancing mode.
        Speaker: Daniele Cesini (INFN CNAF)
        Poster
      • 414
        Online event filtering at BESIII
        BEPCII is the electron-positron collider with the highest luminosity at tau-charm energy region and BESIII is the corresponding detector with greatly improve detection capacity. For the accelerator and detector, the event tigger is rathe high. In order to reduce the background level and the recorder burden of computers, the online event filtering algorithm is established. Such an algorithm classifies background and physics events as fast as possible by adopting the information provided by different sub-detectors according to their strengths and capacities. The filter efficiency and processor time are also checked. The running results indicate that the algorithm satisfies the requirements of online data acquision system and corresponding physics analysis. The classified results by online event filtering are also used to estimate the collider luminosity and monitor the status of data-taking process.
        Speaker: Chendong FU (IHEP, Beijing)
      • 415
        Optimised access to user analysis data using the gLite DPM
        The ScotGrid distributed Tier-2 now provides more that 4MSI2K and 500TB for LHC computing, which is spread across three sites at Durham, Edinburgh and Glasgow. Tier-2 sites have a dual role to play in the computing models of the LHC VOs. Firstly, their CPU resources are used for the generation of Monte Carlo event data. Secondly, the end user analysis object data is distributed to the site Grid storage and held on disk ready for processing by physicists analysis jobs. In this paper we show how we have designed the ScotGrid storage and data management resources in order to optimise access by physicists to LHC data. Within ScotGrid, all sites use the gLite DPM storage manager middleware but use different underlying storage and network hardware. Using the WLCG grid to submit real ATLAS and LHCb analysis code to process VO data stored on the ScotGrid sites, we present a comparison of the different architectures at the sites and the use of different data access protocols (rfio, xroot, gridftp). The results will be presented from the point of view of the end user (in terms of number of events processed/second) and from the point of view of the site which typically wants to minimise load and the impact analysis activity has on other users of the system.
        Speaker: Dr Greig Cowan (University of Edinburgh)
        Poster
      • 416
        Optimizing bulk data transfers using network measurements: a practical case.
        The quality of the connectivity provided by the network infrastructure of a Grid is a crucial factor to guarantee the accessibility of Grid services, schedulate effciently processing and data transfer activity on the Grid and meet QoS expectations. Yet most Grid application do not take into consideration the expected performance of the network resources they plan to use. In this paper we describe the effective use of a Grid Monitoring framework, whose measurements are used to introduce netwrok aware features in a legacy application. We use, a network monitoring framework oriented to Grid infrastructures to measures a small set of network parameters. The tesbed deployment covers a Metropolitan Grid infrastructure, aimed at supporting a data intensive eScience application like HEP. We describe a real use case consisting of bulk data trasfers during the operation of the Grid for the SCoPE project.
        Speaker: Dr Silvio Pardi (INFN)
        Paper
        Poster
      • 417
        OSG: Scalability and Performance of a Deployed Site.
        The Open Science Grid middleware stack has seen intensive development over the past years and has become more and more mature, as increasing numbers of sites have been successfully added to the infrastructure. Considerable effort has been put into consolidating this infrastructure and enabling it to provide a high degree of scalability, reliability and usability. A thorough evaluation of its performance and scalability is important in order to assess the effective deployment of new sites and to compare new versions of the middleware to existing ones. To do such scalability tests, requires testbeds comparative in size with the largest available sites on OSG. In this paper we describe testing procedures, available testbeds, and tools to test and monitor the scalability and performance of a deployed site. We use these methods to review the scalability and performance of a deployed Computing Element (CE) and a deployed User Interface (UI) giving independent measurements for them. The limiting pieces of the infrastructure are singled out especially in the context of multicore/multiprocessor hardware. Comparison of the different versions of the infrastructure is given.
        Speaker: Dr Jose Antonio Coarasa Perez (Department of Physics - Univ. of California at San Diego (UCSD))
      • 418
        Overview of the GridMap tool and technology for visualizing complex monitoring data
        GridMap (http://gridmap.cern.ch) has been introduced to the community at the EGEE'07 conference as a new monitoring tool that provides better visualization and insight to the state of the Grid than previous tools. Since then it has become quite popular in the grid community. Its 2 dimensional graphical visualization technique based on treemaps, coupled with a simple responsive AJAX based rich web interface, provides easy readable top-level live views into activities and monitoring status at the grid sites. The advantage of GridMap is, that it shows complex monitoring data from different existing monitoring systems integrated on a single page and allows to interactively explore the data. This talk gives an overview of the GridMap technology and how it is used at CERN in the academic grid community for visualizing complex monitoring data from the grid infrastructure and applications. What general structures of data can be visualized? How can the data be connected to the GridMap server? How do features like context sensitive lightweight popus, submaps, drilldown, and links to exiting monitoring tools help to manage complexity and increase user experience? These questions are answered and underpinned by examples of recently developed new applications of GridMap.
        Speaker: Dr Max Böhm (EDS / CERN openlab)
      • 419
        Pilot Factory - a Condor-based System for Scalable Pilot Job Generation in the Panda WMS Framework
        The Panda Workload Management System is designed around the concept of the Pilot Job - a "smart wrapper" for the payload executable, that can probe the environment on the remote worker node before pulling down the payload from the server and executing it. Such design allows for improved logging and monitoring capabilities as well as flexibility in Workload Management. In the Grid environment (such as the Open Science Grid), Panda Pilot Jobs are submitted to remote sites via mechanisms that ultimately rely on Condor-G. As our experience has shown, in cases where a large number of Panda jobs are simultaneously routed to a particular remote site, the increased load on the head node of the cluster, which is caused by the Pilot Job sumbission, may lead to overall lack of scalability. We have developed a Condor-inspired solution to this problem, which is using the schedd-based glidein, whose mission is to redirect pilots to the native batch system. Once a glidein schedd is installed and running, it can be utilized exactly the same way as local schedds and therefore, from the user's perspective, Pilots thus submitted are quite similar to jobs submitted to the local Condor pool.
        Speaker: Dr Maxim Potekhin (BROOKHAVEN NATIONAL LABORATORY)
        Poster
        Slides
      • 420
        Pilot Framework and the DIRAC WMS
        DIRAC, the LHCb community Grid solution, has pioneered the use of pilot jobs in the Grid. Pilot jobs provide a homogeneous interface to an heterogeneous set of computing resources. At the same time, pilot jobs allow to delay the scheduling decision to the last moment, thus taking into account the precise running conditions at the resource and last moment requests to the system. The DIRAC Workload Management System provides one single scheduling mechanism for jobs with very different profiles. To achieve an overall optimisation, it organizes pending jobs in task queues, both for individual users and production activities. Task queues are created with jobs having similar requirements. Following the VO policy a priority is assigned to each task queue. Pilot submission and subsequent job matching are based on these priorities following a statistical approach. Details of the implementation and the security aspects of this framework will be discussed.
        Speaker: Dr Ricardo Graciani Diaz (Universitat de Barcelona)
        Paper
        Poster
      • 421
        Planning and tracking computing resources for CMS
        Resource tracking, like usage monitoring, relies on fine granularity information communicated by each site on the Grid. Data is later aggregated to be analysed under different perspectives to yield global figures which will be used for decision making. The dynamic information collected from distributed sites must therefore be comprehensive, pertinent and coherent with up stream (planning) and downstream (usage monitoring and controlling) processes. This paper will present the status of monitoring the computing resources for CMS and outline its main problematic,including the definition of the metric for quantifying those resources. It will in particular focus on the recent actions that made this step more reliable and integratable into a wider resource management process.
        Speaker: Dr Marie-Christine Sawley (ETHZ)
      • 422
        Ptolemy: A Scalable LAN Monitoring System
        The Network Engineering team at the SLAC National Accelerator Laboratory is required to manage an increasing number and variety of network devices with a fixed amount of human resources. At the same time, networking equipment has acquired more intelligence to gain introspection and visibility onto the network. Making such information readily available for network engineers and user support personnel is still an unsolved problem. Features such as determining device inventories, capabilities, topology and the inter-relation of equipment at all networking layers as well as performance are fundamental to the understanding, problem remediation and continued operation of networking services. We have surveyed commercial and open source products and have concluded they typically only offer a small subset of the features that we outlined above. At SLAC, we have established an effort to create an open source scalable network monitoring system to meet these needs. Named Ptolemy - Performance, TopOLogy and measurEMent sYstem monitoring, our design creates a clear separation between the collection, storage and presentation of the network services, utilizing open technologies such as Simple Network Monitoring Protocol, XML, Relational and Round Robin databases. Ptolemy is currently in production at SLAC monitoring over 500 network devices and 25,000 network interfaces. In this paper, we describe Ptolemy in detail and demonstrate how it meets SLAC's security, scalability and performance objectives. We discuss architecture, extensibility, deployment, integration with other infrastructure services, and user interface.
        Speaker: Mr Antonio Ceseracciu (SLAC)
      • 423
        Real Time Flow Analysis for Network Services.
        Emerging dynamic circuit services are being developed and deployed to facilitate high impact data movement within the research and education communities. These services normally require network awareness in the applications, in order to establish an end-to-end path on-demand programmatically. This approach has significant difficulties because user applications need to be modified to support API of these services. Considering the highly distributed and complex applications in use for data movement within High-Energy Physics community, this can be a challenging task. In this paper, we present a different approach to establishing and tearing down dynamic circuits. Instead of forcing network awareness in applications, we are working on developing application awareness in the network. Our application awareness within the network is based on collection and analysis of network flow data in near real time. The objective of this project is to develop heuristic algorithms that recognize flow patterns with the specific characteristics for specific applications of interest. Once such flows are recognized, our service can initiate steps to modify offered network services, such as establishing a dynamic circuit to carry the application’s traffic. In our paper, we will present up-to-date results and challenges, as well as our practical experiences using such a tool for controlling of production traffic between US CMS Tier 1 facility at Fermilab and various US-CMS Tier2 facilities.
        Speaker: Mr Andrey Bobyshev (FERMILAB)
        Poster
      • 424
        ReSS: Resource Selection Service for National and Campus Grid Infrastructure
        The Open Science Grid (OSG) offers access to hundreds of Compute elements (CE) and storage elements (SE) via standard Grid interfaces. The Resource Selection Service (ReSS) is a push-based workload management system that is integrated with the OSG information systems and resources. ReSS integrates standard Grid tools such as Condor, as a brokering service and the gLite CEMon, for gathering and publishing resource information in Glue Schema format. ReSS is used in OSG by Virtual Organizations (VO) such as US CMS, Dark Energy Survey (DES), DZero and Engagement VO. ReSS is also used as a Resource Selection Service for Campus Grids, such as Fermigrid. VOs use ReSS to automate the resource selection in their workload management system to run jobs over the grid. In the past year, the system has been enhanced to enable publication and selection of storage resources and of any special software or software libraries (like MPI libraries) installed at computing resources. In this paper, we discuss the Resource Selection Service, its typical usage on the two scales of a National Cyber Infrastructure Grid, such as OSG, and of a campus grid, such as FermiGrid. Additionally we present workload management system requirements from the coming era of LHC data taking.
        Speaker: Mr Parag Mhashilkar (Fermi National Accelerator Laboratory)
        Poster
      • 425
        Site specific monitoring from multiple information systems – the HappyFace Project
        An efficient administration of computing centres requires sophisticated tools for the monitoring of the local infrastructure. Sharing such resources in a grid infrastructure, like the Worldwide LHC Computing Grid (WLCG), goes ahead with a large number of external monitoring systems, offering information on the status of the services of a grid site. This huge flood of information from many different sources retards the identification of problems and complicates the local administration. In addition, the web interfaces for the access to the site specific information are often very slow and uncomfortable to use. A meta monitoring system which automatically queries the different relevant monitoring systems could provide a fast and comfortable access to all important information for the local administration. It becomes also feasible to easily correlate information from different sources and provides an easy access also for non-expert users. In this paper we describe the HappyFace Project, a modular software framework for such purpose. It queries existing monitoring sources and processes the results to provide a single point of entrance for information of a grid site and its specific services. Besides a discussion of its architecture, the experience with HappyFace being used in production for the monitoring of the CMS specific services at GridKa, the German WLCG T1 centre, is discussed.
        Speaker: Mr Volker Buege (Inst. fuer Experimentelle Kernphysik - Universitaet Karlsruhe)
        Poster
      • 426
        SMI++ Object Oriented Framework used for automation and error recovery in the LHC experiments
        In the SMI++ framework, the real world is viewed as a collection of objects behaving as finite state machines. These objects can represent real entities, such as hardware devices or software tasks, or they can represent abstract subsystems. A special language (SML) is provided for the object description. The SML description is then interpreted by a Logic Engine (coded in C++) to drive the Control System. This allows rule based automation and error recovery. SMI++ objects can run distributed over a variety of platforms, all communication being handled transparently by an underlying communication system - DIM. This framework has been first used by the DELPHI experiment at CERN since 1990 and subsequently by BaBar experiment at SLAC since 1999 for the design and implementation of their experiment control. SMI++ has been adopted at CERN by all LHC experiments in their detector control systems as recommended by the Joint Controls Project. Since then it has undergone many upgrades to cater for varying user needs. The main features of the framework and in particular of SML language as well as recent and near future upgrades will be discussed. SMI++ has, so far, been used only by large particle physics experiments. It is, however, equally suitable for any other control applications.
        Speaker: Dr Bohumil Franek (Rutherford Appleton Laboratory)
        Poster
      • 427
        Something you may have wanted to know about L&B
        Logging and Bookkeeping (L&B) is a gLite subsystem responsible for tracking jobs on the grid. Normally the user interacts with it via glite-wms-job-status and glite-wms-job-logging-info commands. Here we present other, less generally known but still useful L&B usage patterns which are available with recently developed L&B features. L&B exposes a HTML interface; pointing a web browser (after having loaded grid credentials into it) to a jobid displays the job status details on a simple web page. Similarly, the L&B server endpoint URL shows a clickable list of active user's jobs and notification handles. Corresponding plain-text interface is available by appending `?text' modifier to jobid or the server endpoint, yielding the same data in a predictable key=value form suitable for parsing in scripts. Making job status info available via RSS feeds is planned in near future. Apart of actively querying L&B server users can also subscribe for receiving notifications on job state changes. Possible criteria range from simple `whatever happens to this job' to `job of this VO user gets resubmitted to another CE'. Notifications are accessed via both API and CLI (suitable for scripting). Information gathered by L&B can witness on reputability of computing elements, namely detect apparent `black holes' where jobs are accepted quickly, and they fail immediately. This information can be leveraged in JDL rank expression via specific ClassAd function to penalize such misbehaving CEs in job matching.
        Speakers: Mr Ales Krenek (CESNET, CZECH REPUBLIC), Mr Jiri Sitera (CESNET, CZECH REPUBLIC), Mr Ludek Matyska (CESNET, CZECH REPUBLIC), Mr Miroslav Ruda (CESNET, CZECH REPUBLIC), Mr Zdenek Sustr (CESNET, CZECH REPUBLIC)
        Poster
      • 428
        SRM and SRB interoperation
        We show how to achieve interoperation between SDSC's Storage Resource Broker (SRB) and the Storage Resource Manager (SRM) implementations used in the Large Hadron Collider Computing Grid. Interoperation is achieved using gLite tools, to demonstrate file transfers between two different grids. This presentation is different from the work demonstrated by the authors and collaborators at SC2007 and SC2008: the principal difference is that we here use the gLite data management tools instead of "plain" Globus Grid tools. We also extend previous work by considering catalogues and file metadata in more detail.
        Speaker: Dr Jens Jensen (STFC-RAL)
        Poster
      • 429
        Supporting multiple VOs and User Groups on one Grid Infrastructure at DESY
        DESY is one of the world-wide leading centers for research with particle accelerators and synchrotron light. In HEP DESY participates in LHC as a Tier-2 center, supports on-going analyzes of HERA data, is a leading partner for ILC, and runs the National Analysis Facility (NAF) for LHC and ILC. For the research with synchrotron light major new facilities are operated and built (FLASH, PETRA III, and XFEL). DESY is facing steadily growing computing demands of various e-science communities with very different requirements and use cases as well as computing strategies and traditions. In HEP, collaborative work in a global context is well established since long whereas the synchrotron light experiments are just entering the transition region from purely local to mostly global computing approaches with huge amounts of data. I order to meet the requirements, a robust and scalable computing infrastructure is needed which is based on well-defined open standards and protocols - the Grid. In the context of EU-project EGEE and the national Grid initiative D-GRID DESY operates one common gLite Grid infrastructure with currently about 1500 CPU cores and a few hundred Terabytes of disk space, and tape. In the contribution to CHEP'09 we will in depth discuss the conceptional and operational aspects of our multi-VO and multi-community Grid infrastructure. This includes in particular the means for authorization, efficient and fair share of resources, and accounting.
        Speaker: Dr Andreas Gellrich (DESY)
        Poster
      • 430
        System Administration of ATLAS TDAQ Computing Environment
        This contribution gives a thorough overview of the ATLAS TDAQ SysAdmin group activities which deals with administration of the TDAQ computing environment supporting High Level Trigger, Event Filter and other subsystems of the ATLAS detector operating at the LHC machine at CERN. The current installation consists of approximately 1500 netbooted nodes managed by more than 60 dedicated servers, about 40 multi-screen user interface machines installed in the control rooms and various hardware and service monitoring machines as well. In the final configuration, the online computer farm will be capable of hosting tens of thousands applications running simultaneously. The needs of software are matched by the two level NFS based solution. Hardware and networking management systems of ATLAS TDAQ are based on NAGIOS and MySQL cluster behind it for accounting and storing the monitoring data collected, IPMI tools, CERN LANDB and the dedicated tools developed by the group, e.g. ConfdbUI. The user management schema deployed in TDAQ environment is founded on the authentication and role management system based on LDAP. External access to the Point1 facilities is provided by means of the gateways supplied with an accounting system as well. Current activities of the group include deployment of the centralized storage system, testing and validating hardware solutions for future use within the ATLAS TDAQ environment including new extreme multi-core blade servers, developing GUI tools for user authentication and roles management, testing and validating SLC5 64-bit OS, and upgrading the existing TDAQ hardware components, authentication servers and the gateways.
        Speaker: Mr Alexander Zaytsev (Budker Institute of Nuclear Physics (BINP))
        Paper
        Poster
      • 431
        The ALICE Electronic Logbook
        All major experiments need tools that provide a way to keep a record of the events and activities, both during commissioning and operations. In ALICE (A Large Ion Collider Experiment) at CERN, this task is performed by the Alice Electronic Logbook (eLogbook), a custom-made application developed and maintained by the Data-Acquisition group (DAQ). Started as a statistics repository, the eLogbook has evolved to become not only a fully functional electronic logbook, but also a massive information repository used to store the conditions and statistics of the several online systems. It's currently used by more than 600 users in 30 different countries and it plays an important role in the daily ALICE collaboration activities. This paper will describe the LAMP (Linux, Apache, MySQL and PHP) based architecture of the eLogbook, the database schema and the relevance of the information stored in the eLogbook to the different ALICE actors, not only for near real time procedures but also for long term data-mining and analysis. It will also present the web interface, including the different used technologies, the implemented security measures and the current main features. Finally it will present the roadmap for the future, including a migration to the web 2.0 paradigm, the handling of the database ever-increasing data volume and the deployment of data-mining tools.
        Speaker: Vasco Chibante Barroso (CERN)
        Poster
      • 432
        The AliEn-OSG interface
        Once the ALICE experiments starts collecting data, it will gather up to 4 PB of information per year. The data will be analyzed in centers distributed all over the world. Each of these centers might have a different software environment. To be able to use all these resources in a similar way, ALICE has developed AliEn, a GRID layer that provides the same interface independently of the underlying technology. AliEn has plugins to access different GRID implementations. The latest plugin that is being developed allows ALICE to communicate with the Open Science Grid (OSG). The rest of this paper will present how the AliEn-OSG works, the tests that have been done so far, and the plans for the future.
        Speaker: Pablo Saiz (CERN)
        Poster
      • 433
        The ATLAS beam pick-up based timing system
        The ATLAS BPTX stations are comprised of electrostatic button pick-up detectors, located 175 m away along the beam pipe on both sides of ATLAS. The pick-ups are installed as a part of the LHC beam instrumentation and used by ATLAS for timing purposes. The usage of the BPTX signals in ATLAS is twofold; they are used both in the trigger system and for LHC beam monitoring. The ATLAS Trigger System is designed in three levels, each level sequentially refining the selection of events to be saved for further offline analysis. The BPTX signals are discriminated with a constant-fraction discriminator to provide a Level-1 trigger when a bunch passes through ATLAS. Furthermore, the BPTX detectors are used by a stand-alone monitoring system for the LHC bunches and timing signals. The BPTX monitoring software measures the phase between collisions and clock with high accuracy in order to guarantee a stable phase relationship for optimal signal sampling in the sub-detector front-end electronics. In addition to monitoring this phase, the properties of the individual bunches are measured and the structure of the beams is determined. On September 10th, 2008, the first LHC beams reached the ATLAS experiment. During this period with single beam, the ATLAS BPTX system was used extensively to time in the read-out of the sub-detectors. In this paper, we present the performance of the BPTX system with focus on the monitoring system and its measurements of the first LHC beams.
        Speaker: Christian Ohm (Department of Physics, Stockholm University)
        Poster
      • 434
        The ATLAS Distributed Data Management Dashboard
        The ATLAS Distributed Data Management (DDM) system is now at the point of focusing almost all its efforts to operations after successfully delivering a high quality product which has proved to scale to the extreme requirements of the experiment users. The monitoring effort has followed the same path and is now focusing mostly on the shifters and experts operating the system. In this paper we present the new features that have been added to the ATLAS DDM Dashboard which have turned daily operations into a more trivial task. These include the integration with the underlying user operation tools like eLog and GGUS, the collection of detailed site status information from all three distinct grids (EGEE, OSG and NDGF) and the merging of the different DDM activities into a unified interface. These improvements have significantly improved the end-user experience. The evaluation of the service usage by shifters shows which are the most useful and popular areas, and where effort is still needed to have an even more automated system.
        Speaker: Ricardo Rocha (CERN)
      • 435
        The ATLAS MDT remote calibration centers
        The calibration of the ATLAS MDT chambers will be performed at remote sites, called Remote Calibration Centers. Each center will process the calibration data for the assigned part of the detector and send the results back to CERN for general use in the reconstruction and analysis within 24h from the calibration data taking. In this work we present the data extraction mechanism, the data transfer mechanism and the structure of the remote calibration centers. A particular focus will be given to the processing techniques in the calibration centers, the failover mechanisms and the process control system, called Local Calibration Data Splitter (LCDS). The full architecture has been successfully used during the cosmic data taking runs and has been proven to be powerful, robust and stable enough to cope with the real data taking. The preliminary results on the system performance, obtained during the cosmic data taking runs in 2008, will be discussed and the plans for the real data taking period will be presented.
        Speaker: Alessandro De Salvo (Istituto Nazionale di Fisica Nucleare Sezione di Roma 1)
        Poster
      • 436
        The CMS Event Builder and Storage System
        The CMS event builder assembles events accepted by the first level trigger and makes them available to the high-level trigger. The event builder needs to handle a maximum input rate of 100 kHz and an aggregated throughput of 100 GBytes/s originating from approximately 500 sources. This paper presents the chosen hardware and software architecture. The system consists of 2 stages: an initial pre-assembly reducing the number of fragments by one order of magnitude and several independent Readout Builder (RU builder) slices. The RU builder is based on 3 separate services: the buffering of event fragments during the assembly, the event assembly, and the data flow manager. A further component is responsible to handle events accepted by the high-level trigger: the Storage Manager (SM) temporarily stores the events on disk at a peak rate of 2 GBytes/s until they are permanently archived offline. In addition, events and data-quality histograms are served by the SM to online monitoring clients. We discuss the operational experience from the first months of reading out cosmic ray data with the complete CMS detector.
        Speaker: Remigius K Mommsen (FNAL, Chicago, Illinois, USA)
        Poster
      • 437
        The commissioning of CMS sites: improving the site reliability
        The computing system of the CMS experiment works using distributed resources from more than 60 computing centres worldwide. These centres, located in Europe, America and Asia are interconnected by the Worldwide LHC Computing Grid. The operation of the system requires a stable and reliable behaviour of the underlying infrastructure. CMS has established a procedure to extensively test all relevant aspects of a Grid site, such as the ability to efficiently use their network to transfer data, the functionality of all the site services relevant for CMS and the capability to sustain the various CMS computing workflows (Monte Carlo simulation, event reprocessing and skimming, data analysis) at the required scale. This contribution describes in detail the procedure to rate CMS sites depending on their performance, including the complete automation of the program, the description of monitoring tools, and its impact in improving the overall reliability of the Grid from the point of view of the CMS computing system.
        Speaker: Dr Jose Flix Molina (Port d'Informació Científica, PIC (CIEMAT - IFAE - UAB), Bellaterra, Spain)
        Poster
      • 438
        The Gatherer - a mechanism for integration of monitoring data in ATLAS TDAQ
        The ATLAS experiment's data acquisition system is distributed across the nodes of large farms. Online monitoring and data quality runs alongside this system. A mechanism is required that integrates the monitoring data from different nodes and makes it available for shift crews. This integration includes but is not limited to summation or averaging of histograms and summation of trigger rates. A prototype of the central component ('Gatherer') in this mechanism was designed in 2004. Extensive testing in subsequent running in 2008 have led to a substantial reimplementation of the Gatherer. Performance milestones have been achieved which ensure the needs of early datataking will be met. We will present a detailed description of the architectural features and performance of the current Gatherer.
        Speaker: Mr Yuriy Ilchenko (SMU)
        Poster
      • 439
        The gLite Workload Management System
        The gLite Workload Management System (WMS) has been designed and developed to represent a reliable and efficient entry point to high-end services available on a Grid. The WMS translates user requirements and preferences into specific operations and decisions - dictated by the general status of all other Grid services it interoperates with - while taking responsibility to bring requests to successful completion. The WMS, conceived to be a robust and scalable service, implements an early binding approach to meta-scheduling as a neat solution, able to optimise resource access and to satisfy requests for computation together with data. Thanks to the modularity of its design it can be deployed in different physical layouts according to specific needs. Several added value features are provided on top of job submission, different job types are supported from simple batch to a variety of compounds, all described in this paper. As of late 2008, activity on the WMS has been addressing interoperability with NorduGrid and UNICORE, supporting JSDL specifications while contributing to the GIN profile design, reviewing support for MPI jobs in a more flexible way and enabling submission to CREAM CE. Code restructuring has been finalised to accomodate for IPv6 compliancy and delegation 2.0.0, also bringing a more stable and lightweight User Interface. Portability has been favoured by the transition to ETICS. This paper reports on the present and next to come releases - the latter being characterized by improved responsiveness and higher throughput. Short to middle-term plans will be detailed.
        Speaker: Dr Marco Cecchi (INFN)
        Poster
      • 440
        The Service Level Status monitoring of the LHC Experiments Distributed Computing
        This contribution describes how part of the monitoring of the services used in the computing systems of the LHC experiments has been integrated with the Service Level Status (SLS) framework. The LHC experiments are using an increasingly number of complex and heterogeneous services: the SLS allows to group all these different services and to report their status and their availability by providing a web-based display. It dynamically shows availability, basic information and statistics about these services, as well as their dependencies. The SLS framework has been developed by the CERN-IT/FIO group and is currently dealing with more than 350 services, including administrative applications, physics and infrastructure services, Grid-related and experiment-specific services. The SLS can produce different views for different end-users. The service parameter set is highly customizable via a user-friendly XML format and can include subservices and various thresholds to generate alarms of increasing severity. Historical data is made available via web. All information is also retrievable via a programmatic interface and imported in other visualization tools, like Gridmap. SLS is now effectively used to monitor the status of the LHC experiments services.
        Speaker: Dr Alessandro Di Girolamo (CERN)
      • 441
        The square root of serving multiple experiments is a single dCache
        All four LHC experiments are served by GridKa, the German WLCG Tier-1 at the Steinbuch Centre for Computing of the Karlsruhe Institute of Technology (KIT). Each of the experiments requires a significantly different setup of the dCache data management system. Therefore the use of a single dCache instance for all experiments can have negative effects at different levels, e.g. SRM, space manager and metadata (PNFS). For reasons of performance, operation and administration, GridKa has started in 2008 to prepare to split the dCache storage management system into four instances, each serving a single VO. First, after stringent planning, the ATLAS VO is migrated to a dCache instance. Several checks are made to ensure consistency between physical and logical data contents, because a total of three different databases are involved in the conversion. The procedures, the tools which have been used and the planning for the ATLAS dCache fork are described, followed by first evaluations of the split environment. Also, the realisation of the different computing models into the current dCache setup at GridKa is presented.
        Speaker: Dr Doris Ressmann (Karlsruher Institut of Technology)
        Poster
      • 442
        Tile DCS Web System
        The web system described here provides functionalities to monitor the Detector Control System (DCS) acquired data. The DCS is responsible for overseeing the coherent and safe operation of the ATLAS experiment hardware. In the context of the Hadronic Tile Calorimeter Detector, it controls the power supplies of the readout electronics acquiring voltages, currents, temperatures and coolant pressure measurements. The physics data taking requires the stable operation of the power sources. The DCS Web System retrieves data automatically and processes it extracting the statistics for given periods of time. The mean and standard deviation outcomes are stored as XML files and are compared to preset thresholds. Further, a graphical representation of the TileCal barrels indicates the state of the supply system of each detector drawer. Colors are designated for each kind of state. This way problems are easier to find and the collaboration members can focus on them. The user can pick a module to see detailed information. It is possible to check the statistics and generate charts of the parameters over the time. The DCS Web System also gives information about the power supplies latest status. The barrel colors green whenever the system is on. Otherwise it is colored red. Furthermore, it is possible to perform customized analyses. It provides search interfaces where the user can set the module, parameters, and the time period of interest. Moreover the system produces the output of the retrieved data as charts, XML files, CSV and ROOT files according to the user's choice.
        Speaker: Mr Fernando Guimaraes Ferreira (Univ. Federal do Rio de Janeiro (UFRJ))
        Poster
      • 443
        Tools for offline access and visualization of ATLAS online control and data quality databases .
        Data describing the conditions of the ATLAS detector and the Trigger and Data Acquisition system are stored in the Conditions DataBases (CDB), and may include from simple values to complex objects like online system messages or monitoring histograms. The CDB are deployed on COOL, a common infrastructure for reading and writing conditions data. Conditions data produced online are saved to an intermediate file based buffer called ONASIC, relieving possible pressure on the online informations bus. Configuration data are managed by OKS classes and instances, that are made persistent into the CDB by the OKS2COOL application. By the end of each run, monitoring histograms are stored by a collector proccess, and references to the histogram's location are stored on the CDB. NODE is an application capable of reading back the histogram's information from the databases, fetch the histogram and present it to the user. The three applications developed by our group - ONASIC, OKS2COOL and NODE - share an underlying database API, the TIDB2, caracterized by its multi-backend plugins, and scientific objects handeling orientation. Databases created by TIDB2 based tools can be browsed through a graphical application called KTIDBExplorer. There are standing issues accessing Conditions data stored in CDB by ONASIC and OKS2COOL from the ATLAS offline framework ATHENA. The design of future interfaces to recreate usable detector configurations needs to be evaluated. NODE also presents constraints to be overriden. On this paper it is described solutions to these problems, along with recent developments of the ONASIC, OKS2COOL and NODE applications, exposing as well new features/functionalities of the underlying TIDB2 API and explorer.
        Speaker: Mr Lourenço Vaz (LIP - Coimbra)
        Poster
      • 444
        Towards Sustainability: An Interoperability Outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures
        Interoperability of grid infrastructures is becoming increasingly important in the emergence of large scale grid infrastructures based on national and regional initiatives. To achieve interoperability of grid infrastructures adaptions and bridging of many different systems and services needs to be tackled. A grid infrastructure offers services for authentication, authorization, accounting, monitoring, operation besides from the services for handling and data and computations. This paper presents an outline of the work done to integrate the Nordic Tier-1 and 2s, which for the compute part is based on the ARC middleware, into the WLCG grid infrastructure co-operated by the EGEE project. Especially, a throughout description of integration of the compute services will presented.
        Speaker: Dr Josva Kleist (Nordic Data Grid Facility)
        Poster
      • 445
        Utilizing Lustre Filesystem with dCache for CMS Analysis
        The CMS experiment is expected to produce a few Peta Bytes of data a year and distribute them globally. Within the CMS computing infrastructure, most user analyses and the production of the Monte Carlo events will be carried out at some 50 CMS Tier-2 sites. The way how to store the data and to allow physicists to access them efficiently has been a challenge, especially for Tier-2 sites with limited storage resources. The CMS experiment, including other LHC experiments, has been using the dCache for successfully managing and distributing large amount of data. However, lacking the POSIX file access capability and being relatively slow in access using the dCache dcap protocol, there are some issues with large number of users trying to access the same files simultaneously. In this paper, we present our new implementation to continue to utilize the dCache as the frontend for data management and distribution and use the Lustre filesystem as the backend to provide users with the direct POSIX file access without going through the dCache file read protocol. The implementation fully utilizes the dCache HSM interface with additional functionalities for mapping files between the dCache and the Lustre file system. Running simple IO intensive ROOT file dumper user analysis jobs shows that the process time with the data through the Lustre filesystem is over 60% faster than that with the same data stored in the dCache. Furthermore, Lustre allows users to mount the filesystem remotely and this also provides an alternative way for data access by regional T3 sites. We believe this implementation will bring both an efficient file access technique and flexibility of the data hosting in an environment where the storage resources are limited.
        Speakers: Prof. Jorge Rodiguez (Florida International University), Dr Yujun Wu (University of Florida)
        Poster
      • 446
        WLCG-specific special features in GGUS
        The user and operations support of the EGEE series of projects can be captioned "regional support with central coordination". Its central building block is the GGUS portal which acts as an entry point for users and support staff. It is also as an integration platform for the distributed support effort. As WLCG relies heavily on the EGEE infrastructure it is important that the support infrastructure covers the WLCG use cases of the grid. During the last year several special features have been implemented in the GGUS portal to meet the requirements of the LHC experiments needing to contact the WLCG grid infrastructure, especially their Tier 1 and Tier 2 centres. This presentation will summarise these special features, with particular focus on the alarm and team tickets and the direct ticket routing, in the context of the overall user and operations support infrastructure. Additionally we will present the management processes for the user support activity, detailing the options which the LCH VOs have to participate in this process. An outlook will be given on how the user support activity will evolve towards the EGI/NGI model without disrupting the production quality service provided by EGEE for WLCG.
        Speaker: Torsten Antoni (GGUS, KIT-SCC)
        Poster
    • Plenary: Thursday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Live broadcasting at:
      http://prenosy.cesnet.cz/

      Convener: Ludek Matyska (CESNET)
      • 447
        The challenge of adapting HEP physics software applications to run on many-core cpus
        Computing in these years zero has been caracterized by the advent of "multicore cpus". Effective exploitation of this new kind of computing architecture requires the adaptation of legacy software and enventually a shift of the programming paradigms to massive parallel. In this talk we will introduce the reasons that brough to the introduction of "multicore" hardware and the consequencies on system and application software. The activities initiated in HEP to adapt current software will be reviewed before presenting the perspective for the future
        Speaker: Prof. Vincenzo Innocente (CERN)
        Slides
        Video
      • 448
        Distributed Data Analysis and Tools
        Distributed Data Analysis and Tools
        Speaker: Dr Pere Mato (CERN)
        Slides
        Video
      • 449
        The Data Preservation and Long Term Analysis in HEP
        The high energy physics experiments collect data over long periods of time and exploit this data to produce physics publications. The scientific potential of an experiment is in principle defined and exhausted during the collaboration lifetime. However, the continous improvement of the scientific grounds like the theory, experiment, simulation, new ideeas or unexpected discoveries may lead to the need to re-analyse the old data. Relevant examples of such analyses exist and are likely to become more frequent in the future. Indeed, while the experimental complexity and the associated costs have continously increased, many of the present experiments, in particular the ones related to colliders, will provide unique data sets, not likely to be further improved in the near future. The physics motivation and the technological and strategical aspects of the data preservation will be discussed. A review of the present status will be presented, together with a recent collaborative effort towards the data preservation in high energy physics.
        Speaker: Dr Cristinel Diaconu (CPPM IN2P3)
        Slides
        Video
    • 10:30
      coffee break, exhibits and posters
    • Summary: Thursday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Randall Sobie (Triumf/Victoria)
      • 450
        Summary of the WLCG Collaboration workshop 21-21 March
        This talk will summarize the main points that were discussed - and where possible agreed - at the WLCG Collaboration workshop held in Prague during the weekend immediately preceding CHEP. The list of topics for the workshop include: * An analysis of the experience with WLCG services from 2008 data taking and processing; * Requirements and schedule(s) for 2009; * Readiness for 2009
        Speakers: Dr Harry Renshall (CERN), Dr Jamie Shiers (CERN)
        Slides
        Video
        Workshop agenda
      • 451
        Summary: Collaborative Tools
        Speaker: Eva Hladka (CESNET)
        Slides
        Video
      • 452
        Summary: Hardware and Computing Fabrics
        Speaker: Sasaki Takashi (KEK)
        Slides
        Video
    • 13:00
      lunch
    • Distributed Processing and Analysis: Thursday Club C

      Club C

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Joel Snow (Langston University)
      • 453
        The ATLAS Tier-0: Overview and Operational Experience
        Within the ATLAS hierarchical, multi-tier computing infrastructure, the Tier-0 centre at CERN is mainly responsible for prompt processing of the raw data coming from the online DAQ system, to archive the raw and derived data on tape, to register the data with the relevant catalogues and to distribute them to the associated Tier-1 centres. The Tier-0 is already fully functional. It has been successfully participating in all cosmics and commissioning data taking since May 2007, and was ramped up to its foreseen full size, performance and throughput for the cosmics (and short single-beam) run periods between July and October 2008. Data and work flows for collision data taking were exercised in several "Full Dress Rehearsals" (FDRs) in the course of 2008. The transition from an expert to a shifter-based system was successfully established in July 2008. This presentation will give an overview of the Tier-0 system, its data and work flows, and operations model. It will review the operational experience gained in cosmics, commissioning, and FDR exercises during the past year. And it will give an outlook on planned developments and the evolution of the system towards first collision data taking expected in Spring 2009.
        Speaker: Dr Guido NEGRI (CERN)
        Slides
      • 454
        A Comparison of Data-Access Platforms for BaBar and ALICE analysis computing model at the Italian Tier1
        Performance, reliability and scalability in data access are key issues in the context of Grid computing and High Energy Physics (HEP) data analysis. We present the technical details and the results of a large scale validation and performance measurement achieved at the INFN Tier1, the central computing facility of the Italian National Institute for Nuclear Research (INFN). The aim of this work is the evaluation of data access activity during analysis tasks within BaBar and ALICE computing models against two of the most used data handling systems in HEP scenario: GPFS and Scalla/Xrootd.
        Speaker: Fabrizio Furano (Conseil Europeen Recherche Nucl. (CERN))
        Slides
      • 455
        Remote Operation of the global CMS Data and Workflows
        CMS' infrastructure to process, store and analyze data is based on worldwide distributed tiers of computing resources. Monitoring and trouble shooting of all parts of the computing infrastructure, and importantly the experiment specific data flows and workflows running on this infrastructure, is essential to guarantee timely delivery of processed data to the physicists. This is especially important during startup and commissioning where the software and also the computing systems are not yet completely well behaved. This talk will present the operation, monitoring and trouble shooting of the global CMS data and workflow infrastructure from the Fermilab Remote Operation Center (ROC). It will put an emphasis on the description of remote operation protocols and procedures developed during the first cosmics data taking periods. The talk will point out problems of being physically separated from the infrastructure and the detector operation. It will stress the importance of a well designed infrastructure with a multitude of communication possibilities, from phone calls to state-of-the-art video connections. Also it will point out the advantage of being able to provide operational support outside European working hours without putting too much load on the shift personnel. Overall, the talk will describe the success story of remote operation from Fermilab and give recipes for similar future projects.
        Speaker: Dr David Mason (FNAL)
        Slides
      • 456
        User analysis of LHCb data with Ganga
        Ganga (http://cern.ch/ganga) is a job-management tool that offers a simple, efficient and consistent user experience in a variety of heterogeneous environments: from local clusters to global Grid systems. Experiment specific plugins allow Ganga to be customised for each experiment. This paper will describe these LHCb plugins of Ganga. For LHCb users, Ganga is the job submission tool of choice to the Grid as the LHCb specific plugins allow support for end-to-end analysis helping the user to perform his complete analysis with the help of Ganga. This starts with the support for data selection, where a user can select datasets from the LHCb Bookkeeping system. Followed by the setup for large analysis jobs with tailored plugins for the LHCb core software where jobs can be managed by the splitting of these analysis jobs with the subsequent merging of the result files. Furthermore, Ganga offers support for Toy Monte-Carlos to help the user tune their analysis. In addition, to describing the Ganga architecture, typical usage patters within LHCb and experience with the updated LHCb DIRAC 3 WMS will be shown.
        Speaker: Dr Andrew Maier (CERN)
        Slides
      • 457
        Functional and Large-Scale Testing of the ATLAS Distributed Analysis Facilities with Ganga
        Effective distributed user analysis requires a system which meets the demands of running arbitrary user applications on sites with varied configurations and availabilities. The challenge of tracking such a system requires a tool to monitor not only the functional statuses of each grid site, but also to perform large-scale analysis challenges on the ATLAS grids. This work presents one such tool, the ATLAS GangaRobot, and the results of its use in tests and challenges. For functional testing, the GangaRobot performs daily tests of all sites; specifically, a set of exemplary applications are submitted to all sites and then monitored for success and failure conditions. These results are fed back into Ganga to improve job placements by avoiding currently problematic sites. For analysis challenges, a cloud is first prepared by replicating a number of desired DQ2 datasets across all the sites. Next, the GangaRobot is used to submit and manage a large number of jobs targeting these datasets. The high-loads resulting from multiple parallel instances of the GangaRobot exposes shortcomings in storage and network configurations. The results from a series of cloud-by-cloud analysis challenges starting in fall 2008 are presented.
        Speaker: Daniel Colin Van Der Ster (Conseil Europeen Recherche Nucl. (CERN))
        Slides
      • 458
        Portal for VO-specific SAM tests and VO-customised site availability
        WLCG relies on the SAM (Service Availability Monitoring) infrastructure to monitor the behaviour of sites and as a powerful debugging tool. SAM is also used by individual experiments and VOs (Virtual Organisations) to submit application-specific tests to the grid. This degree of specificity implies additional requirements in terms of visualisation and manipulation of the test results provided by SAM. Our portal, built on top of SAM, provides a clear and precise view of the test results and makes it possible for VO managers to define several groups of tests, all critical for a specific task. In addition, availability metrics are extracted out of the "raw" test results, giving the opportunity to the experiments to evaluate a site's performance with regard to a given application. We will describe the application itself and the additional functionalities it provides to all the VOs, as well as its use for the experiments' distributed computing operations and for the site administrators.
        Speaker: Pablo SAIZ (CERN)
        Slides
    • Event Processing: Thursday Club E

      Club E

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Wolfgang Waltenberger (Institut fuer Hochenergiephysik (HEPHY)-Oesterreichische Akademi)
      • 459
        Ring Recognition and Electron Identification in the RICH detector of the CBM Experiment at FAIR
        The Compressed Baryonic Matter (CBM) experiment at the future FAIR facility at Darmstadt will measure dileptons emitted from the hot and dense phase in heavy-ion collisions. In case of an electron measurement, a high purity of identified electrons is required in order to suppress the background. Electron identification in CBM will be performed by a Ring Imaging Cherenkov (RICH) detector and Transition Radiation Detectors (TRD). In this contribution we will present algorithms and software which have been developed for electron identification in CBM. Efficient and fast ring recognition in the RICH detector is based on the Hough Transform method which has been accelerated considerably compared to a standard implementation. Ring quality selection is done using an Artificial Neural Network which also has been used for electron identification. Due to optical distortions ellipse fitting and radius correction routines are used for improved ring radius resolution. These methods allow for a high purity and efficiency of reconstructed electron rings. For momenta above 2 GeV/c the ring reconstruction efficiency for electrons embedded in central Au+Au collisions at 25 AGeV beam energy is 95% resulting in an electron identification efficiency of 90% at a pion suppression factor of 500. Including information from the TRD a pion suppression of 10000 is reached at 80% efficiency. The developed algorithm is very robust to a high ring density environment. Current work focuses on detector layout studies in order to optimize the detector setup while keeping a high performance.
        Speaker: Semen Lebedev (GSI, Darmstadt / JINR, Dubna)
        Slides
      • 460
        Vertex finding in pile-up rich event for p+p and d+Au collisions at STAR
        Vertex finding is an important part of accurately reconstructing events at STAR since many physics parameters, such as transverse momentum for primary particles, depend on the vertex location. Many analysis depend on trigger selection and require an accurate determination of where the interaction that fired the trigger occurred. Here we present two vertex finding methods, the Pile-Up Proof Vertexer (PPV) and a Minuit based vertexer, and their performance on the 2008 STAR p+p and d+Au data. PPV had been developed for use in p+p collisions and uses a 1D truncated log-likelihood method to determine the most probable Z location of the vertex. The Minuit vertex finder had been developed for Au+Au events and uses the mean dip angle to determine the vertex location. The heart of the Minuit finder is the routine by that name which is a tool that finds the minimum value of a multi-parameter function. We will present efficiency versus charged track multiplicity as well as the efficiency for both forward and mid-rapidity triggers. A comparison to a hardware determination of the vertex will also be included.
        Speaker: Mrs Rosi REED (University of California, Davis)
        Paper
        Slides
      • 461
        A framework for vertex reconstruction in the ATLAS experiment at LHC
        In anticipation of the First LHC data to come, a considerable effort has been devoted to ensure the efficient reconstruction of vertices in the ATLAS detector. This includes the reconstruction of photon conversions, long lived particles, secondary vertices in jets as well as finding and fitting of primary vertices. The implementation of the corresponding algorithms requires a modular design based on the use of abstract interfaces and common Event Data Model. An enhanced software framework addressing various physics application of vertex reconstruction has been developed in the ATLAS experiment. The general principles of this framework will be presented. A particular emphasis will be given to the description of the concrete implementations, which are dedicated to diverse methods of vertex reconstruction, and to their expected performance with the early data of ATLAS.
        Speaker: Dr Kirill Prokofiev (CERN)
        Slides
      • 462
        An overview of the b-Tagging algorithms in the CMS Offline software
        The CMS offline software contains a widespread set of algorithms to identify jets originating from the weak decay of b-quarks. Different physical properties of b-hadron decays like lifetime information, secondary vertices and soft leptons are exploited. The variety of selection algorithms range from simple and robust ones, suitable for early data-taking and online environments such as the trigger system, to highly discriminating ones, exploiting all the information available. For the latter, a generic discriminator computing framework has been developed that allows to exploit the full power of multi-variate analysis techniques in an flexible way.
        Speaker: Dr Andrea Bocci (Università and INFN, Pisa)
        Slides
      • 463
        Ideal tau tagging with TMVA multivariate data-analysis toolkit
        We report our experience on using ROOT package TMVA for multivariate data analysis, for a problem of tau tagging in the framework of heavy charged MSSM Higgs boson searches at the LHC. With a generator level analysis, we investigate how in the ideal case tau tagging could be performed and hadronic tau decays separated from the hadronic jets of QCD multi-jet background present in LHC experiments. A successful separation of the Higgs signal from the background requires a rejection factor of 10^5 or better against the QCD background. The tau tagging efficiency and background rejection are studied with various MVA classifiers.
        Speaker: Mr Aatos Heikkinen (Helsinki Institute of Physics)
        Slides
      • 464
        Offline computing for the Minerva experiment
        The Minerva experiment is a small fully active neutrino experiment which will run in 2010 in the NUMI beamline at Fermilab. The offline computing framework is based on the GAUDI framework. The small Minerva software development team has used the GAUDI code base to produce a functional software environment for simulation of neutrino interactions generated by the GENIE generator and analysis and display of real data from test beams and a cosmic ray test of 10% of the final detector.
        Speaker: Heidi Schellman (Northwestern University)
        Slides
    • Grid Middleware and Networking Technologies: Thursday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 465
        PhEDEx Data Service
        The PhEDEx Data Service provides access to information from the central PhEDEx database, as well as certificate-authenticated managerial operations such as requesting the transfer or deletion of data. The Data Service is integrated with the 'SiteDB' service for fine-grained access control, providing a safe and secure environment for operations. A plugin architecture allows server-side modules to be developed rapidly and easily by anyone familiar with the schema, and can automatically return the data in a variety of formats for use by different client technologies. Using HTTP access via the Data Service instead of direct database connections makes it possible to build monitoring web-pages with complex drill-down operations, suitable for debugging or presentation from many aspects. This will form the basis of the new PhEDEx website in the near future, as well as providing access to PhEDEx information and certificate-authenticated services for other CMS dataflow and workflow management tools such as CRAB, WMCore, DBS and the dashboard. A PhEDEx command-line client tool provides one-stop access to all the functions of the PhEDEx Data Service interactively, for use in simple scripts that do not access the service directly. The client tool provides certificate authenticated access to managerial functions, so all the functions of the PhEDEx Data Service are available to it. The tool can be expanded by plugins which can combine or extend the client-side manipulation of data from the Data Service, providing a powerful environment for manipulating data within PhEDEx.
        Speaker: Mr Ricky Egeland (Minnesota)
        Slides
      • 466
        Data Management Evolution and Strategy at CERN
        Data management components at CERN form the backbone for production and analysis activities of the experiments of the LHC accelerator. Significant data amounts (15PB/y) will need to be collected from the online systems, reconstructed and distributed to other sites participating in the Worlrdwide LHC computing Grid for further analysis. More recently also significant resources to support local analysis at CERN have been requested by several experiments, which poses additional requirements on the data management components, which have been extensively tested during the recent combined computing tests of the experiments. In this contribution we report on the experience gained with the existing system and summarise the key requirements for supporting efficient end user analysis. We will describe the developments, which resulted in a first analysis system at CERN and will outline the steps to evolve this system together with the experiment and deployment requirements towards full LHC production phase. We will also outline the longer term vision towards modular data management components taking into account the expected medium term technology and media changes.
        Speaker: Mr Alberto Pace (CERN)
        Poster on Monitoring
        Poster on Tape improvements
        Slides
      • 467
        Data Management in EGEE
        Data management is one of the cornerstones in the distributed production computing environment that the EGEE project aims to provide for a e-Science infrastructure. We have designed and implemented a set of services and client components, addressing the diverse requirements of all user communities. LHC experiments as main users will generate and distribute approximately 15 PB of data per year worldwide using this infrastructure. Another key user community, biomedical projects, have strict security requirements with less emphasis on the volume of data. We maintain three service groups for grid data management: The Disk Pool Manager (DPM) Storage Element (with more than 100 instances deployed world-wide), the LCG File Catalogue (LFC) and the File Transfer Service (FTS) which sustains an aggregated transfer rate of 1.5GB/sec. They are complemented by individual client components and also tools which help coordinating more complex uses cases with multiple services (GFAL-client, lcg_util, hydra-cli). In this paper we show how these services, keeping clean and standard interfaces among each other, can work together to cover the data flow and how they can be used as individual components to cover diverse requirements. We will also describe areas that we consider for further improvements, both for performance and functionality.
        Speaker: Ákos Frohner (CERN)
        Slides
      • 468
        dCache ready for LHC production and analysis.
        At the time of CHEP'09, the LHC Computing Grid approach and implementation is rapidly approaching the moment it finally has to prove its feasibility. The same is true for dCache, the grid middle-ware storage component, meant to store and manage the largest share of LHC data outside of the LHC Tier 0. This presentation will report on the impact of recently deployed dCache sub-components, enabling this Storage Element for final LHC production data taking, reconstruction and analysis. We will elaborate on performance improvements caused by redesigned dCache subsystems like the new dCache name space provider (Chimera), a revised SRM front-end and others. Furthermore we will touch on new functionality in dCache, requested by the LHC experiments to simplify large scale Grid Data Management. Most prominent in this area certainly is the introduction of Access Control Lists (ACLs) for the Chimera name space, the protection of SRM spaces as well as shielding the robotic tape system from malicious or accidental misuse by non production VO members. We will present first ideas on the implementation of a generalized group and user quota system in dCache, seamlessly interacting with the SRM space management sub-component. Finally we would like to discuss dCache solutions for the next big challenge in the LHC computing world, the data analysis. In this context we will present a comparison between legacy local data access protocols and modern industry standards e.g. NFS4.1.
        Speaker: Dr Patrick Fuhrmann (DESY)
        Slides
      • 469
        On StoRM performance and scalability
        StoRM is a Storage Resource Manager (SRM) service adopted in the context of WLCG to provide data management capabilities on high performing cluster and parallel file systems as Lustre and GPFS. The experience gained in the readiness challenges of LHC Grid infrastructure proves that scalability and performance of SRM services are key characteristics to provide effective and reliable storage resources. In this paper, the methodology for testing the scalability and performance of the StoRM service is presented, and tests results are analyzed and evaluated to provide guidelines for sites to build right scaled StoRM based storage system. This article presents a service performance analysis approach for web based Storage Resource Manager as StoRM. The possible metrics adopted for realistic performance evaluation are presented together with analysis of service configuration parameters and data flow for different SRM calls. Following the proposed approach, a report analysis and results of performance tests on StoRM service is presented and compared with typical use cases from real high energy physic experiments. Results show how the system behaves changing service configurations and deployment layouts. Evaluations of results define important guidelines for Grid sites to proper scale the capacity of the StoRM installation depending on sizes and experiment requirements.
        Speaker: Luca Magnoni (INFN CNAF)
        Slides
      • 470
        Will Clouds Replace Grids? Can Clouds Replace Grids?
        The WLCG service has been declared officially open for production and analysis during the LCG Grid Fest held at CERN - with live contributions from around the world - on Friday 3rd October 2008. But the service is not without its problems - services or even sites suffer degradation or complete outage with painful repercussions on experiment activities, the operations and service model is arguably not sustainable at this level but yet an important element of the funding comes to and end approximately one year after this conference! Cloud computing - which has been referred to as Grid computing with a viable business model - makes ambitious claims. Could it solve all - or even a significant fraction, say Monte Carlo production - of our computing problems? What would be the associated costs, technical and sociological implications? This presentation analyzes the Strengths, Weaknesses, Opportunities and Threats of these potential rival models from the viewpoint of the current WLCG service. It makes proposals for studies that should be performed - beyond existing largely paper analyses - and highlights some key differentiators between the two approaches.
        Speaker: Dr Jamie Shiers (CERN)
        Paper
        Slides
    • Grid Middleware and Networking Technologies: Thursday Club B

      Club B

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 471
        New strategy for job monitoring on the WLCG scope
        Job processing and data transfer are the main computing activities on the WLCG infrastructure. Reliable monitoring of the job processing on the WLCG scope is a complicated task due to the complexity of the infrastructure itself and the diversity of the currently used job submission methods. The talk will describe the new strategy for the job monitoring on the WLCG scope, covering primary information sources, job status changes publishing, transport mechanism and visualization.
        Speaker: Julia Andreeva (CERN)
      • 472
        Real Time Monitoring of Grid Job Executions
        In this paper we describe the architecture and operation of the Real Time Monitor (RTM), developed by the Grid team in the HEP group at Imperial College London. This is arguably the most popular dissemination tool within the EGEE Grid. Having been used, on many occasions including GridFest and LHC inauguration events held at CERN in October 2008. The RTM gathers information from EGEE sites hosting Logging and Bookkeeping (LB) services. Information is cached locally at a dedicated server at Imperial and made available for clients to use in near real time. The system consists of 3 main components: the RTM server, enquirer and an apache Web Server which is queried by clients. The RTM server queries the LB servers at fixed time intervals, collecting job related information and storing this in a local database. Job related data includes not only job state (i.e Scheduled, waiting, Running or Done) along with timing information but also other attributes such as Virtual Organization, Computing Element (CE) queue - if known. Job data stored in RTM database is read by the enquirer every minute and converted to an XML format which is stored on a Web Server. This decouples the RTM server database from potentially many clients which could bottleneck the database. This information can be visualized through either a 2D or 3D Java based client with live job data either being overlaid on to a 2 dimensional map of the world or rendered in 3 dimensions over a globe map using OpenGL.
        Speaker: Dr Janusz Martyniak (Imperial College London)
        Slides
      • 473
        Evolution of SAM in an enhanced model for monitoring WLCG services
        Authors: David Collados, Judit Novak, John Shade, Konstantin Skaburskas, Lapka Wojciech It is four years now since the first prototypes of tools and tests started to monitor the Worldwide LHC Computing Grid (WLCG) services. One of these tools is the Service Availability Monitoring (SAM) framework, which superseded the SFT tool, and has become a keystone for the monthly WLCG availability and reliability computations. During this time, the grid has evolved into a robust, production-level infrastructure, in no small part thanks to the extensive monitoring infrastructure which includes testing, visualization and reporting. Experience gained with monitoring has led to emerging grid monitoring standards, and provided valuable input for the Operations Automation Strategy aimed at the regionalization of monitoring services. This change in scope, together with an ever-increasing number of services and infrastructures, make enhancements in the architecture of existing monitoring tools a necessity. This paper describes the present architecture of SAM, an enhanced and distributed model for monitoring WLCG services, and the required changes in SAM to adopt this new model inside the EGEE-III project.
        Speaker: Mr david collados (CERN)
        Slides
      • 474
        Monitoring and operational management in USLHCNet
        USLHCNet provides transatlantic connections of the Tier1 computing facilities at Fermilab and Brookhaven with the Tier0 and Tier1 facilities at CERN as well as Tier1s elsewhere in Europe and Asia. Together with ESnet, Internet2 and the GEANT, USLHCNet also supports connections between the Tier2 centers. The USLHCNet core infrastructure is using the Ciena Core Director devices that provide time-division multiplexing and packet-forwarding protocols that support virtual circuits with bandwidth guarantees. The virtual circuits offer the functionality to develop efficient data transfer services with support for QoS and priorities. In this paper we present the distributed service used for monitoring and operational management for the dynamic circuits in the entire USLHCNet network. This distributed service system provides in near realtime complete topological information for all the circuits, resource allocation and usage, accounting, detects automatically failures in the links and network equipment, generate alarms and has the functionality to take automatic actions. The system is developed based on the MonALISA framework, which provides a robust monitoring and controlling service oriented architecture, with no single points of failure.
        Speaker: Ramiro Voicu (California Institute of Technology)
        Slides
      • 475
        RSV: OSG Grid Fabric Monitoring and Interoperation with WLCG Monitoring Systems
        The Open Science Grid (OSG) Resource and Service Validation (RSV) project seeks to provide solutions for several grid fabric monitoring problems, while at the same time providing a bridge between the OSG operations and monitoring infrastructure and the WLCG (Worldwid LHC Computing Grid) infrastructure. The RSV-based OSG fabric monitoring begins with local resource fabric monitoring, which gives local administrators tools to monitor their status on the OSG without leaving their local monitoring infrastructure. With a set of local grid status probes, the results of which are uploaded to a central collector, a system administrator can monitor and watch for issues in house, while the OSG Operations Center (GOC) can watch from a centralized position. Plugins to relay RSV results to other popular fabric monitoring software (Nagios) allow system administrators flexibility to stay aware of their grid status using their chosen status display interface. Additional probes are easily developed and plugged into the RSV structure, and an emphasis is placed on the community to develop additional probes that fit the needs of different categories of users (VO, User, Software Developer) as needed. From the GOC, resuls are transmitted to a WLCG message broker via a specified format which can then translate these records into critical statistics to the LHC collaborating projects. RSV has succeeded in meeting these initial goals, future development is centered around usability and extending the project's scope and functionality.
        Speaker: Robert Quick (Indiana University)
        Slides
      • 476
        The impact and adoption of GLUE 2.0 in the LCG/EGEE production Grid
        The GLUE information schema has been in use in the LCG/EGEE production Grid since the first version was defined in 2002. In 2007 a major redesign of GLUE, version 2.0, was started in the context of the Open Grid Forum following the creation of the GLUE Working Group. This process has taken input from a number of Grid projects, but as a major user of the version 1 schema LCG/EGEE has had a strong interest that the new schema should support its needs. In this paper we discuss the structure of the new schema in the light of the LCG/EGEE requirements and explain how they are met, and where improvements have been achieved compared with the version 1 schema. In particular we consider some difficulties encountered in recent extensions of the use of the version 1 schema to aid resource accounting in LCG, to enable the use of the SRM version 2 storage protocol by the LHC experiments, and to publish information about a wider range of services to improve service discovery. We describe how these can be better met by the new schema, and we also discuss the way in which the transition to the new schema is being managed.
        Speaker: Dr Stephen Burke (RUTHERFORD APPLETON LABORATORY)
        Slides
    • Software Components, Tools and Databases: Thursday Club A

      Club A

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Pere Mato (CERN)
      • 477
        CernVM - a Virtual Software Appliance for LHC applications
        CernVM is a Virtual Software Appliance to run physics applications from the LHC experiments at CERN. The virtual appliance provides a complete, portable and easy to install and configure user environment for developing and running LHC data analysis on any end-user computer (laptop, desktop) and on the Grid independently of operating system software and hardware platform (Linux, Windows, MacOS). The aim is to facilitate the installation of the experiment software on an user computer and minimize the number of platforms (compiler-OS combinations) on which experiment software needs to be supported and tested thus reducing the overall cost of software maintenance for LHC. CernVM Operating System, based on rPath Linux, fits into a compressed file smaller than 100 MB and represents a common platform that can host software frameworks of all four LHC experiments. The experiment software stack is brought into appliance by means of CVMFS, a file system specifically designed for an efficient and ‘just in time’ software distribution. In this model, the client downloads only necessary binaries and libraries as they get referenced for the first time. By doing that, the amount of software that has to be downloaded in order to run the typical experiment tasks in the Virtual Machine is reduced by an order for magnitude. In this contribution we describe the architecture and implementation of CernVM and CVMFS as well as plans to evolve CVMFS into content delivery network using combination of P2P and HTTP protocols. The CernVM project, which has started at the beginning of this year is funded for period of four years under the recently approved R&D program at CERN.
        Speaker: Predrag Buncic (CERN)
        Slides
      • 478
        A comparison between xen and kvm
        Virtualization is a proven software technology that is rapidly transforming the IT landscape and fundamentally changing the way that people compute. Recently all major software producers (e.g. Microsoft and RedHat) developed or acquired virtualization technologies. Our institute is a Tier1 for LHC experiments and is experiencing lots of benefits from virtualization technologies, like improving fault tolerance, providing efficient hardware resource usage and increasing security. Currently the virtualization solution adopted is xen, which is well supported by the Scientific Linux distribution, widely adopted by the HEP community. Since the HEP linux distribution is based on RedHat ES, we feel the need to investigate performances and usability differences with the new kvm technology recently acquired by RedHat. The case study of this work will be the LHCb experiment Tier2 site hosted at our institute, where all major grid elements run on xen virtual machines smoothly. We will investigate the impact on performance and stability that a migration to kvm would entail on the Tier2 site, as well as the effort required by a system administrator to deploy the migration.
        Speaker: Dr Andrea Chierici (INFN-CNAF)
        Slides
      • 479
        Virtual Machine Logbook (VML) - Enabling Virtualization for ATLAS
        ATLAS software has been developed mostly on CERN linux cluster lxplus[1] or on similar facilities at the experiment Tier 1 centers. The fast rise of virtualization technology has the potential to change this model, turning every laptop or desktop into an ATLAS analysis platform. In the context of the CernVM project[2] we are developing a suite of tools and CernVM plug-in extensions to promote the use of virtualization for ATLAS analysis and software development. The Virtual Machine Logbook (VML), in particular, is an application to organize physicists¹ work on multiple projects, logging their progress, and speeding up "context switches" from one project to another. An important feature of VML is the ability to share with a single "click" the status of a given project with other colleagues. VML builds upon the save and restore capabilities of mainstream virtualization software like VMware, and provides a technology-independent client interface to them. A lot of emphasis in the design and implementation has gone into optimizing the save and restore process to makepractical to store many VML entries on a typical laptop disk or to share a VML entry over the network. At the same time, taking advantage of CernVM's plugin capabilities, we are extending the CernVM platform to help increase the usability of ATLAS software. For example, we added the ability to start the ATLAS event display on any computer running CernVM simply by clicking a button in a web browser. We want to integrate seamlessly VML with CernVM unique file system design to distribute efficiently ATLAS software on every physicist computer. The CernVM File System (CVMFS) download files on-demand via HTTP, and cache it locally for future use. This reduces by one order of magnitude the download sizes, making practical for a developer to work with multiple software releases on a virtual machine.
        Speaker: Dr Yushu Yao (LBNL)
        Slides
      • 480
        GPU's for event reconstruction in the FairRoot Framework
        FairRoot is the simulation and analysis framework used by CBM and PANDA experiments at FAIR/GSI. The use of GPU's for event reconstruction in FairRoot will be presented. The fact that CUDA (Nvidia's Compute Unified Device Architecture) development tools work alongside the conventional C/C++ compiler, makes it possible to mix GPU code with general-purpose code for the host CPU, based on this some of the reconstruction tasks can be send to the graphic cards. Moreover, tasks that run on the GPU's can also run in emulation mode on the host CPU, which has the advantage that the same code is used on both CPU and GPU.
        Speaker: Dr Mohammad Al-Turany (GSI DARMSTADT)
        Slides
      • 481
        Monitoring the CDF analysis farm (CAF)
        We will present the monitoring system for the analysis farm of the CDF experiment at the Tevatron (CAF). All monitoring data is collected in a relational database (PostgreSQL), with SQL providing a common interface to the monitoring data. The display of these monitoring data is done with a Web Application in form of Java Server pages served by the Apache Tomcat server. For the database access we make use of the JSTL tag libraries and to embed complex charts of all kind we use the cewolf tag library based on JFreeChart. The framework provides both static views that are updated periodically as well as interactive views for authorized users.
        Speakers: Dr Hans wenzel (Fermilab), Dr Marian Zvada (Fermilab)
        Slides
      • 482
        Servicing HEP experiments with a complete set of ready integrated and configured common software components
        The LCG Applications Area at CERN provides basic software components for the LHC experiments such as ROOT, POOL, COOL which are developed in house and also a set of "external" software packages (~ 70) which are needed in addition such as Python, Boost, Qt, CLHEP, etc. These packages target many different areas of HEP computing such as data persistency, math, simulation, grid computing, databases, graphics, etc. Other packages provide tools for documentation, debugging, scripting languages and compilers. All these packages are provided in a consistent manner on different compilers, architectures and operating systems. The Software Process and Infrastructure project (SPI) is responsible for the continous testing, coordination, release and deployment of these software packages. The main driving force for the actions carried out by SPI are the needs of the LHC experiments, but also other HEP experiments could profit from the set of consistent libraries provided and receive a stable and well tested foundation to build their experiment software frameworks. This presentation will first provide a brief description of the tools and services provided for the coordination, testing, release, deployment and presentation of LCG/AA software packages and then focus on a second set of tools provided for outside LHC experiments to deploy a stable set of HEP related software packages both as binary distribution or from source.
        Speaker: Dr Stefan Roiser (CERN)
    • Software Components, Tools and Databases: Thursday Club

      Club

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 483
        The ALICE Offline Framework – Status and perspectives
        Since 1998 ALICE is developing the AliRoot framework for Offline computing. This talk will critically review the development and present status of the framework. The current functionality for simulation, reconstruction, alignment, calibration and analysis will be described and commented. The integration with the Grid and the Proof systems will be described and discussed. The talk will also concentrate on the tools and procedures to maintain, test and distribute the code and libraries. The talk will review the development of the framework and comment on the positive and negative aspects. The talk will conclude with the future perspectives of the framework, particularly as far as multicores are concenred.
        Speaker: Peter Hristov (CERN)
        Slides
      • 484
        Computing activities for the Panda experiment at FAIR
        The Panda experiment at the future facility FAIR will provide valuable data for our present understanding of the strong interaction. In preparation for the experiments, large-scale simulations for design and feasibility studies are performed exploiting a new software framework, Fair/PandaROOT, which is based on ROOT and the Virtual Monte Carlo (VMC) interface. In this paper, the various novel algorithms and methods for track reconstruction and visualization, and for higher-level data analysis are presented. Furthermore, a status report and future plans for a high-performance computing environment for Panda will be discussed exploiting an AliEN-based GRID infrastructure and R&D on various parallelization techniques.
        Speaker: Dr Johan Messchendorp (for the PANDA collaboration) (University of Groningen)
        Slides
      • 485
        Design and performance evaluations of generic programming techniques in a R&D prototype of Geant4 physics
        Geant4 is nowadays a mature Monte Carlo system; new functionality has been extensively added to the toolkit since its first public release in 1998, nevertheless, its architectural design and software technology features have remained substantially unchanged since their original conception in the RD44 phase of the mid ‘90s. A R&D project has been recently launched at INFN to revisit Geant4 architectural design in view of addressing new experimental issues in HEP and other related physics disciplines, as well as existing concerns in the present Geant4 technological environment. One of the items in this project investigates the use of generic programming techniques besides the conventional object oriented methods currently used in Geant4 to address some significant issues emerged in the first 10 years’ experience of Geant4 experimental applications. Some of the topics addressed are the customization of physics modeling in a simulation application, scattered and tangled concerns across the code, computational performance, intrinsic testing capabilities embedded in physics objects, automated quality assurance features, data handling optimization etc.. Software design features and preliminary results from a new prototype implementation of Geant4 electromagnetic physics and physics data handling packages exploiting generic programming techniques are illustrated. Performance evaluations concerning the use of dynamic or static polymorphism in concrete simulation use cases are presented. Programming solutions addressing quality assurance issues in Geant4 physics modeling are discussed.
        Speaker: Dr Maria Grazia Pia (INFN GENOVA)
        Slides
      • 486
        Software Validation Infrastructure for the ATLAS Trigger
        The ATLAS trigger system is responsible for selecting the interesting collision events delivered by the Large Hadron Collider (LHC). The ATLAS trigger will need to achieve a ~10-7 rejection factor against random proton-proton collisions, and still be able to efficiently select interesting events. After a first processing level based on hardware, the final event selection is based on custom software running on two CPU farms, containing around two thousand multi-core machines. This is known as the high-level trigger. Running the trigger online during long periods demands very high quality software. It must be fast, performant, and essentially bug-free. With more than 100 contributors and around 250 different packages, a thorough validation of the HLT software is essential. This relies on a variety of unit and integration tests as well as on software metrics, and uses both in-house and open source software. This paper describes the existing infrastructure used for validating the high-level trigger software, as well as plans for its future development.
        Speaker: Wolfgang Ehrenfeld (DESY)
        Slides
      • 487
        The ATLAS RunTimeTester software
        The ATLAS experiment's RunTimeTester (RTT) is a software testing framework into which software package developers can plug their tests, have them run automatically, and obtain feedback via email and the web. The RTT processes the ATLAS nightly build releases, using acron to launch runs on a dedicated cluster at CERN, and submitting user jobs to private LSF batch queues. Running higher statistic tests, up to 24 hours long, it is thus complementary to ATLAS' ATN framework which feeds back rapidly on few event tests run directly on ATLAS build machines. We will examine the various components of the RTT system, discuss how developers interact with the RTT and what it offers over and above developer stand-alone testing. A description will be given of the hardware and software environment in which the RTT runs. Scaling issues arising from increased developer usage will be detailed, as well as the adopted solutions. Finally, we provide an overview of future RTT development.
        Speaker: Dr Brinick Simmons (Department of Physics and Astronomy - University College London)
        Slides
    • 16:00
      coffee break, exhibits and posters
    • Distributed Processing and Analysis: Thursday Club C

      Club C

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Fabrizio Furano (CERN)
      • 488
        Monitoring the efficiency of user jobs
        Instrumentation of jobs throughout its lifecycle is not obvious, as they are quite independent after being submitted, crossing multiple environments and locations until landing on a worker node. In order to measure correctly the resources used at each step, and to compare it with the view from a Fabric Infrastructure, we propose a solution using Messaging System for the Grids (MSG) for integrating information coming from different sources.
        Speaker: Ulrich Schwickerath (CERN)
        Slides
      • 489
        Distributed Monte Carlo Production for DZero
        DZero uses a variety of resources on four continents to pursue a strategy of flexibility and automation in the generation of simulation data. This strategy provides a resilient and opportunistic system which ensures an adequate and timely supply of simulation data to support DZero's physics analyses. A mixture of facilities, dedicated and opportunistic, specialized and generic, large and small, grid job enabled and not, are used to provide a production system that has adapted to newly developing technologies. This strategy has increased the event production rate by a factor of seven and the data production rate by a factor of ten in the last three years despite diminishing manpower. Common to all production facilities is the SAM (Sequential Access to Metadata) data-grid. Job submission to the grid uses SAMGrid middleware which may forward jobs to the OSG, the WLCG, or native SAMGrid sites. The distributed computing and data handling system used by DZero will be described and the results of MC production since the deployment of grid technologies will be presented.
        Speaker: Prof. joel snow (Langston University)
        Slides
      • 490
        Automatization of User Analysis Workflow in CMS
        CMS has a distributed computing model, based on a hierarchy of tiered regional computing centres. However, the end physicist is not interested in the details of the computing model nor the complexity of the underlying infrastructure, but only to access and use efficiently and easily the remote services. The CMS Remote Analysis Builder (CRAB) is the official CMS tool that allows the access to the distributed data in a transparent way. We present the current development direction, which is focused on improving the interface presented to the user and adding intelligence to CRAB such that it can automatize more and more the work done on behalf of user. We also present the status of deployment of the CRAB system and the lessons learnt in deploying this tool to the CMS collaboration.
        Speaker: Daniele Spiga (Universita degli Studi di Perugia & CERN)
        Slides
      • 491
        Deployment of Job Priority mechanisms in the Italian cloud of the ATLAS experiment
        An optimized use of the grid computing resources in the ATLAS experiment requires the enforcement of a mechanism of job priorities and of resource sharing among the different activities inside the ATLAS VO. This mechanism has been implemented through the VOViews publication in the information system and the fair share implementation per UNIX group in the batch system. The VOView concept consists of publishing resource information, such as running and waiting jobs, as a function of VO groups and roles. The ATLAS Italian Cloud is composed of the CNAF Tier1 and Roma Tier2, with farms based on the LSF batch system, and the Tier2s of Frascati, Milano and Napoli based on PBS/Torque. In this paper we describe how test and deployment of the job priorities has been performed in the cloud, where the VOMS-based regional group /atlas/it has been created. We show that the VOviews are published and correctly managed by the WMS and that the resources allocated to generic VO users, users with production role and users of the /atlas/it group correspond to the defined share.
        Speaker: Dr Alessandra Doria (INFN Napoli)
        Slides
      • 492
        PROOF on Demand
        “PROOF on demand” is a set of utilities, that allows to start a PROOF cluster at user request, on a batch farm or on the Grid. It provides a plug-in based system, which allows to use different job submission frontends, such as LSF or gLite WMS. Main components of “PROOF on demand” are the PROOFAgent and the PAConsole. PROOFAgent provides the communication layer between the xrootd redirector/PROOF master on the client machine and the PROOF workers on the batch or Grid machines, possibly behind a firewall. PAConsole provides a user-friendly GUI, and also allows to easily manage PROOF worker job submissions to different systems, which can later function as one uniform cluster. Installation is simple and doesn't require administrator rights, and all the processes run only in user space. “PROOF on demand” gives users, who don't have a static PROOF cluster at their institute, the possibility to enjoy the full power of interactive analysis with PROOF.
        Speaker: Mr Anar Manafov (GSI)
        Slides
      • 493
        Challenges for the CMS Computing Model in the First Year
        CMS is the the process of commissioning a complex detector and a globally distributed computing model simultaneously. The represents a unique challenge for the current generation of experiments. Even at the beginning there is not sufficient analysis or organized processing resources at CERN alone. In this presentation we will discuss the unique computing challenges CMS expects to face during the first year of running and how those influence the baseline computing model decisions. During the early accelerator commissioning periods, CMS will attempt to collect as many events as possible when the beam is on to provide adequate early commissioning data. Some of these plans involve overdriving the Tier-0 infrastructure during data collection with recovery when the beam is off. In addition to the larger number of triggered events, there will be pressure in the first year to collect and analysis more complete data formats as the summarized formats mature. The large event formats impact the required storage, bandwidth, and processing capacity across all the computing centers. While the understanding of the detector and the event selections is being improved, there will likely be a larger number of reconstruction passes and skims performed by both central operations and individual users. We will discuss how these additional stresses impact the allocation of resources and the changes from the baseline computing model. We will also present the results of the commissioning tests to ensure the system can gracefully handle the additional requirements.
        Speaker: Ian Fisk (Fermi National Accelerator Laboratory (FNAL))
        Slides
    • Event Processing: Thursday Club E

      Club E

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Szymon Gadomski (University of Geneva)
      • 494
        CMD-3 Detector Offline Software Development
        CMD-3 is the general purpose cryogenic magnetic detector for VEPP-2000 electron-positron collider, which is being commissioned at Budker Institute of Nuclear Physics (BINP, Novosibirsk, Russia). The main aspects of physical program of the experiment are precision measurements of hadronic cross sections, study of known and search for new vector mesons, study of the ppbar a nnbar production cross sections in the vicinity of the threshold and search for exotic hadrons in the region of center of mass energy below 2 GeV. The essential upgrade of CMD-2 detector (designed for VEPP-2M collider at BINP) computing farm and data storage system is being carried out to satisfy the new detector needs. In this talk I will present the general design overview and a status of implementation of CMD-3 offline software for reconstruction, simulation, visualization and storage management. Software design standards for this project are object oriented programming techniques, C++ as a main language, Geant4 as an only simulation tool, Geant4 based detector geometry description, CLHEP library based primary generators, ROOT toolbox as a persistency manager and Scientific Linux as a main platform. The dedicated software development framework (Cmd3Fwk) was implemented in order to be the basic software integration solution and a high level persistency manager. The key features of the framework are modularity, dynamic data processing chain handling according to the XML configuration of reconstruction modules and on-demand data provisioning mechanisms.
        Speaker: Mr Alexander Zaytsev (Budker Institute of Nuclear Physics (BINP))
        Paper
        Slides
      • 495
        Overview of the new ROOT statistical software
        ROOT, a data analysis framework, provides advanced mathematical and statistical methods needed by the LHC experiments for analyzing their data. In addition, the ROOT distribution include packages such as TMVA, which provides advanced multivariate analysis tools for both classification and regression, and RooFit for performing data modeling and complex fitting. Recently a large effort is being put in improving these tools to make them suitable for LHC data analysis, by improving both their quality and performances. Algorithms like the minimization one, have been parallelized for a multi-thread or a multiple node environment. A set of new high level statistical software tools, RooStats, designed for establishing signal significance, estimating confidence level and for analysis combination, is being developed in close collaboration with the LHC experiments. The final goal is to provide a set of common standard implementations of statistical methods required by the experiments for the analysis of the LHC data. We present an overview of all these statistical methods emphasizing the recent developments which have been introduced in the latest ROOT release or they are planned for being released this year. Examples of these new developments are the new fitting classes of the core ROOT math library, multivariate regression analysis in TMVA, the ability to share and to make persistence the fitting model in RooFit and the new RooStats classes implementing the tools for calculating confidence intervals and performing hypothesis testing.
        Speaker: Lorenzo Moneta (on behalf of the ROOT, TMVA, RooFit and RooStats teams)
        Slides
      • 496
        BAT - The Bayesian Analysis Toolkit
        The main goals of a typical data anaysis are to compare model predictions with data, to draw conclusions on the validity of the model as a representation of the data, and to extract the possible values of parameters within the context of a model. The Bayesian Analysis Toolkit, BAT, is a tool developed to evaluate the posterior probability distribution for models and their parameters. It is based on Bayes' Theorem and is realized with the use of Markov Chain Monte Carlo. This gives access to the full posterior probability distribution and enables straightforward parameter estimation, limit setting and uncertainty propagation. The BAT is implemented in C++ and allows for flexible definition of mathematical models and applications. It provides a set of algorithms for numerical integration, optimization and error propagation. Predefined models exist for standard cases. In addition, methods to judge the ``goodness--of--fit'' of a model are implemented. An inferface to ROOT allows for further analysis and graphical display of results. BAT has been developed primarily in the context of data analysis for particle physics experiments. The applications so far range from the extraction of structure functions in ZEUS and the calculation of the sensitivity of GERDA to double beta-decay, to kinematic fitting of top-quark events in ATLAS. Applications in cosmology are also being investigated.
        Speaker: Daniel Kollár (CERN)
        Slides
      • 497
        The Muon High Level Trigger of the ATLAS experiment
        The ATLAS experiment CERN's Large Hadron Collider has been projected and realized for new discoveries in High Energy Physics as well as for precision measurements of Standard Model parameters. To satisfy the limited data acquisition capability, at the LHC project luminosity, the ATLAS trigger system will have to select a very small rate of physically interesting events (~200 Hz) among about 40 million events per second. In the case of events containing muons, as described in this work, the first hardware-based level (LVL1) starts from measurements of the Muon Spectrometer trigger chambers to select Regions of Interest (RoI) where muons produce significant activity. Such RoIs are used as seeds for the two subsequent trigger levels (LVL2 and Event Filter), running on dedicated online farms, which constitute the High Level Trigger (HLT). This seeding strategy is crucial to drastically reduce the total processing time. Within the Muon HLT, few algorithms are implemented in different steps according to predefined sequences of Feature Extraction (FEX) and Hypothesis (HYPO) algorithms, whose goal is to validate the previously selected muon objects. The ATLAS muon trigger system, thanks to its particular design and to the peculiar structure of the Muon Spectrometer, is able to provide muon stand-alone event trigger decisions, that can be furtherly refined by exploiting the muon information coming from the other ATLAS subdetectors. Muon HLT algorithms are described here in terms of working functionality and performance (memory leaks, data volume, code testing and validation) both on simulated and real data, including non-standard trigger configurations (like cosmic data and LHC start-up scenarios).
        Speaker: Andrea Ventura (INFN Lecce, Universita' degli Studi del Salento, Dipartimento di Fisica, Lecce)
        Slides
      • 498
        ATLAS Tau Trigger: from design challenge to first tests with cosmics
        The ATLAS tau trigger is a challenging component of the online event selection, as it has to apply a rejection of 10^6 in a very short time with a typical signal efficiency of 80%. Whilst in the first hardware level narrow calorimeter jets are selected, in the second and third software levels candidates are refined on base of simple but fast (second level) and slow but accurate (third level) algorithms. In these two levels, the data from various subdetectors are analysed, however the overall data volume transported through the system (both input subdetector data and output trigger output) has to be minimised. The requirements of the tau trigger together with measured performance during ATLAS cosmics run will be presented. Triggering on tau leptons is a particularly challenging task, as the signature characteristics are not much different from the overwhelming QCD background. Advanced multi-variate optimisation techniques help to find cut based criteria, which are suitable for usage at trigger level. However, the procedure will be repeated on data. First steps in this direction done in the commissioning of the first and second levels are discussed as well as preparations for fast commissioning with first LHC data.
        Speaker: Mogens Dam (Niels Bohr Institute)
        Slides
      • 499
        The CMS L1 Trigger Emulation Software
        The CMS L1 Trigger processes the muon and calorimeter detector data using a complex system of custom hardware processors. A bit-level emulation of the trigger data processing has been developed. This is used to validate and monitor the trigger hardware, to simulate the trigger response in monte-carlo data, and for some components, to seed higher-level triggers. The multiple-use cases are managed using a modular design, implemented within the modular CMS offline software framework. The requirements, design and performance of the emulators are described, as well as the iterative process required to bring the emulators and hardware into agreement.
        Speaker: Vasile Mihai Ghete (Institut fuer Hochenergiephysik (HEPHY))
        Slides
    • Grid Middleware and Networking Technologies: Thursday Club B

      Club B

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 500
        Grid Interoperation with ARC middleware for CMS experiment
        The Compact Muon Solenoid (CMS) is one of the LHC (Large Hadron Collider) experiments at CERN. CMS computing relies on different grid infrastructures to provide calculation and storage resources. The major grid middleware stacks used for CMS computing are gLite, OSG and ARC (Advanced Resource Connector). Helsinki Institute of Physics (HIP) builds one of the Tier-2 centers for CMS computing. CMS Tier-2 centers operate software systems for data transfers (PhEDEx), Monte Carlo production (ProdAgent) and data analysis (CRAB). In order to provide the Tier-2 services for CMS, HIP uses tools and components from both ARC and gLite grid middleware stacks. Interoperation between grid systems is a challenging problem and HIP uses two different solutions to provide the needed services. The first solution is based on gLite-ARC grid level interoperability. This allows to use ARC resources in e.g. CMS data analysis without modifying the CMS data analysis software. The second solution is based on developing specific plugins for ARC in CMS software.
        Speaker: Dr Jukka Klem (Helsinki Institute of Physics HIP)
        Slides
      • 501
        ARC middleware: evolution towards standards-based interoperability
        The Advanced Resource Connector (ARC) middleware introduced by NorduGrid is one of the leading Grid solutions used by scientists worldwide. Its simplicity, reliability and portability, matched by unparalleled efficiency, make it attractive for large-scale facilities like the Nordic DataGrid Facility (NDGF) and its Tier1 center, and also for smaller scale projects. Being well-proven in daily production use by a wide variety of sciences, ARC of today is still largely based on conventional Grid technologies introduced by Globus a decade ago. In order to guarantee sustainability, true cross-system portability, interoperability and standards-compliance, ARC community undertakes a massive effort of introducing Web Service based components into the middleware. With support from the EU KnowARC project, key ARC services are re-implemented in a service-oriented architecture, based on modern Grid standards. Such services include the resource-coupled execution service, the self-healing storage service and the peer-to-peer information system, to name a few. Gradual introduction of these new services and client tools into production middleware releases is carried out together with NDGF and thus ensures a smooth transition to the next generation Grid middleware. Standard interfaces and modularity of the new components design are essential for ARC contributions to the planned Universal Middleware Distribution by EGI. This talk will outline design principles of the new ARC services, update on the status of the middleware development, present the gradual transition process and lay out strategies for future ARC development and deployment.
        Speaker: Dr Oxana Smirnova (Lund University / NDGF)
        Slides
      • 502
        An XACML profile and implementation for Authorization Interoperability between OSG and EGEE
        The Open Science Grid (OSG) and the Enabling Grids for E-sciencE (EGEE) have a common security model, based on Public Key Infrastructure. Grid resources grant access to users because of their membership in a Virtual Organization (VO), rather than on personal identity. Users push VO membership information to resources in the form of identity attributes, thus declaring that resources will be consumed on behalf of a specific group inside the organizational structure of the VO. Resources contact an access policies repository, centralized at each site, to grant the appropriate privileges for that VO group. Despite the commonality of the model, OSG and EGEE use different protocols for the communication between resources and the policy repositories. Middleware developed for one Grid could not naturally be deployed on the other Grid, since the authorization module of the middleware would have to be enhanced to support the other Grid's communication protocol. In addition, maintenance and support for different authorization call-out protocols represents a duplication of effort for our relatively small community. To address these issues, OSG and EGEE initiated a joint project on Authorization Interoperability. The project defined a common communication protocol and attribute identity profile for authorization call-out and provided implementation and integration with major Grid middleware. The activity had resonance with middleware development communities, such as the Globus Toolkit and Condor, who decided to join the collaboration and contribute requirements and software. In this paper, we discuss the main elements of the profile, its implementation, and deployment in EGEE and OSG.
        Speaker: Gabriele Garzoglio (FERMI NATIONAL ACCELERATOR LABORATORY)
        Slides
      • 503
        Deploying distributed network monitoring mesh for LHC Tier-1 and Tier-2 sites
        Fermilab hosts the US Tier-1 center for data storage and analysis of the Large Hadron Collider's (LHC) Compact Muon Solenoid (CMS) experi ment. To satisfy operational requirements for the LHC networking model, the networking group at Fermilab, in collaboration with Internet2 and ESnet, is participating in the perfSONAR-PS project. This collaboration has created a collection of network monitoring services targ eted at providing continuous network performance measurements across wide-area distributed computing environments. The perfSONAR-PS serv ices are packaged as a bundle, and include a bootable disk capability. We have started on a deployment plan consisting of a decentralized mesh of these network monitoring services at US LHC Tier-1 and Tier-2 sites. The initial deployment will cover all Tier-1 and Tier2 site s of US ATLAS and US CMS. This paper will outline the basic architecture of each network monitoring service. Service discovery model, int eroperability, and basic protocols will be presented. The principal deployment model and available packaging options will be detailed. Th e current state of deployment and availability of higher level user interfaces and analysis tools will be also be demonstrated.
        Speaker: Mr Maxim Grigoriev (FERMILAB)
        Paper
        Slides
      • 504
        WAN Dynamic Circuit Support at Fermilab
        Fermilab has been one of the earliest sites to deploy data circuits in production for wide-area high impact data movement. The US-CMS Tier-1 Center at Fermilab uses end-to-end (E2E) circuits to support data movement with the Tier-0 Center at CERN, as well as with all of the US-CMS Tier-2 sites. On average, 75% of the network traffic into and out of the Laboratory is carried on E2E circuits. These circuits can provide traffic isolation, and in many cases, guaranteed bandwidth levels. While circuit technologies and services are emerging on a number of fronts, of particular interest is the evolution of dynamic circuit support. The capability to establish a circuit when needed, and tear it down when that need has been satisfied, offers an obvious attraction for large-scale data movement of an irregular or bursty nature, such as in high energy physics. However, E2E circuit support comes at a cost, involving significant higher complexity and added support effort. This presentation will discuss Fermilab’s experiences with deploying and supporting E2E circuits, with an emphasis on dynamic circuits. The talk will cover the current state of dynamic circuit services within the research and education community, issues with monitoring E2E circuits, and difficulties with troubleshooting in a circuit environment. We will speculate on the future evolution of circuit services, discussing where the problems lie, and what needs to happen for wider deployment to occur.
        Speaker: Mr Philip DeMar (FERMILAB)
    • Grid Middleware and Networking Technologies: Thursday Panorama

      Panorama

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 505
        The CREAM-CE: First experiences, results and requirements of the 4 LHC experiments
        In a few months, the four LHC detectors will collect data at a significant rate that is expected to ramp-up to around 15PB per year. To process such a large quantity of data, the experiments have developed over the last years distributed computing models that build on the overall WLCG service. These implement the different services provided by the gLite middleware into the computing models of the experiments. From the point of view of the middleware, the current LCG-CE used by the 4 LHC experiments is about to be deprecated. The new CREAM service (Computing Resource Execution And Management) has been approved to replace the previous service. CREAM is a lightweight service created to handle job management operations at the CE level. It is able to accept requests both via the gLite WMS service and also via direct submission for transmission to the local batch system. This flexible duality provides the experiments with a large level of freedom to adapt the service to their own computing models, but at the same time it requires a careful follow up of the requirements and tests of the experiments to ensure that their needs are fulfilled before real data taking. In this talk we present the current testing results of the four LHC experiments concerning this new service. The experiments requirements and the expectations for both the sites and the service itself are exposed in detail. Finally, the operations procedures, which have been elaborated together with the experiment support teams will be included in this presentation
        Speakers: Dr Alessandro di Girolamo (CERN IT/GS), Dr Andrea Sciaba (CERN IT/GS), Dr Elisa Lanciotti (CERN IT/GS), Dr Nicolo Magini (CERN IT/GS), Dr Patricia Mendez Lorenzo (CERN IT/GS), Dr Roberto Santinelli (CERN IT/GS), Dr Simone Campana (CERN IT/GS), Dr Vincenzo Miccio (CERN IT/GS)
        Slides
      • 506
        Use of the gLite-WMS in CMS for production and analysis
        The CMS experiment at LHC started using the Resource Broker (by the EDG and LCG projects) to submit production and analysis jobs to distributed computing resources of the WLCG infrastructure over 6 years ago. In 2006 it started using the gLite Workload Management System (WMS) and Logging & Bookkeeping (LB). In current configuration the interaction with the gLite-WMS/LB happens through the CMS production and analysis frameworks, respectively ProdAgent and CRAB, through a common component, BOSSLite. The important improvements recently made in the gLite-WMS/LB as well as in the CMS tools and the intrinsic independence of different WMS/LB instances allow CMS to reach the stability and scalability needed for LHC operations. In particular the use of a multi-threaded approach in BOSSLite allowed to increase the scalability of the systems significantly. In this work we present the operational set up of CMS production and analysis based on the gLite-WMS and the performances obtained in the past data challenges and in the normal daily operations of the experiment.
        Speaker: Giuseppe Codispoti (Dipartimento di Fisica)
        Slides
      • 507
        Using CREAM and CEMON for job submission and management in the gLitemiddleware
        In this paper we describe the use of CREAM and CEMON for job submission and management within the gLite Grid middleware. Both CREAM and CEMON address one of the most fundamental operations of a Grid middleware, that is job submission and management. Specifically, CREAM is a job management service used for submitting, managing and monitoring computational jobs. CEMON is an event notification framework, which can be coupled with CREAM to provide the users with asynchronous job status change notifications. Both components have been integrated with the gLite Workload Management System by means of ICE (Interface to CREAM Environment). These software components have been released for production in the EGEE Grid infrastructure and, for what concerns the CEMon service, also in the OSG Grid. In this paper we report about the current status of these services, the achieved results, and the issues that still have to be addressed.
        Speaker: Massimo Sgaravatto (INFN Padova)
        Slides
      • 508
        CDF GlideinWMS usage in Grid computing of High Energy Physics
        Many members of large science collaborations already have specialized grids available to advance their research in the need of getting more computing resources for data analysis. This has forced the Collider Detector at Fermilab (CDF) collaboration to move beyond the usage of dedicated resources and start exploiting Grid resources. Nowadays, CDF experiment is increasingly relying on glidein-based computing pools for data reconstruction. Especially, Monte Carlo production and user data analysis, serving over 400 users by central analysis farm middleware (CAF) on the top of Condor batch system and CDF Grid infrastructure. Condor is designed as distributed architecture and its glidein mechanism of pilot jobs is ideal for abstracting the Grid computing by making a virtual private computing pool. We would like to present the first production use of the generic pilot-based Workload Management System (glideinWMS), which is an implementation of the pilot mechanism based on the Condor distributed infrastructure. CDF Grid computing uses glideinWMS for its data reconstruction on the FNAL campus Grid, user analysis and Monte Carlo production across Open Science Grid (OSG). We review this computing model and setup used including CDF specific configuration within the glideinWMS system which provides powerful scalability and makes Grid computing working like in a local batch environment with ability to handle more than 10000 running jobs at a time.
        Speaker: Dr Marian Zvada (Fermilab)
        Slides
      • 509
        DIRAC3 - the new generation of the LHCb grid software
        DIRAC, the LHCb community Grid solution, was considerably reengineered in order to meet all the requirements for processing the data coming from the LHCb experiment. It is covering all the tasks starting with raw data transportation from the experiment area to the grid storage, data processing up to the final user analysis. The reengineered DIRAC3 version of the system includes a fully grid security compliant framework for building service oriented distributed systems; complete Pilot Job framework for creating efficient workload management systems; several subsystems to manage high level operations like data production and distribution management. The user interfaces of the DIRAC3 system providing rich command line and scripting tools are complemented by a full-featured Web portal providing users with a secure access to all the details of the system status and ongoing activities. We will present an overview of the DIRAC3 architecture, new innovative features and the achieved performance. Extending DIRAC3 to manage computing resources beyond the WLCG grid will be discussed. Experience with using DIRAC3 by other user communities than LHCb and in other application domains than High Energy Physics will be shown to demonstrate the general-purpose nature of the system.
        Speaker: Dr Andrei TSAREGORODTSEV (CNRS-IN2P3-CPPM, MARSEILLE)
        Slides
      • 510
        The ALICE Workload Management System: Status before the real data taking
        With the startup of LHC, the ALICE detector will collect data at a rate that, after two years, will reach 4PB per year. To process such a large quantity of data, ALICE has developed over ten years a distributed computing environment, called AliEn, integrated with the WLCG environment. The ALICE environment presents several original solutions, which have shown their viability in a number of large exercises of increasing complexity called ALICE Data Challenges. Also during the past Common Computing Readiness Challenge (CCRC’08), proposed during the WLCG workshop in 2007 for the four LHC experiments together, ALICE has run their Full Dress Rehearsals exercises collecting more than 70TB in few weeks and achieving a sustained outgoing data rate of 125MB/s for more than one week. Within the ALICE distributed computing environment, the AliEn Workload Management Structure (WMS) was created to submit to the WLCG infrastructure, and has played a crucial role to achieve the mentioned results. ALICE has more than 70 sites distributed all over the world and this WMS together with the operations management structure defined by the experiment has demonstrated a reliability and performance level ready to begin the data taking at the end of the year. In this talk we will focus on the description and the current status of the AliEn WMS, emphasizing the last functionalities that have been included to handle from a single entry point the different matchmaking services of WLCG (lcg-RB, gLite WMS) and also the future CREAM-CE; the latter has been extensively tested by the experiment during summer 2008. The talk will describe the ALiEn WMS structure and will expose the results achieved in 2008-2009, since the CCRC’08 exercise until the CREAM-CE testing phase
        Speakers: Dr Alina Grigoras (CERN PH/AIP), Dr Andreas Joachim Peters (CERN IT/DM), Dr Costin Grigoras (CERN PH/AIP), Dr Fabrizio Furano (CERN IT/GS), Dr Federico Carminati (CERN PH/AIP), Dr Latchezar Betev (CERN PH/AIP), Dr Pablo Saiz (CERN IT/GS), Dr Patricia Mendez Lorenzo (CERN IT/GS), Dr Predrag Buncic (CERN PH/SFT), Dr Stefano Bagnasco (INFN/Torino)
        Slides
    • Online Computing: Thursday Club D

      Club D

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic

      Sponsored by ACEOLE

      Convener: Clara Gaspar (CERN)
      • 511
        Commissioning the ALICE Experiment
        ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). A large bandwidth and flexible Data Acquisition System (DAQ) has been designed and deployed to collect sufficient statistics in the short running time available per year for heavy ion and to accommodate very different requirements originated from the 18 sub-detectors. This paper will present the large scale tests conducted to assess the standalone DAQ performances, its interfaces with the other online systems and the extensive commissioning performed in order to be fully prepared for physics data taking. It will review the experience accumulated since May 2007 during the standalone commissioning of the main detectors and the global cosmic runs and the lessons learned from this exposure on the “battle field”. It will also discuss the test protocol followed to integrate and validate each sub-detector with the online systems and it will conclude with the first results of the LHC injection tests and startup in September 2008. Several abstracts of the same conference present in more details some elements of the ALICE DAQ system.
        Speaker: Mr Pierre VANDE VYVRE (CERN)
        Slides
      • 512
        The CMS Online Cluster: IT for a Large Data Acquisition and Control Cluster
        The CMS online cluster consists of more than 2000 computers, mostly under Scientific Linux CERN, running the 10000 applications instances responsible for the data acquisition and experiment control on a 24/7 basis. The challenging dimension of the cluster constrained the design and implementation of the infrastructure: - The critical nature of the control applications demands a tight security and independence of external networks, including the CERN's network, while maintaining a high availability of the services; - The evolving nature of the acquisition applications requires an easy management and configuration infrastructure suitable for large scale installation and fast configuration turnaround: any failing computer can be replaced and fully configured automatically from scratch in less than 10 minutes; more than 1000 computers can be reinstalled concurrently in less than 60 minutes and the infrastructure is easily scalable to reduce the installation time and accommodate for more computers at the same time; - The large number of subsystems and users imposes dealing with heterogeneous systems and services; - In the next two years the cluster will increase its size more than 50% while the detector reaches its nominal capacity, which demands for easy scalability; In this paper we will revise the tools and solutions used to fulfill the aforementioned requirements and others coming from the scale of the cluster. Details will be given on the problems and solutions adopted, ranging from the implementation of the redundant and load balanced network services (DNS, DHCP, LDAP, Kerberos, file serving, proxys...) to the configuration and deployment infrastructure based on quattor.
        Speaker: Dr Jose Antonio Coarasa Perez (Department of Physics - Univ. of California at San Diego (UCSD) and CERN, Geneva, Switzerland)
        Slides
      • 513
        The ATLAS Level-1 Central Trigger System in Operation
        The ATLAS Level-1 Central Trigger (L1CT) electronics is a central part of ATLAS data-taking. It receives the 40 MHz bunch clock from the LHC machine and distributes it to all sub-detectors. It initiates the detector read-out by forming the Level-1 Accept decision, which is based on information from the calorimeter and muon trigger processors, plus a variety of additional trigger inputs from detectors in the forward regions. The L1CT also provides trigger-summary information to the data acquisition and the Level-2 trigger systems for use in higher levels of the selection process, in offline analysis, and for monitoring. In this paper we give an overview of the operational framework of the L1CT with particular emphasis on cross-system aspects. The software framework allows a consistent configuration with respect to the LHC machine, upstream and downstream trigger processors, and the data acquisition. Trigger and deadtime rates are monitored coherently on all stages of processing and are logged by the online computing system for physics analysis, data quality assurance and operational debugging. In addition, the synchronization of trigger inputs is watched based on bunch-by-bunch trigger information. Several software tools allow to efficiently display the relevant information in the control room in a way useful for shifters and experts. We present the overall performance during cosmic-ray data taking with the full ATLAS detector and the experience with first beam in the LHC.
        Speaker: Mr Thilo Pauly (CERN)
        Slides
      • 514
        Development of DAQ-Middleware
        DAQ-Middleware is a software framework of network-distributed DAQ system based on Robot Technology Middleware, which is an international standard of Object Management Group (OMG) in Robotics and developed by AIST. DAQ-Component is a software unit of DAQ Middleware. Basic components are already developed. For examples, Gatherer is a readout component, Logger is a logging component, Monitor is an analysis component and Dispatcher connects Gatherer to Logger/Monitor as the data path. DAQ operator component controls those components by using the control/status path. An important point is that those components are reusable as DAQ-Components because the control/status path and data path as well as XML-based system configuration and XML/HTTP-based system interface are well defined in DAQ-Middleware framework. DAQ-Middleware was adopted by experiments at J-PARC (Japan Proton Accelerator Research Complex) while the commissioning at the first beam had been successfully carried out. The functionality of DAQ-Middleware and the status of DAQ-Middleware at J-PARC will be presented.
        Speaker: Yoshiji Yasu (High Energy Accelerator Research Organization (KEK))
        Slides
      • 515
        The CMS ECAL Database services for detector control and monitoring
        The Electromagnetic Calorimeter (ECAL) of the CMS experiment at the LHC is made of about 75000 scintillating crystals. The detector properties must be continuously monitored in order to ensure the extreme stability and precision required by its design. This leads to a very large volume of non-event data to be accessed continuously by shifters, experts, automatic monitoring tasks, detector configuration for trigger and data acquisition systems and offline data reconstruction programs. This talk describes the measurements and calibrations taken for slow control, the data handling strategy and the workflow as well as the architecture of the configuration and conditions databases. An important component of the system is the so-called web based browser, a software tool used by shifters and experts to visualize the data on a web browser and to keep the detector under control.
        Speaker: Giovanni Organtini (Univ. + INFN Roma 1)
        Slides
      • 516
        First Level Event Selection Package of the CBM Experiment
        The CBM Collaboration builds a dedicated heavy-ion experiment to investigate the properties of highly compressed baryonic matter as it is produced in nucleus-nucleus collisions at the Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany. This requires the collection of a huge number of events which can only be obtained by very high reaction rates and long data taking periods. Reaction rates are up to 10 MHz (minimum bias) which corresponds to a beam intensity of 10^9 beam particles per second on a 1% interaction target. The rare signals are embedded in a large background of charged particles. A typical central Au+Au collision in the CBM experiment will produce up to 700 tracks in the inner tracker. Large track density together with presence of non-homogeneous magnetic field make reconstruction and selection of events complicated. A chain of reconstruction procedures is developed for the first level event selection. It includes a cellular automaton based track finder, Kalman filter based track and decay particle fitters, and a procedure for selection of rare physics channels, like open charm. The most time consuming algorithms are parallelized using the SIMD instruction set. Having high efficiency and speed, the package is successfully used in the CBM experiment for feasibility studies and detector optimization.
        Speaker: Dr Ivan Kisel (GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt)
        Slides
    • Software Components, Tools and Databases: Thursday Club A

      Club A

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Convener: Paolo Calafiura (LBNL, Berkeley)
      • 517
        LLVM-based C++ interpreter for ROOT
        ROOT is planning to replace a large part of its C++ interpreter CINT. The new implementation will be based on the LLVM compiler infrastructure. LLVM is developed among others by Apple, Adobe, the university of Illinois at Urbana-Champaign; it is open source. Once available, LLVM will offer an ISO compliant C++ parser, a bytecode generator and execution engine, a just-in-time-compiler, and several back-ends that will allow code to be converted into binaries on all major platforms. Compared to CINT we expect improvements in the interpreter's correctness, memory and CPU performance, and multithreading support. In this talk we will present the plans for this endeavor.
        Speaker: Axel Naumann (CERN)
        Slides
      • 518
        Organization, Management, and Documentation of ATLAS Offline Software Releases
        We update our CHEP06 presentation on the ATLAS experiment software infrastructure used to build, validate, distribute, and document the ATLAS offline software. The ATLAS collaboration's computational resources and software developers are distributed around the globe in more then 30 counties. The ATLAS offline code base is currently over 5 MSLOC in 10000+ C++ classes organized into about 1700 packages. More than 600 developers contribute code. Since our last report, we have developed a powerful, flexible system to request code versions to be included in software builds, made changes to our software building tools, created a system of nightly builds to validate significant code changes, improved the tools for distributing the code to our computational sites around, and made many advancements in our tools to document our code.
        Speaker: Fred Luehring (Indiana University)
        Slides
      • 519
        Software integration and development tools in the CMS experiment
        The offline software suite of the Compact Muon Solenoid (CMS) experiment must support the production and analysis activities across the distributed computing environment developed by the LHC experiments. This system relies on over 100 external software packages and includes the developments of hundreds of active developers. The applications of this software require consistent and rapid deployment of code releases, a stable code development platform, and effective tools to enable code development as well as production work across the facilities utilized by the experiment. We describe the model used for CMS offline release management and software development, and discuss how the continued growth in development has been facilitated. Recent work has resulted in significant improvements in these areas. We report on the concept and challenges, status, recent improvements and future plans of the CMS offline software development and release integration environment.
        Speaker: David Lange (LLNL)
        Slides
      • 520
        Development, validation and maintenance of Monte Carlo event generators and auxiliary packages in the LHC era
        The Generator Services project collaborates with the Monte Carlo generators authors and with the LHC experiments in order to prepare validated LCG compliant code for both the theoretical and the experimental communities at the LHC. On the one side it provides the technical support as far as the installation and the maintenance of the generators packages on the supported platforms is concerned and on the other side it participates in the physics validation of the generators. The libraries of the Monte Carlo generators maintained within this project are currently widely adopted by the LHC collaborations and are used in large scale productions. The existing testing and validation tools are regularly used and the additional ones are being developed, in particular for the new object-oriented generators. The aim of the validation activity is also to participate in the tuning of the generators in order to provide appropriate settings for the proton-proton collisions at the LHC energy level. This paper presents the current status and the future plans of the Generator Services project. The approach used in order to provide tested Monte Carlo generators for the LHC experiments is discussed and some of the testing and validation tools are presented.
        Speaker: Mr Dmitri Konstantinov (IHEP Protvino)
        Slides
      • 521
        An update on perfmon and the struggle to get into the Linux kernel
        At CHEP2007 we reported on the perfmon2 subsystem as a tool for interfacing to the PMUs (Performance Monitoring Units) which are found in the hardware of all modern processors (from AMD, Intel, SUN, IBM, MIPS, etc.). The intent was always to get the subsystem into the Linux kernel by default. The talk will report on how progress is now being made (after long discussions) and also show the latest additions to the subsystems. In a second part the speaker will discuss the evolution of the hardware with special emphasis on new capabilities that are available in the most recent x86 processors on the market. Examples will be shown using standard HEP benchmarks as well as SPEC2006 benchmarks.
        Speaker: Mr Andrzej Nowak (CERN)
        Slides
      • 522
        Log Mining with Splunk
        Robust, centralized system and application logging services are vital to all computing organizations, regardless of size. For the past year, the RHIC/USATLAS Computing Facility (RACF) has dramatically augmented the utility of logging services with Splunk. Splunk is a powerful application that functions as a log search engine, providing fast, real-time access to data from servers, applications, and network devices. Splunk at the RACF is configured to parse system and application log files, script output, snmp traps, alerts, and has been integrated into our Nagios monitoring infrastructure. This work will detail our central log infrastructure vis-`a-vis Splunk, examine lightweight agents and example configurations, consider security, and demonstrate functionality. Distributed Splunk deployments or clusters between institutions will be discussed.
        Speaker: Robert Petkus (Brookhaven National Laboratory)
    • Summary: Friday Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      Conveners: Milos Lokajicek (Insitute of Physics AS CR, Prague), Phillipe Charpentier (CERN)
      • 523
        Summary: Online Computing
        Speaker: Volker Guelzow (Unknown)
        live broadcast
        Slides
      • 524
        Summary: Event Processing
        Speaker: Dr Elizabeth Sexton-Kennedy (FNAL)
        live broadcast
        Slides
      • 525
        Summary: Software Components, Tools and Databases
        Speaker: Dr Julius Hrivnac (LAL)
        live broadcast
        Slides
      • 10:30
        coffee break
      • 526
        Summary: Grid Middleware and Networking Technologies
        Speaker: Dr Ales Krenek (MASARYK UNIVERSITY, BRNO, CZECH REPUBLIC)
        live broadcast
        Slides
      • 527
        Summary: Distributed Processing and Analysis
        Speaker: Dagmar Adamova (Nuclear Physics Institute)
        live broadcast
        Slides
      • 528
        Conference summary
        Speaker: Dr Dario Barberis (CERN/Genoa)
        live broadcast
        Slides
    • Closing Congress Hall

      Congress Hall

      Prague

      Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
      • 529
        Invitation to CHEP2010, Taipei, Taiwan
        Speakers: Stella Shen (Academia Sinica), Vicky, Pei-Hua HUANG (Academia Sinica)
        Slides
        Video
      • 530
        Closing Ceremony
        Speaker: Milos Lokajicek (Institute of Physics)