In recent years, it has become more and more evident that software threat communities are taking an
increasing interest in Grid infrastructures. To mitigate the security risk associated with the increased numbers of attacks, the Grid software development community needs to scale up effort to reduce software vulnerabilities. This can be achieved by introducing security review processes as a standard project management practice.
The Grid Facilities Department of the Fermilab Computing Division has developed a code inspection process, tailored to
reviewing security properties of software. The goal of the process is to identify technical risks associated with an application and their impact.
This is achieved by focusing on the business needs of the application (what it does and protects), on understanding threats and exploit communities (what an exploiter gains), and on uncovering potential vulnerabilities (what defects can be exploited). The desired outcome of the process is an improvement of the quality of the software artifact and an enhanced understanding of possible mitigation strategies for residual risks.
This paper describes the inspection process and lessons learned on applying it to Grid middleware.
(FERMI NATIONAL ACCELERATOR LABORATORY)
A Grid Job Monitoring System
This paper presents a web based Job Monitoring framework for individual Grid sites that allows users to follow in detail their jobs in quasi-real time. The framework consists of several independent components, (a) a set of sensors that run on the site CE and worker nodes and update a database, (b) a simple yet extensible web services framework and (c) an Ajax powered web interface having a look-and-feel and control similar to a desktop application. The monitoring framework supports LSF, Condor and PBS-like batch systems.
This is the first such monitoring system where an X509 authenticated web interface can be seamlessly accessed by both end-users and site administrators. While a site administrator has access to all the possible information, a user can only view the jobs for the Virtual Organizations (VO) he/she is a part of.
The monitoring framework design supports several possible deployment scenarios. For a site running a supported batch system, the system may be deployed as a whole, or existing site sensors can be adapted and reused with our web services components. A site may even prefer to build the web server independently and choose to use only the Ajax powered web interface.
Finally, the system is being used to monitor a glideinWMS instance. This broadens its scope significantly, allowing it to monitor jobs over multiple sites.
A minimal xpath parser for accessing XML tags from C++
A minimal xpath 1.0 parser has been implemented within the JANA framework that
allows easy access to attributes or tags in an XML document. The motivating
implmentation was to access geometry information from XML files in
the HDDS specification (derived from ATLAS's AGDD). The system allows
components in the reconstruction package to pick out individual numbers
from a collection of XML files with a single line of C++ code. The
xpath parsing aspect of JANA will be presented along with examples
of both its use and specific tasks where its use would be beneficial.
A PanDA Backend for the Ganga Analysis Interface
Ganga provides a uniform interface for running ATLAS user analyses on a number of local, batch, and grid backends. PanDA is a pilot-based production and distributed analysis system developed and used extensively by ATLAS. This work presents the implementation and usage experiences of a PanDA backend for Ganga. Built upon reusable application libraries from GangaAtlas and PanDA, the Ganga PanDA backend allows users to run their analyses on the worldwide PanDA resources, while providing the ability for users to develop simple or complex analysis workflows in Ganga. Further, the backend allows users to submit and manage "personal" PanDA pilots: these pilots run under the user's grid certificate and provide a secure alternative to shared pilot certificates while enabling the usage of local resource allocations.
Daniel Colin Van Der Ster
(Conseil Europeen Recherche Nucl. (CERN))
A Web portal for the Engineering and Equipment Data Management System at CERN
CERN, the European Laboratory for Particle Physics, located in Geneva - Switzerland, has recently started the Large Hadron Collider (LHC), a 27 km particle accelerator. The CERN Engineering and Equipment Data Management Service (EDMS) provides support for managing engineering and equipment information throughout the entire lifecycle of a project. Based on several both in-house developed and commercial data management systems, this service supports management and follow-up of different kinds of information throughout the lifecycle of the LHC project: design, manufacturing, installation, commissioning data, maintenance and more.
The data collection phase, carried out by specialists, is now being replaced by a phase during which data will be consulted on an extensive basis by non-experts users. In order to address this change, a Web portal for the EDMS has been developed. It brings together in one space all the aspects covered by the EDMS: project and document management, asset tracking and safety follow-up.
This paper presents the EDMS Web portal, its dynamic content management and its “one click” information search engine.
Advanced Data Extraction Infrastructure: Web Based System for Management of Time Series Data
During operation of high energy physics experiments a big amount of slow control data is recorded. It is necessary to examine all collected data checking the integrity and validity of measurements. With growing maturity of AJAX technologies it becomes possible to construct sophisticated interfaces using web technologies only.
Our solution for handling time series, generally slow control data, has a modular architecture: backend system for data analysis and preparation, a web service interface for data access and a fast AJAX web display. In order to provide fast interactive access the time series are aggregated over time slices of few predefined lengths. The aggregated values are stored in the temporary caching database and, then, are used to create generalizing data plots. These plots may include indication of data quality and are generated within few hundreds of milliseconds even if very high data rates are involved. The extensible export subsystem provides data in multiple formats including CSV, Excel, ROOT, and TDMS. The search engine can be used to find periods of time where indications of selected sensors are falling into the specified ranges. Utilization of caching database allows performing most of such lookups within a second. Based on this functionality a web interface facilitating fast (Google-maps style) navigation through the data has been implemented.
The solution is at the moment used by several slow control systems at Test Facility for Fusion Magnets (TOSKA) and Karlsruhe Tritium Neutrino (KATRIN).
(The Institute of Data Processing and Electronics, Forschungszentrum Karlsruhe)
Alternative Factory Model for Event Processing with Data on Demand
Factory models are often used in object oriented
programming to allow more complicated and controlled
instantiation than is easily done with a standard C++ constructor.
The alternative factory model implemented in the
JANA event processing framework addresses issues of
data integrity important to the type of reconstruction
software developed for experimental HENP. The data on demand
feature of the framework makes it well suited for Level-3 trigger
applications. The alternative
factory model employed by JANA will be presented with
emphasis on how it implements a data on demand mechanism
while ensuring the integrity of the data objects passed between
Association Rule Mining on Grid Monitoring Data to Detect Error Sources
Grid computing is associated with a complex, large scale, heterogeneous and distributed environment. The combination of different Grid infrastructures, middleware implementations, and job submission tools into one reliable production system is a challenging task. Given the impracticability to provide an absolutely fail-safe system, strong error reporting and handling is a crucial part of operating these infrastructures.
There are various monitoring systems in place, which are also able to deliver error codes of failed Grid jobs. Nevertheless, the error codes do not always denote the actual source of the error. Instead, a more sophisticated methodology is required to locate problematic Grid elements. In our contribution we propose to mine Grid monitoring data using association rules. With this approach we are able to produce additional knowledge about the Grid elements' behavior by taking correlations and dependencies between the characteristics of failed Grid jobs into account. This technique finds error patterns - expressedas rules - automatically and fast, which helps tracing back errors to their origin. Therewith a significant decrease in time for fault recovery and fault removal is achieved, yielding an improvement of a Grid's reliability. This work presents the results of investigations on association rule mining algorithms and evaluation methods to find the best rules with respect to monitoring data in a Grid infrastructure.
(Johannes Kepler Universität Linz)
ATLAS Event Metadata Records as a Testbed for Scalable Data Mining
At a data rate of 200 hertz, event metadata records ("TAGs," in ATLAS parlance)
provide fertile grounds for development and evaluation of tools for scalable data mining.
It is easy, of course, to apply HEP-specific selection or classification rules to event records
and to label such an exercise "data mining," but our interest is different.
Advanced statistical methods and tools such as classification, association rule mining,
and cluster analysis are common outside the high energy physics community. These tools can prove
useful, not necessarily for discovery physics, but for learning about our data, our detector, and our software.
A fixed and relatively simple schema makes TAG export to other storage technologies such as
HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms
such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world
for open science, in the development of scalable tools for data mining. Using a domain-neutral
scientific data format may also enable us to take advantage of existing data mining components
from other communities.
There is, further, a substantial literature on the topic of one-pass algorithms and stream
mining techniques, and such tools may be inserted naturally at various points in the event data
processing and distribution chain.
This paper describes early experience with event metadata records from ATLAS simulation
and commisioning as a testbed for scalable data mining tool development and evaluation.
(Argonne National Laboratory), DrPeter Van Gemmeren
(Argonne National Laboratory)
ATLAS Grid Compute Cluster with virtualised service nodes
The ATLAS computing Grid consists of several hundred compute clusters distributed around the world as part of the Worldwide LHC Computing Grid (WLCG). The Grid middleware and the ATLAS software which has to be installed on each site, often require certain Linux distribution and sometimes even specific version thereof.
On the other hand, mostly due to maintenance reasons, computer centres install the same operating system and version on all computers. This might lead to problems with the Grid middleware if the local version is different from the one for which it has been developed.
At RZG we partly solved this conflict by using virtualisation technology for the service nodes. We will present the setup used at RZG and show how it helped to solve the problems described above. In addition we will ilustrate the additional advantages gained by the above setup.
ATLAS operation in the GridKa Tier1/Tier2 cloud
The organisation and operations model of the ATLAS T1-T2 federation/cloud associated to the GridKa T1
in Karlsruhe is described. Attention is paid to cloud level services and the experience gained during
the last years of operation.
The ATLAS GridKa Cloud is large and divers spanning 5 countries, 2 ROC's and is currently comprised of 13
core sites. A well defined and tested operations model in such a cloud is of the utmost importance.
We have defined the core cloud services required by the ATLAS experiment and ensured that they are performed
in a managed and sustainable manner. Services such as Distributed Data Management involving data
replication,deletion and consistency checks, Monte Carlo Production, software installation and data
reprocessing are described in greater detail.
In addition to providing these central services we have undertaken several cloud level stress tests and developed
monitoring tools to aid with cloud diagnostics. Furthermore we have defined good channels of communication
between ATLAS, the T1 and the T2's and have pro-active contributions from the T2 manpower.
A brief introduction to the GridKa cloud is provided followed by a more detailed
discussion of the operations model and ATLAS services within the cloud.
Finally a summary of our experience gained while running these services is presented.
Authentication and authorisation in CMS' monitoring and computing web services
The CMS experiment at the Large Hadron Collider has deployed numerous web-based services in order to serve the collaboration effectively. We present the two-phase authentication and authorisation system in use in the data quality and computing monitoring services, and in the data- and workload management services. We describe our techniques intended to provide a high level of security with minimum harassment, and how we have applied a role-based authorisation model to a variety of services depending on the task and the strength of the authentication. We discuss the experience of implementing authentication at front-end servers separate from application servers, and challenges authenticating both humans and programs effectively. We describe our maintenance procedures and report capacity and performance results.
Automated Testing Infrastructure for LHCb Software Framework Gaudi
An extensive test suite is the first step towards the delivery of robust software, but it is not always easy to implement it, especially in projects with many developers. An easy to use and flexible infrastructure to use to write and execute the tests reduces the work each developer has to do to instrument his packages with tests. At the same time, the infrastructure gives the same look and feel to the tests and allows automated execution of the test suite. For Gaudi, we decided to develop the testing infrastructure on top of the free tool QMTest, used already in LCG Application Area for the routine tests run in the nightly build system. The high flexibility of QMTest allowed us to integrate it in the Gaudi package structure. A specialized test class and some utility functions have been developed to simplify the definition of a test for a Gaudi-based application. Thanks to the testing infrastructure described here, we managed to quickly extend the standard Gaudi test suite and add tests to the main LHCb applications, so that they are executed in the LHCb nightly build system to validate the code.
(European Organization for Nuclear Research (CERN))
Batch efficiency at CERN
A frequent source of concern for resource providers is the efficient use of computing resources in their centres. This has a direct impact on requests for new resources.
There are two different but strongly correlated aspects to be
considered: while users are mostly interested in a good turn-around time for their jobs, resource providers are mostly interested in a high and efficient usage of their available resources.
Both things, the box usage and the efficiency of individual user jobs, need to be closely monitored so that the sources of the inefficiencies can be identified. At CERN, the Lemon monitoring system is used for both purposes. Examples of such sources are poorly written user code, inefficient access to mass storage systems, and dedication of resources to specific user groups.
As a first step for improvements CERN has launched a project to develop a scheduler add-on that allows careful overloading of worker nodes that run idle jobs. Results on the impact of these developments on the box efficiency will be presented.
MrRicardo Manuel Salgueiro Domingues da Silva
Benchmarking the ATLAS software though the Kit Validation engine
The measurement of the experiment software performances is a very important metric in order to choose the most effective resources to be used and to discover the bottlenecks of the code implementation.
In this work we present the benchmark techniques used to measure the ATLAS software performance through the ATLAS offline testing engine Kit Validation and the online portal Global Kit Validation. The performance measurements, the data collection, the online analysis and display of the results will be presented. The results of the measurement on different platforms and architectures will be shown, giving a full report on the CPU power and memory consumption of the Monte Carlo generation, simulation, digitization and reconstruction of the most CPU-intensive channels. The impact of the multi-core computing on the ATLAS software performance will also be presented, comparing the behavior of different architectures when increasing the number of concurrent processes.
The benchmark techniques described in this paper have been used in the HEPiX group since the beginning of 2008 to help defining the performance metrics for the High Energy Physics applications, based on the real experiment software.
Alessandro De Salvo
(Istituto Nazionale di Fisica Nucleare Sezione di Roma 1)
Build and test system for FairRoot
One of the challenges of software development for large experiments is to
manage the contributions from globally distributed teams. In order to keep
the teams synchronized a strong quality control is important.
For a software project this means that it has to be tested on all
supported platforms if the project can be build from source,
if it runs and in the end if the program delivers the correct results.
This tests should be done frequently which results immediately in the
necessity to do these checks automatically.
If the number of different platforms increases it becomes impractical to
have installations of all supported platforms at one site. To overcome this
problem, the best way is to use a client server architecture, which means to
run the quality control at the place where a specific platform is
installed and used (client) and only the results are send to a central server
responsible for the processing of the data.
The scheme used within FairRoot to fulfill this requirements will be presented.
The configure, build and test framework is based on CMake an open source
tool to generate standard build files for the different
operating systems/compiler out of simple configuration files.
To process and display the gathered data the open source tool CDash is used.
From the generated web pages information about the status of the project at a
given time can be obtained.
CASTOR end-to-end monitoring system
We present the new monitoring system for CASTOR (CERN Advanced
STORage) which allows an integrated view on all the different storage
components. With the massive data-taking phase approaching, CASTOR is
one of the key elements of the software needed by the LHC
experiments. It has to provide a reliable storage machinery for saving
the event data, as well as to enable an efficient reconstruction and
analysis, making the monitoring of the running CASTOR instances
essential. The new CASTOR monitoring system is built around a
dedicated database schema which allows to perform the appropriate
queries in an efficient way. The monitoring database is currently
populated using SQL procedures running on the CASTOR Distributed
Logging Facility (DLF) which is a database where the log messages
created by the different CASTOR entities are stored. In the future
releases, it is envisaged to move to a SYSLOG-based transport and to
have the monitoring database to be directly populated by Python
scripts parsing and pre-processing the log messages. A web interface
has been developed for the presentation of the monitoring
information. The different histograms and plots are created using PHP
scripts which query the monitoring database. The modular approach of
the new monitoring system makes it easy to change the method of
populating the monitoring database, or to changes the web interface,
without modifying the database itself. After a short introduction
about the CASTOR architecture, we will discuss in details the CASTOR
monitoring database and present the new web interface.
CMS conditions database web application service
The web application service as part of the conditions database system serves applications and users outside the event-processing. The application server is built upon conditions python API in the CMS offline software framework. It responds to http requests on various conditions database instances. The main client of the application server is the conditions database web GUI which currently exposes three main services.
The tag browser allows user to see the availability of the conditions data in terms of their version (tag) and the interval of validity (iov).
The global tag component is used by physicists to inspect the organization of the tags in a given data taking or data production while production managers use the web service to produce such tag hierarchy.
History chart plotting service creates dynamic summary and distribution charts of the payload data in the database. Fast graphical overview of different information greatly helps physicists in monitoring and validating the calibration data stored in the condition database.
CMS Dashboard Task Monitoring: A user-centric monitoring view.
Dashboard is a monitoring system developed for the LHC experiments in order to provide the view of the Grid infrastructure from the perspective of the Virtual Organisation. The CMS Dashboard provides a reliable monitoring system that enables the transparent view of the experiment activities across different middleware implementations and combines the Grid monitoring data with information that is specific to the experiment. The scientists must be able to monitor the execution status, application and grid-level messages of their tasks that may run at any site within the Virtual Organisation. The existing monitoring systems provide this type of information but they are not focused on the user's perspective. Information towards individual users is not easily available at present or even non-existent. The CMS Dashboard Task Monitoring project addresses this gap by collecting and exposing a user-centric set of information to the user regarding submitted tasks. It provides a clear and precise view of the status of the task including job distribution by sites and over time, reason of failure and advanced graphical plots giving a more usable and attractive interface to the analysis and production user. The development was user driven with physicists invited to test the prototype in order to assemble further requirements and identify weaknesses with the application. The solutions implemented and insight into future development plans are presented here.
CMS data quality monitoring web service
A central component of the data quality monitoring system of the CMS experiment at the Large Hadron Collider is a web site for browsing data quality histograms. The production servers in data taking provide access to several hundred thousand histograms per run, both live in online as well as for up to several terabytes of archived histograms for the online data taking, Tier-0 prompt reconstruction, prompt calibration and analysis activities, for re-reconstruction at Tier-1s and for release validation. At the present usage level the servers currently handle in total around a million authenticated HTTP requests per day. We describe the main features and components of the system, our implementation for web-based interactive rendering, and the server design. We give an overview of the deployment and maintenance procedures. We discuss the main technical challenges and our solutions to them, with emphasis on functionality, long-term robustness and performance.
CMS Partial Releases: model, tools, and applications. Online and Framework-light releases.
The CMS Software project CMSSW embraces more than a thousand packages organized in over a hundred subsystems covering the areas of analysis, event display, reconstruction, simulation, detector description, data formats, framework, utilities and tools. The release integration process is highly automated, using tools developed or adopted by CMS. Packaging in rpm format is a built-in step in the software build process.
For several well-defined applications it is highly desirable to have only a subset of the CMSSW full package bundle. For example, High Level Trigger algorithms that run on the Online farm, and need to be rebuilt in a special way, require no simulation, event display, or data analysis functionality. Physics analysis applications in the ROOT environment require only a small number of core libraries and the description of CMS specific data formats.
We present a model of CMS Partial Releases, used for preparation of the customized CMS software builds, including description of tools, the implementation, and how we deal with technical challenges, such as resolving dependencies and meeting special requirements for concrete applications in a highly automated fashion.
CMS Software Build, Release and Distribution --- Large system optimization
The CMS offline software consists of over two million lines of code actively developed by hundreds of developers from all around the world. Optimal builds and distribution of such a large scale system for production and analysis activities for hundreds of sites and multiple platforms are major challenges. Recent developments have not only optimized the whole process but also helped us identify the remaining build and integration issues. We describe how parallel builds of software and minimal distribution size dramatically reduced the time gap between software build and installation on remote sites and how we have improved the performance of the build environment used by developers. In addition, we discuss our work to produce few big binary products rather than thousands of small ones.
CMS Tier-2 Resource Management
The Tier-2 centers in CMS are the only location, besides the specialized analysis facility at CERN, where users are able to obtain guaranteed access to CMS data samples. The Tier-1 centers are used primarily for organized processing and storage. The Tier-1s are specified with data export and network capacity to allow the Tier-2 centers to refresh the data in disk storage regularly for analysis. A nominal Tier-2 center will deploy 200 TB of storage for CMS. The CMS expectation for the global Tier-2 capacity is more than 5 PB of usable disk storage. In order to manage such a large and highly distributed resource CMS has tried to introduce policy and structure to the Tier-2 storage and processing.
In this presentation we will discuss the CMS policy for dividing resources between the local community, the individual users, CMS centrally, and focused CMS analysis groups. We will focus on the technical challenges associated with management and accounting as well as the collaborative challenges of assigning resources to the whole community. We will explore the different challenges associated with partitioning dynamic resources like processing and more static resources like storage. We will show the level of dynamic data placement and resource utilization achieved and the level of distribution CMS expects to achieve in the future.
(RWTH Aachen, III. Physikal. Institut B)
CMS Usage of the Open Science Grid and the US Tier-2 centers
The CMS experiment has been using the Open Science Grid, through its US Tier-2 computing centers, from its very beginning for production of Monte Carlo simulations. In this talk we will describe the evolution of the usage patterns indicating the best practices that have been identified. In addition to describing the production metrics and how they have been met, we will also present the problems encountered and mitigating solutions. Data handling and the user analysis patterns on the Tier-2 and OSG computing will be described.
DrAjit Kumar Mohapatra
(University of Wisconsin, Madison, USA)
Commissioning Distributed Analysis at the CMS Tier-2 Centers
CMS has identified the distributed Tier-2 sites as the primary location for physics analysis. There is a specialized analysis cluster at CERN, but it represents approximately 15% of the total computing available to analysis users. The more than 40 Tier-2s on 4 continents will provide analysis computing and user storage resources for the vast majority of physicists in CMS. The CMS estimate is that each Tier-2 will be able to support on average 40 people and the global number of analysis jobs per day is between 100k and 200k depending on the data volume and individual activity. Commissioning a distributed analysis system of this scale in terms of distribution and number of expected users is a unique challenge.
In this presentation we will discuss the CMS Tier-2 analysis commissioning activities and user experience. The 4 steps deployed during the Common Computing Readiness Challenge that drove the level of activity and participation to an unprecedented scale in CMS will be presented. We will summarize the dedicated commissioning tests employed to prepare the next generation of CMS analysis server. Additionally, we will present the experience from users and the level of adoption of the tools in the collaboration.
(on beahlf of CMS - INFN-BOLOGNA (ITALY))
COOL Performance Optimization and Scalability Tests
The COOL project provides software components and tools for the handling of the LHC experiment conditions data. The project is a collaboration between the CERN IT Department and Atlas and LHCb, the two experiments that have chosen it as the base of their conditions database infrastructure. COOL supports persistency for several relational technologies (Oracle, MySQL and SQLite), based on the CORAL Relational Abstraction Layer. For both experiments, Oracle is the backend used for the deployment of COOL database services at Tier0 (both online and offline) and Tier1 sites. While the development of new software features is still ongoing, performance optimizations and tests have been the main focus of the project in 2008. This presentation will focus on the results of the proactive scalability tests performed by the COOL team for data insertion and retrieval using samples of simulated conditions data. It will also briefly review the results of stress tests performed by the experiments using the production setups for service deployment.
Cyberinfrastructure for High Energy Physics in Korea
KISTI (Korea Institute of Science and Technology Information) in Korea is the national headquarter of supercomputer, network, Grid and e-Science. We have been working on cyberinfrastructure for high energy physics experiment, especially CDF experiment and ALICE experiment. We introduce the cyberinfrastructure which includes resources, Grid and e-Science for these experiments. The goal of e-Science is to study high energy physics anytime and anywhere even if we are not on-site of accelerator laboratories. The components are data production, data processing and data analysis. The data production is to take both on-line and off-line shifts remotely. The data processing is to run jobs anytime, anywhere using Grid farms. The data analysis is to work together to publish papers using collaborative environment such as EVO (Enabling Virtual Organization) system.
We also present the activities of FKPPL (France-Korea Particle Physics Laboratory) which is the joint laboratory between France and Korea for Grid, ILC, ALICE and CDF experiments. Recently we have constructed FKPPL VO (Virtual Organization). We will present the applications of this VO.
Data Management tools and operational procedures in ATLAS : Example of the German cloud
A set of tools have been developed to ensure the Data Management operations (deletion, movement of data within a site and consistency checks) within the German cloud for ATLAS. These tools that use local protocols which allow a fast and efficient processing are described hereafter and presented in the context of the operational procedures of the cloud. A particular emphasis is put on the consistency checks between the Local Catalogues (LFC) and the files stored on the Storage Element. These consistency checks are crucial to be sure that all the data stored in the sites are actually available for the users and to get rid of non registered files also known as Dark Data.
dCache with tape storage for High Energy Physics applications
An interface between dCache and the local Tivoli Storage Manager (TSM) tape storage facility has been developed at the University of Victoria (UVic) for High Energy Physics (HEP) applications. The interface is responsible for transferring the data from disk pools to tape and retrieving data from tape to disk pools. It also checks the consistency between the PNFS filename space and the TSM database. The dCache system, consisting of a single admin node with two pool nodes, is configured to have two read pools and one write pool. The pools are attached to the TSM storage that has a capacity of about 100TB. This system is being used in production at UVic as part of a Tier A site for BaBar Tau analysis. An independent dCache system is also in production for the storage element (SE) of the ATLAS experiment as a part of Canadian Tier-2 sites. This system does not currently employ a tape storage facility, however, it can be added in the future.
(University of Victoria, Victoria, BC, Canada)
Development and Commissioning of the CMS Tier0
The CMS Tier 0 is responsible for handling the data in the first period of it's life, from being written to a disk buffer at the CMS experiment site in Cessy by the DAQ system, to the time transfer completes from CERN to one of the Tier1 computing centres. It contains all automatic data movement, archival and processing tasks run at CERN. This includes the bulk transfers of data from Cessy to a Castor disk pool at CERN, repacking the data into Primary Datasets, storage to tape of and export to the Tier 1 centres. It also includes a first reconstruction pass over all data and and the tape archival and export to the Tier1 centres of the reconstructed data. While performing these tasks, the Tier 0 has to maintain redundant copies of the data and flush it through the system within a narrow time window to avoid data loss. With data taking being imminent, this aspect of the CMS computing effort becomes of the upmost importance. We discuss and explain here the work developing and commissioning the CMS Tier0 undertaken over the last year.
DIRAC, the LHCb community Grid solution, provides access to a vast amount of computing and storage resources to a large number of users. In DIRAC users are organized in groups with different needs and permissions. In order to ensure that only allowed users can access the resources and to enforce that there are no abuses, security is mandatory. All DIRAC services and clients use secure connections that are authenticated using certificates and grid proxies. Once a client has been authenticated, authorization rules are applied to the requested action based on the presented credentials. These authorization rules and the list of users and groups are centrally managed in the DIRAC Configuration Service.
Users submit jobs to DIRAC using their local credentials. From then on, DIRAC has to interact with different Grid services on behalf of this user. DIRAC has a proxy management service where users upload short-lived proxies to be used when DIRAC needs to act on behalf of them. Long duration proxies are uploaded by users to MyProxy service, and DIRAC retrieves new short delegated proxies when necessary.
This contribution discusses the details of the implementation of this security infrastructure in DIRAC.
MrAdrian Casajus Ramo
(Departament d' Estructura i Constituents de la Materia)
Distributed Processing and Analysis of ALICE data at distributed Tier2-RDIG
A. Bogdanov3, L. Malinina2, V. Mitsyn2, Y. Lyublev9, Y. Kharlov8, A. Kiryanov4,
D. Peresounko5, E.Ryabinkin5, G. Shabratova2 , L. Stepanova1, V. Tikhomirov3,
W. Urazmetov8, A.Zarochentsev6, D. Utkin2, L. Yancurova2, S. Zotkin8
1 Institute for Nuclear Research of the Russian, Troitsk, Russia;
2 Joint Institute for Nuclear Research, Dubna, Russia;
3 Moscow Engineering Physics Institute, Moscow, Russia;
4 Petersburg Nuclear Physics Institute, Gatchina, Russia;
5 Russian Research Center "Kurchatov Institute", Moscow, Russia;
6 Saint-Petersburg State University, Saint-Petersburg, Russian;
7 Skobeltsyn Institute of Nuclear Physics, Moscow, Russia;
8 Institute for High Energy Physics, Protvino, Russia;
9 Institute for Theoretical and Experimental Physics, Moscow, Russia;
( this activity is supported by CERN-INTAS grant 7484)
The readiness of Tier-2s to the processing and analysis of LHC data in present days is a subject of worry and control from LHC experiment managements. According to ALICE computing model , main tasks of Tier-2 activity are production of simulated data and analysis as simulated as experimental data. Russian sites combined together into distributed Tier-2 RDIG (Russian Intensive Data GRID) were and are participating in the ALICE GRID activity starting from 2004 year.
The ALICE GRID activity is based at AliEn with usage of LCG(EGEE) middle ware via interface. The stable operation of AliEn with LCG middleware has been tested and demonstrated in few last year. For the more adequate processing of ALICE data during LHC operation there needed to test stability of processing and analysis data with application more modern services like CREAM-CE and pure xrootd
The major subject of this report is demonstration of a possibility for production simulation data necessary for the complex analysis of the forthcoming LHC data and processing this analysis itself.
There will be discussed the usage of CPU and DISK resources pledged by RDIG for the GRID activity of ALICE. The installation, test and stable operation support of new services at RDIG sites like CREAM-CE and pure xrootd have been discussed in this report. It will show the advantage of these services usage for ALICE tasks. There will be presented also the information about installation, test and support of parallel analysis facility based on PROOF for the special usage of Russian ALICE community. There will be presented examples of this facility application for analysis of simulated and reconstructed ALICE data for the first LHC physics.
 ALICE Collaboration, Technical Design Report of Computing,CERN-LHCC-2005-018
 P. Saiz et al., Nucl. Instrum. Methods A502 (2003) 437-440; http://alien.cern.ch/;
F.Rademakers et al
(Joint Inst. for Nuclear Research (JINR))
Dynamic Virtual AliEn Grid Sites on Nimbus with CernVM
Infrastructure-as-a-Service (IaaS) providers allow users to easily acquire on-demand computing and storage resources. For each user they provide an isolated environment in the form of Virtual Machines which can be used to run services and deploy applications. This approach, also known as 'cloud computing', has proved to be viable for a variety of commercial applications. Currently there are many IaaS providers on the market, the biggest of them is Amazon with its 'Amazon Elastic Computing Cloud (Amazon EC2)' service.
The question arises whether scientific communities can benefit from the IaaS approach, and how existing projects can take advantage of cloud computing. Will there be a need to make any changes to existing services and applications? How can services and applications (e.g., grid infrastructure or other distributed tools), currently used by scientists, be integrated to infrastructures offered by IaaS providers?
In this contribution we describe some answers to these questions. We show how cloud computing resources can be used within the AliEn Grid framework, developed by CERN ALICE experiment, for performing simulation, reconstruction and analysis of physics data.
We use baseline virtual software appliance for the LHC experiments developed by the CernVM project. The appliance provides a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid, independent of physical software and hardware platform. We deploy those appliances on the Science Clouds resources that use the Nimbus project to enable deployment of VMs on remote resources. We further also use Nimbus tools for one click deployment of dynamically configurable AliEn Grid site on the Science Cloud of the Univeristy of Chicago.
Enabling Virtualization for Atlas Production Work through Pilot Jobs
Omer Khalid, Paul Nillson, Kate Keahey, Markus Schulz
Given the profileration of virtualization technology in every technological domain, we have been investigating on enabling virtualization in the LCG Grid to bring in virtualization benefits such as isolation, security and environment portability using virtual machines as job execution containers.
There are many different ways to go around about it but as our workload candidate is Atlas experiment, so we choose to enable virtualization through pilot jobs which in Atlas case is Panda Pilot Framework. In our approach, once a pilot would have acquired a resource slot on the grid; it verifies if the server support virtual machines. If it does, then it proceeds to standard phases of job download and environment preparation and finally deploy virtual machine.
We have taken a holistic approach in our implementation where all the I/O takes places outside of the virtual machine on the host OS. Once all the data have been downloaded, then the Panda Pilot packages the job in the virtual machines and launches it for execution. Upon termination, panda pilot running on the host machine updates the server and stores the job output to an external SE and then do the clean up to makes the host slot available for next job execution.
Installing and maintaining Atlas releases on the worker nodes are the biggest issue, and especially how they could be made available to the virtual machine job execution container. In our implementation, Panda pilot takes an existing Atlas release installation and packages it in the virtual machine before starting it as read-only block device thus enabling the job to execute. Similarly, the base images for the virtual machine are generic to make sure that they are usable for large sets of jobs while keeping the control in the hands of system administrators as Panda pilot only uses the images made available by them.
In this way, pilot never looses the slot but at the same time enables virtualization on the grid in a systematic and coherent manner. Additional advantage of this approach is that only the computational over head of the virtualization is incurred which are minimal, and avoids more significant over head of I/O in a virtual machine by downloading/uploading in the host environment rather than in the virtual machine.
Ensuring Data Consistency Over CMS Distributed Computing System
CMS utilizes a distributed infrastructure of computing centers to custodially store data, to provide organized processing resources, and to provide analysis computing resources for users. Integrated over the whole system, even in the first year of data taking, the available disk storage approaches 10 peta bytes of space. Maintaining consistency between the data bookkeeping, the data transfer system, and physical storage is an interesting technical and operations challenge. In this presentation we will discuss the CMS effort to ensure that data is consistently available at all computing centers. We will discuss the technical tools that monitor the consistency of the catalogs and the physical storage as well as the operations model used to find and solve inconsistencies.
(Fermi National Accelerator Lab. (Fermilab))
EVE - Event Visualization Environment of the ROOT framework
EVE is a high-level visualization library using ROOT's data-processing, GUI and OpenGL interfaces. It is designed as a framework for object management offering hierarchical data organization, object interaction and visualization via GUI and OpenGL representations. Automatic creation of 2D projected views is also supported. On the other hand, it can serve as an event visualization toolkit satisfying most HEP requirements: visualization of geometry, simulated and reconstructed data such as hits, clusters, tracks and calorimeter information. Special classes are available for visualization of raw-data.
Object-interaction layer allows for easy selection and highlighting of objects and their derived representations (projections) across several views (3D, Rho-Z, R-Phi). Object-specific tooltips are provided in both GUI and GL views.
The visual-configuration layer of EVE is built around a data-base of template objects that can be applied to specific instances of visualization objects to ensure consistent object presentation. The data-base can be retrieved from a file, edited during the framework operation and stored to file.
EVE prototype was developed within the ALICE collaboration and has been included into ROOT in December 2007. Since then all EVE components have reached maturity. EVE is used as the base of AliEve visualization framework in ALICE, Firework physics-oriented event-display in CMS, and as the visualization engine of FairRoot in FAIR.
Evolution of the ATLAS Computing Model
Despite the all too brief availability of beam-related data, much has been learned about the usage patterns and operational requirements of the ATLAS computing model since Autumn 2007. Bottom-up estimates are now more detailed, and cosmic ray running has exercised much of the model in both duration and volume. Significant revisions have been made in the resource estimates, and in the usage of those resources. In some cases, this represents an optimization while in others it attempts to counter lack of functionality in the available middleware. There are also changes reflecting the emerging roles of the different data formats. The model continues to evolve with a heightened focus on end-user performance, and the state of the art after a major review process over winter 08/09 will be presented.
Experience Building and Operating the CMS Tier-1 Computing Centers
The CMS Collaboration relies on 7 globally distributed Tier-1 computing centers located at large universities and national laboratories for a second custodial copy of the CMS RAW data and primary copy of the simulated data, data serving capacity to Tier-2 centers for analysis, and the bulk of the reprocessing and event selection capacity in the experiment. The Tier-1 sites have a challenging role in CMS because they are expected to ingest and archive data from both CERN and regional Tier-2 centers, while they export data to a global mesh of Tier-2s at rates comparable to the raw export data rate from CERN. The combined capacity of the Tier-1 centers is more than twice the resources located at CERN and efficiently utilizing this large distributed resources represents a challenge.
In this presentation we will discuss the experience building, operating, and utilizing the CMS TIer-1 computing centers. We will summarize the facility challenges at the Tier-1s including the stable operations of CMS services, the ability to scale to large numbers of processing requests and large volumes of data, and the ability to provide custodial storage and high performance data serving. We will also present the operations experience utilizing the distributed TIer-1 centers from a distance: transferring data, submitting data serving requests, and submitting batch processing requests.
Experience with ATLAS MySQL Panda DataBase service
The PanDA distributed production and analysis system has been in
production use for ATLAS data processing and analysis since late 2005
in the US, and globally throughout ATLAS since early 2008. Its core
architecture is based on a set of stateless web services served by
Apache and backed by a suite of MySQL databases that are the
repository for all Panda information: active and archival job queues,
dataset and file catalogs, site configuration information, monitoring
information, system control parameters, and so on. This database
system is one of the most critical components of PanDA, and has
successfully delivered the functional and scaling performance
required by PanDA, currently operating at a scale of half a million
jobs per week, with much growth still to come.
In this paper we describe the design and implementation of the PanDA
database system, its architecture of MySQL servers deployed at BNL
and CERN, backup strategy and monitoring tools. The system has been
developed, thoroughly tested, and brought to production to provide
highly reliable, scalable, flexible and available database services
for ATLAS Monte Carlo production, reconstruction and physics
(Brookhaven National Laboratory (BNL)), DrYuri Smirnov
(Brookhaven National Laboratory (BNL))
Experience with Server Self Service Center (S3C)
CERN has a successful experience with running Server Self Service Center
(S3C) for virtual server provisioning which is based on Microsoft Virtual
Server 2005. With the introduction of Window Server 2008 and its built-in hypervisor based virtualization (Hyper-V) there are new possibilities for the expansion of the current service.
Observing a growing industry trend of provisioning Virtual Desktop Infrastructure (VDI) we try to gather the ideas of how desktop
infrastructure could take advantage of thin client technology combined with virtual desktops hosted by the Hyper-V infrastructure.
The talk will cover our experience of running Server Self Service Centre,
steps for the migration to the Hyper-V based infrastructure and Virtual Desktop
First experience in operating the population of the "condition database" for the CMS experiment
Reliable population of the condition database is critical for the correct operation of the online selection as well as of the offline reconstruction and analysis of data.
We will describe here the system put in place in the CMS experiment to populate the database and make condition data promptly available online for the high-level trigger and offline for reconstruction.
The system has been designed for high flexibility to cope with very different data sources and uses Pool-ORA technology to store data in an object format that matches best the object oriented C++ programming paradigm used in CMS offline software. To ensure consistency among the various subdetectors, a dedicated package, PopCon (Populator of Condition Objects), is used to store data online. The data are then automatically streamed to the offline database and so immediately accessible offline worldwide. This mechanism has been intensively used during 2008 in the test-runs with cosmic rays. The experience of this first months of operation will be discussed in details.
MrMichele De Gruttola
(INFN, Sezione di Napoli - Universita & INFN, Napoli/ CERN)
FROG : The Fast & Realistic OpenGl Event Displayer
FROG is a generic framework dedicated to visualize events in a given geometry. \newline
It has been written in C++ and use OpenGL cross-platform libraries. It can be used to any particular physics experiment or detector design. The code is very light and very fast and can run on various Operating System. Moreover, FROG is self consistent and does not require installation of ROOT or Experiment software (e.g. CMSSW) libraries on user's computer.\newline
The slides will describe the principle of the algorithm and its many functionalities such as : 3D and 2D visualization, graphical user interface, mouse interface, configuration files, production of pictures in various format, integration of personal objects... Finally the application of FROG for physic experiment, such as CMS experiment, will be described.
(Universite Catholique de Louvain)
GEANT 4 TESTING INTEGRATION INTO LCG NIGHTLY BUILDS SYSTEM
Geant4 is a toolkit to simulate the passage of particles through
matter, and is widely used in HEP, in medical physics and for space
applications. Ongoing developments and improvements require regular
integration testing for new or modified code.
The current system uses a customised version of the Bonsai Mozilla tool
to collect and select tags for testing, a set of shell and perl
scripts to submit building of the software and running the tests to
a set of Unix platforms and uses the Tinderbox Mozilla tool
to collect and display test results. Mac OS and Windows are not
integrated in this system.
Geant4 integration testing is being integrated into the LCG
applications area nightly builds system.
The LCG nightly builds system based on CMT and
on pyhton scripts supports testing on many different platforms,
including Windows and Mac OS. The CMT configuration
management tool is responsible for the configuration of the build
and test environment and external dependencies
in a structured and modulated way, giving fine control
of configuring options for the build and execution of tests.
For the testing itself, the LCG nightly builds system
uses QMTest, a test suite providing tools to test software and
to present the test outcome in different formats. We are working to
integrate this tool with Geant4 tests and to improve the
presentation of test results, so we can give different outputs
to the default ones, and different formats.
Further improvements include 'on-the-fly' automatic tag testing,
parallel execution of tests, improvements on the time use of the
server, testing of patches automatically and efficiency improvements.
Victor Diez Gonzalez
(Univ. Rov. i Virg., Tech. Sch. Eng.-/CERN)
Geant4 Qt visualization driver
Qt is a powerfull cross-platform application framework , powerful, free (even on Windows), used by lots of people and applications.
That's why, last developments in Geant4 visualization group come with a new driver, based on Qt toolkit. Qt library has OpenGL available, then all 3D scenes could be move by mouse (like in OpenInventor driver).
This driver try to resume all the features already present in other drivers, but in addition, added some new ones.
For example, a movie record feature, very useful to make movies, debug geometry....
GLANCE Traceability - Web System for Equipment Traceability and Radiation Monitoring for the ATLAS
During the operation, maintenance, and dismantling periods of the ATLAS Experiment, the traceability of all detector equipment must be guaranteed for logistic and safety matters. The running of the Large Hadron Collider will expose the ATLAS detector to radiation. Therefore, CERN shall follow specific regulation from French and Swiss authorities for equipment removal, transport, repair, and disposal. GLANCE Traceability, implemented in C++ and Java/Java3D, has been developed to fulfill the requirements. The system registers and associates each equipment part to either a functional position in the detector or a zone outside the underground area through a 3D graphical user interface. Radiation control of the equipment is performed using a radiation monitor connected to the system: the local background gets stored and the threshold is automatically calculated. The system classifies the equipment as non radioactive if its radiation dose does not exceed that limit value. History for both location traceability and radiation measurements is ensured, as well as simultaneous management of multiples equipment. The software is fully operational, being used by the Radiation Protection Experts of ATLAS since the first beam of the LHC. Initially developed for the ATLAS detector, the flexibility of the system has allowed its adaptation for the LHCb detector.
MrLuiz Henrique Ramos De Azevedo Evora
gLExec and MyProxy integration in the ATLAS/OSG PanDA Workload Management System.
Worker nodes on the grid exhibit great diversity, making it difficult to offer uniform processing resources. A pilot job architecture, which probes the environment on the remote worker node before pulling down a payload job, can help. Pilot jobs become smart wrappers, preparing an appropriate environment for job execution and providing logging and monitoring capabilities.
PanDA (Production and Distributed Analysis), an ATLAS and OSG workload management system, follows this design. However, in the simplest (and most efficient) pilot submission approach of identical pilots carrying the same identifying grid proxy, end-user accounting by the site can only be done with application-level information (PanDA maintains its own end-user accounting), and end-user jobs run with the identity and privileges of the proxy carried by the pilots, which may be seen as a security risk.
To address these issues, we have enabled Panda to use gLExec, a tool provided by EGEE which runs payload jobs under an end-user's identity. End-user proxies are pre-staged in a credential caching service, MyProxy, and the information needed by the pilots to access them is stored in the Panda DB. gLExec then extracts from the user's proxy the proper identity under which to run.
We describe the deployment, installation, and configuration of gLExec, and how PanDA components have been augmented to use it. We describe how difficulties were overcome, and how security risks have been mitigated. Results are presented from OSG and EGEE Grid environments performing ATLAS analysis using PanDA and gLExec.
(Brookhaven National Laboratory (BNL))
H1 Grid Production Tool for Monte Carlo Production
The H1 Collaboration at HERA has entered the period of high precision analyses based on the final data sample. These analyses require a massive production of simulated Monte Carlo (MC) events.
The H1 MC framework is a software for mass MC production on the LCG Grid infrastructure
and on a local batch system created by H1 Collaboration.
The aim of the tool is a full automatization of the MC production workflow, including the experiment specific parts (preparation of input files, running reconstruction and postprocessing calculations), management of the MC jobs on the Grid until copying of the resulting files from the Grid to the H1 tape storage.
The H1 MC framework has a modular structure, providing a separate module for specific task. Communication between modules is done via central database. Jobs are created as a fully autonomic and fault-tolerant for reconstruction processes service and can be running on 32 and 64-bit LCG Grid architectures. In the grid running state they can be continuously monitored using
R-GMA service. Experimental software is downloaded by jobs from a set of Storage Elements using LFC catalog.
Monitoring of the H1 MC activity and detection of problems with submitted jobs and grid sites is performed by regular checks of the jobs state from the database and the Service Availability Monitoring (SAM) framework.
The improved stability of the system has allowed a dramatic increase of the MC production rate, which exceeded two billion events in 2008.
Within the last years, the HepMC data format has established itself as the
standard data format for simulation of high-energy physics interactions and is
commonly used by all four LHC experiments. At the energies of the
proton-proton collisisions at the LHC, a full description of the generation of
these events and the subsequent interactions with the detector typically
involves several thousand particles and several hundred vertices. Currently, the
HepMC libraries only provide a text-based representation of these events.
HepMCVisual is a visualization package for HepMC events, allowing to
interactively browse through the event. Intuitive user guiding and the
possibility of expanding/collapsing specific branches of the interaction tree
allow quick navigation and visualization of the specific parts of the event of
interest to the user. Thus, it may be usefull not only for physics users
trying to understand the structure of single events, but may also be
a valuable tool for debugging MonteCarlo event generators.
Being based on the ROOT graphics libraries, HepMC Visual can be used as a standalone library, as
well as interactively from the ROOT console or in combination with the
HepMCBrowser interface within the ATLAS software framework. A short description
of the user interface and the API will be presented.
(University College London)
High Performance C++ Reflection
C++ does not offer access to reflection data: the types and their members as well as their memory layout are not accessible. Reflex adds that: it can be used to describe classes and any other types, to lookup and call functions, to lookup and access data members, to create and delete instances of types. It is rather unique and attracts considerable interest also outside of high energy physics.
Reflex is a fundamental ingredient in the data storage framework of most of the LHC experiments. It is used in a production context after several years of development. Based on this experience a new version of Reflex has been designed, allowing faster lookup, a clearer layout, a hierarchical organization of type catalogs, and a straight forward near-term extension to support multithreaded access. This new API is backed by a newly designed, externally contributed test suite based on CMake. We will present these developments and the plans for the near future.
Improved Cache Coherency Approach for CMS Frontier
The CMS experiment requires worldwide access to conditions data by nearly a hundred thousand processing jobs daily. This is accomplished using a software subsystem called Frontier. This system translates database queries into http, looks up the results in a central database at CERN, and caches the results in an industry-standard http proxy/caching server called Squid. One of the most challenging aspects of any cache system is coherency, that is, ensuring that changes made to the underlying data get propagated out to all clients in a timely manner. Recently, the Frontier system was enhanced to drastically reduce the time for changes to be propagated everywhere, typically as low as 10 minutes for some kinds of data and no more than 60 minutes for the rest of the data, without overloading servers. This was done by taking advantage of an http and Squid feature called "If-Modified-Since" in which the "Last-Modified" timestamp of cached data is sent back to the central server. The server responds to this with a very short message if data has not been modified, which is the case most of the time, and re-validates the cache. In order to use this feature, the Frontier server has to send the "Last-Modified" timestamp, but that information is not normally stored by the Oracle databases so a PL/SQL program was developed to keep track of the modification times of database tables. We discuss the details of this caching scheme and the obstacles overcome including Oracle database and Squid bugs.
Integrating interactive PROOF into a Batch System
While the Grid infrastructure for the LHC experiments is well suited for batch-like analysis, it does not support the final steps of an analysis on a reduced data set, e.g. the optimization of cuts and derivation of the final plots. Usually this part is done interactively. However, for the LHC these steps might still require a large amount of data. The German "National Analysis Facility"(NAF) at DESY in Hamburg is envisioned to close this gap. The NAF offers computing resources via the Sun Grid Engine(SGE) workload management system and high bandwidth data access via the network clustering file system Lustre. From the beginning, it was planed to setup a "Parallel ROOT Facility"(PROOF) to allow the users to analyze large amounts of data interactively in parallel. However, a separate central PROOF cluster would be decoupled from the scheduling and accounting of the existing workload management system. Thus, we have developed a setup that interfaces interactive PROOF to the SGE batch system by allowing every user to set up its own PROOF cluster using SGE's parallel environments. In addition, this setup circumvents security issues and incompatibilities between different ROOT versions. We will describe this setup and its performance for different analysis tasks. Furthermore, we will present the different ways offered by the CMS offline software to analyze CMS data with PROOF.
JINR experience in development of Grid monitoring and accounting systems
Different monitoring systems are now extensively used to keep an eye on
real time state of each service of distributed grid infrastructures and
jobs running on the Grid. Tracking current services’ state as well as
the history of state changes allows rapid error fixing, planning future
massive productions, revealing regularities of Grid operation and many
other things. Along with monitoring, accounting is also an area which
shows how the Grid is used. The data considered are statistics on Grid
sites’ resources utilization by virtual organizations and single users.
Here we describe our longstanding experience in successful development
and design of Grid monitoring and accounting systems for global grid
segments and for local national grid projects in Russia. The main points
of the developments always were satisfying real needs of VO and resource
managers and administrators, as well as making interoperable and
portable solutions which are used in several grid projects. Provided
solutions work with different Grid middleware like LCG2, gLite,
(Joint Institute for Nuclear Research (JINR))
Job optimization in ATLAS TAG based Distributed Analysis
The ATLAS experiment is projected to collect over one billion events/year during the first few years of operation.
The efficient selection of events for various physics analyses across all appropriate samples presents a significant technical challenge.
ATLAS computing infrastructure leverages the Grid to tackle the analysis across large samples by organizing data in a hierarchical structure and exploiting distributed computing to churn through the computations. This includes the same events at different stages of processing: RAW, ESD (Event Summary Data), AOD (Analysis Object Data), DPD (Derived Physics Data).
Event Level Metadata Tags (TAGs) contain a lot of information about all events stored using multiple technologies accessible by POOL and various web services. This allows users to apply selection cuts on quantities of interest across the entire sample to compile a subset of events which are appropriate for their analysis.
This paper describes new methods for organizing jobs to using the TAGs criteria to analyze ATLAS data using enhancements to ATLAS POOL Collection Utilities and ATLAS distributed analysis systems.
It further compares different access pattern to the event data and different ways to partition the workload for event selection and analysis, where analysis is intended as a broader event processing, including also events selection and reduction operations known as skimming, slimming and thinning, and DPD making.
Specifically it compares analysis with direct access to the events (AODs, ESDs, ...) to access mediated by different TAG base event selections.
We then compare different ways of splitting the processing to maximize performance.
(UNIVERSITY OF CHICAGO)
Knowledge Management System for ATLAS Scalable Task Processing on the Grid
In addition to challenges on computing and data handling, ATLAS and other
LHC experiments place a great burden on users to configure and manage the
large number of parameters and options needed to carry out distributed
Management of distribute physics data is being made more transparent by
dedicated ATLAS grid computing technologies, such as PanDA (a pilot-based
job control system).
The laborious procedure of steering the data processing application by
providing physics parameters and software configurations remained beyond
the scope of large grid projects.
The error-prone manual procedure does not scale to the LHC challenges.
To reduce human errors and automate the process of populating the ATLAS
production database with million of jobs per year we developed a system
for ATLAS knowledge management ("Knowledgement") of Task Request (AKTR).
AKTR manages configuration parameters, used for massive grid data
processing tasks (groups of similar jobs). The system assures a scalable
management of ATLAS-wide knowledge of distributed production conditions,
and guaranties reproducibility of results.
Use of AKTR system resulted in major gains in efficiency and productivity
of ATLAS production infrastructure.
LHCb Full Experiment System Test (FEST09)
LHCb had been planning to commission its High Level Trigger software and Data Quality monitoring procedures using real collisions data from the LHC pilot run. Following the LHC incident on 19th September 2008, it was decided to commission the system using simulated data.
This “Full Experiment System Test” consists of:
- Injection of simulated minimum bias events into the full HLT farm, after selection by a simulated Level 0 trigger.
- Processing in the HLT farm to achieve the output rate expected for nominal LHC luminosity running, sustained over the typical duration of an LHC fill.
- Real time Data Quality validation of the HLT output, validation of calibration and alignment parameters for use in the reconstruction.
- Transmission of the event data, calibration data and book-keeping information to Tier1 sites and full reconstruction of the event data.
- Data Quality validation of the reconstruction output.
We will report on the preparations and results of FEST09, and on the status of commissioning for nominal LHC luminosity running.
LQCD Workflow Execution Framework: Models, Provenance, and Fault-Tolerance
Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of whole workflow might be affected by a single job failure.
In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments.
As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution.
Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consist of hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults.
We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity. Preliminary results on a virtual setup with injection failures are shown.
Managing Large Data Productions in LHCb
LHC experiments are producing very large volumes of data either accumulated from the detectors or generated via the Monte-Carlo modeling. The data should be processed as quickly as possible to provide users with the input for their analysis. Processing of multiple hundreds of terabytes of data necessitates generation, submission and following a huge number of grid jobs running all over the Computing Grid. Manipulation of these large and complex workloads is impossible without powerful production management tools.
In LHCb, the DIRAC Production Management System (PMS) is used to accomplish this task. It enables production managers and end-users to deal with all kinds of data generation, processing and storage. Application workflow tools allow to define jobs as complex sequences of elementary application steps expressed as Directed Acyclic Graphs. Specialized databases and a number of dedicated software agents ensure automated data driven job creation and submission. The productions are accomplished by thorough checks of the resulting data integrity.
With the PMS a complete user interface is provided for operations starting from requests generated by the user community till the task completion and bookkeeping. Both command line and a full featured Web based Graphical User Interface allows to perform all the tasks of the production definition, control and monitoring.
This facilitates the job of the production managers allowing a single person to steer all the LHCb production activities. In the paper we will provide a detailed description of the DIRAC PMS components, their interactions with the other DIRAC subsystems. The experience with real large-scale productions will be presented and further evolution of the system will be discussed.
Mathematical simulation for 3-Dimensional Temperature Visualization on Open Source-based Grid Computing Platform
New Iterative Alternating Group Explicit (NAGE) is a powerful parallel numerical algorithm for multidimensional temperature prediction. The discretization is based on the finite difference method of partial differential equation (PDE) with parabolic type. The 3-Dimensional temperature visualization is critical since it’s involves large scale of computational complexity. The three fundamental applied mathematics issues under consideration are as follows:
i. The accurate modeling of physical systems using finite differential methods.
ii. The investigation of discretization methods that retain constraint-preserving properties of mathematical modeling.
iii. The high performance measurements of parallel algorithms involving time and space.
This paper proposed the NAGE method as a straight forward transformation from sequential to parallel algorithm using domain decomposition and splitting strategies. The processes involving the scheduling of communication, algometric and mapping the subdomain into a number of processors.
This computational challenge encourages us to utilize the power of higher performance computing. By the means of higher performance computing, the computation cannot be relying on just one single set of cluster. Therefore, this research takes the advantage of utilizing multiple set of clusters from geographically different location which is known as grid computing. In realizing this concept, we consider the advantages of data passing between two web services which each are connected with one or multiple set of clusters. For this kind of relationship, we choose service-oriented architecture (SOA) style. Each web services are easily maintainable since there is loose coupling between interacting nodes. The development of this architecture is based on several programming language as it involves algorithm implementation on C, parallelization using Parallel Virtual Machine (PVM) and Java for web services development. The grid computing platform is an open source-based and will be develop under Linux environment. The platform development will increase the acceleration and scaled-out across a virtualized grid. The clusters of processors involved in this platform are developed on increasingly-larger computational hardware with inexpensive architecture. As the conclusions, this leading grid-based application platform has a bright potential in managing highly scalable and reliable temperature prediction visualization. The efficiency of this application will be measured based on the results of numerical analysis and parallel performance.
(Department of Mathematics, Faculty of Science,Universiti Teknologi Malaysia), Norma Alias
(Institute of Ibnu Sina, Universiti Teknologi Malaysia,)
Metrics Correlation and Analysis Service
In a shared computing environment, activities orchestrated by workflow management systems often need to span organizational and ownership domains. In such a setting, common tasks, such as the collection and display of metrics and debugging information, are challenged by the informational entropy inherent to independently maintained and owned software sub-components. Because such information pool is often disorganized, it becomes a difficult target for business intelligence analysis i.e. troubleshooting, incident investigation, and trend spotting.
The Metrics Correlation and Analysis Service (MCAS) provides an integral solution for system operators and users to uniformly access, transform, and represent disjoint metrics, generated by distributed middleware or user services. The proposed software infrastructure assists with indexing and navigation of existing metrics and it supplies tools and services to define and store other quantifiable data. The Project reuses existing monitoring and data collection software deployments, with the goal of presenting a unified view of metrics data.
This paper discusses the MCAS system and places special emphasis on applying integration technologies to assist with the process of formalizing the interaction of users with end applications.
Monitoring the ATLAS distributed production
The ATLAS production system is one of the most critical components in the experiment's distributed system, and this becomes even more true now that real data has entered the scene.
Monitoring such a system is a non trivial task, even more when two of its main characteristics are the flexibility in the submission of job processing units and the heterogeneity of the resources it uses.
In this paper we present the architecture of the monitoring system that is in production today and being used by ATLAS shifters and experts around the world as a main tool for their daily activities. We describe in detail the different sources of job execution information, the different tools aggregating system usage into a relevant set of statistics and collecting site and resource status at near real time. The description of the shifter's routine usage of the application gives a clear idea of the tight integration with the rest of both grid and experiment operations tools.
Monitoring the world-wide daily computing operations in ATLAS LHC experiment
The ATLAS distributed computing activities involve about 200 computing centers distributed world-wide and need people on shift covering 24 hours per day. Data distribution, data reprocessing, user analysis and Monte Carlo event simulation runs continuously. Reliable performance of the whole ATLAS computing community is of crucial importance to meet the ambitious physics goals of the ATLAS experiment. Distributed computing software and monitoring tools are evolving continuously to achieve this target. The world-wide daily operations shift group are the first responders to all faults, alarms and outages. The shifters are responsible to find, report and follow problems at almost every level of a complex distributed infrastructure, and complex processing model. In this paper we present the operations model followed by the experiences of running the world-wide daily operations group for the past year. We will present the most common problems encountered, and the expected future evolution to provide efficient usage of data, resources, manpower and improve communication between sites and the experiment.
Organization and Management of ATLAS nightly builds
The system of automated multi-platform software nightly builds is a major
component in ATLAS collaborative software organization and code approval
scheme. Code developers from more than 30 countries use about 25
branches of nightly releases for testing new packages, validation of patches to
existing software, and migration to new platforms and compilers. The successful
nightly releases are transformed into stable releases used for data processing
worldwide. ATLAS nightly builds are managed by NICOS control tool on the
computing farm with 40 powerful multiprocessor nodes. NICOS provides a fully
automated framework for the release builds, testing, and creation of
distribution kits. The modular structure of NICOS allows for an easy integration
of third-party build and validation tools. The ATN test tool is embedded
within the nightly system and provides the first results even before the full
compilations completion. Several ATLAS test frameworks are synchronized with
NICOS jobs and run larger production jobs with the nightly releases. NICOS
web pages dynamically provide information about the progress and results of
the builds. For faster feedback the e-mail notifications about nightly build
problems are automatically distributed to responsible developers.
(BROOKHAVEN NATIONAL LABORATORY, USA)
Parallel computing of ATLAS data with PROOF at the LRZ Munich
The PROOF (Parallel ROOT Facility) library is designed to perform parallelized
ROOT-based analyses with a heterogeneous cluster of computers.
The installation, configuration and monitoring of PROOF have been carried out
using the Grid-Computing environments dedicated to the ATLAS experiment.
A PROOF cluster hosted at the Leibniz Rechenzentrum (LRZ) and consisting of a
scalable amount of worker nodes has been exploited in order to conduct the
performance tests in the case of interactive ATLAS analyses. Scenarios of
various complexities have been considered to exercise PROOF with ATLAS data
and evaluate its utilization in actual conditions. The investigation of the
PROOF performance has been done by varying the number of parallelized
processing units, the amount of simultaneous users, and the type of the file
storage. Strategies based on local files, dCache, and Lustre have been
Parallelization of Maximum Likelihood Fit Technique Using MINUIT and RooFit Packages
MINUIT is the most common package used in high energy physics for numerical minimization of multi-dimensional functions. The major algorithm of this package, MIGRAD, searches for the minimum by using the gradient function. For each minimization iteration, MIGRAD requires the calculation of the first derivatives for each parameter of the function to be minimized.
Minimization is required for data analysis problems based on the maximum likelihood technique. Complex likelihood functions, with several free parameters, many independent variables and large data sample, can be very CPU-time consuming. For such a technique the minimization process requires the calculation of the likelihood function (and corresponding normalization integrals) several times for each minimization iteration.
In this presentation we will show how MINUIT algorithm, the likelihood calculation, and the normalization integrals calculation can be easily parallelized using MPI techniques to scale over multiple nodes or multi-threads for multi-cores in a single node. We will present the speed-up improvements obtained in typical physics applications such as complex maximum likelihood fits using the RooFit package. Furthermore, we will also show results of hybrid parallelization between MPI and multi-threads, to take full advantage of multi-core architectures.
(Universita and INFN, Milano / CERN)
Partial Wave Analysis using Graphics Processing Units
Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available.
It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA, harnessing the power of GPUs based on the Brook+ framework for general purpose computing on graphics processing units. GPUPWA simplifies the coding of amplitudes in the covariant tensor formalism and other tedious and error-prone tasks involved in partial wave analyses. The user can write a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a significant speedup of the partial wave fit compared to legacy FORTRAN code.
(Institute for High Energy Physics, Beijing)
Petaminer: Using ROOT for Efficient Data Storage in MySQL Database
High Energy and Nuclear Physics (HENP) experiments store petabytes of event data and terabytes of calibrations data in ROOT files. The Petaminer project develops a custom MySQL storage engine to enable the MySQL query processor to directly access experimental data stored in ROOT files.
Our project is addressing a problem of efficient navigation to petabytes of HENP experimental data described with event-level TAG metadata, which is required by data intensive physics communities such as the LHC and RHIC experiments. Physicists need to be able to compose a metadata query and rapidly retrieve the set of matching events, where improved efficiency will facilitate the discovery process by permitting rapid iterations of data evaluation and retrieval. Our custom MySQL storage engine enabled the MySQL query processor to directly access TAG data stored in ROOT TTrees. As ROOT TTrees are column-oriented, reading them directly provides improved performance over traditional row-oriented TAG databases. Leveraging the flexible and powerful SQL query language to the data stored in ROOT TTrees, the Petaminer approach will enable rich MySQL index-building capabilities for further performance optimization.
We studied feasibility of using the built-in ROOT support for automatic schema evolution to ease handling of large volumes of calibrations data of the large working experiment stored in MySQL. Over the lifecycle of calibrations, their schema may change. Support for schema changes in relational databases requires efforts. In contrast, ROOT provides support for automatic schema evolution. Our approach has a potential to ease handling of the metadata needed for efficient access to large volumes of calibrations data.
(Argonne National Laboratory), David Malon
(Argonne National Laboratory), Jack Cranshaw
(Argonne National Laboratory), Jérôme Lauret
(Brookhaven National Laboratory), Paul Hamill
(Tech-X Corporation), Valeri Fine
(Brookhaven National Laboratory)
Pseudo-interactive monitoring in distributed computing
Distributed computing, and in particular Grid computing, enables physicists to use thousands of CPU days worth of computing every day, by submitting thousands of compute jobs.
Unfortunately, a small fraction of such jobs regularly fail; the reasons vary from disk and network problems to bugs in the user code. A subset of these failures result in jobs being stuck for long periods of time. In order to debug such failures, interactive monitoring is highly desirable; users need to browse through the job log files and check the status of the running processes.
Batch systems typically don't provide such services; at best, users get job logs at job termination, and even this may not be possible if the job is stuck in an infinite loop.
In this paper we present a novel approach of using regular batch system capabilities of Condor to enable users to access the logs and processes of any running job. This does not provide true interactive access, so commands like vi are not viable, but it does allow operations like ls,
cat, top, ps, lsof, netstat and dumping the stack of any process owned by the user; we call this pseudo-interactive monitoring.
It is worth noting that the same method can be used to monitor Grid jobs in a glidein-based environment.
We further believe that the same mechanism could be applied to many other batch systems.
Python-based Hierarchical Configuration of LHCb Applications
The LHCb software, from simulation to user analysis, is based on the framework Gaudi. The extreme flexibility that the framework provides, through its component model and the system of plug-ins, allows us to define a specific application as its behavior more than its code. The application is then described by some configuration files read by the bootstrap executable (shared by all applications). Because of the modularity of the components we have and the complexity of a typical application, the basic configuration of an application can be a challenging task, made more difficult by the need of the possibility, for user and developers, to tune such configuration. In the last year, to simplify the task, we changed the way we configure applications from static text files to Python scripts. Thanks to the power of Python, we designed an object-oriented hierarchical configuration framework, on top of the initial implementation by Atlas collaboration, where the applications are defined as high level configuration entities that use other entities representing the various configuration subsystems or contexts, thus hiding the complexity of the low level configuration from the user.
(European Organization for Nuclear Research (CERN))
Readiness of an ATLAS Distributed TIER-2 for the Physics Analysis of the early collision events at the LHC
The ATLAS data taking is due to start in Spring 2009. In this contribution and given the expectation, a rigorous evaluation of the readiness parameters of the Spanish ATLAS
Distributed Tier-2 is given.
Special attention will be paid to the readiness to perform Physics Analysis from different
points of view: Network Efficiency, Data Discovery, Data Management, Production of Simulated events, User Support and Distributed Analysis.
The prototypes of the local computing infrastructures for data analysis set-up , the so-called Tier-3 , attached to the three sites that make up the Tier-2 are described. Several use
cases of Distributed Analysis in the GRID system and local interactive tasks in the non-grid farms are provided in order to evaluate the interplay between both environments and to compare the different performances.
The sharing between Monte Carlo Production and Distributed Analysis activities is also studied. The Data Storage and Management systems chosen are described and results
on their performance are given.
(Instituto de Fisica Corpuscular (IFIC) - Universidad de Valencia)
The ROOT framework provides many visualization techniques. Lately several new ones have been implemented. This poster will present all the visualization techniques ROOT provides highlighting the best use one can do of each of them.
ROOT.NET: Making ROOT accessible from CLR based languages
ROOT.NET provides an interface between Microsoft’s Common Language Runtime (CLR) and .NET technology and the ubiquitous particle physics analysis tool, ROOT. This tool automatically generates a series of efficient wrappers around the ROOT API. Unlike pyROOT, these wrappers are statically typed and so are highly efficient as compared to the Python wrappers. The connection to .NET means that one gains access to the full series of languages developed for the CLR including functional languages like F# (based on OCaml). Dynamic languages based on the CLR can be used as well, of course (Python, for example). A first attempt at integrating ROOT tuple queries with Language Integrated Query (LINQ) is also described. This poster will describe the techniques used to effect this translation, along with performance comparisons, and examples. All described source code is posted on SourceForge.
(UNIVERSITY OF WASHINGTON)
Scaling up incident response models to multi-grid security incidents
Different computing grids may provide services to the same user
community, and in addition, a grid resource provider may share its
resources across different unrelated user communities.
Security incidents are therefore increasingly prone to propagate from
one resource center to the another, either via the user community or via
cooperating grid infrastructures.
As a result, related and connected computing grid infrastructures need
to collaborate, define and follow compatible security procedures,
exchange information and provide a coordinated response to security
incidents. However, a large number of security teams may be involved and
may need to share information, which not only is difficult to manage,
but also increases the likelihood of information leak.
Therefore it is essential to design and implement a carefully
structured, tiered, communication model to produce an appropriate
information flow during security incidents. This presentation exposes
necessary changes to the current model, as well as key challenges to
achieve a better coordinated response to security incidents affecting
Setting up Tier2 site at Golias/ Prague farm
High Energy Nuclear Physics (HENP) collaborations’ experience show that the computing resources available from a single site are often not sufficient nor satisfy the need of remote collaborators eager to carry their analysis in the fastest and most convenient way. From latencies in the network connectivity to the lack interactivity, having fully functional software stack on local resources is a strong enabler of science opportunities for any local group who can afford the time investment. The situation become more complex as vast amount of data not fitting on local resources are often needed to perform meaningful analysis.
Prague heavy-ion’s group participating in the RHIC/STAR experiment has been a strong advocate of local computing as the most efficient way of data processing and physics analyses. To create an environment where science can freely expand, a Tier2 computing center was set up at a regional Golias Computing Center for Particle Physics. Golias is the biggest farm in the Czech Republic fully dedicated for particle physics experiments. We report our experience in setting up a fully functional Tier2 center leveraging minimal locally available human and financial resources. We discuss the chosen solution to address the storage space and analysis issue and the impact on overall functionality. This includes locally built STAR analysis framework, integration with a local DPM system (as a cost effective storage solution), influence of the availability and quality of network connection to Tier0 via dedicated CESNET/ESnet link and the development of light-weight yet fully automated data transfer tools allowing moving entire datasets from BNL (Tier0) to Golias (Tier2). We will summarize the impact of the gained computing performance on the efficiency of the offline analysis for the local physics group and show feasibility of such a solution that can used by other groups as well.
(Nuclear Physics Inst., Academy of Sciences, Praha)
Simulation and reconstruction of cosmic ray showers for the Pierre Auger Observatory on the EGEE grid
The Pierre Auger Observatory studies ultra-high energy cosmic rays.
Interactions of these particles with the nuclei of air gases at energies
many orders of magnitude above the current accelerator capabilities induce
unprecedented extensive air showers in the atmosphere. Different interaction
models are used to describe the first interactions in such showers and their
predictions are confronted with measured shower characteristics.
We created libraries of cosmic ray showers with more than 35 000 simulated events
using CORSIKA with EPOS or QGSjetII models. These showers are reused several times
for simulation of detector response at different position within the detector array.
We describe our experience with installation of the specific software on the grid
and running large amount of jobs on sites supporting the VO auger with dedicated and
also opportunistic resources. A web based dashboard for summary of job states was developed together with a custom database of available files with simulated and reconstructed showers.
(Institute of Physics, Prague), DrJiri Chudoba
(Institute of Physics, Prague)
SiteDB: Marshalling the people and resources available to CMS
In a collaboration the size of CMS (approx. 3000 users, and almost 100 computing centres of varying size) communication and accurate information about the sites it has access to is vital in co-ordinating the multitude of computing tasks required for smooth running. SiteDB is a tool developed by CMS to track sites available to the collaboration, the allocation to CMS of resources available at those sites and the associations between CMS members and the sites (as either a manager/operator of the site or a member of a group associated to the site). It is used to track the roles a person has for an associated site or group. SiteDB eases the co-ordination load for the operations teams by providing a consistent interface to manage communication with the people working at a site, by identifying who is responsible for a given task or service at a site and by offering a uniform interface to information on CMS contacts and sites.
SiteDB provides api's and reports for other CMS tools to use to access the information it contains, for instance enabling CRAB to use "user friendly" names when black/white listing CE's, providing role based authentication and authorisation for other web based services and populating various troubleshooting squads in external ticketing systems in use daily by CMS Computing operations.
Statistical Comparison of CPU performance for LHCb applications on the Grid
The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…)
Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound.
Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the achieved performance can be obtained.
This contribution will present the result of such comparison from recent LHCb activities on the Grid. The result are compared for different CPU models and the dependence with the CPU clock is shown for CPUs of the same family. Further comparisons with HEPIX WG results and LHCb, and other LHC experiments, benchmarking are also included.
DrRicardo Graciani Diaz
(Universidad de Barcelona)
Status of the Grid Computing for the ALICE Experiment in the Czech Republic
Czech Republic (CR) has been participating in the LHC Computing Grid
project (LCG) ever since 2003 and gradually, a middle-sized Tier2 center
has been built in Prague, delivering computing services for national HEP
experiments groups including the ALICE project at the LHC. We present a
brief overview of the computing activities and services being performed in
the CR for the ALICE experiment at the LHC.
(Nuclear Physics Institute AS CR)
Storm-GPFS-TSM: a new approach to Hierarchical Storage Management for the LHC experiments
In the framework of WLCG, the Tier-1 computing centres have very stringent requirements in the sector of the data storage, in terms of size, performance and reliability.
Since some years, at the INFN-CNAF Tier-1 we have been using two distinct storage systems: Castor as tape-based storage solution (also known as the
D0T1 storage class in the WLCG language) and the General Parallel File System (GPFS), in conjuction with StoRM as a SRM service, for pure disk access (D1T0). Commencing 2008 we have started to explore the possibility of employing GPFS together with the tape management software TSM as a solution for realizing a tape-disk infrastructure, first implementing a
D1T1 storage class (files always on disk with a backup on tape), and then also a D0T1 (hence involving also active recalls of files from tape to disk). The first StoRM-GPFS-TSM D1T1 system is nowadays already in production at CNAF for the LHCb experiment, while a prototype of D0T1 system is under development and study. We describe the details of the new D1T1 and D0T1 implementations, discussing the differences between the Castor-based solution and the StoRM-GPFS-TSM one. We also present the results of some performance studies of the novel D1T1 and D0T1 systems.
Pier Paolo Ricci
Testing PROOF Analysis with Pythia8 Generator Level Data
We study the performance of different ways of running a physics analysis in preparation for the analysis of petabytes of data in the LHC era. Our test cases include running the analysis code in a Linux cluster with a single thread in ROOT, with the Parallel ROOT Facility (PROOF), and in parallel via the Grid interface with the ARC middleware. We use of the order of millions of Pythia8 generator level QCD multi-jet events to stress the analysis system. The performances of the test cases are reported.
(Helsinki Institute of Physics)
The ATLAS Conditions Database Architecture for the Muon Spectrometer
The ATLAS Muon Spectrometer is the outer part of the ATLAS detector at LHC. It has been designed to detect charged particles exiting the barrel and end-cap calorimeters and to measure their momentum in the pseudorapidity range |η| < 2.7. The challenge performance in momentum measurements needs an accurate monitoring of detector and calibration parameters and an high complex architecture to stre them.
The ATLAS Muon System has extensively started to use the Condition Database to store all the conditions data needed for the reconstruction of the events.
The LCG conditions database project 'COOL' as the basis for all its conditions data storage both at CERN and throughout the worlwide collaboration as decided by the ATLAS Collaboration. The management of the Muon COOL conditions database will be one of the most challenging applications for Muon System, both in terms of data volumes and rates, but also in terms of the variety of data stored. The Muon Conditions database is responsible for almost of all the 'non-event' data and detector quality flags storage needed for debugging of the detector operations and for performing reconstruction and analysis. COOL implements an interval of validity database, i.e. objects stored or referenced in COOL have an associated start and end time between which they are valid, the data is stored in folders, which are themselves arranged in a hierarchical structure of foldersets. The structure is simple and mainly optimsed to store and retrieve object(s) associated to a particular time. In this work, an overview of the entire Muon Conditions Database architecture is given, including the different sources of the data and the storage model used, in addition, the software interfaces are also described.
The ATLAS Distributed Data Management Central Catalogues and steps towards scalability and high availability
The ATLAS Distributed Data Management system, Don Quijote2 (DQ2), has
been in use since 2004.
Its goal is to manage tens of petabytes of data per year, distributed
among the WLCG.
One of the most critical components of DQ2 is the central catalogues
which comprises a set of web services with a database back-end and a
distributed memory object caching system.
This component has proven to be very reliable and to fulfill ATLAS
requirements regarding performance and scalability.
In this paper we present the architecture of the DQ2 central
catalogues component and implementation decisions regarding
performance, scalability, replication and memory usage. The
exploitation of techniques and features of the Oracle database which
hosts the application is described together with an overview of the
disaster recovery strategy that needs to be in place to address the
requirement of high availability.
The ATLAS DQ2 Accounting and Storage Usage service
The DQ2 Distributed Data Management system is the system developed and used by ATLAS for handling very large datasets. It encompasses data bookkeeping, managing of largescale production transfers as well as endusers
data access requests.
In this paper, we will describe the design and implementation of the DQ2 accounting service. It collects different data usage informations in order to show and compare them from the experiment and application perspective. Today, the DQ2 data volume represents more than 8 petabytes, ~70 million file and 500 k dataset replicas, distributed in more than 500 grid storage endpoints.
The ATLAS METADATA INTERFACE
AMI is the main interface for searching for ATLAS datasets using physics metadata criteria.
AMI has been implemented as a generic database management framework which allows parallel searching over many catalogues, which may have differing schema, and may be distributed geographically, using different RDBMS.
The main features of the web interface will be described; in particular the powerful graphic query builder. The use of XML/XLST technology ensures that all commands can be used either on the web or from a command line interface via a web service.
We will also discuss how we have been able to use the AMI mechanism to describe database tables which belong to other applications so that the AMI generic interfaces can be used for browsing or querying the information they contain.
The ATLAS TAGS Database distribution and management - Operational challenges of a multi-terabyte distributed database system
The TAG files store summary event quantities that allow a quick selection of interesting events.
This data will be produced at a nominal rate of 200 Hz, and is uploaded into a relational database for access from websites and other tools.
The estimated database volume is 6TB per year, making it the largest application running on the ATLAS relational databases, at CERN and at other voluntary sites.
The sheer volume and high rate of production makes this application a challenge to data and resource management, on many aspects.
This paper will focus on the operational challenges of this system. These include:
uploading the data from files to the CERN's and remote sites' databases;
distributing the TAG metadata that is essential to guide the user through event selection; controlling resource usage of the database, from the user query load to the strategy of cleaning and archiving of old TAG data.
The CMS Computing Facilities Operations
The CMS Facilities and Infrastructure Operations group is responsible for providing and maintaining a working distributed computing fabric with a consistent working environment for Data operations and the physics user community. Its mandate is to maintain the core CMS computing services; ensure the coherent deployment of Grid or site specific components (such as workload management, file transfer and storage systems); monitor the CMS specific site availability and efficiency; systematically trouble-shoot and track facilities related issues.
In recent years, the CMS tiered computing infrastructure has grown significantly and was tested via so called “data challenges” and used for processing real cosmic data, routinely running 100k jobs per day distributed over more than 50 sites. In this presentation we will focus on operational aspects in the facilities area in view of the LHC startup. In particular, we will report on the experience gained and the progress made in the computing shift procedures, which are running in dedicated CMS centres inside and outside CERN. The collaborative effort of all CMS centres and good communication with CMS sites has proven to be an essential ingredient for efficient, sustained distributed data processing.
The CMS Dataset Bookkeeping Service Query Language (DBSql)
The CMS experiment has implemented a flexible and powerful approach enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to its physics data. In addition to the existing WEB based and programmatic API, a generalized query system has been designed and built. This query system has a query language that hides the complexity of the underlying database structure. This provides a way of querying the system that is straightforward for CMS data managers and physicists. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, then a query builder using a graph representation of the DBS schema constructs the actual SQL sent to underlying database. We will describe the design of the query system and provide details of the language components. We will also provide an overview of how this component fits into the overall data discovery system, as well as providing access to information about Data Quality and Luminosity.
The CMS experiment workflows on StoRM-based storage at Tier-1 and Tier-2 centers
The CMS experiment is preparing for data taking in many computing activities, including the testing, deployment and operation of various storage solutions to support the computing workflows of the experiment. Some Tier-1 and Tier-2 centers supporting the collaboration are deploying and commissioning StoRM storage systems. That is, posix-based disk storage systems on top of which StoRM implements the Storage Resource Manager (SRM version 2) interface allowing for a standard-based access from the Grid. This paper presents some tests made with CMS applications performing reference Tier-N workflows on StoRM storage, the configurations and solutions adopted and the experience so far achieved in production level operations.
The LHCb data bookkeeping system
The LHCb Bookkeeping is a system for the storage and retrieval of meta data associated with LHCb datasets. e.g. whether it is real or simulated data, which running period it is associated with, how it was processed and all the other relevant characteristics of the files.
The meta data are stored in an oracle database which is interrogated using services provided by the LHCb DIRAC3 infrastructure, that provides security, data streaming, and multi threading connections. Users can browse the Bookkeeping database through a command line interface or Graphical User Interface (GUI). The command line presents a view similar to a file system and the GUI is implemented on top of this.
The LHCb Software distribution
The installation of the LHCb software is handled by a single python script: install_project.py. This bootstrap
script is unique by allowing the installation of software projects on various operating system (Linux, Windows,
MacOSX). It is designed for the LHCb software deployment for a single user or for multiple users, in a shared area or on the Grid. It retrieves the software packages and deduces the dependencies using a remote web repository and thus takes care of the consistency of the installation.
Among the various features which have been implemented one can list: the fix of the access permission settings for the installed packages, the incremental installation using multiple deployment areas and the consistency check of the retrieved files.
The only prerequisite for the use of this tool is to have a recent enough version of the python language installed
(2.3 and above) and a reasonable network access.
(European Organization for Nuclear Research (CERN))
The new ROOT browser
Description of the new implementation of the ROOT browser
The nightly build and test system for LCG AA and LHCb software
The core software stack both from the LCG Application Area and LHCb consists of more than
25 C++/Fortran/Python projects build for about 20 different configurations on Linux, Windows
and MacOSX. To these projects, one can also add about 20 external software packages (Boost, Python, Qt,
CLHEP, ...) which have also to be build for the same configurations. It order to reduce the
time of the development cycle and increase the quality insurance, a framework has been developed for
the daily (nightly actually) build and test of the software.
Performing the build and the tests on several configurations and platform allows to increase
the efficiency of the unit and integration tests.
- flexible and fine grained setup (full, partial build) through a web interface
- possibility to build several "slots" with different configurations
- precise and highly granular reports on a web server
- support for CMT projects (but not only) with their cross-dependencies.
- scalable client-server architecture for the control machine and its build machines
- copy of the results in a common place to allow early view of the software stack
The nightly build framework is written in python for portability and it is easily extensible to
accommodate new build procedures.
(CERN), Karol Kruzelecki
(Cracow University of Technology-Unknown-Unknown)
The offline Data Quality Monitoring system of the ATLAS Muon Spectrometer
The ATLAS detector has been designed to exploit the full discovery potential of the LHC proton-proton collider at CERN, at the c.m. energy of 14 TeV. Its Muon Spectrometer (MS) has been optimized to measure final state muons from those interactions with good momentum resolution (3-10% for momentum of 100GeV/c-1TeV/c).
In order to ensure that the hardware, DAQ and reconstruction software of the ATLAS MS is functioning properly, Data Quality Monitoring (DQM) tools have been developed both for the online and the offline environment. The offline DQM is performed on histograms of quantities of interest which are filled in the ATLAS software framework ATHENA during different levels of processing - raw hit, reconstructed object (segment and track) and higher (physics) level. Then those histograms can be displayed and browsed by shifters and experts using various macros. They are also given as input to the Data Quality Monitoring Framework (DQMF) application, which applies simple algorithms and/or comparisons with reference histograms to set a status flag, which is propagated to a global status and saved in a database. A web display of DQMF results is also available. This initial processing is done on a subset of data (express stream) within a few hours of the run, and depending on the data quality, the whole statistics are then processed.
The offline muon DQM structure and content, as well as the corresponding tools developed, are presented, with examples from the commissioning of the MS with cosmic rays.
(Physics Department - Aristotle Univ. of Thessaloniki)
The Open Science Grid -- Operational Security in a Highly Connected World
Open Science Grid stakeholders invariably depend on multiple
infrastructures to build their community-based distributed systems.
To meet this need, OSG has built new gateways with TeraGrid, Campus
Grids, and Regional Grids (NYSGrid, BrazilGrid). This has brought new
security challenges for the OSG architecture and operations. The
impact of security incidents now has a larger scope and demands a
Operationally, we took first steps towards building an incident
sharing community among our peer grids. To reach higher-education user
communities, especially HEP researchers, outside the grids, OSG
members joined REN-ISAC. We also defined (jointly with EGEE) a set of
operational security tools and began implementation. And, because
across the infrastructures certificate hygiene is a top priority, we
worked with the IGTF (International Grid Trust Federation) to develop
risk assessment and incident response processes.
Architecturally, we analyzed how proxy credentials are treated
end-to-end in the OSG infrastructure. We discovered that the treatment
of proxies, after a job is finished, has some shortcomings. Given long
proxy lifetimes, a breach of a host can affect multiple users and
Finally, we are working on a banning service that can deny access to
resources by suspect users at the gatekeeper. We designed this site
service to receive alerts from a central banning service managed by
the security team in cases of emergencies. We envision that coupled
with our operational efforts, this service would be a first-line
defense against security incidents.
The ROOT event recorder
Description of the ROOT event recorder, a GUI testing and validation tool.
TMemStat - memory usage debugging and monitoring in ROOT and AliROOT
Memory monitoring is a very important part of complex project development.
Open Source tools, such as valgrind, are available for the task, however, their performance penalties make them not suitable for debugging long, CPU-intensive programs, such as reconstruction or simulation. We have developed the TMemStat tool, which, while not providing the full functionality of valgrind, gives developers the possibility to find memory problems even in very large projects,such as full simulation of the ALICE detector in high flux environment. TMemStat uses hooks for alloc and internal gcc functions, and provides detailed information about memory leaks and memory usage, with user-defined frequency or at user-defined watch points.
TSKIM : a tool for skimming ROOT trees
The same as many experiments, FERMI is storing its data within ROOT trees. A very common activity of physicists is the tuning of selection criteria which define the events of interest, thus cutting and pruning the ROOT trees so to extract all the data linked to those specific events. It is rather straighforward to write a ROOT script so to skim a single kind of data, for example the reconstructed one. This turns to be more tricky if you want to process also some simulated or analysis data at the same time, because each kind of data is structured with its own rules for what concerns file names, file sizes, tree names, identification of events, etc. TSkim has been designed so to ease this task. Thanks to a meta-data file which says where to find the run and event ids in the different kind of trees, TSkim is able to collect all the tree elements which match a given ROOT cut. The tool will also help when loading the shared libraries which describe the experiment data, or when pruning the tree branches. Initially a pair of PERL and ROOT scripts, TSkim is today a fully compiled C++ application, enclosing our ROOT know-how and offering a panel of features going far beyond the original FERMI requirements. In this talk, we plan to present the features of interest for any ROOT based experiment, including a new kind of event list, and emphasize the implementation mechanisms which make it scalable.
(Laboratoire Leprince-Ringuet (LLR)-Ecole Polytechnique-Unknown)
Using Python for Job Configuration in CMS
In 2008, the CMS experiment made the transition
from a custom-parsed language for job configuration
to using Python. The current CMS software release
has over 180,000 lines of Python configuration code.
We describe the new configuration system, the
motivation for the change, the transition
itself, and our experiences with the new
(California Institute of Technology)
Validation of software releases for CMS
The CMS software stack currently consists of more than 2 Million lines of code developed by over 250 authors with a new version being released every week. CMS has setup a release validation process for quality assurance which enables the developers to compare to previous releases and references.
This process provides the developers with reconstructed datasets of real data and MC samples. The samples span the whole range of detector effects and important physics signatures to benchmark the performance of the software. They are used to investigate interdependency effects of software packages and to find and fix bugs.
The samples have to be available in a very short time after a release is published to fit into the streamlined CMS development cycle. The standard CMS processing infrastructure and dedicated resources at CERN and FNAL are used to achieve a very short turnaround of 24 hours.
This talk will present the CMS release validation process and statistics describing the prompt usage of the produced samples.
Overall, it will emphasize the importance of a streamlined release validation process for projects with a large code basis and significant number of developers and can function as an example for future projects.
Visual Physics Analysis VISPA
VISPA is a novel development environment for high energy physics analyses which enables physicists to combine graphical and textual work. A physics analysis cycle consists of prototyping, performing, and verifying the analysis. The main feature of VISPA is a multipurpose window for visual steering of analysis steps, creation of analysis templates, and browsing physics event data at different steps of an analysis. VISPA follows an experiment-independent approach and incorporates various tools for steering and controlling required in a typical analysis. Connection to different frameworks of high energy physics experiments is achieved by using a Python interface. We present the look-and-feel for an example physics analysis at the LHC, and explain the underlying software concepts of VISPA.
Wide Area Network Access to CMS Data Using the Lustre Cluster Filesystem
The CMS experiment will generate tens of petabytes of data per year, data that will be processed, moved and stored in large computing facilities at locations all over the globe. Each of these facilities deploys complex and sophisticated hardware and software components which require dedicated expertise lacking at many of the university and institutions wanting access to the data as soon as it becomes available. Also, the standard methods for accessing data remotely rely on grid interfaces and batch jobs that while powerful, significantly increase the amount of procedural overhead and can impede a remote user’s ability to analyze data interactively, develop and debug code and examine detailed information.
We believe that enabling direct but remote access to CMS data will greatly enhance the analysis experience for remotes users not situated at a CMS Tier1 or Tier2.
The Lustre cluster filesystem allows remote servers the ability to mount filesystems over the wide-area-network as well as over the local-area network as it is more commonly used. It is also has an easy-to-deploy client, is reliable and performs exceptionally well. In this paper we report our experience using the Lustre filesystem to access CMS data from servers located a few hundred kilometers away from the physical filesystem. We describe the procedure used to connect two of the Florida Tier3 sites located in Miami and Daytona Beach to a storage element located in the University of Florida’s, located in Gainesville, Tier2 center and its High Performance Computing Center. We include details on the hardware used, kernel modifications and tunings, report on network bandwidth, system I/O performance and compare these benchmarks with actual CMS application runs. We also propose a possible scenario for implementing this new method of accessing CMS data in the context of the CMS data management system. Finally we explore some of the issues concerning remote user access with Lustre, and touch upon security concerns.
Prof.Rodriguez Jorge Luis
(Florida Int'l University)
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
(chair of the Academy of Sciences of the Czech Republic), Vaclav Hampl
(rector of the Charles University in Prague), Vaclav Havlicek
(rector of the Czech Technical University in Prague)
Plenary: MondayCongress Hall
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Live broadcasting at:
The LHC Machine and Experiments: Status and Prospects
The LHC Machine and Experiments: Status and Prospects
A personal review of WLCG and the readiness for first real LHC data, highlighting some particular successes, concerns and challenges that lie ahead.
coffee break, exhibits and posters
Plenary: MondayCongress Hall
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Live broadcasting at:
Status and Prospects of LHC Experiments Data Acquisiton
Data Acquisition systems are an integral part of their respective experiments. They are designed to meet the needs set by the physics programme. Despite some very interesting differences in the architecture the unprecedented data-rates at LHC have led to a lot of commonalities among the four large LHC data acquisition systems. All of them rely on commercial local area network technology and more specificially mostly on Gigabit Ethernet. They transport the data from the detector readout-boards to large farms of industry standard servers, where a pure software trigger is run. These four systems will be reviewed, the underlying commonalities will be high-lighted and interesting architectural differences will be discussed. In
view of a possible LHC upgrade we will briefly discuss the suitability and evolution of the current architectures to fit the needs of SLHC.
Status and Prospects of The LHC Experiments Computing
Status and Prospects of The LHC Experiments Computing
LHC data analysis starts on a Grid – What’s next?
For various reasons the computing facility for LHC data analysis has been organised as a widely distributed computational grid. Will this be able to meet the requirements of the experiments as LHC energy and luminosity ramp up? Will grid operation become a basic component of science infrastructure? Will virtualisation and the cloud model eliminate the need for complex grid
middleware? Will multi-core personal computers relegate the grid to a data delivery service?..... The talk will look at some of the advantages and some of the drawbacks of the grid approach, and will present a personal view on how things might evolve.
Collaborative Tools: MondayClub B
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
CMS Centres Worldwide: a New Collaborative Infrastructure
The CMS Experiment at the LHC is establishing a global network of inter-connected "CMS Centres" for controls, operations and monitoring. These support: (1) CMS data quality monitoring, detector calibrations, and analysis; and (2) computing operations for the processing, storage and distribution of CMS data.
We describe the infrastructure, computing, software, and communications, systems required to create an effective and affordable CMS Centre. We present our highly successful operations experiences with the major CMS Centres at CERN, Fermilab, and DESY during the LHC first beam data-taking and cosmic ray commissioning work. The status of the various centres already operating or under construction in Asia, Europe, Russia, South America, and the USA is also described.
We emphasise the collaborative communications aspects. For example, virtual co-location of experts in CMS Centres Worldwide is achieved using high-quality permanently-running "telepresence" video links. Generic Web-based tools have been developed and deployed for monitoring, control, display management and outreach.
Traditionally interaction between users and the Grid is done with command line tools. However, these tools are difficult to use by a non-expert user providing minimal help and generating outputs not always easy to understand especially in case of errors. Graphical User Interfaces are typically limited to providing access to the monitoring or accounting information and concentrate on some particular aspects failing to cover the full spectrum of grid control tasks.
To make the Grid more user friendly more complete graphical interfaces are needed. Within the DIRAC project we have attempted to construct a Web based User Interface that provides means not only for monitoring the system behavior but also allows to steer the main user activities on the grid. Using DIRAC's web interface a user can easily track jobs and data. It provides access to job information and allows to perform actions on jobs such as killing or deleting. Data managers can define and monitor file transfer activity as well as check requests set by jobs. Production managers can define and follow large data productions and react if necessary by stopping or starting them.
The Web portal is build following all the grid security standards and using modern Web 2.0 technologies which allows to achieve the user experience similar to the desktop applications. Details of the DIRAC Web Portal architecture and User Interface will be presented and discussed.
MrAdrian Casajus Ramo
(Departament d' Estructura i Constituents de la Materia)
Lecture archiving on a larger scale at the University of Michigan and CERN
The ATLAS Collaboratory Project at the University of Michigan has been a leader in the area of collaborative tools since 1999. Its activities include the development of standards, software and hardware tools for lecture archiving, and making recommendations for videoconferencing and remote teaching facilities. Starting in 2006 our group became involved in classroom recordings, and in early 2008 we spawned CARMA, a University-wide recording service. This service uses a new portable recording system that we developed. Capture, archiving and dissemination of rich multimedia content from lectures, tutorials and classes are increasingly widespread activities among universities and research institutes. A growing array of related commercial and open source technologies is becoming available, with several new products being introduced in the last couple years. As the result of a new close partnership between U-M and CERN IT, a market survey of these products is being conducted and will be presented. It will inform an ambitious effort in 2009 to equip many CERN rooms with automated lecture archiving systems, on a much larger scale than before. This new technology is being integrated with CERN’s existing webcast, CDS, and Indico applications.
(U. of Michigan)
Virtual Logbooks as a Tool for Enriching the Collaborative Experience in Large Scientific Projects
A key feature of collaboration in large scale scientific projects is
keeping a log of what and how is being done - for private use and
reuse and for sharing selected parts with collaborators and peers,
often distributed geographically on an increasingly global scale.
Even better if this log is automatic, created on the fly while
a scientist or software developer is working in a habitual way,
without the need for extra efforts. The CAVES - Collaborative Analysis
Versioning Environment System - and CODESH - COllaborative DEvelopment
SHell - projects address this problem in a novel way. They build on the
concepts of virtual state and virtual transition to enhance the
collaborative experience by providing automatic persistent virtual
logbooks. CAVES is designed for sessions of distributed data analysis
using the popular ROOT framework, while CODESH generalizes the same
approach for any type of work on the command line in typical UNIX
shells like bash or tcsh. Repositories of sessions can be configured
dynamically to record and make available the knowledge accumulated in
the course of a scientific or software endeavor. Access can be
controlled to define logbooks of private sessions or sessions shared
within or between collaborating groups. As a typical use case we
concentrate on building working scalable systems for analysis of
Petascale volumes of data expected with the start of the LHC
experiments. Our approach is general enough to find applications
in many scientific fields.
(University of Floria)
EVO (Enabling Virtual Organizations)
The EVO (Enabling Virtual Organizations) system is based on a new distributed and unique architecture, leveraging the 10+ years of unique experience of developing and operating large distributed production based collaboration systems. The primary objective being to provide to the High Energy and Nuclear Physics experiments a system/service that meet their unique requirements of usability, quality, scalability, reliability, and cost necessary for nationally and globally distributed research organizations.
The EVO system, which will be officially released during June 2007 includes a
better-integrated and more convenient user interface, a richer feature set including higher resolution video and instant messaging, greater adaptability to all platforms and operating systems, and higher overall operational efficiency and robustness. All of these aspects will be particularly important as we are entering the startup period of the LHC because the community will require an unprecedented level of daily collaboration. There will be intense demand for long distance scheduled meetings, person-to-person communication, group-to-group discussions, broadcast meetings, workshops and continuous presence at important locations such as control rooms and experimental areas. The need to have the collaboration tools totally integrated in the physicists’ working environments will gain great importance.
Beyond all these user-features, another key enhancement is the collaboration
infrastructure network created by EVO, which covers the entire globe and which is
fully redundant and resilient to failure. The EVO infrastructure automatically adapts
to the prevailing network configuration and status, so as to ensure that the
collaboration service runs without disruption. Because we are able to monitor the
end-user’s node, we are able to inform the user of any potential or arising problems (e.g. excessive CPU load or packet loss) and, where possible, to fix the problems automatically and transparently on behalf of the user (e.g. by switching to another server node in the network, by reducing the number of video streams received, et cetera). The integration of the MonALISA architecture into this new EVO architecture was an important step in the evolution of the service towards a globally distributed dynamic system that is largely autonomous.
The EVO system is now the primary collaboration system used by the LHC and more generally by High Energy and Nuclear Physics community going forward.
(California Institute of Technology (CALTECH))
High Definition Videoconferencing for High Energy Physics
We describe the use of professional-quality high-definition (HD) videoconferencing systems for daily HEP experiment operations and large-scale media events.
For CMS operations at the Large Hadron Collider, we use such systems for permanently running "telepresence" communications between the CMS Control Room in France and major offline CMS Centres at CERN, DESY, and Fermilab, and with a number of smaller sites worldwide on an as-needed basis. We have also used HD systems for large-scale global media events, such as the LHC First Beam Day event on Sept. 10, 2008, the world's largest scientific press event since the moon landing. For such events, poor quality audio or video signals or equipment failure is simply not an option.
We describe the systems we use today and our views on the future of HD videoconferencing and HD telepresence systems in High Energy Physics. We describe how high-quality, easy-to-use, extremely reliable videoconferencing systems may be established in a HEP environment at an affordable cost.
(Fermi National Accelerator Laboratory (FNAL))
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Towards the 5th LHC VO: The LHC beam studies in the WLCG environment
Recently a growing number of various applications have been quickly and successfully enabled on the Grid by the CERN Grid application support team. This allowed the applications to achieve and publish large-scale results in short time which otherwise would not be possible.
The examples of successful Grid applications include the medical and particle physics simulation (Geant4, Garfield), satellite imaging and geographic information for humanitarian relief operations (UNOSAT), telecommunications (ITU), theoretical physics (Lattice QCD, Feynman-loop evaluation), Bio-informatics (Avian Flu Data Challenge), commercial imaging processing and classification (Imense Ltd.).
Based on this successful experience, and that of the 4 LHC VOs, the LHC beam team has decided to run their tracking and collimation applications in the WLCG environment. The large amount of jobs, the level of service and the performance requirements as well as the importance of tracking applications for the four LHC experiments makes the LHC beam community a candidate for the 5th LHC VO.
In this talk we present the procedures, tools and services used for enabling the tracking applications in the WLCG environment. We also study the experience of running the LHC tracking applications on the Grid. We draw the analogies with the problems that ITER will have to face in the future to establish a collaboration within the Grid community and make a successful use of the Grid resources.
(CERN IT/GS), DrPatricia Mendez Lorenzo
CMS FileMover: One Click Data
The CMS experiment has a distributed computing model, supporting thousands of physicists at hundreds of sites around the world. While this is a suitable solution for "day to day" work in the LHC era there are edge use-cases that Grid solutions do not satisfy. Occasionally it is desirable to have direct access to a file on a users desktop or laptop; for code development, debugging or examining event displays.
We have developed a user-friendly, web based tool that bridges the gap between the large scale Grid resources and the smaller, simpler user edge cases. We discuss the development and integration of this new component with existing CMS and Grid services, as well as the constraints we have put in place to prevent misuse. We also explore possible future developments which could turn the current service into a general "low-latency" event delivery service.
Ganga: User-friendly Grid job submission and management tool for LHC and beyond
Ganga has been widely used for several years in Atlas, LHCb and a handful of other communities in the context of the EGEE project. Ganga provides a simple yet powerful interface for submitting and managing jobs to a variety of computing backends. The tool helps users configuring applications and keeping track of their work. With the major release of version 5 in summer 2008, Ganga's main user-friendly features have been strengthened. New configuration interface, enhanced support for job collections, bulk operations and easier access to subjobs are just few examples. In addition to the traditional batch and Grid backends such as Condor, LSF, PBS, gLite/EDG a point-to-point job execution via ssh on remote machines is now supported. Ganga is used as an interactive job submission interface for the end-users and also, as a job submission component for higher-level tools. For example GangaRobot is used to perform automated, end-to-end testing of the HEP data analysis chain on the Grid. Ganga comes with extensive test suite covering more than 350 test cases. The development model involves all active developers in the release management shifts which is an important and novel approach for the distributed software collaborations. Ganga 5 is a mature, stable and widely-used tool with long-term support from the HEP community.
DrDaniel van der Ster
Babar Task Manager II
The Babar experiment produced one of the largest datasets in high
energy physics. To provide for many different concurrent analyses
the data is skimmed into many data streams before analysis can begin,
multiplying the size of the dataset both in terms of bytes and number
of files. As a large scale problem of job management and data
control, the Babar Task Manager system was developed. The system
proved not able to scale to the size of the problem, and it was
wished to distribute the production to many sites and use grid
resources to help. A development effort was started, and the Task
Manager II was the result. This has now been in production for over
a year in Babar, and produced several skim cycles of data, at multiple
computing centers, and was able to use grid resources. The structure
of the system will be presented, along with details on scalability to
number of jobs, and use of remote sites both with and without grid
(STANFORD LINEAR ACCELERATOR CENTER)
Scalla/xrootd WAN globalization tools: where we are.
The Scalla/Xrootd software suite is a set of tools and suggested methods useful to build scalable, fault tolerant and high performance storage systems for POSIX-like data access. One of the most important recent development efforts is to implement technologies able to deal with the characteristics of Wide Area Networks, and find solutions in order to allow data analysis applications to directly access remote data repositories in an efficient way. This contribution describes the current status of the various features and mechanisms implemented in the Scalla/Xrootd sotware suite, which allow to create and efficiently access 'global' data repositories, obtained by aggregating multiple sites through Wide Area Networks. One of these mechanisms is the ability of the clients to efficiently exploit high-latency high-throughput WANs and access remote repositories in read/write mode for analysis-like tasks. We will also discuss the possibilities of making distant data sub-repositories cooperate. The aim is to give a unique view of their content, and eventually allow external systems to coordinate and trigger data movements among them. Experience in using Scalla/Xrootd remote data repositories will also be reported.
Reprocessing LHC beam and cosmic ray data with the ATLAS distributed Production System
We present our experience with distributed reprocessing of the LHC beam
and cosmic ray data taken with the ATLAS detector during 2008/2009.
Raw data were distributed from CERN to ATLAS Tier-1 centers, reprocessed
and validated. The reconstructed data were consolidated at CERN and ten WLCG
ATLAS Tier-1 centers and made available for physics analysis.
The reprocessing was done simultaneously in more than 30 centers
using the ATLAS Production System. Several challenging issues were
solved, such as scalable access to ATLAS conditions and calibration data,
bulk data prestaging and data distribution in quasi real time mode.
We also describe the ATLAS distributed production system running
at 70 Universities and Labs in Asia, Europe, North America and Pacific
region with automatic task sumbission, control and aggregation of results at
Event Processing: MondayClub E
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
CMS Software Performance Strategies
Performance of an experiment's simulation, reconstruction and analysis
software is of critical importance to physics competitiveness and making
optimum use of the available budget. In the last 18 months the performance
improvement program in the CMS experiment has produced more than a ten-fold
gain in reconstruction performance alone, a significant reduction in mass
storage system load, a reduction in memory consumption and a variety
of other gains. We present our application performance analysis methods
and our techniques for higher performance memory management, I/O, data
persistency, software packaging, code generation, as well as how to
reduce total memory usage. We report on specific gains achieved and the
main contributing causes. We discuss our estimate of future achievable
gains and promising new tools and analysis methods.
HEP C++ meets reality -- lessons and tips
In 2007 the CMS experiment first reported some initial findings on the
impedance mismatch between HEP use of C++ and the current generation
of compilers and CPUs. Since then we have continued our analysis of
the CMS experiment code base, including the external packages we use.
We have found that large amounts of C++ code has been written largely
ignoring any physical reality of the resulting machine code and run
time execution costs, including and especially software developed by
experts. We report on a wide range issues affecting typical high energy
physics code, in the form of coding pattern - impact - lesson - improvement.
(NORTHEASTERN UNIVERSITY OF BOSTON (MA) U.S.A.)
The ATLAS Simulation Validation and computing performance studies
The ATLAS Simulation validation project is done in two distinct phases. The first one is the computing validation, the second being the physics performance that must be tested and compared to available data. Infrastructure needed at each stage of validation is here described. In ATLAS software development is controlled by nightly builds to check stability and performance. The complete computing performance of the simulation is tested through three types of tests: ATLAS Nightly Tests (ATN), Real Time Tests (RTT) and Full Chain Tests (FCT)., each test being responsible for different levels of validation. In this report tests on robustness, benchmarking computing performance and basic functionality are described. In addition to automatic tests, computing time, memory consumption, and output ﬁle size are benchmarked in each stable release in a variety of processes both simple and complex. Single muons, electrons, and charged pions are used, as well as dijets in bins of leading parton pT , Supersymmetric benchmark point three (SU3), minimum bias, Higgs boson decaying to four leptons, Z → e+e−, Z → µ+µ−, and Z → τ+τ− events.
(Caltech, USA & Columbia University, USA)
The Virtual Point 1 Event Display for the ATLAS Experiment
We present an event display for the ATLAS Experiment, called Virtual Point
1 (VP1), designed initially for deployment at point 1 of the LHC, the
location of the ATLAS detector. The Qt/OpenGL based application provides
truthful and interactive 3D representations of both event and non-event
data, and now serves a general-purpose role within the experiment. Thus,
VP1 is used both online (in the control room itself or remotely via a
special "live" mode) and offline environments to provide fast debugging
and understanding of events, detector status and software. In addition to
a flexible plugin infrastructure and a high level of configurability, this
multi-purpose role is mainly facilitated by the application being embedded
directly in the ATLAS offline software framework, enabling it to use the
native Event Data Model directly, and thus run on any source of ATLAS
data, or even directly from within e.g. reconstruction jobs. Finally, VP1
provides high-quality pictures and movies, useful for outreach purposes.
(University of Pittsburgh)
Fireworks: A Physics Event Display for CMS
Fireworks is a CMS event display which is specialized for the physics
studies case. This specialization allows to use a stylized rather
than 3D accurate representation when it's appropriate. Data handling
is greatly simplified by using only reconstructed information and
ideal geometry. Fireworks provides an easy to use interface which
allows a physicist to concentrate only on the data to which they are
interested. Data is presented via graphical and textual views. Cross
view data interpretation is easy since the same object is shown using
the same color in all views and if the object is selected it is
highlighted in all views. Objects which have been selected can be
further studied by displaying a detailed view of just that object.
Physicists can select which events (e.g. require a high energy muon),
what data (e.g. which track list) and which items in a collection
(e.g. only high-pt tracks) to show. Once the physicist has configured
Fireworks to their liking they can save the configuration. Fireworks
is built using the Eve subsystem of the CERN ROOT project and CMS's
FWLite project. The FWLite project was part of CMS's recent code
redesign which separates data classes into libraries separate from
algorithms producing the data and uses ROOT directly for C++ object
storage thereby allowing the data classes to be used directly in ROOT.
(University of California, Santa Barbara)
Validation of software releases for CMS
The CMS software stack currently consists of more than 2 million lines of
code developed by over 250 authors with a new version being released every
week. CMS has setup a central release validation process for quality
assurance which enables the developers to compare the performance to
previous releases and references.
This process provides the developers with reconstructed datasets of real
data and MC samples. The samples span the whole range of detector effects
and important physics signatures to benchmark the performance of the
software. They are used to investigate interdependency effects of software
packages and to find and fix bugs.
This talk will describe the composition of the Release Validation sample
sets and list the development groups who requested and use these samples. It
especially points out the difficulties to compose coherent sample sets from
the various requests for release validation samples. All samples have to fit
within the available resource constraints. This is achieved by exploiting
synergies between the different requester use cases and sample requests.
Common to all use cases are the event processing workflows used to produce
the samples. They are modified compared to the production workflows to be
better suited for validation and described in more detail.
Overall, the talk will emphasize the importance of a central release
validation process for projects with a large code basis and significant
number of developers. It will summarize the extent and impact of the 2008
release validation sample production and can function as an example for
Grid Middleware and Networking Technologies: MondayPanorama
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
The UK particle physics Grid - status and developments
During 2008 we have seen several notable changes in the way the LHC experiments have tried to tackle outstanding gaps in the implementation of their computing models. The development of space tokens and changes in job submission and data movement tools are key examples. The first section of this paper will review these changes and the technical/configuration impacts they have had at the site level across the GridPP sites. The second section will look in more detail at challenges that have been faced the RAL Tier-1 site, and in particular work that has been done to improve the resilience and stability of core services. The third section of the paper will examine required recent changes in the operational model across the UK Tier-1 and Tier-2s as the focus has shifted to better supporting users and understanding how the user view of services differs from that of the infrastructure provider. This will be tackled through the use of several use cases which highlight common problems that still need to be overcome. The fourth and final section of the paper will present an analysis of GridPP metrics used within the project to assess progress, problems and issues.
(University of Cambridge - GridPP)
ITIL and Grid services at GridKa
Offering sustainable Grid services to users and other computing centres is the main aim of GridKa, the German Tier-1 centre of the WLCG infrastructure. The availability and reliability of IT services directly influences the customers’ satisfaction as well as the reputation of the service provider and not to forget the economical aspects. It is thus important to concentrate on processes and tools that increase the availability and reliability of IT services. At the German Tier 1 Centre GridKa a special working group for ITIL processes exists. This Group is responsible for the management of all the IT services offered by the institute. ITIL is a standardized and process-orientated description for the management of IT Services.
The ITIL model itself consists of several processes. We will show the different ITIL processes like Incident, Problem, Change and Configuration Management and how they are organized at GridKa. The special roles and a list of the tools which are implemented at GridKa to support the customers and the internal staff members will be presented. A special focus will be the distinction between the view from outside and inside the Steinbuch Centre for Computing and the consequences of this distinction for the ITIL processes.
(Karlsruhe Institute of Technology (KIT))
Advances in Grid Operations
A review of the evolution of WLCG/EGEE grid operations
Authors: Maria BARROSO, Diana BOSIO, David COLLADOS, Maria DIMOU, Antonio RETICO, John SHADE, Nick THACKRAY, Steve TRAYLEN, Romain WARTEL
As the EGEE grid infrastructure continues to grow in size, complexity and usage, the task of ensuring the
continued, uninterrupted availability of the grid services to the ever increasing number of user communities becomes more and more challenging. In addition, it is clear that these challenges will only
increase with the significant ramp‐up, in 2009, of data taking at the Large Hadron Collider; the main experiments of which are, through the WLCG service, by far the largest users of the EGEE grid
infrastructure. In this paper we discuss the ways in which the processes and tools of grid operations have been appraised and enhanced over the last 18 months in order to meet these challenges without any
increase in the size of the team, while at the same time improving the overall level of service that the users experience when using the grid infrastructure. The improvements to the operations procedures and tools
include: enhancements to the middleware lifecycle processes; improvements to operations communications channels (both to VOs and to sites); strategies to raise the availability and reliability of
sites; improvements in the level of service supplied by the central grid operations tools; improvements to the robustness of core middleware services; enhancements to the handing of trouble ticket; sharing of best
practices; and others.
These points are then brought together to describe how the grid central operations team has learned valuable lessons through the day‐to‐day experience of operating the infrastructure and
how operations has evolved as a result of this. In the last part of the paper, we will examine the future plans for further improvements in grid operations, including how we will deal with the unavoidable
reduction in the level of effort available to for grid operations, as the funding for EGEE comes to an end in early 2010, just as the use of the grid by the LHC experiments will dramatically increase.
(CERN), Nicholas Thackray
A Business Model for the Establishment of the European Grid Infrastructure
International research collaborations increasingly require secure sharing of resources owned by the partner organizations and distributed among different administration domains. Examples of resources include data, computing facilities (commodity computer clusters, HPC systems, etc.), storage space, metadata from remote archives, scientific instruments, sensors, etc. Sharing is made possible via Grid middleware, i.e. software services exposing a uniform interface regardless of the specific fabric-layer resource properties, providing access according to user role and in full compliance with the policies defined by the resource owners.
The Grid Infrastructure consists of: distributed resources – funded and owned by national and local resource providers – with their respective usage policies, interoperable middleware services installed and operated by resource providers, the Grid middleware distribution and the testbeds for its certification and integration, the Grid operations including authentication, authorization, monitoring and accounting, and, finally, user and application support.
The European project EGI_DS, brings about the creation of a new organizational model, capable of fulfilling the vision of a sustainable European Grid infrastructure for e-Science. The European Grid Initiative (EGI) is the proposed framework which links seamlessly at a world-wide level the European national e-Infrastructures operated by the National Grid Initiatives, and based on a European Unified Middleware Distribution (UMD), which will be the result of a joint effort of various European Grid middleware consortia.
This paper describes the actors contributing to the foundation of the European Grid infrastructure, and the use cases, the mission, the purpose, the offering, and the organizational structure which constitute the EGI business model.
(INFN Milano), Tiziana Ferrari
Analysis of the Use, Value and Upcoming Challenges for the Open Science Grid
The Open Science Grid usage has ramped up more than 25% in the past twelve months due to both the increase in throughput of the core stakeholders – US LHC, LIGO and Run II – and increase in usage by non-physics communities. We present and analyze this ramp up together with the issues encountered and implications for the future.
It is important to understand the value of collaborative projects such as the OSG in contributing to the scientific community. This needs to be cognizant of the environment of commercial cloud offerings, the evolving and maturing middleware for grid based distributed computing, and the evolution in science and research dependence on computation. We present a first categorization of OSG value and analysis across several different aspects of the Consortium’s goals and activities.
And last, but not least, we analyze the upcoming challenges of LHC data analysis ramp up and our ongoing contributions to the World Wide LHC Computing Grid.
CDF way to Grid
The CDF II experiment has been taking data at FNAL since 2001. The CDF computing architecture has evolved from initially using dedicated computing farms to using decentralized Grid-based resources on the EGEE grid, Open Science Grid and FNAL Campus grid.
In order to deliver high quality physics results in a timely manner to a running experiment,
CDF has had to adapt to Grid with minimum impact on the physicists analyzing the data. The use of portals to access the computing resources have allowed CDF to migrate Grid computing without changing how the users work. The infrastructure modifications was done by small steps over several years.
CDF started from the usage of glidein concept; i.e. submitting Condor-based pilot jobs to the Grid with the first pilot-based pool in 2005 at the CNAF Tier 1 site in Italy,
followed shortly by similar pools in N.America, Europe and Asia. This pilot job model evolved in OSG into the PANDA submission model of Atlas and the glideinWMS of CMS and recently integrated also into the CDF infrastructure. In order to access LCG/EGEE resources using the gLite middleware the CDF middleware has been reimplemented into
LcgCAF, a dedicated portal.
The evolution of the architecture together with the performances reached by the two portal will be discussed.
(University and INFN Padova)
Online Computing: MondayClub D
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Sponsored by ACEOLE
CMS Data Acquisition System Software
The CMS data acquisition system is made of two major subsystems: event building and event filter.
The presented paper describes the architecture and design of the software that processes the data
flow in the currently operating experiment. The central DAQ system relies heavily on industry
standard networks and processing equipment. Adopting a single software infrastructure in
all subsystems of the experiment imposes, however, a number of different requirements.
High efficiency and configuration flexibility are among the most important ones. The XDAQ software
infrastructure has matured over an eight years development and testing period and has shown to be
able to cope well with the CMS requirements. We provide performance figures and report on the initial
experience with the system at hand.
The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger
Event selection in the ATLAS High Level Trigger is accomplished to a large extent by reusing software components and event selection algorithms developed and tested in an offline environment. Many of these offline software modules are not specifically designed to run in a heavily multi threaded online data flow environment. The ATLAS High Level Trigger (HLT) framework based on the GAUDI and ATLAS ATHENA frameworks, forms the interface layer, which allows the execution of the HLT selection and monitoring code within the online run control and dataflow software. While such an approach provides a unified environment for trigger event selection across all of ATLAS, it also poses strict requirements on the reused software components in terms of performance, memory usage and stability. Experience of running the HLT selection software in the different environments and especially on large multi node trigger farms has been gained in several commissioning periods using preloaded Monte Carlo events, in data taking periods with cosmic events and in a short period with proton beams from LHC. The contribution discusses the architectural aspects of the HLT framework, its performance and its software environment within the ATLAS computing, trigger and data flow projects. Emphasis is also put on the architectural implications for the software by the use of multi core processors in the computing farms and the experiences gained with multi threading and multi process technologies.
(University of Wisconsin)
A common real time framework for SuperKEKB and Hyper Suprime-Cam at Subaru telescope
The real time data analysis at next generation experiments is a challenge because of their enormous data rate and size. The SuperKEKB experiment, the upgraded Belle experiment, requires to process 100 times larger data of current one taken at 10kHz. The offline-level data analysis is necessary in the HLT farm for the efficient data reduction.
The real time processing of huge data is also the key at the planned dark energy survey using the Subaru telescope. The main camera for the survey called Hyper Suprime-Cam consists of 100 CCDs
with 8 mega pixels each, and the total data size is expected to become comparable with that of SuperKEKB. The online tuning of measurement parameters is being planned by the real time processing, which was done empirically in the past.
We started a joint development of the real time framework to be shared both by SuperKEKB and Hyper Suprime-Cam. The parallel processing technique is widely adopted in the framework design to utilize a huge number of network-connected PCs with multi-core CPUs. The parallel processing is performed not only in the trivial event-by-event manner, but also in the pipeline of the software modules which are dynamically placed over the distributed computing nodes. The object data flow in the framework is realized by the object serializing technique with the object persistence. On-the-fly collection of histograms and N-tuples is supported for the run-time data monitoring.
The detailed design and the development status of the framework is presented.
The LHCb Run Control
LHCb has designed and implemented an integrated Experiment Control System. The Control System uses the same concepts and the same tools to control and monitor all parts of the experiment: the Data Acquisition System, the Timing and the Trigger Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure and the interaction with the CERN Technical Services and the Accelerator.
LHCb's Run Control, the main interface used by the experiment's operator, provides access in a hierarchical, coherent and homogeneous manner to all areas of the experiment and to all its sub-detectors. It allows for automated (or manual) configuration and control, including error recovery, of the full experiment in its different running modes: physics, cosmics, calibration, etc.
Different instances of the same Run Control interface are used by the various sub-detectors for their stand-alone activities: test runs, calibration runs, etc.
The architecture and the tools used to build the control system, the guidelines and components provided to the developers, as well as the first experience with the usage of the Run Control will be presented.
The ALICE Online-Offline Framework for the Extraction of Conditions Data
The ALICE experiment is the dedicated heavy-ion experiment at the CERN LHC and will take data with a bandwidth of up to 1.25 GB/s. It consists of 18 subdetectors that interact with five online systems (DAQ, DCS, ECS, HLT and Trigger). Data recorded are read out by DAQ in a raw data stream produced by the subdetectors. In addition the subdetectors produce conditions data derived from the raw data, i.e. calibration and alignment information, which have to be available from the beginning of the reconstruction and therefore cannot be included in the raw data. The extraction of the conditions data is steered by a system called Shuttle. It provides the link between data produced by the subdetectors in the online systems** and a dedicated procedure per subdetector, called preprocessor, that runs in the Shuttle system. The preprocessor performs merging, consolidation and reformatting of the data. Finally, it stores the data in the Grid Offline Conditions Data Base (OCDB) so that they are available for the Offline reconstruction. The reconstruction of a given run is initiated automatically once the raw data are successfully exported to the Grid storage and the run has been processed in the Shuttle framework. While data-taking, a so-called quasi-online reconstruction is performed using the reduced set of conditions data that is already available during the current run.
The talk introduces the quasi-online reconstruction strategy within the ALICE online-offline framework, i.e. the Shuttle system. The performance of such a complex system during the ALICE cosmics commissioning and LHC startup is described. Special emphasis is given to operational issues and feedback received. Operational statistics and remaining open issues are presented.
** Processing in the ALICE DAQ is discussed in a separate talk
The DZERO Level 3 Trigger and data acquisition system has been successfully running since March of 2001, taking data for the DZERO experiment located at the Tevatron at the Fermi National Laboratory. Based on a commodity parts, it reads out 65 VME front end crates and delivers the 250 MB of data to one of 1200 processing cores for a high level trigger decision at a rate of 1 kHz. Accepted events are then shipped to the DZERO online system where they are written to tape. The design is still relatively modern – all data pathways are based on TCP/IP and all components from the single board computer in the readout crates to the Level 3 trigger farm are based on commodity items. All parts except for the central network switch have been upgraded during the lifetime of the system. This paper will discuss the performance – particularly as the Tevatron has continued to increase its peak luminosity -- and the lessons learned during the upgrade of both the farms and the front end readout crate processors. We will also discuss the continued evolution of the automatic program that repairs common problems in the DAQ system.
(UNIVERSITY OF WASHINGTON)
Software Components, Tools and Databases: MondayClub A
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
The CMS Offline condition database software system
Non-event data describing detector conditions change with time and
come from different data sources. They are accessible by physicists
within the offline event-processing applications for precise calibration of reconstructed data as well as for data-quality control purposes.
Over the past three years CMS has developed and deployed a software
system managing such data. Object-relational mapping and the relational
abstraction layer of the LHC persistency framework are the foundation;
the offline condition framework updates and delivers C++ data objects according to their validity. A high-level tag versioning system allows production managers to organize data in hierarchical view. A scripting API in python, command-line tools and a web service serve physicists in daily work. A mini-framework is available for handling data coming from external sources. Efficient data distribution over the worldwide network is guaranteed by a system of hierarchical web caches.
The system has been tested and used in all major productions, test-beams and cosmic runs.
Advanced Technologies for Scalable ATLAS Conditions Database Access on the Grid
During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic workflow, ATLAS database scalability tests provided feedback for Conditions DB software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions DB data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions DB data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of “pilot” job submission on the Grid: instead of the actual query ATLAS utility library sends to the database server a “pilot” query first.
LCG Persistency Framework (POOL, CORAL, COOL) - Status and Outlook
The LCG Persistency Framework consists of three software packages (POOL, CORAL and COOL) that address the data access requirements of the LHC experiments in several different areas. The project is the result of the collaboration between the CERN IT Department and the three experiments (ATLAS, CMS and LHCb) that are using some or all of the Persistency Framework components to access their data. The POOL package is a hybrid technology store for C++ objects, using a mixture of streaming and relational technologies to implement both object persistency and object metadata catalogs and collections. POOL provides generic components that can be used by the experiments to store both their event data and their conditions data. The CORAL package is an SQL-free abstraction layer for accessing data stored using relational database technologies. It is used directly by experiment-specific applications and internally by both COOL and POOL. The COOL package provides specific software components and tools for the handling of the time variation and versioning of the experiment conditions data. This presentation will report on the status and outlook of developments in each of the three sub-projects. It will also briefly review the usage and deployment models for these software packages in the three LHC experiments contributing to their development.
Distributed Database Services - a Fundamental Component of the WLCG Service for the LHC Experiments - Experience and Outlook
Originally deployed at CERN for the construction of LEP, relational databases now play a key role in the experiments' production chains, from online acquisition through to offline production, data distribution, reprocessing and analysis. They are also a fundamental building block for the Tier0 and Tier1 data management services. We summarize the key requirements in terms of availability, performance and scalability and explain the primary solutions that have been deployed both on- and off-line, at CERN and outside, to meet these requirements.
We describe how the distributed database services deployed in the Worldwide LHC Computing Grid have met the challenges of 2008 - the two phases of CCRC'08, together with data taking from cosmic rays and the short period of LHC operation.
Finally, we list the areas - both in terms of the baseline services as well as key applications and data life cycle - where enhancements have been required for 2009 and summarize the experience gained from 2009 data taking readiness testing - aka "CCRC'09" - together with a prognosis for 2009 data taking.
CORAL server: a middle tier for accessing relational database servers from CORAL applications
The CORAL package is the CERN LCG Persistency Framework common relational database abstraction layer for accessing the data of the LHC experiments that is stored using relational database technologies.
A traditional two-tier client-server model is presently used by most CORAL applications accessing relational database servers such as Oracle, MySQL, SQLite.
A different model, involving a middle tier server solution deployed close to the database servers, has recently been discussed. This would provide several advantages over the simple client-server model in the areas of security (authentication via proxy certificates) and of scalability and performance (multiplexing for several incoming connections, etc.). Data caching is also available, by a "proxy server" component, deployed close to the database users.
A joint development of such a middle tier (CERN, SLAC), known as 'CORAL server', is ongoing.
This presentation will report on the status and outlook of the developments, solutions and test results for the new software components relevant to this project.
An Integrated Overview of Metadata in ATLAS
Metadata--data about data--arise in many contexts, from many diverse sources,
and at many levels in ATLAS.
Familiar examples include run-level, luminosity-block-level, and event-level metadata, and,
related to processing and organization, dataset-level and file-level metadata,
but these categories are neither exhaustive nor orthogonal.
Some metadata are known a priori, in advance of data taking or simulation; other metadata
are known only after processing--and occasionally, quite late (e.g., detector status
or quality updates that may appear after Tier 0 reconstruction is complete).
Metadata that may seem relevant only internally to the distributed computing infrastructure
under ordinary conditions may become relevant to physics analysis under error conditions
("What can I discover about data I failed to process?").
This talk provides an overview of metadata and metadata handling in ATLAS, and
describes ongoing work to deliver integrated metadata services in support of physics
(Argonne National Laboratory), DrElizabeth Gallas
(University of Oxford)
coffee break, exhibits and posters
Distributed Processing and Analysis: MondayClub C
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
End-to-end monitoring for data management
One of the current problem areas for sustainable WLCG operations is in the
area of data management and data transfer. The systems involved (e.g.
Castor, dCache, DPM, FTS, gridFTP, OPN network) are rather complex and have
multiple layers - failures can and do occur in any layer and due to the
diversity of systems involved, the differences in the information they have
available and their log formats it is currently extremely manpower-intensive
to debug problems across these systems. That the information is often
located on more than one WLCG site also complicates the problem and
increases the latency in problem resolution. Additionally, we lack a good
set of monitoring tools to provide a high-level operations-focused overview
of what is happening upon the transfer services, and where the current top
problems are. The services involved have most of the necessary information
- we just don't collect all of it, join it and provide a useful view.
The paper will describe the current status of a set of operations tools
that allow a service manager to debug acute problem through the multiple
layers (allowing them to see how a request is handled across all components
involved). It will also report on work towards an "operations dashboard" for service managers to show what (and where) the current top problems in the system are.
Workflow generator and tracking at the rescue of distributed processing. Automating the handling of STAR's Grid production.
Processing datasets on the order of tens of terabytes is an onerous task, faced by production coordinators everywhere. Users solicit data productions and, especially for simulation data, the vast amount of parameters (and sometime incomplete requests) point at the need for a tracking, control and archiving all requests made so a coordinated handling could be made by the production team.
With the advent of grid computing the parallel processing power has increased but traceability has also become an increasing problematic due to the heterogeneous nature of Grids. Any one of a number of components may fail invalidating the job or execution flow in various stages of completion and re-submission of a few of the multitude of jobs (keeping the entire dataset production consistency) a difficult and tedious process. From the definition of the workflow to its execution, there is a strong need for validation, tracking, monitoring and reporting of problems.
To ease the process of requesting production workflow, STAR has implemented several components addressing the full workflow consistency. A Web based online submission request module, implemented using Drupal’s Content Management System API, enforces ahead that all parameters are described in advance in a uniform fashion. Upon submission, all jobs are independently tracked and (sometime experiment-specific) discrepancies are detected and recorded providing detailed information on where/how/when the job failed. Aggregate information on success and failure are also provided in near real-time. We will describe this system in full.
(BROOKHAVEN NATIONAL LABORATORY)
CMS Grid Submission Portal
We present a Web portal for CMS Grid submission and management. Grid portals can deliver complex grid solutions to users without the need to download, install and maintain specialized software, or worrying about setting up site-specific components. The goal is to reduce the complexity of the user grid experience and to bring the full power of the grid to physicists engaged in LHC analysis through a standard web GUI.
We describe how the portal exploits standard, off-the-shelf commodity software together with existing grid infrastructures in order to facilitate job submission and monitoring. Currently users are exposed to different flavors of grid middleware and the installation and maintenance of CMS and Grid specific software is still very complex for most physicists. The goal of the CMS grid submission portal is to hide and integrate the complex infrastructure details that can hinder a user's ability to do science. A rich AJAX user interface provides users the functionality to create, submit, share and monitor grid submissions. The grid portal is built on J2EE architecture employing enterprise technologies powered by JBoss application server. This technology has been used for many years in industry to provide enterprise class application deployments. The architecture is comprised of three tiers; presentation, business logic and data persistence. The presentation layer currently consists of a Java Server Faces web interface developed with Netbeans Visual Web Page development tools. The business logic layer provides interfaces to existing grid infrastructure such as VOMS, Globus, CRAB and CRABSERVER.
This paper describes these developments, work in progress and plans for future enhancements.
Status of the ALICE CERN Analysis Facility
The ALICE experiment at CERN LHC is intensively using a PROOF cluster for fast analysis and reconstruction. The current system (CAF - CERN Analysis Facility) consists of some 120 CPU cores and about 45 TB of local space. One of the most important aspects of the data analysis on the CAF is the speed with which it can be carried out. Fast feedback on the collected data can be obtained, which allows quasi-online quality assurance of the data as well as fast analysis that is essential for the success of the experiment. CAF aims to provide fast response in prototyping code for users needing many development iterations. PROOF allows the interactive parallel processing of data distributed on a local cluster via the xrootd protocol. Subsets of selected data can be automatically staged in CAF from the Grid storage systems.
The talk will present the current setup, performance tests and comparison with a previous cluster and usage statistics. The possibility to use a PROOF setup for parallel data reconstruction is discussed using as example ALICE software framework AliRoot. Furthermore, needed developments, plans and the future scenario of PROOF on a Grid environment are addressed.
CMS Analysis Operations
During normal data taking CMS expects to support potentially as many as 2000 analysis users. In 2008 there were more than 800 individuals who submitted a remote analysis job to the CMS computing infrastructure. The bulk of these users will be supported at the over 40 CMS Tier-2 centers. Supporting a globally distributed community of users on a globally distributed set of computing clusters is a task that requires reconsidering the normal methods of user support for analysis operations.
In 2008 CMS formed an Analysis Support Task Force in preparation for large scale physics analysis activities. The charge of the task force was to evaluate the available support tools, the user support techniques, and the direct feedback of users with the goal of improving the success rate and user experience when utilizing the distributed computing environment. The task force determined the tools needed to assess and reduce the number of non-zero exit code applications submitted to through the grid interfaces and worked with the CMS Experiment Dashboard developers to obtain the necessary information to quickly and proactively identify issues with user jobs and data sets hosted at various sites. Results of the analysis group surveys were compiled. Reference platforms for testing and debugging problems were established in various geographic regions. The task force also assesed the resources needed to make the transition to a permanent Analysis Operations task. In this presentation the results of the task force will be discussed as well as the CMS analysis operations plans for the start of data taking.
(Department of Physics-Univ. of California at San Diego (UCSD))
Babar production - the final dataset?
The Babar experiment has been running at the SLAC National Accelerator
Laboratory for the past nine years, and has measured 500 fb-1 of data.
The final data run for the experiment finished in April 2008. Once the
data was finished the final processing of all Babar data was started.
This was the largest computing production effort in the history of
Babar, including a reprocessing of all measured data, a full simulation
with latest code versions for all measured detector conditions, and a
full skimming of this data into all current analysis streams for use.
This effort ended up producing the largest rates of CPU use and data
production in the history of an already large scale experiment. The
difficulties and successes of this effort will be reported with the
amounts of data size, cpu time, and computing centers used.
(STANFORD LINEAR ACCELERATOR CENTER)
Event Processing: MondayClub E
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Experience with the CMS EDM
The re-engineered CMS EDM was presented at CHEP in 2006. Since that time we have gained a lot of operational experience with the chosen model. We will present some of our findings, and attempt to evaluate how well it is meeting its goals. We will discuss some of the new features that have been added since 2006 as well as some of the problems that have been addressed. Also discussed is the level of adoption throughout CMS, which spans the trigger farm up to the final physics analysis. Future plans, in particular dealing with schema evolution and scaling, will be discussed briefly.
File Level Provenance Tracking in CMS
The CMS Offline framework stores provenance information within CMS's standard ROOT event data files. The provenance information is used to track how every data product was constructed including what other data products were read in order to do the construction. We will present how the framework gathers the provenance information, the efforts necessary to minimize the space used to store the provenance in the file and the tools which will be available to use the provenance.
(Fermi National Accelerator Laboratory)
PAT: the CMS Physics Analysis Toolkit
The CMS Physics Analysis Toolkit (PAT) is presented. The PAT is a high-level analysis layer enabling the development of common analysis efforts across and within Physics Analysis Groups. It aims at fulfilling the needs of most CMS analyses, providing both ease-of-use for the beginner and flexibility for the advanced user. The main PAT concepts are described in detail and some examples from realistic physics analyses are given.
(SNS & INFN Pisa, CERN)
ROOT: Support For Significant Evolutions of the User Data Model
One of the main strength of ROOT I/O is its inherent support for schema evolution. Two distinct modes are supported, one manual via a hand coded Streamer function and one fully automatic via the ROOT StreamerInfo. One draw back of the Streamer function is that they are not usable by TTrees in split mode. Until now, the automatic schema evolution mechanism could not be customized by the user and the only mechanism to go beyond the default rules was to revert to using the Streamer Function. In ROOT 5.22/00, we introduced a new mechanism which allows user extensions of the automatic schema evolution that can be used in object-wise, member-wise and split modes. This presentation will describe the myriads of possibility ranging from the simple assignment of transient members to the complex reorganization of the user's object model.
The Software Framework of the ILD detector concept at the ILC detector
The International Linear Collider is the next large accelerator project in
High Energy Physics.
The ILD Detector Concept is one of three international working groups that
are developing a detector concept for the ILC. It has been created by merging the two
concept studies LDC and GLD in 2007.
ILD uses a modular C++ application framework (Marlin) that is
based on the international data format LCIO. It allows the distributed
development of reconstruction and analysis software.
Recently ILD has produced a large Monte Carlo data set of Standard Model physics and
expected new physics signals at the ILC in order to further optimize the detector
concept based on the Particle Flow paradigm. This production was only possible by
exploiting grid computing resources available for ILC in the context of the WLCG.
In this talk we give an overview of the core framework
focusing on recent developments and improvements needed for the large
scale Monte Carlo production since it has been last presented at CHEP2007.
The CMS Computing, Software and Analysis Challenge
The CMS experiment has performed a comprehensive challenge during May 2008 to test the full scope of offline data handling and analysis activities needed for data taking during the first few weeks of LHC collider operations. It constitutes the first full-scale challenge with large statistics under the conditions expected at the start-up of the LHC, including the expected initial mis-alignments and mis-calibrations for each sub-detector, and event signatures and rates typical for low instantaneous luminosity. Particular emphasis has been given to the prompt reconstruction workflows, and to the procedures for the alignment and calibration of each sub-detector. The latter were performed with restricted latency using the same computing infrastructure that will be used for real data, and the resulting calibration and alignment constants were used to re-reconstruct the data at Tier-1 centres. The presentation addresses the goals and practical experience from the challenge, and the lessons learned in view of LHC data taking are discussed.
Grid Middleware and Networking Technologies: MondayPanorama
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
GOCDB, A Topology Repository For A Worldwide Grid Infrastructure
All grid projects have to deal with topology and operational information like resource distribution, contact lists and downtime declarations. Storing, maintaining and publishing this information properly is one of the key elements to successful grid operations. The solution adopted by EGEE and WLCG projects is a central repository that hosts this information and makes it available to users and client tools. This repository, known as GOCDB, is used through EGEE and WLCG as an authoritative primary source of information for operations, monitoring, accounting and reporting. After giving a short history of GOCDB, the paper describes the current architecture of the tool and gives an overview of its well established development workflows and release procedures. It also presents different collaboration use cases with other EGEE operations tools and deals with the High Availability mechanism put in place to address failover and replication issues. It describes ongoing work on providing web services interfaces and gives examples of integration with other grid projects, such as the NGS in the UK. The paper finally presents our vision of GOCDB's future and associated plans to base its architecture on a pseudo object database model, allowing for its distribution across the 11 EGEE regions. This will be one of the most challenging works to achieve during the third phase of EGEE in order to prepare for a sustainable European Grid Infrastructure.
(STFC, Didcot, UK)
Bringing the CMS Distributed Computing System into Scalable Operations
Establishing efficient and scalable operations of the CMS distributed
computing system critically relies on the proper integration,
commissioning and scale testing of the data and workfload management
tools, the various computing workflows and the underlying computing
infrastructure located at more than 50 computing centres worldwide
interconnected by the Worldwide LHC Computing Grid.
Computing challenges periodically undertaken by CMS in the past years
with increasing scale and complexity have revealed the need for a
sustained effort on computing integration and commissioning
activities. The Processing and Data Access (PADA) Task Force was
established at the beginning of 2008 within the CMS Computing
Programme with the mandate of validating the infrastructure for
organized processing and user analysis including the sites and the
workload and data management tools, validating the distributed
production system by performing functionality, reliability and scale
tests, helping sites to commission, configure and optimize the
networking and storage through scale testing data transfers and data
processing, and improving the efficiency of accessing data across the
CMS computing system from global transfers to local access.
This contribution will report on the tools and procedures developed by
CMS for computing commissioning and scale testing as well as the
improvements accomplished towards efficient, reliable and scalable
computing operations. The activities include the development and
operation of load generators for job submission and data transfers
with the aim of stressing the experiment and Grid data management and
workload management systems, site commissioning procedures and tools
to monitor and improve site availability and reliability, as well as
activities targeted to the commissioning of the distributed
production, user analysis and monitoring systems.
A Dynamic System for ATLAS Software Installation on OSG Grid site
ATLAS Grid production, like many other VO applications, requires the
software packages to be installed on remote sites in advance. Therefore,
a dynamic and reliable system for installing the ATLAS software releases
on Grid sites is crucial to guarantee the timely and smooth start of
ATLAS production and reduce its failure rate.
In this talk, we discuss the issues encountered in the previous software
installation system, and introduce the new approach, which is
built upon the new development in the areas of the ATLAS workload
management system (PanDA), and software package management system
(pacman). It is also designed to integrate with the EGEE ATLAS software
In the new system, ATLAS software releases are packaged as pacball, a
uniquely identifiable and reproducible self-installing data file. The
distribution of pacballs to remote sites is managed by ATLAS data
management system (DQ2) and PanDA server. The installation on remote
sites is automatically triggered by the PanDA pilot jobs. The
installation job payload connects to the EGEE ATLAS software
installation portal, making the information of installation status
easily accessible across OSG and EGEE Grids.
The deployment of this new system and its performance in USATLAS
production will also be discussed.
(Brookhaven National Laboratory,USA)
Migration of ATLAS PanDA to CERN
The ATLAS Production and Distributed Analysis System (PanDA) is a key
component of the ATLAS distributed computing infrastructure. All ATLAS
production jobs, and a substantial amount of user and group analysis
jobs, pass through the PanDA system which manages their execution on
the grid. PanDA also plays a key role in production task definition
and the dataset replication request system. PanDA has recently been
migrated from Brookhaven National Laboratory (BNL) to the European
Organization for Nuclear Research (CERN), a process we describe here.
We discuss how the new infrastructure for PanDA, which relies heavily
on services provided by CERN IT, was introduced in order to make the
service as reliable as possible and to allow it to be scaled to
ATLAS's increasing need for distributed computing.
The migration involved changing the backend database for PanDA from
MySQL to ORACLE, which impacted upon the database schemas. The process
by which the client code was optimised for the new database backend is
illustrated by example. We describe the procedure by which the
database is tested and commissioned for production use.
Operations during the migration had to be planned carefully to
minimise disruption to ongoing ATLAS operations. All parts of the
migration had to be fully tested before commissioning the new
infrastructure, which at times involved careful segmenting of ATLAS
grid resources in order to verify the new services at scale.
Finally, after the migration was completed, results on the final
validation and full scale stress testing of the new infrastructure are
DrGraeme Andrew Stewart
(University of Glasgow)
Critical services in the LHC computing
The LHC experiments (ALICE, ATLAS, CMS and LHCb) rely for the data acquisition, processing, distribution, analysis and simulation on complex computing systems, run using a variety of services, provided by the experiment services, the WLCG Grid and the different computing centres. These services range from the most basic (network, batch systems, file systems) to the mass storage services or the Grid information system, up to the different workload management systems, data catalogues and data transfer tools, often internally developed in the collaborations.
In this contribution we review the status of the services most critical to the experiments by quantitatively measuring their readiness with respect to the start of the LHC operations. Shortcomings are identified and common recommendations are offered.
Status and outlook of the HEP Network
I will review the status, outlook recent technology trends and
state of the art developments in the major networks serving the
high energy physics community in the LHC era.
I will also cover the progress in reducing or closing the Digital Divide
separating scientists in several world regions from the mainstream,
from the perspective of the ICFA Standing Committee on
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Isidro Gonzales Caballero
A comparison of HEP code with SPEC benchmark on multicore worker nodes
The SPEC INT benchmark has been used as a performance reference for computing in the HEP community for the past 20 years. The SPEC CPU INT 2000 (SI2K) unit of performance has been used by the major HEP experiments both in the Computing Technical Design Report for the LHC experiments and in the evaluation of the Computing Centres. At recent HEPiX meetings several HEP sites have reported disagreements between actual machine performances and the scores reported by SPEC.
Our group performed a detailed comparison of Simulation and Reconstruction code performances from the four LHC experiments in order to find a successor to the SI2K benchmark.
We analyzed the new benchmarks from SPEC CPU 2006 suite, both integer and floating point, in order to find the best agreement with the HEP code behaviour, with particular attention paid to reproducing the actual environment of HEP farm i,e., each job running independently on each core, and matching compiler, optimization, percentage of integer and floating point operations, and ease of use.
(INFN + Hepix)
Experience with low-power x86 processors (ATOM) for HEP usage
In CERN openlab we have being running tests with a server using a low-power ATOM N330 dual-core/dual-thread processor deploying both HEP offline and online programs.
The talk will report on the results, both for single runs as well as max throughput runs, and will also report on the results of thermal measurements. It will also show how the price/performance of an ATOM system compares to a Xeon system. Finally it will make recommendations as to how such low-power systems can be made optimal for HEP usage
Air Conditioning and Computer Centre Power Efficiency: the Reality
The current level of demand for Green Data Centres has created a growing market for consultants providing advice on how to meet the requirement for high levels of electrical power and, above all, cooling capacity both economically and ecologically. How should one choose, in the face of the many competing claims, the right concept for a cooling system in order to reach the right power level, efficiency, carbon emissions, reliability and to ensure flexibility in the face of future computing technology evolution?
This presentation will compare and contrast various alternative computer centre cooling solutions, in particular covering examples of old technologies that are returning to favour in the context of the present energy crisis and new products vying for a place the market in addition to classic design options.
A High Performance Hierarchical Storage Management System
For the Canadian Tier-1 Centre at TRIUMF
We describe in this paper the design and implementation of Tapeguy, a high performance non-proprietary Hierarchical Storage Management System (HSM) which is interfaced to dCache for efficient tertiary storage operations. The system has been successfully implemented at the canadian Tier-1 Centre at TRIUMF. The ATLAS experiment will collect a very large amount of data (approximately 3.5 Petabytes each year). An efficient HSM system will play a crucial role in the success of the ATLAS Computing Model which is driven by intensive large-scale data analysis activities that will be performed on the Worldwide LHC Computing Grid infrastructure around the clock.
Tapeguy is perl-based. It controls and manages data and tape libraries. Its architecture is scalable and includes Dataset Writing control, a Readback Queuing mechanism and I/O tape drive load balancing as well as on-demand allocation of resources. A central MySQL database records metadata information for every file and transaction (for audit and performance evaluation), as well as an inventory of library elements. Tapeguy Dataset Writing was implemented to group files which are close in time and of similar type. Optional dataset path control dynamically allocates tape families and assign tapes to it. Tape flushing is based on various strategies: time, threshold or external callbacks mechanisms. Tapeguy Readback Queuing reorders all read requests by using a 'scan algothrim', avoiding unnecessary tape loading and unloading. Implementation of priorities will guarantee file delivery to all clients in a timely manner.
Fair-share scheduling algorithm for a tertiary storage system
Any experiment facing Peta bytes scale problems is in need for a highly scalable mass storage system (MSS) to keep a permanent copy of their valuable data. But beyond the permanent storage aspects, the sheer amount of data makes complete dataset availability onto “live storage” (centralized or aggregated space such as the one provided by Scala/Xrootd) cost prohibitive implying that a dynamic population from MSS to faster storage is needed. One of the most difficult aspects of dealing with MSS is the robotic tape component and its intrinsically long access times (latencies) that can dramatically affect the overall performance of any data access systems having MSS as their primary data storage.
To speed the retrieval of such data, one could "organize" the requests according to criterion with an aim to deliver maximal data throughput. However, such approaches are often orthogonal to the fairness and a tradeoff between quality of service (responsiveness) and throughput is necessary for an optimal and practical implementation of a truly faire-share oriented file restore policy. Starting from explaining the key criterion used to build such policy, we will present an evaluation and comparisons of three different algorithms, offering fairshare file restoration from MSS and discuss their respective merits. We will further quantify their use impact on a typical file restoration for the RHIC/STAR experimental setup and this, within a development, analysis and production environment relying on a shared MSS service.
(Nuclear Physics Inst., Academy of Sciences, Praha)
Lustre File System Evaluation at FNAL
As part of its mission to provide integrated storage for a variety of experiments and use patterns, Fermilab's Computing Division examines emerging technologies and reevaluates existing ones to identify the storage solutions satisfying stakeholders' requirements, while providing adequate reliability, security, data integrity and maintainability. We formulated a set of criteria and then analyzed several commercial and open-source storage systems.
In this paper we present and justify our evaluation criteria, which have two variants, one for HEP event analysis and one for HPC applications as found in LQCD and Computational Cosmology. We then examine in detail Lustre and compare it to dCache, the predominant (by byte count) storage system for LHC data.
After a period of testing we released a Lustre system for use by Fermilab's Computational Cosmology cluster in a limited production environment. The Lattice QCD project will prototype a larger Lustre installation on their Infiniband-based clusters.
Finally, we discuss Lustre's fitness for the HEP domain and production environments, and the possible integration of Lustre with GridFTP, SRM, and Enstore HSM.
Online Computing: MondayClub D
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Sponsored by ACEOLE
Pierre Vande Vyvre
Reliable online data-replication in LHCb
In LHCb raw data files are created on a high-performance storage
system using a custom, speed-optimized file-writing software. The
file-writing is orchestrated by a data-base, which represents the
life-cycle of a file and is the entry point for all operations related
to files such as run-start, run-stop, file-migration, file-pinning
and ultimately file-deletion.
File copying to the Tier0 is done using LHCbs standard Grid
framework, DIRAC. The file-mover processes also prepare the
Offline-reprocessing by entering the files into the LHCb Bookkeeping
database. In all these operations a lot of emphasis has been put on
reliability via handshakes, cross-checks and retries.
This paper presents the architecture, implementation details,
performance results from the LHCb Full System test and associated
tools (command line, web-interface).
(University of Applied Sciences Kaiserslautern)
ECAL Front-End Monitoring in the CMS experiment
The CMS detector at LHC is equipped with a high precision lead tungstate
crystal electromagnetic calorimeter (ECAL).
The front-end boards and the photodetectors are monitored using a network
of DCU (Detector Control Unit) chips located on the detector electronics.
The DCU data are accessible through token rings controlled by an XDAQ
based software component.
Relevant parameters are transferred to DCS (Detector Control System) and
stored into the Condition DataBase.
The operational experience from the ECAL commissioning at the CMS experimental
cavern is discussed and summarized.
(Universita degli Studi di Torino - Universita & INFN, Torino)
The ALICE data quality monitoring
ALICE is one of the four experiments installed at the CERN Large Hadron Collider (LHC), especially designed for the study of heavy-ion collisions.
The online Data Quality Monitoring (DQM) is an important part of the data acquisition (DAQ) software. It involves the online gathering, the analysis by user-defined algorithms and the visualization of monitored data.
This paper presents the final design, as well as the latest and coming features, of the ALICE's specific DQM software called AMORE (Automatic MonitoRing Environment).
It describes the challenges we faced during its implementation, including the performances issues, and how we tested and handled them, in particular by using a scalable and robust publish-subscribe architecture.
We also review the on-going and increasing adoption of this tool amongst the ALICE collaboration and the measures taken to develop, in synergy with their respective teams, efficient monitoring modules for the sub-detectors.
The related packaging and release procedure needed by such a distributed framework is also described.
We finally overview the wide range of usages people make of this framework, and we review our own experience, before and during the LHC start-up, when monitoring the data quality on both the sub-detectors and the DAQ side in a real-world and challenging environment.
Dynamic configuration of the CMS Data Acquisition cluster
The CMS Data Acquisition cluster, which runs around 10000 applications, is configured dynamically at run time. XML configuration documents determine what applications are executed on each node and over what networks these applications communicate. Through this mechanism the DAQ System may be adapted to the required performance, partitioned in order to perform (test-) runs in parallel, or re-structured in case of hardware faults.
This paper presents the CMS DAQ Configurator tool which is used to generate comprehensive configurations of the CMS DAQ system based on a high-level description given by the user. Using a database of configuration templates and a database containing a detailed model of hardware modules, data and control links, compute nodes and the network topology, the tool automatically determines which applications are needed, on which nodes they should run, and over which networks the event traffic will flow. The tool computes application parameters and generates the XML configuration documents as well as the configuration of the run-control system. The performance of the tool and operational experience during CMS commissioning and the first LHC runs are discussed.
(European Organization for Nuclear Research (CERN))
The CMS RPC Detector Control System at LHC
The Resistive Plate Chamber system is composed
by 912 double-gap chambers equipped with about 10^4 frontend
boards. The correct and safe operation of the RPC system
requires a sophisticated and complex online Detector Control
System, able to monitor and control 10^4 hardware devices
distributed on an area of about 5000 m^2. The RPC DCS acquires,
monitors and stores about 10^5 parameters coming from the
detector, the electronics, the power system, the gas, and cooling
systems. The DCS system, the first results and performances, obtained during the 2007 and 2008 CMS cosmic runs, will be described here.
(Lappeenranta Univ. of Technology)
First-year experience with the ATLAS Online Monitoring framework
ATLAS is one of the four experiments in the Large Hadron Collider (LHC) at CERN which has been put in operation this year. The challenging experimental environment and the extreme detector complexity required development of a highly scalable distributed monitoring framework, which is currently being used to monitor the quality of the data being taken as well as operational conditions of the hardware and software elements of the detector, trigger and data acquisition systems. At the moment the ATLAS Trigger/DAQ system is distributed over more than 1000 computers which is about one third of the final ATLAS size. At every minute of an ATLAS data taking session the monitoring framework serves several thousands physics events to monitoring data analysis applications, handles more than 4 million histograms updates coming from more than 4 thousands applications, executes 10 thousands advanced data quality checks for a subset of those histograms, displays histograms and results of these checks on several dozens of monitors installed in main and satellite ATLAS control rooms.
This note presents the overview of the online monitoring software framework, and describes the experience which was gained during an extensive commissioning period as well as at the first phase of LHC beam in September 2008. Performance results, obtained on the current ATLAS DAQ system will also be presented, showing that the performance of the framework is adequate for the final ATLAS system.
(University of California, Irvine)
Software Components, Tools and Databases: MondayClub A
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Event Selection Services in ATLAS
ATLAS has developed and deployed event-level selection services based upon event metadata records ("tags")
and supporting file and database technology.
These services allow physicists to extract events that satisfy their selection predicates from any stage
of data processing and use them as input to later analyses.
One component of these services is a web-based Event-Level Selection Service Interface (ELSSI).
ELSSI supports event selection by integrating run-level metadata, luminosity-block-level metadata
(e.g., detector status and quality information), and event-by-event information (e.g., triggers passed and physics
content). The list of events that pass the physicist's cuts is returned in a form that can be used directly as input
to local or distributed analysis; indeed, it is possible to submit a skimming job directly from the ELSSI interface
using grid proxy credential delegation. Beyond this, ELSSI allows physicists who may or may not be interested
in event-level selections to explore ATLAS event metadata as a means to understand, qualitatively and quantitatively,
the distributional characteristics of ATLAS data: to see the highest missing ET events or the events with the most
leptons, to count how many events passed a given set of triggers, or to find events that failed a given trigger but
nonetheless look relevant to an analysis based upon the results of offline reconstruction, and more.
This talk provides an overview of ATLAS event-level selection services, with an emphasis upon the interactive
Event-Level Selection Service Interface.
(Argonne National Laboratory), DrQizhi Zhang
(Argonne National Laboratory)
The JANA Calibrations and Conditions Database API
Calibrations and conditions databases can be accessed from within the JANA Event Processing framework through the API defined in its JCalibration base class. This system allows constants to be retrieved through a single line
of C++ code with most of the context implied by the run currently being analyzed. The API is designed to support everything from databases, to web
services to flat files for the backend. A Web Service backend using SOAP has been implemented which is particularly interesting since it addresses many cybersecurity issues.
The HADES Oracle database and its interfaces for experimentalists
Since 2002 the HADES experiment at GSI employs an Oracle database for storing of all parameters relevant for simulation and data analysis. The implementation features a flexible, multi-dimensional and easy-to-use version management. Direct interfaces to the ROOT-based analysis and simulation framework HYDRA allow for an automated initialization based on actual or historic data which is needed at all levels of the analysis. Generic data structures, database tables and interfaces were developed to store variable sets of parameters of various types (C-types, binary arrays, ROOT based classes). A snapshot of the data can be stored in a ROOT file for exporting and local access. Web interfaces are used for parameter validation, to show the history of the data and to compare different data sets. They also provide access to additional information not directly used in the analysis (file catalog, beam time logbook, hardware). An interface between the EPICS runtime database and Oracle is realized by a program developed at SLAC. Run-based summary information is provided to allow for fast scans and filtering of the data indispensable for run validation. Web interfaces as well as interfaces to the analysis exist to make e.g. use of the ROOT graphics package. The database concept reported here is a possible platform for the implementation of a database in FAIR-ROOT, the latter being an advancement/offspring of HYDRA.
A lightweight high availability strategy for Atlas LCG File Catalogs
The LCG File Catalog (LFC) is a key component of the LHC Computing Grid (LCG) middleware, as it contains the mapping between all logical and physical file names on the Grid. The Atlas computing model foresees multiple local LFC hosted in each Tier-1 and Tier-0, containing all information about files stored in that cloud. As the local LFC contents are presently not replicated, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. The issue of central LFC replication has been successfully addressed in LCG by the 3D project, which has deployed a replication environment (based on Oracle Streams technology) spanning the Tier-0 and all Tier-1. Anyway this solution is not suitable for Tier-1 - Tier-2 clouds, due to the considerable amount of man power needed for Oracle Streams administration/management and the high costs of the additional Oracle licenses needed to deploy Streams replication.
A more lightweight solution is to copy the LFC Oracle backend information to one or more Tier-2s, exploiting the Oracle Dataguard technology. We present the results of a wide range of feasibility and performance tests run on a Dataguard-based LFC high availability environment, built between the Italian LHC Tier-1 (INFN - CNAF) and an Atlas Tier-2 located at INFN - Roma1. We also explain how this strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users.
A RESTful web service interface to the ATLAS COOL database
The COOL database in ATLAS is primarily used for storing detector conditions data, but also status flags which are uploaded summaries of information to indicate the detector reliability during a run. This paper introduces the use of CherryPy, a Python application server which acts as an intermediate layer between a web interface and the database, providing a simple means of storing to and retrieving from the COOL database which has found use in many web applications. The software layer is designed to be RESTful, implementing the common CRUD (Create, Read, Update, Delete) database methods by means of interpreting the http method (POST, GET, PUT, DELETE) on the server along with a URL identifying the database resource to be operated on. The format of the data (text, xml etc) is also determined by the http protocol. The details of this layer are described along with a popular application demonstrating its use, the ATLAS run list web page.
The Tile Calorimeter Web Systems for Data Quality Analyses
The ATLAS detector consists of four major components: inner tracker, calorimeter, muon
spectrometer and magnet system. In the Tile Calorimeter (TileCal), there are 4 partitions, each partition
has 64 modules and each module has up to 48 channels. During the ATLAS commissioning phase, a
group of physicists need to analyze the Tile Calorimeter data quality, generate reports and update the
official database, when necessary. The Tile Commissioning Web Systems (TCWS) retrieves
information from different directories and databases, executes programs that generate results, stores
comments and verifies the calorimeter status. TCWS integrates different applications, each one
presenting a unique data view. The Web Interface for Shifters (WIS) supports monitoring tasks by
managing test parameters and all the calorimeter status. The TileComm Analysis stores plots, automatic
analyses results and comments concerning the tests. With the necessity of increasing granularity, a new
application was created: the Monitoring and Calibration Web System (MCWS). This application
supports data quality analyses at channels level by presenting the automatic analyses results, the
problematic known channels and the channels masked by the shifters. Through the web system, it's
possible to generate plots and reports, related to the channels, identify new bad channels and update the
Bad Channels List at the ATLAS official database (COOL DB). The Data Quality Monitoring Viewer
(DQM Viewer) displays the data quality automatic results through an oriented visualization.
Andressa Sivolella Gomes
(Universidade Federal do Rio de Janeiro (UFRJ))
CASTOR provides a powerful and rich interface for managing files and pools
of files backed by tape-storage. The API is modelled very closely on that of
a POSIX filesystem, where part of the actual I/O part is handled by the rfio
library. While the API is very close to POSIX it is still separated, which
unfortunately makes it impossible to use standard tools and scripts straight
away. This is particularly inconvenient when applications are written in
languages other than C/C++ such as is frequently the case in web-apps.
Here up to now the only the recourse was to use command-line utilities and
parse their output, which is clearly a kludge.
We have implemented a complete POSIX filesystem to access CASTOR using FUSE
(Filesystem in Userspace) and have successfully tested and used this on SLC4
and SLC5 (both in 32 and 64 bit). We call it CastorFS. In this paper we
will present its architecture and implementation, with emphasis on
performance and caching aspects.
A Geant4 physics list for spallation and related nuclear physics applications based on INCL and ABLA models
We present a new Geant4 physics list prepared for nuclear physics applications
in the domain dominated by spallation.
We discuss new Geant4 models based on the translation of
INCL intra-nuclear cascade and ABLA de-excitation codes in C++
and used in the physic list.
The INCL model is well established for targets heavier than Aluminium
and projectile energies from ~ 150 MeV up to 2.5 GeV ~ 3 GeV.
Validity of the Geant4 physics list is demonstrated from the perspective of accelerator driven systems
and EURISOL project, especially with the neutron double differential cross sections and residual
Foreseen improvements of the physics models for the treatment of light targets (Carbon - Oxygen)
and light ion beams (up to Carbon) are discussed.
An example application utilizing the physics list is introduced.
(Helsinki Institute of Physics, HIP)
A Monte Carlo study for the X-ray fluorescence enhancement induced by photoelectron secondary excitation
Well established values for the X-ray fundamental parameters (fluorescence yields, characteristic lines branching ratios, mass absorption coefficients, etc.) are very important but not adequate for an accurate reference-free quantitative X-Ray Fluorescence (XRF) analysis. Secondary ionization processes following photon induced primary ionizations in matter may contribute significantly to the intensity of the detected fluorescence radiation introducing significant errors in quantitative XRF analysis, if not taken into account properly.
In the present work, a new developed particle/ray-tracing Monte Carlo (MC) simulation code is presented. The code implements appropriate databases for all the physical interactions that involve between x-rays, electrons and matter leading to the determination of the intensity of the characteristic radiation induced by photoelectrons for any given experimental conditions (sample geometry, incident beam parameters etc).
In order to achieve acceptable counting statistics for the secondary photoelectron excitation, that it is a second order phenomenon, the MC simulation code is executed on a powerful cluster-computer facility, which is able to host long time simulations (up to 20 billion events per exciting energy) deducing thus low relative uncertainties. The final goal is to compare the simulated MC data together with high accurate experimental measurements, deduced from well and absolute calibrated experimental setups. In this way the current description of electron ionization cross sections can be properly assessed, whereas in the case that systematic differences are observed, it may lead to the determination of corrective electron ionization cross sections versus energy that fit properly the experimental data.
(N.C.S.R. Demokritos, Institute of Nuclear Physics)
A new Data Format for the Commissioning Phase of the ATLAS Detector
In the commissioning phase of the ATLAS experiment, low-level Event Summary Data (ESD) are analyzed to evaluate the performance of the individual subdetectors, the performance of the reconstruction and particle identification algorithms, and obtain calibration coefficients. In the GRID model of distributed analysis, these data must be transferred to Tier-1 and Tier-2 sites before they can be analyzed. However, the large size of ESD (~1 MByte/event) constrains the amount of data that can be distributed on the GRID and be made readily available on disks. In order to overcome this constraint and make the data fully available, new data formats - collectively known as Derived Physics Data (DPD) - have been designed. Each DPD format contains a subset of the ESD data, tailored to specific needs of the subdetector and object reconstruction and identification performance groups. Filtering algorithms perform a selection based on physics contents and trigger response, further reducing the data volume. Thanks to these techniques, the total volume of DPD to be distributed on the GRID amounts to 20% of the initial ESD data. An evolution of the tools developed in this context will serve to produce another set of DPDs that are specifically tailored for physics analysis.
Reconstruction of interaction vertices is an essential step in the reconstruction chain of a modern collider experiment such as CMS; the primary ("collision") vertex is reconstructed in every
event within the CMS reconstruction program, CMSSW.
However, the task of finding and fitting secondary ("decay") vertices also plays an important role in several physics cases such as the reconstruction of long-lived particles like Kaons, or the identification of b-jets, i.e. the task of b-tagging.
A very simple but powerful general-purpose vertex finding algorithm is presented that is based on the well-established adaptive vertex fitter to find and fit primary and secondary vertices.
GSI Darmstadt is hosting a Tier2 centre for the ALICE experiment providing about 10% of ALICE Tier2 resources. According to the computing model the tasks of a Tier2 centre are scheduled and unscheduled analysis as well as Monte Carlo simulation. To accomplish this a large water cooled compute cluster has been set up and configured consisting of currently 200 CPUs (1500 Cores). After intensive I/O tests it has been decided to provide on site storage via a Lustre cluster, at the moment 150 TB disk space, which is visible from each individual worker node. Additionally an xrootd managed storage cluster is provided which serves also as a Grid Storage Element. The central GSI batch farm can be accessed with Grid methods from outside as well as via LSF methods for users from the inside of the centre. Both is used mainly for simulation jobs. Moreover for interactive access a PROOF analysis facility, GSIAF, is maintained on a subset of the same machines. On these machines the necessary infrastructure has been statically installed providing to each user 160 PROOF servers and the possibility to analyse 1700 events per seconds. Also the alternative to create a PROOF on demand cluster dynamically on the batch farm machines is supported. The coexistence of interactive processes and batch jobs has been studied and can be dealt with by adjusting the process priorities accordingly. All relevant services are monitored contineously, to a large extend based on MonaLisa.
Detailed user experience, data transfer activities, as well as future and ramp up plans are reported also in this presentation.
GSI will profit from the expert knowledge it will gain during the set up
and operation of the ALICE Tier2 centre for the upcoming Tier0 centre for
ALICE TPC particle identification, calibration and performance.
We will present a Particle identification algorithm, as well as a calibration and performance study in the ALICE Time Projection Chamber (TPC) using the dEdx measurement. New calibration algorithms had to be developed, since the simple geometrical corrections were only suitable at 5-10% level. The PID calibration consists of the following parts: gain calibration, energy deposit calibration as a function of angle and position and Bethe-Bloch energy deposit calibration. The gain calibration is done in the space domain (pad-by-pad gain calibration), as well as in the time domain (gain as a function of time, pressure, temperature and gas composition).
The energy deposit calibration is done, taking into account the particle dependence on the track topology (inclination angles with respect to the detection layer and particle position).
For the Bethe-Bloch energy calibration, five parameters of the Bethe-Bloch formula, which are used for the TPC PID, were fitted for the TPC gas mixture.
The studies were performed on the cosmic data, and the comparison with the MonteCarlo simulation showed good results.
ALICE TPC reconstruction performance study
We will present our studies of the performance of the reconstruction in the ALICE Time projection chamber (TPC). The reconstruction algorithm in question is based on the Kalman filter. The performance is characterized by the resolution in the position, angle and momenta
as a function of particle properties (momentum, position). The resulting momentum parametrization is compared with the MonteCarlo simulation, which allows to disectangle the material budget and systematic effects influences.
The presented studies were performed on the cosmic data.
Alignment of the ATLAS Inner Detector Tracking System
The CERN's Large Hadron Collider (LHC) is the world largest particle accelerator. ATLAS is one of the two general purpose experiments equipped with a charge particle tracking system built on two technologies: silicon and
drift tube based detectors, composing the ATLAS Inner Detector (ID). The required precision for the alignment of the most sensitive coordinates of the silicon sensors is just few microns. Therefore the alignment of the ATLAS ID requires complex algorithms with extensive CPU and memory usage. So far the proposed alignment algorithms are exercised on several applications. We will present the outline of the alignment approach and results from Cosmic Ray runs and large scale computing simulation of physics samples mimicking the ATLAS operation during real data taking. The full alignment chain is tested using that stream and alignment constants are produced and validated within 24 hours. Cosmic ray data serves to produce an early alignment of the real ATLAS Inner Detector even before the LHC start up. Beyond all tracking information, the assembly survey data base contains essential information in order to determine the relative position of one module with respect to its neighbors.
Alignment of the LHCb detector with Kalman fitted tracks
We report on an implementation of a global chisquare algorithm
for the simultaneous alignment of all tracking systems in the
LHCb detector. Our algorithm uses hit residuals from the
standard LHCb track fit which is based on a Kalman filter. The
algorithm is implemented in the LHCb reconstruction framework
and exploits the fact that all sensitive detector elements have
the same geometry interface. A vertex constraint is implemented
by fitting tracks to a common point and propagating the change
in track parameters to the hit residuals. To remove
unconstrained or poorly constrained degrees of freedom
(so-called weak modes) the average movements of (subsets of)
alignable detector elements can be fixed with Lagrange
constraints. Alternatively, weak modes can be removed with a
cutoff in the eigenvalue spectrum of the second derivative of
the chisquare. As for all LHCb reconstruction and analysis
software the configuration of the algorithm is done in python
and gives detailed control over the selection of alignable
degrees of freedom and constraints. The study the performance
of the algorithm on simulated events and first LHCb data.
(NIKHEF), Wouter Hulsbergen
AMS Experiment Parallel Event Processing using ROOT/OPENMP scheme
The ROOT based event model for the AMS experiment is presented. By adding few pragmas to the main ROOT code the parallel processing of the ROOT
chains on the local multi-core machines became possible. The scheme does not require any merging of the user defined output information (like histograms, etc). Also no any pre-installation procedure is needed. The scalability of the scheme is shown on the example of real physics analysis application (~20k histograms). The comparison with the ProofLite performance for the same application is also done.
Application of the Kalman Alignment Algorithm to the CMS Tracker
One of the main components of the CMS experiment is the Inner Tracker. This device, designed to measure the trajectories of charged particles, is composed of approximately 16,000 planar silicon detector modules, which makes it the biggest of its kind. However, systematical measurement errors, caused by unavoidable inaccuracies in the construction and assembly phase, reduce the precision of the measurements drastically. The geometrical corrections that are therefore required should be known to an accuracy that is better than the intrinsic resolution of the detector modules, such that special alignment algorithms have to be utilized.
The Kalman Alignment Algorithm (KAA) is a novel approach to extract a set of alignment constants from a sufficiently large collection of recorded particle tracks, suited even for a system as big as the CMS Inner Tracker. To show that the method is functional and well understood, and thus expedient for the data-taking period of the CMS experiment, two significant case studies are discussed. Results from detailed simulation studies demonstrate that the KAA is able to align the CMS Inner Tracker under the conditions expected during the LHC start-up phase. Moreover, it has been shown that the associated computational effort can be kept at a reasonable level by deploying the available CMS computing resources to process the data in parallel. Furthermore, an analysis of the first experimental data from cosmic particle tracks, recorded directly after the assembly of the CMS Inner Tracker, shows that the KAA is at least competitive to existing algorithms when applied to real data.
(Institut für Hochenergiephysik (HEPHY Vienna))
ATLAS@Amazon Web Services: Running ATLAS software on the Amazon Elastic Compute Cloud
We show how the ATLAS offline software is ported on the Amazon Elastic Compute Cloud (EC2). We prepare an Amazon Machine Image (AMI) on the basis of the standard ATLAS platform Scientific Linux 4 (SL4). Then an instance of the SLC4 AMI is started on EC2 and we install and validate a recent release of the ATLAS offline software distribution kit. The installed software is archived as an image on the Amazon Simple Storage Service (S3) and can be quickly retrieved and connected to new SL4 AMI instances using the Amazon Elastic Block Store (EBS). ATLAS jobs can then configure against the release kit using the ATLAS configuration management tool (cmt) in the standard way. The output of jobs is exported to S3 before the SL4 AMI is terminated. Job status information is transferred to the Amazon SimpleDB service. The whole process of launching instances of our AMI, starting, monitoring and stopping jobs and retrieving job output from S3 is controlled from a client machine using python scripts implementing the Amazon EC2/S3 API via the boto library working together with small scripts embedded in the SL4 AMI. We report our experience with setting up and operating the system using standard ATLAS job transforms.
(Max-Planck-Institut für Physik)
Automatic TTree creation from Reconstructed Data Objects in JANA
Automatic ROOT tree creation is achived in the JANA
Event Processing Framework through a special plugin.
The janaroot plugin can automatically define a TTree
from the data objects passed though the framework
without using a ROOT dictionary. Details on how this
is achieved as well as possible applications will be
Building a Storage Cluster with Gluster
Gluster, a free cluster file-system scalable to several peta-bytes, is under evaluation at the RHIC/USATLAS Computing Facility. Several production SunFire x4500 (Thumper) NFS servers were dual-purposed as storage bricks and aggregated into a single parallel file-system using TCP/IP as an interconnect. Armed with a paucity of new hardware, the objective was to simultaneously allow traditional NFS client access to discreet systems as well as access to the GlusterFS global namespace without impacting production.
Gluster is elegantly designed and carries an advanced feature set including, but not limited to, automated replication across servers, server striping, fast db backend, and I/O scheduling. GlusterFS exists as a layer above existing file-systems, does not have a single-point-of-failure, supports RDMA, distributes metadata, and is entirely implemented in user space via FUSE.
We will provide a background of Gluster along with its architectural underpinnings, followed by a description of our test-bed, environmentals, and performance characteristics.
(Brookhaven National Laboratory)
Building and Commissioning of the CMS CERN Analysis Facility (CAF)
The CMS CERN Analysis Facility (CAF) was primarily designed to host a large variety of latency-critical workflows. These break down into alignment and calibration, detector commissioning and diagnosis, and high-interest physics analysis requiring fast-turnaround. In addition to the low latency requirement on the batch farm, another mandatory condition is the efficient access to the RAW detector data stored at the CERN Tier-0 facility. The CMS CAF also foresees resources for interactive login by a large number of CMS collaborators located at CERN, as an entry point for their day-by-day analysis. These resources will run on a separate partition in order to protect the high-priority use-cases described above. While the CMS CAF represents only a modest fraction of the overall CMS resources on the WLCG GRID, an appropriately sized user-support service needs to be provided.
In this presentation we will describe the building, commissioning and operation of the CMS CAF during the year 2008. The facility was heavily and routinely used by almost 250 users during multiple commissioning and data challenge periods. It reached a CPU capacity of 1.4MSI2K and a disk capacity at the Petabyte scale. In particular, we will focus on the performances in terms of networking, disk access and job efficiency and extrapolate prospects for the upcoming LHC first year data taking. We will also present the experience gained and the limitations observed in operating such a large facility, in which well controlled workflows are combined with chaotic type analysis by a large number of physicists.
(RWTH Aachen IIIA)
Calibration of ATLAS Resistive Plate Chambers
Resistive Plate Chambers (RPC) are used in ATLAS to provide the first
level muon trigger in the barrel region. The total size of the system is
about 16000 m2, readout by about 350000 electronic channels.
In order to reach the needed trigger performance, a precise knowledge of
the detector working point is necessary, and the high number of readout
channels calls for severe requirements on the analysis tools to be
developed. First of all, high-statistics data samples will have to be
used as input. Second, the results would me unmanageable without a
proper interface to some database technology. Moreover, the CPU power
needed for the anlaysis makes it necessary to use distributed computing
A set of analysis tools will be presented, coping with all the critical
aspects of this task, ranging from the use of a dedicated data stream
(the so-called muon calibration stream), to the automatic job submission
on the GRID, to the implementation of an interface to ATLAS' conditions
database. Integration with Detector Control System information and
impact of the calibration on the performance of the reconstruction
algorithms will be discussed as well.
Andrea Di Simone
Calibration of the Barrel Muon DT System of CMS with Cosmic Data
The calibration process of the Barrel Muon DT System of CMS as developed and tuned during the recent cosmic data run is presented. The calibration data reduction method, the full work flow of the procedure and final results are presented for real and simulated data.
CASTOR Tape Performance Optimisation at the UK LCG Tier-1
The UK LCG Tier-1 computing centre located at the Rutherford Appleton Laboratory is responsible for the custodial storage and processing of the raw data from all four LHC experiments; CMS, ATLAS, LHCb and ALICE. The demands of data import, processing, export and custodial tape archival place unique requirements on the mass storage system used. The UK Tier-1 uses CASTOR as the storage technology of choice, which currently handles 2.3PB of disk across 320 disk servers. 18 Sun T10000 tape drives provide the custodial back-end. This paper describes work undertaken to optimise the performance of the CASTOR infrastructure at RAL. Significant gains were achieved and the lessons learned have been deployed at other LHC CASTOR sites.
Problems were identified with the performance of tape migration when disk servers were under production-level load. An investigation was launched at two levels; hardware and operating system performance, and the impact of CASTOR tape algorithms and job scheduling. A test suite was written to quantify the low-level performance of disk servers with various tunings applied, and CMS test data coupled with the existing transfer infrastructure was used to verify the performance of the tape system with realistic experimental data transfer patterns. The improvements identified resulted in the instantaneous tape migration rate per drive reaching near line-speed of 100MB/s, a vast improvement on the previous attainable rate of around 16MB/s.
(H.H. Wills Physics Laboratory - University of Bristol)
CERN automatic audioconference service
Scientists all over the world collaborate with the CERN laboratory day by day. They must be able to communicate effectively on their joint projects at any time, so telephone conferences become indispensable and widely used. The traditional conference system, managed by 6 switchboard operators, was hosting more than 20000 hours and 5500 conference per year. However, the system needed to be modernized in three ways. Firstly, to ensure researchers autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audioconferences into a collaborative framework.
To solve this issue, the CERN telecommunications team drew up a specification to implement a new system. After deep analysis, it was decided to use a new Alcatel collaborative conference solution based on the SIP protocol. During 2005/2006 the system was tested as the first European pilot and, based on CERN’s recommendations, several improvements were implemented: billing, security, redundancy, etc.
The new automatic conference system has been operational since the second half of 2006. It is very popular for the users: 39000 calls and 30000 accumulated hours for around 5000 conferences during the last twelve months. Furthermore, to cope with the demand, the capacity of the service is about to be tripled and new features, such as apps sharing and on-line presentation, should be proposed in the near future.
Rodrigo Sierra Moral
CERN GSM monitoring system
As a result of the tremendous development of GSM services over the last years, the number of related services used by organizations has drastically increased. Therefore, monitoring GSM services is becoming a business critical issue in order to be able to react appropriately in case of incident.
In order to provide with GSM coverage all the CERN underground facilities, more than 50 km of leaky feeder cable have been deployed. This infrastructure is also used to propagate VHF radio signals for the CERN’s fire brigade. Even though CERN’s mobile operator monitors the network, it cannot guarantee the availability of GSM services, and for sure not VHF services, where signals are carried by the leaky feeder cable. So, a global monitoring system has become critical to CERN. In addition, monitoring this infrastructure will allow to characterize its behaviour over time, especially with LHC operation.
Given that commercial solutions were not yet mature, CERN developed a system based on GSM probes and an application server which collects data from them via the CERN GPRS network. By placing probes in strategic locations and comparing measurements between probes, it is possible now possible to determine if there is a GSM or VHF problem on one leaky feeder cable segment.
This system has been successfully working for several months in underground facilities, allowing CERN to inform GSM users and fire brigade in case of incidents.
ci2i and CMS-TV: Generic Web Tools for CMS Centres
The CMS Experiment at the LHC is establishing a global network of inter-connected "CMS Centres" for controls, operations and monitoring at CERN, Fermilab, DESY and a number of other sites in Asia, Europe, Russia, South America, and the USA.
"ci2i" ("see eye to eye") is a generic Web tool, using Java and Tomcat, for managing: hundreds of displays screens in many locations; monitoring content and mappings to displays; CMS Centres' hardware configuration; user login rights and group accounts; screen snapshot services; and operations planning tools. ci2i enables CMS Centre users anywhere in the world to observe displays in other CMS Centres, notably CERN, and manage the content remotely if authorised. Distributed shifts are already happening.
"CMS-TV" aggregates arbitrary (live) URLs into a cyclic program that can be watched full-screen in any Web browser. "TV channels" can be trivially created and configured with either specific expert content or for outreach displays in public places. All management is done from a simple Web interface with secure authentication.
We describe the specific deployment at CERN to manage operations in the CMS Centre @ CERN (more than 850 active users and ever increasing) including the aspects of system administration (PXE aims kickstart, gdm auto-login, security, afs account and acl management, etc.).
(Northeastern U., Boston)
CluMan: High-density displays and cluster management
LHC computing requirements are such that the number of CPU and storage nodes, and the complexity of the services to be managed are bringing new challenges. Operations like checking configuration consistency, executing actions on nodes, moving them between clusters etc. are very frequent. These scaling challenges are the basis for CluMan, a new cluster management tool being designed and developed at CERN.
High-density displays such as heat maps, grids or color maps are more and more commonly used in various applications like data visualization or monitoring systems. They allow humans to see, interpret and understand complex and detailed information at a glance.
We propose to present the ideas behind the CluMan project, and to show how high density displays are used to help service managers to understand, manage and control the state and behavior of their clusters.
Cluster Filesystem usage for HEP Analysis
Having the first analyses capable data from LHC on the horizon, more
and more sites are facing the question/problem of building a high
efficient analysis facility, for their local physicists, mostly
attached to a Tier2/3. The most important ingredient for such a
facility is the underlying storage system and here the selected option
for the data management and data access system - well known as
'Filesystem'. At DESY we've build up a facility deploying the HPC
grounded cluster filesystem Lustre, serving as a 'very big and fast
playground' for various purposes like compiling large packages,
accessing n-tuple data for histogramming or even private mc
generation. We will show the actual configuration, measurements and
experience from the user perpective together with impressions and
measures from the system perspective.
CMS production and processing system - Design and experiences
ProdAgent is a set of tools to assist in producing various data products such as Monte Carlo simulation, prompt reconstruction, re-reconstruction and skimming
In this paper we briefly discuss the ProdAgent architecture, and focus on the experience in using this system in recent computing challenges, feedback from these challenges, and future work. The computing challenges have proven invaluable for scaling the system to the level desired for the first LHC physics runs. The feedback from the recent computing challenges resulted in a design review of some of the ProdAgent core components. Results of this review and the mandate to converge development within the data management sub projects, led to the establishment of the WCore project: a common set of libraries for CMS workflow systems, with the aim of reducing code duplication between sub projects, and increasing maintainability. This paper discusses some of the lessons learned from recent computing challenges and how this experience has been incorporated into the WMCore project.
The current ProdAgent project has shifted towards bulk operations (optimizing database performance) and buffered tasks (so to better handle reliability when interacting with third party components). Two significant areas of development effort are the migration to a common set of libraries (WMCore) for all CMS workflow systems and a system to split and manage work requests between ProdAgents - to better utilise the available resources.
Commissioning of the ATLAS Inner Detector software infrastructure with cosmic rays
T Cornelissen on behalf of the ATLAS inner detector software group
Several million cosmic tracks were recorded during the combined ATLAS runs in Autumn of 2008. Using these cosmic ray events as well as first beam events, the software infrastructure of the inner detector of the ATLAS experiment (pixels and microstrips silicon detectors as well as straw tubes withadditional transition radiation detection) is being commissioned.
The full software chain has been set up in order to reconstruct and
analyse this kind of events. Final detector decoders have been
developed, different pattern recognition algorithms and track fitters
have been validated as well as the various calibration
methods. The infrastructure to deal with conditions data coming from
the data acquisition, detector control system and calibration runs
has been put in place, allowing also to apply alignment and calibration
The software has also been essential to monitor the detector
performance during data taking. Detector efficiencies, noise
occupancies and resolutions are being studied in detail as well as the performance of the track reconstruction itself.
(CERN / University of Mainz)
Commissioning of the ATLAS reconstruction software with first data
Looking towards first LHC collisions, the ATLAS detector is being commissioned using all types of physics data available: cosmic rays and events produced during a few days of LHC single beam operations. In addition to putting in place the trigger and data acquisition chains, commissioning of the full software chain is a main goal. This is interesting not only to ensure that the reconstruction, monitoring and simulation chains are ready to deal with LHC physics data, but also to understand the detector performance in view of achieving the physics requirements. The recorded data have allowed us to study the ATLAS detector in terms of efficiencies, resolutions, channel integrity, alignment and calibrations. They have also allowed us to test and optimize the sub-systems reconstruction as well as some combined algorithms, such as combined tracking tools and different muon identification algorithms. The status of the integration of the complete software chain will be presented as well as the data analysis results.
Commissioning the CMS Alignment and Calibration Framework
The CMS experiment has developed a powerful framework to ensure the
precise and prompt alignment and calibration of its components, which is a major prerequisite to achieve the optimal performance for physics analysis. The prompt alignment and calibration strategy harnesses computing resources both at the Tier-0 site and the CERN Analysis Facility (CAF) to ensure fast turnaround for updating the corresponding database payloads. An essential element is the creation of dedicated data streams concentrating the specific event information required by the various alignment and calibration workflows. The resulting low latency is required for feeding the resulting constants into the prompt reconstruction process, which is essential for achieving swift physics analysis of the LHC data. The presentation discusses the implementation and the computational aspects of the alignment & calibration framework. Recent commissioning campaigns with cosmic muons, beam halo and simulated data have been used to gain detailed experience with this framework, and results of this validation are reported.
(Imperial College, University of London)
Customizable Scientific Web-Portal for DIII-D Nuclear Fusion Experiment
Increasing utilization of the Internet and convenient web technologies has made the web-portal a major application interface for remote participation and control of scientific instruments. While web-portals have provided a centralized gateway for multiple computational services, the amount of visual output often is overwhelming due to the high volume of data generated by complex scientific instruments and experiments. Since each scientist may have different priorities and areas of interest in the experiment, filtering and organizing information based on the individual user’s need can increase the usability and efficiency of a web-portal.
DIII-D is the largest magnetic nuclear fusion device in the US. A web-portal has been designed to support the experimental activities of DIII-D researchers worldwide. It offers a customizable interface with personalized page layouts and list of services for users to select. Each individual user can create a unique working environment to fit their own needs and interests. Customizable services are: real-time experiment status monitoring, diagnostic data access, interactive data analysis and visualization. The web-portal also supports interactive collaborations by providing collaborative logbook, shared visualization and online instant messaging services.
The DIII-D web-portal development utilizes multi-tier software architecture, and web2.0 technologies, such as AJAX and Django, to develop a highly-interactive and customizable user interface. A set of client libraries was also created to provide a solution for conveniently plugging in new services to the portal. A live demonstration of the system will be presented.
*Work supported by U.S. DOE SciDAC program at General Atomics under Cooperative Agreement DE-FC02-01ER25455.
Data Driven Approach to Calorimeter Simulation in CMS
CMS is looking forward to tune detector simulation using the forthcoming collision data from LHC. CMS established a task force in February 2008 in order to understand and reconcile the discrepancies observed between the CMS calorimetry simulation and the test beam data recorded during 2004 and 2006. Within this framework, significant effort has been made to develop a strategy of tuning fast and flexible parametrizations describing showering in the calorimeter with available data from test beams. These parametrizations can be used within the context of Full as well as Fast Simulation. The study is extended to evaluate the use of first LHC collision data, when it becomes available, to rapidly tune the CMS calorimeter.
dCache Storage Cluster at BNL
Over the last (2) years, the USATLAS Computing Facility at BNL has managed a highly performant, reliable, and cost effective dCache storage cluster using SunFire x4500/4540 (Thumper/Thor) storage servers. The design of a discreet storage cluster signaled a departure from a model where storage resides locally on a disk-heavy compute farm. The consequent alteration of data flow mandated a dramatic re-construction of the network fabric.
This work will cover all components of our dCache storage cluster (from door to pool) including OS/ZFS file-system configuration, 10GE network tuning, monitoring, and environmentals. Performance metrics will be surveyed within the context of our Solaris 10 production system as well as those rendered during evaluations of OpenSolaris and Linux. Failure modes, bottlenecks, and deficiencies will be examined.
Lastly, we discuss competing architectures under evaluation, scaling limits in our current model, and future technologies that warrant close surveillance.
(Brookhaven National Laboratory)
DeepConference: A complete conference in a picture
Particle physics conferences lasting a week (like CHEP) can have 100’s of talks and posters presented. Current conference web interfaces (like Indico) are well suited to finding a talk by author or by time-slot. However, browsing the complete material in a modern large conference is not user friendly. Browsing involves continually making the expensive transition between HTML viewing and talk-slides (which are either PDF files or some other format). Further the web interfaces aren’t designed for undirected browsing. The advent of multi-core computing and advanced video cards means that we have more processor power available for visualization than any time in the past. This poster describes a technique of rendering a complete conference’s slides and posters as a single very large picture. Standard plug-in software for a browser allows a user to zoom in on a portion of the conference that looks interesting. As the user zooms further more and more details become visible, allowing the user to make a quick and cheap decision on whether to spend more time on a particular talk. The project, DeepConference, has been implemented as a public web site and can render any conference whose agenda is powered by Indico. The rendering technology is powered by the free download, Silverlight. The poster discusses the implementation and use as well as cross platform performance and possible future directions. A demo will be shown.
(UNIVERSITY OF WASHINGTON)
Development of a simulated trigger generator for the ALICE commissioning
ALICE (A Large Ion Collider Experiment) is an experiment at the LHC (Large Hadron Collider) optimized for the study of heavy-ion collisions.
The main aim of the experiment is to study the behavior of strongly interaction matter and quark gluon plasma. In order to be ready for the first real physics interaction, the 18 sub-detectors composing ALICE have been tested using cosmic rays and sequences of random trigger used to simulate p-p and heavy ion interactions.
In order to simulate real triggers, the RTG (Random Trigger
Generator) has been developed and it is able to provide
6 concurrent sequences of trigger with different probabilities.
This paper will describe the hardware that generates the binary stream used as trigger and the software algorithms to create the sequences and to control the hardware. It will describe the tests performed in the laboratory on the random trigger generator to confirm its correct behavior and the details of the installation in the counting room of ALICE where it provides the triggers for all the sub-detectors.
It will also discuss the configurations used to simulate several trigger combinations likely to happen with the real beam.
Electronic Calibration of the ATLAS LAr Calorimeter
The Liquid Argon (LAr) calorimeter is a key detector component in the ATLAS experiment at the LHC, designed to provide precision measurements of electrons, photons, jets and missing transverse energy. A critical element in the precision measurement is the electronic calibration.
The LAr calorimeter has been installed in the ATLAS cavern and filled with liquid argon since 2006. The electronic calibration of the readout system has been continuously exercised in the commissioning phase, resulting a fully commissioned calorimeter with its readout and a small number of problematic channels. A total of only 0.02% of the read out channels are dead beyond repair and 0.4% need special treatment for calibration. Throughout the last two years, a large amount of calibration data have been collected. We present here the the LAr electronic calibration scheme, large scale acquisition and processing of the calibration data, the measured stability of the pedestal, the pulse shape and the gain, and the expected calibration procedure for LHC running. Various problems observed and addressed during the commissioning phase will also be discussed.
DrMartin Aleksa (for the LAr conference committee)
Enhancing GridFTP and GPFS performances using intelligent deployment
Many High Energy Physics experiments must share and transfer large volumes of data. Therefore, the maximization of data throughput is a key issue, requiring detailed analysis and setup optimization of the underlying infrastructure and services. In Grid computing, the data transfer protocol called GridFTP is widely used for efficiently transferring data in conjunction with various types of file systems. In this paper, we focus on the interaction and performance issues in a setup, which combines GridFTP server with the IBM General Parallel File System (GPFS), adopted for providing storage management and capable of handling petabytes of data and billions of files. A typical issue is the size of the data blocks read from disk used by the GridFTP server version 2.3, which can potentially impair the data transfer threshold achievable with an IBM GPFS data block. We propose an experimental deployment of GridFTP server characterized by being on a Scientific Linux Cern 4 (SLC4) 64-bit platform, having GridFTP server and IBM GPFS over a Storage Area Network (SAN) infrastructure aimed to improve data throughput and to serve distributed remote Grid sites. We present the results of data-transfer measurements, such as CPU load, network utilization, data read and write rates, obtained performing several tests at INFN Tier1 where the described deployment has been setup. During this activity, we have verified a significant improvement of the GridFTP performances (of almost 50%) on SLC4 64-bit over SAN saturating the Gigabit with a very low CPU load.
Experience with LHCb alignment software on first data
We report results obtained with different track-based
algorithms for the alignment of the LHCb detector with first
data. The large-area Muon Detector and Outer Tracker have been
aligned with a large sample of tracks from cosmic rays. The
three silicon detectors --- VELO, TT-station and Inner Tracker
--- have been aligned with beam-induced events from the LHC
injection line. We compare the results from the track-based
alignment with expectations from detector survey.
Marc Deissenroth, Marc Deissenroth
Experimental validation of the Geant4 ion-ion models for carbon beams interaction at the hadron-therapy energy range (0 - 400 AMeV)
Geant4 is a Monte Carlo toolkit describing transport and interaction of particles with matter. Geant4 covers all particles and materials, and its geometry description allows for complex geometries.
Initially focused on high energy applications, the use of Geant4 is growing also in different like radioprotection, dosimetry, space radiation and external radiotherapy with proton and carbon beams.
External radiotherapy using ion beams presents many advantages, both in terms of dose distributions and in biological efficiencies, compared to either conventional electron or photon beams as well as compared to the proton therapy. Nevertheless, an efficient and proper use of ions for patient irradiation requires a very accurate understanding of the complex processes governing interactions of ions with matter for both electromagnetic and hadronic interactions.
In particular, the accurate knowledge of secondary neutral and charged particles production is of fundamental importance as it is strictly related to the biological dose released in tissues. Dose released in an ion-therapy treatment cannot be, in fact, correctly evaluated without these information.
Is it moreover demonstrated that a lack exists for both experimental data (in terms of accurate double differential production cross sections) and validated nucleus-nucleus models in the particles and energy ranges typical of hadron-therapy applications: light incident ions (up to Carbon) at energies between 0 and 400 AMeV.
In this work we will report and discuss a set of specific validations we performed to the test some of the nucleus-nucleus models actually provided inside Geant4. Double differential production cross sections of neutron and charged particles from 12C beams on different thin targets, obtained using alternative Geant4 models, will be compared to existing published data and to new data acquired by our group in a dedicated experiment performed at INFN/LNS.
Expression and cut parser for CMS event data
We present a parser to evaluate expressions and boolean selections that is applied on CMS event data for event filtering and analysis purposes. The parser is based on boost spirit grammar definition, and uses Reflex dictionary for class introspections. The parser allows a natural definition of expressions and cuts in users configuration, and provides good run-time performances compared to other existing parsers.
(INFN Sezione di Napoli)
Fast Simulation of the CMS detector at the LHC
The experiments at the Large Hadron Collider (LHC) will start their search for answers to some of the remaining puzzles of particle physics in 2008. All of these experiments rely on a very precise Monte Carlo Simulation of the physical and technical processes in the detectors.
A fast simulation has been developed within the CMS experiment, which is between 100-1000 times faster than its Geant4-based counterpart, at the same level of accuracy. Already now, the fast simulation is essential for the analyses carried out in CMS, because it facilitates studies of high statistics physics backgrounds and systematic errors that would otherwise be impossible to evaluate.
Its simple and flexible design will be a major asset toward a quick and accurate tuning on the first data.
The methods applied in the fast simulation, both software and physics wise, are being outlined. This includes the concepts of simulating the interaction of particles with the detector material and the response of the various parts of the detector, namely the silicon tracker, the electromagnetic and hadron-calorimeters and the muon system.
(University of Rochester)
Fit of weighted histograms in the ROOT framework.
Weighted histograms are often used for the estimation of a probability density functions in High Energy Physics. The bin contents of a weighted histogram can be considered as a sum of random variables with random number of terms. A generalization of the Pearson’s chi-square statistics for weighted histograms and for weighted histograms with unknown normalization has been recently proposed by the first author. The usage of these statistics provide the possibility of fitting the parameters of a probability density functions. A new implementation of this statistical method has been recently realized within the ROOT statistical framework using the MINUIT algorithm for minimization. We will describe this statistical method and its new implementation including some examples of applications. A numerical investigation is presented for fitting various histograms with different numbers of events. Restrictions related with the application of the procedure for histograms with small statistics of events are also discussed.
(CERN), Prof.Nikolai GAGUNASHVILI
(University of Akureyri, Iceland)
Forget multicore. The future is manycore - An outlook to the explosion of parallelism likely to occur in the LHC era.
This talk will start by reminding the audience that Moore's law is very much alive. Transistors will continue to double for every new silicon generation every other year. Chip designers are therefore trying every possible "trick" for putting the transistors to good use. The most notable one is to push more parallelism into each CPU: More and longer vectors, more parallel execution units, more cores and more hyperthreading inside each core. In addition highly parallel graphics processing units (GPUs) are also entering the game and compete efficiently with CPUs in several computing fields. The speaker will try to predict the CPU dimensions we will reach during the LHC era, based on what we have seen in the recent past and the projected roadmap for silicon. He will also discuss the impact on HEP event processing software. Can we continue to rely on event-level parallelism at the process levels or do we need to move to a new software paradigm? Finally he will show several examples for successfully threading of HEP software.
Geant4 models for simulation of multiple scattering
The process of multiple scattering of charge particles is an important component of Monte Carlo transport. At high energy it defines deviation of particles from ideal tracks and limitation of spatial resolution. Multiple scattering of low-energy electrons defines energy response and resolution of electromagnetic calorimeters. Recent progress in development of multiple scattering models within Geant4 toolkit is presented. The default Geant4 model based on Lewis approach and tuned to the available data. In order to understand precision of this model and to provide more precise alternatives new developments were carried out. The single Coulomb scatting model samples each elastic collision of a charged particle. This model is adequate for low-density media. It is combined with the new multiple scattering model based on Wentzel scattering function. This model assumed for muons and hadrons. Another new alternative model based on Goudsmit-Saunderson formalism have been developed for sampling of electron transport. The comparisons with the data are shown. The trade of precision and CPU performance is discussed with the focus on LHC detectors simulation.
HEP Specific Benchmarks of Virtual Machines on multi-core CPU Architectures
Virtualization technologies such as Xen can be used in order to satisfy the disparate and often incompatible system requirements of different user groups in shared-use computing facilities. This capability is particularly important for HEP applications, which often have restrictive requirements. The use of virtualization adds flexibility, however, it is essential that the virtualization technology place little overhead on the HEP application. We present an evaluation of the practicality of running HEP applications in multiple Virtual Machines (VMs) on a single multi-core Linux system. We use the benchmark suite used by the HEPiX CPU Benchmarking Working Group to give a quantitative evaluation relevant to the HEP community. Benchmarks are packaged inside VMs, and then the VMs are booted onto a single multi-core system. Benchmarks are then simultaneously executed on each VM to simulate highly loaded VMs running HEP applications. These techniques are applied to a variety of multi-core CPU architectures and VM configurations.
(University of Victoria)
HepMCAnalyser - a tool for MC generator validation
HepMCAnalyser is a tool for generator validation and comparisons.
It is a stable, easy-to-use and extendable framework
allowing for easy access/integration to generator level analysis.
It comprises a class library with benchmark physics processes to analyse
HepMC generator output and to fill root histogramms. A web-interface is
provided to display all or selected histogramms, compare
to references and validate the results based on Kolmogorov Tests.
Steerable example programs can be used for event generation.
The default steering is tuned to optimally align the distributions of the different generators.
The tool will be used for generator validation by the Generator Services
(GENSER) LCG project e.g. for version upgrades. It is supported on the same
platforms as the GENSER libraries and is already in use at Atlas.
(University of Goettingen)
High availability using virtualization
High availability has always been one of the main problems for a data center. Till now high availability was achieved by host per host redundancy, a highly expensive method in terms of hardware and human costs. A new approach to the problem can be offered by virtualization.
Using virtualization, it is possible to achieve a redundancy system for all the services running on a data center. This new approach to high availability allows to distribute the running virtual machines over the only servers up and running, by exploiting the features of the virtualization layer: start, stop and move virtual machines between physical hosts.
The system (3RC) is based on a finite state machine, providing the possibility to restart each virtual machine over any physical host, or reinstall it from scratch. A complete infrastructure has been developed to install operating system and middleware in a few minutes. To virtualize the main servers of a data center, a new procedure has been developed to migrate physical to virtual hosts.
The whole Grid data center SNS-PISA is running at the moment in virtual environment under the high availability system.
As extension of the 3RC architecture, several storage solutions have been tested to store and centralize all the virtual disks, from NAS to SAN, to grant data safety and access from everywhere.
Exploiting virtualization and ability to automatically reinstall a host, we provide a sort of host on demand, where the action on a virtual machine is performed only when a disaster occurs.
ILCSoft reconstruction software for the ILD Detector Concept at ILC
The International Linear Collider is proposed as the next large accelerator project in High Energy Physics. The ILD Detector Concept Study is one of three international groups working on designing a detector to be used at the ILC. The ILD Detector is being optimised to employ the so called Particle Flow paradigm. Such an approach means that hardware alone will not be able to realise the full resolution of the detector, placing a much greater significance on the reconstruction software than has traditionally been the case at previous lepton colliders. This means that it is imperative that the detector is optimised using a full reconstruction chain employing prototypes of Particle Flow Algorithms. To meet this requirement ILD has assembled a full reconstruction suite of algorithms contained in the software package ILCSoft, comprising of low level digitisation through to higher level event analysis, such as jet finders and vertexing. The reconstruction software in ILCSoft uses the modular C++ application framework Marlin that is based on the international data format LCIO. ILCSoft also contains reconstruction packages for the detector prototype test beam studies with the EUDET project. Having developers create reconstruction software for both the full detector and prototype studies within one single package maximises the of application of algorithms. In this talk we give an overview of the reconstruction software in ILCSoft.
Implementation of a Riemann Helical Fit for GlueX track reconstruction
The future GlueX detector in Hall D at Jefferson Lab is a large acceptance (almost 4pi) spectrometer
designed to facilitate the study of the excitation of the gluonic field
binding quark--anti-quark pairs into mesons.
A large solenoidal magnet will provide a 2.2-Tesla field that will be used
to momentum-analyze the charged particles emerging from a liquid hydrogen
target. The trajectories
of forward-going particles will be measured with a set of
four planar cathode strip drift chamber packages with six layers per package.
The design naturally separates the track into segments where the magnetic
field is relatively constant, thereby opening up the possibility of performing
local helical fits to the data within individual packages. We have
implemented the Riemann Helical Fit algorithm to fit the track
The Riemann Helical Fit is a fast and elegant algorithm combining a circle fit
for determining the transverse momentum and a line fit for determining the dip angle and
initial z value that does not require computation of any derivative matrices.
The track segments are then linked together by swimming through the field from
one package to the next to form track candidates. A comparison between
the Riemann Circle Fit and a simple linear regression method that assumes that
the origin is on the circle will be presented. A comparison between
the Riemann Helical Fit and a full least-squares fit with a non-uniform
magnetic field will also be presented.
Improving collaborative documentation in CMS
Complete and up-to-date documentation is essential for efficient data analysis in a large and complex collaboration like CMS. Good documentation reduces the time spent in problem solving
for users and software developers.
The scientists in our research environment do not necessarily have the interests or skills of professional technical writers. This results in inconsistencies in the documentation. To improve the quality, we have
started a multidisciplinary project involving CMS user support and expertise in technical communication from the University of Turku, Finland.
In this paper, we present possible approaches to study the usability of the documentation, for instance, usability tests conducted recently for the CMS software and computing user documentation.
(Helsinki Institute of Physics HIP)
INSPIRE: a new scientific information system for HEP
The status of high-energy physics (HEP) information systems has been jointly analyzed by the libraries of CERN, DESY, Fermilab and SLAC. As a result, the four laboratories have started the INSPIRE project – a new platform built by moving the successful SPIRES features and content, curated at DESY, Fermilab and SLAC, into the open-source CDS Invenio digital library software that was developed at CERN.
INSPIRE will integrate present acquisition workflows and databases to host the entire body of the HEP literature (about one million records), aiming to become the reference HEP scientific information platform worldwide. It will provide users with fast access to full-text journal articles and preprints, but also material such as conference slides and multimedia. INSPIRE will empower scientists with new tools to discover and access the results most relevant to their research, enable novel text- and data-mining applications, and deploy new metrics to assess the impact of articles and authors. In addition, it will introduce the "Web 2.0" paradigm of user-enriched content in the domain of sciences, with community-based approaches to scientific publishing.
INSPIRE represents a natural evolution of scholarly communication built on successful community-based information systems, and it provides a vision for information management in other fields of science. Inspired by the needs of HEP, we hope that the INSPIRE project will be inspiring for other communities.
LHC First Beam Event Display at CMS from online to the World Press - the first 3 minutes
Geneva, 10 September 2008. The first beam in the Large Hadron Collider at CERN was successfully steered around the full 27 kilometers of the world¿s most powerful particle accelerator at 10h28 this morning. This historic event marks a key moment in the transition from over two decades of preparation to a new era of scientific discovery. (http://www.interactions.org/cms/?pid=1026796)
From 9:44 am CET attention of the CMS physicists in the control room is drawn to the CMS event display - the "eyes" of the detector. We observe the tell-tale splash events, the beam gas and beam halo muons. We see in real time how the beam events become more and more clean as the beam is corrected.
The article describes the key component of the CMS event display: IGUANA - a well-established generic interactive visualisation framework based on a C++ component model and open-source graphics products. We describe developments since the last CHEP, including: online displays of the first real beam gas and beam halo data from the LHC first beam, flexible interactive configuration, integration with CMSSW framework, event navigation and filtering. We give an overview of the deployment and maintenance procedures in the commissioning and early detector operation and how the lessons learnt help us in getting ready for collisions.
MDT data quality assessment at the Calibration centre for the ATLAS experiment at LHC
ATLAS is a large multipurpose detector, presently in the final phase of construction at LHC, the CERN Large Hadron Collider accelerator. In ATLAS the muon detection is performed by a huge magnetic spectrometer, built with the Monitored Drift Tube (MDT) technology. It consists of more than 1,000 chambers and 350,000 drift tubes, which have to be controlled to a spatial accuracy better than 10 micrometers and an efficiency close to 100%. Therefore, the detector automated monitor is an essential aspect of the operation of the spectrometer. The quality procedure collects data from online and offline sources and from the "calibration stream" at the calibration centres, situated in Ann Arbor (Michigan), MPI (Munich) and INFN Rome. The assessment at the Calibration Centres is performed using the DQHistogramAnalyzer utility of the Athena package. This application checks the histograms in an automated way and, after a further inspection with a human interface, reports results and summaries. In this study a complete description of the entire chain, from the calibration stream up to the database storage is presented. Special algorithms have been implemented in the DQHistogramAnalyzer for the Monitored Drift Tube chambers. A detailed web display is provided for easy data quality consultation. The analysis flag is stored inside an Oracle Database using the COOL LCG library, through a C++ object-oriented interface. This quality flag is compared with the online and offline results, produced in a similar way, and the final decision is stored in a DB using a standalone C++ tool. The final DB, which uses the same COOL technology, is accessed by the reconstruction and analysis programs.
Monte Carlo simulations of spallation experiments
Monte Carlo codes MCNPX and FLUKA are used to analyze the experiments on
simplified Accelerator Driven Systems, which are performed at the Joint
Institute for Nuclear Research Dubna. At the experiments, protons or
deuterons with the energy in the GeV range are directed to thick, lead
targets surrounded by different moderators and neutron multipliers. Monte
Carlo simulations of these complex systems are performed using PBS and MPI
parallelization. The processing powers of some systems and experience with
such types of parallelization are presented.
(Nuclear Physics institute AS CR, Rez)
Multi-threaded Event Reconstruction with JANA
Multi-threading is a tool that is not only well suited to high statistics
event analysis, but is particularly useful for taking advantage of the
next generation many-core CPUs. The JANA event processing framework has
been designed to implement multi-threading through use of posix
threads. Thoughtful implementation allows reconstruction packages to be
developed that are thread enabled while requiring little or no knowledge
of thread programming by the reconstruction code authors. How this design
goal is achieved along with test results showing rate scaling for CPU bound
jobs as well as improved performance on I/O bound jobs will be shown.
Muon identification procedure for the ATLAS detector at the LHC using Muonboy reconstruction package and tests of its performance using cosmic rays and single beam data
ATLAS is one of the four experiments at the Large Hadron Collider (LHC) at CERN. This experiment has been designed to study a large range of physics including searches for previously unobserved phenomena such as the Higgs Boson and super-symmetry. The ATLAS Muon Spectrometer (MS) is optimized to measure final state muons in a large momentum range, from a few GeV up to TeV. Its momentum resolution varies
from (2-3%) at 10-100 GeV/c to 10% at 1 TeV, taking into account the high level background environment, the inhomogeneous magnetic field, and the large size of the apparatus (24 m diameter by 44 m length). A robust muon identification and high momentum measurement accuracy is crucial to fully exploit the physics potential of the LHC.
The basic principles of the muon reconstruction package "Muonboy" are discussed in this paper. Details of the modifications done in order to adapt the pattern recognition to the cosmic-ray configuration as well as its performance with the recent cosmic-rays and single beam data are presented.
Network Information and Monitoring Infrastructure (NIMI)
Fermilab is a high energy physics research lab that maintains a highly dynamic
network which typically supports around 15,000 active nodes.
Due to the open nature of the scientific research conducted at FNAL,
the portion of the network used to support open scientific research
requires high bandwidth connectivity to numerous collaborating institutions
around the world, and must facilitate convenient access by scientists
at those institutions. Network Information and Monitoring
Infrastructure (NIMI) is a framework built to help network management
personnel and the computer security team monitor and manage the FNAL network.
This includes the portions of the network used to support open scientific
research as well as the portions for more tightly controlled administrative
and scientific support activities.
As an infrastructure, NIMI has been used to build such applications as Node Directory, Network Inventory Database and Computer Security Issue Tracking System (TIssue). These applications have been successfully used by FNAL
Computing Division personnel to manage local network, maintain necessary
level of protection of LAN participants against external threats and
promptly respond to computer security incidents.
The article will discuss NIMI structure, functionality of major NIMI-based
applications, history of the project, its current status and future plans.
New development of CASTOR at IHEP
Some large experiments at IHEP will generate more than 5 Petabytes of data in the next few years, which brings great challenges for data analysis and storage. CERN CASTOR version 1 was firstly deployed at IHEP in 2003, but now it is difficult to meet the new requirements. Taking into account the issues of management, commercial software etc., we don’t update CASTOR from version 1 to version 2. Instead, based on CASTOR version 1 and MySQL, we developed a new open-source software with good scalability, high performance and easy-to-use features. This paper will give the introduction of our requirements, the design and implementation of new stager, which we call DCC (disk cache for CASTOR), MySQL 5.x compatibility, LTO4 tape support, the deployment, monitoring, alerting and so on. DCC adopts database centric architecture just like CASTOR version 2 stager, which makes it more modular and flexible. The detailed design and performance measure of DCC will also be described in this paper.
(Institute of High Energy Physics,Chinese Academy of Sciences)
New Developments in File-based Infrastructure for ATLAS Event Selection
In ATLAS software, TAGs are event metadata records that can be stored in various technologies, including ROOT files and relational databases. TAGs are used to identify and extract events that satisfy certain selection predicates, which can be coded as SQL-style queries.
Several new developments in file-based TAG infrastructure are presented.
TAG collection files support in-file metadata to store information describing all events in the collection. Event Selector functionality has been augmented to provide such collection-level metadata to subsequent algorithms.
The ATLAS I/O framework has been extended to allow computational processing of TAG attributes to select or reject events without reading the event data. This capability enables physicists to use more detailed selection criteria than are feasible in an SQL query. For example, the TAGs contain enough information not only to check the number of electrons, but also to calculate their distance to the closest jet--a calculation that would be difficult to express in SQL.
Another new development allows ATLAS to write TAGs directly into event data files. This feature can improve performance by supporting advanced event selection capabilities, including computational processing of TAG information, without the need for external TAG file or database access.
DrPeter Van Gemmeren
(Argonne National Laboratory)
Physics and Software validation for ATLAS
The ATLAS experiment recently entered the data taking phase, with the
focus shifting from software development to validation.
The ATLAS software has to be both robust to process large datasets and
produce the high quality output needed for the experiment scientific
exploitation. The validation process is discussed in this talk,
starting from the validation of the nightly builds and pre-releases to
the final validation of software releases used for data taking and
A few thousands events are processed every day using the most recent
nightly build and physics and technical histograms are processed
automatically. New versions of the software are released every 3 weeks
and are validated using a set of 100K events that are monitored by
people appointed by each of the ATLAS subsystems. Patch version of the
software can be deployed at the ATLAS Tier0 and on the grid within a
12-24 hours cycle and a crew of validation shifters continuously
monitor bug reports that are submitted by the operation teams.
(IFAE Barcelona), Davide Costanzo
(University of Sheffield), Iacopo Vivarelli
(INFN and University of Pisa), Manuel Gallas
Pixel detector Data Quality Monitoring in CMS
The silicon pixel detector in CMS contains approximately 66 million
channels, and will provide extremely high tracking resolution for the experiment. To ensure the data collected is valid, it must be monitored continuously at all levels of acquisition and reconstruction. The Pixel Data Quality Monitoring process ensures that the detector, as well as the data acquisition and reconstruction chain, is functioning properly. It is critical that the monitoring process not only examine the pixel detector with high enough granularity such that potential problems can be identified and isolated, but also run quickly enough that action can be taken before much data is compromised. We present a summary of the software system we have developed to accomplish this task. We focus on the implementation designed to maximize the amount of available information, and the methodology by which we store persistent information such that known problems can be recorded and historical trends preserved.
(Dept. of Physics and Astronomy-Rutgers, State Univ. of New Jerse)
Powerfarm: a power and emergency management thread-based software tool for the ATLAS Napoli Tier2
The large potential storage and computing power available in the modern grid and data centre infrastructures enable the development of the next generation grid-based computing paradigm, in which a large number of clusters are interconnected through high speed networks. Each cluster is composed of several or often hundreds of computers and devices each with its own specific role in the grid. In such a distributed environment, it is of critical importance to ensure and preserve the functioning of the data centre. It is therefore essential to have a management and fault recovery system that preserves the integrity of the systems both in presence of serious faults such as power outages or temperature peaks and in maintenance operations. In such a context, for the ATLAS INFN Napoli Tier2 and for the SCoPE project of the University “Federico II” of Napoli, we developed Powerfarm, a customizable thread-based software system that monitors several parameters such as, for example, the status of power supplies, room and CPU temperatures and promptly responds to values out of range with the appropriate actions. Powerfarm enforces hardware and software dependencies between devices and is able to switch them on/off in the particular order induced by the dependencies. Indeed, Powerfarm makes use of specific parametric plugins in order to manage virtually any kind of devices and represents the whole structure by means of XML configuration files. In this optic, Powerfarm may become an indispensable tool for power and emergency management of the modern grid and data centre infrastructures.
R&D on co-working transport schemes in Geant4
A R&D project, named NANO5, has been recently launched at INFN to address fundamental methods in radiation transport simulation and revisit Geant4 kernel design to cope with new experimental requirements.
The project, that gathers an international collaborating team, focuses on simulation at different scales in the same environment. This issue requires novel methodological approaches to radiation transport across the current boundaries of condensed-random-walk and discrete methods: the ability is needed to change the scale at which the problem is described and analyzed within a complex experimental set-up.
An exploration is also foreseen about exploiting and extending already existing Geant4 features to apply Monte Carlo and deterministic transport methods in the same simulation environment.
The new developments have been motivated by requirements in various physics domains, which challenge the conventional application domain of Monte Carlo transport codes like Geant4: ongoing R&D for nanotechnology-based tracking detectors for HEP experiments, radiation effects on components at high luminosity colliders and in space science, optimization of astrophysics instrumentation, nanodosimetry, investigations of new generation nuclear power sources etc.
The main features of the project are presented, together with the first prototype developments and results. A new concept introduced in the simulation – mutable physics entities (process, model or other physics-aware object), whose state and behavior depend on the environment and may evolve as an effect of it, is illustrated. The interdisciplinary nature of the R&D is described, highlighting the mutual benefits of collaborative contributions and beta-testing in HEP and other physics research domains.
DrMaria Grazia Pia
RSC: tool for analysis modelling, combination and statistical studies
RSC is a software framework based on the RooFit technology and born for the CMS experiment community, whose scope is to allow the modelling and combination of multiple analysis channels together with the accomplishment of statistical studies. That is performed through a variety of methods described in the literature implemented as classes. The design of these classes is oriented to the execution of multilple cpu intensive jobs on batch systems or on the GRID, facilitating the splitting of the calculations and the recollection of the results. In addition the production of plots by means of sophisticated formatting, drawing and graphics manipulation routines is provided transparently for the user.
Analyses and their combinations are characterised in configuration files, thus separating physics inputs from the C++ code. The deployment of such a feature eases the sharing of the input models among the analysis groups establishing common guidelines to summarise Physics results.
A maximum statistical advantage can be drawn from the analyses combination allowing the definition of common variables, constrained parameters and arbitrary correlations among the different quantities.
RSC is therefore meant to complement the existing analyses by means of their combination therewith obtaining earlier discoveries, sharper limits and more refined measurements of physically relevant quantities.
Simulations and software tools for the CMS Tracker at SLHC
The luminosity upgrade of the Large Hadron Collider (SLHC) is foreseen starting from 2013. An eventual factor-of-ten increase in LHC statistics will have a major impact in the LHC Physics program. However, the SLHC as well as offering the possibility to increase the physics potential will create an extreme operating environment for the detectors, particularly the tracking devices and the trigger system. An increase in the number of minimum-bias events by at least an order of magnitude beyond the levels envisioned for design luminosity creates the need to handle much higher occupancies and for the innermost layers unprecedented levels of radiation.
This will require a fully upgraded tracking system giving a higher granularity, while trying not to exceed the material budget and power levels of the current system, and a revision of the current trigger system. Additional trigger information from the rebuilt tracking system could reduce the L1 trigger rate or could be used earlier in the higher level triggers. Detailed simulations are needed to help in the design of the new Tracker and to study the possibility of including tracking information in the L1 trigger system. At the same time, the huge increase in pile-up events
imposes sever constraints also in the existing software that needs to be
optimized in order to produce realistic studies for SLHC.This will require a fully upgraded tracking system giving a higher granularity, while trying not to exceed the material budget and power levels of the current system.
Detailed simulations are needed to help in the design of the new Tracker and to study the possibility of including tracking information in the L1 trigger system. At the same time, the huge increase in pile-up events imposes sever constraints also in the existing software that needs to be optimized in order to produce realistic studies for SLHC.
Storm-GPFS-TSM: a new approach to Hierarchical Storage Management for the LHC experiments
In the framework of WLCG, the Tier-1 computing centres have
very stringent requirements in the sector of the data storage,
in terms of size, performance and reliability.
Since some years, at the INFN-CNAF Tier-1 we have been using
two distinct storage systems: Castor as tape-based storage
solution (also known as the D0T1 storage class in the WLCG language) and the General Parallel File System (GPFS), in conjuction with StoRM as a SRM service, for pure disk access (D1T0). Commencing 2008 we have started to explore the possibility of employing GPFS together with the tape management software TSM as a solution for realizing a tape-disk infrastructure, first implementing a
D1T1 storage class (files always on disk with a backup on tape), and then also a D0T1 (hence involving also active recalls of files from tape to disk). The first StoRM-GPFS-TSM D1T1 system is nowadays already in production at CNAF for the
LHCb experiment, while a prototype of D0T1 system is under
development and study. We describe the details of the new D1T1
and D0T1 implementations, discussing the differences between
the Castor-based solution and the StoRM-GPFS-TSM one. We also
present the results of some performance studies of the novel
D1T1 and D0T1 systems.
Swiss ATLAS Grid computing in preparation for the LHC collision data
Computing for ATLAS in Switzerland has two Tier-3 sites with several years of experience, owned by Universities of Berne and Geneva. They have been used for ATLAS Monte Carlo production, centrally controlled via the NorduGrid, since 2005. The Tier-3 sites are under continuous development.
In case of Geneva the proximity of CERN leads to additional use cases, related to commissioning of the experiment. The work requires processing of the latest ATLAS data using the latest software under development, which is not distributed to grid sites. We rely on the AFS file system to have the software and we are planning to rely on the ATLAS Distributed Data Management for what concerns latest data. An SRM interface will be installed in Geneva for this purpose.
The Swiss Tier-2 at the CSCS centre has a recent and powerful cluster, serving three LHC experiments, including ATLAS. The system features two implementations of the grid middleware, NorduGrid ARC and the LCG gLite, which operate simultaneously on the same resources.
In this talk will present our implementation choices and our experience with hardware, middleware and ATLAS-specific grid software. We will discuss the requirements of our users and how we meet them. We will present the status of our work and our plans for the ATLAS data taking period in 2009.
(DPNC, University of Geneva)
The ALTAS b-Tagging Infrastructure
The ATLAS detector, one of the two collider experiments at the Large Hadron Collider, will take high energy collision data for the first time in 2009. A general purpose detector, its physics program encompasses everything from Standard Model physics to specific searches for beyond-the-standard-model signatures. One important aspect of separating the signal from large Standard Model backgrounds is the accurate identification of jets of particles originating from a bottom quark. A physics analysis in-and-of-itself, ATLAS has developed a series of algorithms based on the unique aspects of bottom quark decay (soft lepton association, long life time). This talk gives a brief overview of these algorithms and the software infrastructure required to support them in a production environment like the one found at ATLAS. Some attention will also be paid to the different perspectives of the algorithm writer, who wants to understand exactly how a jet is tagged as being from a bottom quark, and an analysis user, who is only curious to know if a jet is “tagged” and what the fake rate is.
(UNIVERSITY OF WASHINGTON), DrLaurent Vacavant
The ATLAS Detector Digitization Project for 2009 data taking
The ATLAS digitization project is steered by a top-level PYTHON digitization package which ensures uniform and consistent configuration across the subdetectors. The properties of the digitization algorithms were tuned to reproduce the detector response seen in lab tests, test beam data and cosmic ray running. Dead channels and noise rates are read from database tables to reproduce conditions seen in a particular run. The digits are then persistified as Raw Data Objects (RDO) with or without intermediate bytestream simulation depending on the detector type. Emphasis is put on the description of the digitization project configuration, its flexibility in events handling for processing and in the global detector configuration, as well as its variety of options including detector noise simulation, random number service, metadata and details of pile-up background events to be overlaid. The LHC beam bunch spacing is also configurable, as well as the number of bunch crossings to overlay and the default detector conditions (including noisy channels, dead electronics associated with each detector layout). Cavern background calculation, beam halo and beam gas treatment, pile-up with real data is also part of this report.
(Dept. of Physics, Cavendish Lab.)
The CMS Tracker calibration workflow: experience with cosmic ray data
The CMS Silicon Strip Tracker (SST) consists of 25000 silicon microstrip sensors covering an area of 210m2 and 10 million readout channels. Starting from December 2007 the SST has been inserted and connected inside the CMS experiment and since summer 2008 it has been commissioned using cosmic muons with and without magnetic field. During these data taking the performance of the SST have been carefully studied: the noise of the detector, together with its correlations with the strip length and the temperature, the data integrity, the S/N ratio, the hit reconstruction efficiency and the calibration constants have been all monitored with time and for different conditions, at the full detector granularity. In this presentation an overview of the SST calibration workflow and the detector performance results will be given.
(Dipartimento di Fisica - Universita di Firenze)
The CMSSW benchmarking suite: using HEP code to measure cpu performance
The demanding computing needs of the CMS experiment require thoughtful planning and management of its computing infrastructure. A key factor in this process is the use of realistic benchmarks when assessing the computing power of the different architectures available. In recent years a discrepancy has been observed between the cpu performance estimates given by the reference benchmark for HEP computing (SPEC INT) and actual performances of HEP code. Making use of the cpu performance tools from the CMSSW performance suite, comparative cpu performance studies have been carried out on several architectures. A benchmarking suite has been developed and integrated in the CMSSW framework, to allow computing centers and interested third parties to benchmark architectures directly with CMSSW. The CMSSW benchmarking suite can be used out of the box, to test and compare several machines in terms of CPU performance and report with the wanted level of detail the different benchmarking scores (e.g. by processing step) and results. In this talk we describe briefly the CMSSW software performance suite, and in detail the CMSSW benchmarking suite client/server design, the performance data analysis and the choice and composition of the benchmark scores. The interesting issues encountered in the use of HEP code for benchmarking will be discussed and CMSSW benchmark results presented.
(CERN PH Dept (for the CMS collaboration))
The Effect of the Fragmentation Problem in Decision Tree Learning Applied to the Search for Single Top Quark Production
Decision tree learning constitutes a suitable approach to classification due to its ability to partition the input (variable) space into regions of class-uniform events, while providing a structure amenable to interpretation (as opposed to other methods such as neural networks). But an inherent limitation of decision tree learning is the progressive lessening of the statistical support of the final classifier as clusters of single-class events are split on every partition, a problem known as the fragmentation problem. We describe a software system that measures the degree of fragmentation caused by a decision tree learner on every event cluster. Clusters are found through a decomposition of the data using a technique known as Spectral Clustering. Each cluster is analyzed in terms of the number and type of partitions induced by the decision tree. Our domain of application lies on the search for single top quark production, a challenging problem due to large backgrounds (similar to W+jets and tt¯ events), low energetic signals, and low number of jets. The output of the machine-learning software tool consists of a series of statistics describing the degree of classification error attributed to the fragmentation problem.
The Introduction of Data Analysis System of MDC for BEPCII/BESIII
The BEPCII/BESIII(Beijing Electron Positron Collider / Beijing Spectrometer) had been installed and operated successfully in July 2008 and has been commissioning since Sep. 2008. The luminosity has reached 1.3*1032 cm-2s-1@489mA*530mA with 90 bunches now. About 13M psi(2S) physics data is collected by BESIII.
The offline data analysis system of BESIII have been tested and operated to handle the real experiments data. The data analysis system of the MDC(Main Drift Chamber) includes the event reconstruction, track fitting, offline calibration and events start time algorithm and Monte Carlo tuning between the MC data and real data. Among them, the Event Start Time Determination is the first step of charged track reconstruction of MDC. It is the important process in the Charged particle track reconstruction of BESIII offline data analysis, because of the multi-bunch colliding mode used in the BEPCII, the pipeline arrangement method of trigger system is used in the BESIII data acquisition system, a special time measurement method is used for the MDC electronic system.
The performance of the software System of MDC, includes the tracking efficiency, CPU consume, the preliminary results of offline calibration and Monte Carlo tuning of MDC for real experiment data are presented. The preliminary performance of MDC is indicated: the spatial resolution is about 128um, the momentum resolution is about 0.81%.
(Institute of High energy Physics, Chinese Academy of Sciences)
The LHCb track fitting concept and its performance
The reconstruction of charged particles in the LHCb tracking
systems consists of two parts. The pattern recognition links
the signals belonging to the same particle. The track fitter
running after the pattern recognition extracts the best
parameter estimate out of the reconstructed tracks. A dedicated
Kalman-Fitter is used for this purpose. The track model
employed in the fit is based on a trajectory concept originally
introduced by the BaBar collaboration, which has been further
developed and improved. To scope with various applications on
trigger level and in the offline reconstruction software the
fitter has been designed to be very flexible to be adapted to
the individual requirements in CPU time and resolution. E.g. a
simplified geometry model has been introduced which speeds up
the computation time of the fitter significantly, obtaining
almost identical resolution than the full geometry description.
We will report on the LHCb fitting concept and present its
current performance in various applications based on the latest
The new spectrometer for the challenging physics in the tau-charm energy region, BESIII, has been constructed and gone into the commissioning phase at BEPCII, the upgraded e+e- collider with peak luminosity up to 10^33cm^-2s^-1 in Beijing, China. The BESIII muon detector will mainly contribute to the distinguishing muons from hadrons, especially the pions. The Resistive Plate Chambers(RPCs) have been used to the BESIII muon detector. These RPCs work in the streamer mode and are made of a new type of bakelite material with melamine treatment instead of linseed oil treatment. The offline software of BESIII muon detector has been developed successfully and validated preliminarily with cosmic ray data and Ψ(2S) data. We describe the ideas and implementation of the simulation, reconstruction and calibration packages. The detector commissioning and software validation results are presented. The Monte Carlo and data comparison are shown.
(Institute of High energy Physics, Chinese Academy of Sciences)
The Online Histogram Presenter for the ATLAS experiment: a modular system for histogram visualization
The challenging experimental environment and the extreme complexity of modern high-energy physics experiments make online monitoring an essential tool to assess the quality of the acquired data.
The Online Histogram Presenter (OHP) is the ATLAS tool to display histograms produced by the online monitoring system. In spite of the name, the Online Histogram Presenter is much more than just a histogram display. To cope with the large amount of data, the application has been designed to actively minimise the network traffic; sophisticated caching, hashing and filtering algorithms reduce memory and CPU usage. The system uses Qt and ROOT for histogram visualisation and manipulation. In addition, histogram visualisation can be extensively customised through configuration files. Finally, its very modular architecture features a lightweight plugin system, allowing extensions to accommodate specific user needs.
The Online Histogram Presenter unifies the approach to histogram visualisation inside the ATLAS online environment in a general purpose, highly configurable, interactive application. After an architectural overview of the application, the paper is going to present in detail the solutions adopted to increase the performance and a description of the plugin system. Examples of OHP use from ATLAS commissioning and first LHC beam will also be presented.
(INFN and Università Pisa)
The PetaQCD project
The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice QCD simulations started in early ’08 among a consortium of 7 laboratories (IN2P3, CNRS, INRIA, CEA) and 2 SMEs. This consortium received a grant from the French ANR agency in July, and the PetaQCD project kickoff is expected to take place in January ’09. Building upon several years of fruitful collaborative studies in this area, the aim of this project is to demonstrate that the simulation of a 256x128^3 lattice can be achieved through the HMC software, using a machine with a reasonable cost/relia-bility/power consumption. It is expected that this machine can be built out of a rather limited number of processors (e.g. between 1000 and 4000), although capable of a sustained petaflop CPU performance.
The proof-of-concept should be a mock-up cluster built as much as possible with off-the-shelf components, and 2 particularly attractive axis will be mainly investigated, in addition to fast all-purpose multi-core processors: the use of the new brand of IBM-Cell processors (with on-chip accelerators) and the very recent Nvidia GP-GPUs (off-chip co-processors). This cluster will obviously be massively parallel, and heterogeneous. Communication issues between processors, implied by the Physics of the simulation and the lattice partitioning, will certainly be a major key to the project.
The RooFit toolkit for data modeling
RooFit is a library of C++ classes that facilitate data modeling in the ROOT environment. Mathematical concepts such as variables, (probability density) functions and integrals are represented as C++ objects. The package provides a flexible framework for building complex fit models through classes that mimic math operators, and is straightforward to extend. For all constructed models RooFit provides a concise yet powerful interface for fitting (binned and unbinned likelihood, chi^2, plotting and toy Monte Carlo generation as well as sophisticated tools to manage large scale projects. RooFit has matured since 1999 into an industrial strength tool and has been used in the BABAR experiments most complicated fits. Recent developments include the ability to persist probability density functions into ROOT files that can be easily shared and used with a simple interface, without the need to distribute code. Model persistence enables the concept of digital publishing of complex physics result and provide a foundation for higher level statistical tools for the LHC experiments to calculate combined physics results.
The Status of the Simulation Project for the ATLAS Experiment in view of the LHC startup
The Simulation suite for ATLAS is in a mature phase ready to cope with the challenge of the 2009 data. The simulation framework already integrated in the ATLAS framework (Athena) offers a set of pre-configured applications for full ATLAS simulation, combined test beam setups, cosmic ray setups and old standalone test-beams. Each detector component was carefully described in all details and performance monitored. The few still missing pieces of the apparatus (forward and very forward detectors) inert material and services (toroid supports, support rails, detector feet) are about to be integrated in the current simulation suite. Detailed description of ideal and real geometry for each ATLAS subcomponent made possible optimization studies and validation. Short/medium scale productions are constantly and daily monitored through a set of tests for different samples of physics events and large scale productions on the Grid verify the robustness of the implementation as well as possible errors only visible on large statistics. Metadata handling is the latest subject of interest for the conditions monitoring and recording during the simulation process. A fast shower simulation suite was also developed in ATLAS and performance comparisons are part of the overall evaluation.
(Caltech, USA & Columbia University, USA)
The Use of the TWiki Web in ATLAS
The ATLAS Experiment, with over 2000 collaborators, needs efficient and effective means of communicating information. The Collaboration has been using the TWiki Web at CERN for over three years and now has more than 7000 web pages, some of which are protected. This number greatly exceeds the number of “static” HTML pages, and in the last year, there has been a significant migration to the TWiki.
The TWiki is one example of the many different types of Wiki web which exist. In this talk, a description will be given of the ATLAS TWiki at CERN. The tools used by the Collaboration to manage the TWiki will be described and some of the problems encountered will be explained.
A very useful development has been the creation of a set of Workbooks (Users’ Guides) – these have benefitted from the TWiki environment and, in particular, a tool to extract pdf from the associated pages.
TMVA - The Toolkit for Multivariate Data Analysis
The toolkit for multivariate analysis, TMVA, provides a large set of advanced multivariate analysis techniques for signal/background classification. In addition, TMVA now also contains regression analysis, all embedded in a framework capable of handling the pre-processing of the data and the evaluation of the output, thus allowing a simple and convenient use of multivariate techniques. The analysis techniques implemented in TMVA can be invoked easily and the direct comparison of their performance allows the user to choose the most appropriate for a particular data analysis. This talk gives an overview of the TMVA package and presents recently developed features.
Track Reconstruction in the Muon and Transition Radiation Detectors of the CBM Experiment at FAIR
The Compressed Baryonic Matter (CBM) experiment at the future FAIR accelerator at Darmstadt is being designed for a comprehensive measurement of hadron and lepton production in heavy-ion collisions from 8-45 AGeV beam energy, producing events with large track multiplicity and high hit density. The setup consists of several detectors including as tracking detectors the silicon tracking system (STS), the muon detector (MUCH) or alternatively a set of Transition Radiation Detectors (TRD).
In this contribution, the status of the track reconstruction software including track finding, fitting and propagation is presented for MUCH and TRD.
Since both MUCH and TRD detectors have similar designs where material layers are alternating with detector stations the track reconstruction algorithm is flexible with respect to its applicability to different detectors. It is an important ingredient to feasibility studies of different physics channels and to the optimization of the detectors.
The track propagation algorithm takes into account an inhomogeneous magnetic field and includes accurate calculation of multiple scattering and energy losses in the detector material. Track parameters and covariance matrices are estimated using the Kalman filter method and a Kalman filter modification by assigning weights to hits and using simulated annealing. Two different track finder methods based on track following and these approaches are developed with either using track branches or not.
The track reconstruction efficiency for central Au+Au collisions at 25 AGeV beam energy using events from the UrQMD model is at the level of 93-97% for both detectors.
TrackInCaloTools: a package for measuring muon energy loss and calorimetric isolation in ATLAS
Muons in the ATLAS detector are reconstructed by combining the information from the Inner Detectors and the Muon Spectrometer (MS), located in the outermost part of the experiment. Until they reach the MS, muons traverse typically 100 radiation lengths (X0) of material, most part instrumented by the electromagnetic and hadronic calorimeters.
The proper account for multiple scattering and energy loss effects is essential for the reconstruction and the use of the calorimeter measurement can improve the transverse momentum resolution, specially in case of high energy deposits.
On the other hand, the calorimeter activity around a muon, or conversely its isolation, is one the most powerful features to distinguish W and Z decays from semi-leptonic decays of heavy flavour mesons (containing b and c quarks).
The principle of the software that performs these tasks, called TrackInCaloTools, is presented, together with the expected performance for early LHC data in 2009 and the impact in first physics analysis.
(CEA - Saclay)
Upgrade and design of the Pluto event generator
Due to the fact, that experimental setups are usually not suited to cover the
complete full solid angle, event generators are very important tools for
experiments. Here, theoretical calculations provide valuable input as they
can describe specific distributions for parts of the kinematic variables very
precicely. The caveat is that an event has several degrees of freedom
which can be correlated. Practically, the experimental physics need a tool in
hand which allows for the exchange of almost all kinematic variables with a
manageable user interface.
Recently, the user-friendly Pluto event generator was re-designed in order to
introduce a more modular, object-oriented structure, thereby making additions
such as new particles, decays of resonances, new models up to modules for
entire changes easily applicable. Overall consistency is ensured by a plugin-
and distribution manager.
One specific feature of Pluto is that we do not use monolithic decay models
but allow for the splitting into different models in a very granular way
(e.g. to exchange form factors or total cross sections). This turned out to be
a very important tool in order to check various scenarious among with measured
data, which will be outlined with a few examples
Therefore Pluto allows for the attachment of secondary models for all kinds of
purposes. Here, a secondary model is an object for a particle/decay returning
a (complex) number as a function of a defined number of values. All models are
connected via a relative data base.
All features can be employed by the user without re-compiling the package,
which makes Pluto extremely configurable.
In our contribution, we present the new structure for the Pluto event
generator, originally intended to work for experiment proposals but now
upgraded to allow for the implementation of user-defined functions and models.
Validation and verification of Geant4 standard electromagnetic physics
The standard electromagnetic physics packages of Geant4 are used for simulation of particle transport and HEP detector response. The requirements to the precision and stability of computations are strong, for example, calorimeter response for ATLAS and CMS should be reproduced well within 1%. To keep and control long-stand quality of the package the software suites for validation and verification have been developed. In this work we describe main approaches for the validation, the structure of validation software and show examples of comparison between Geant4 simulation and the data.
VETRA - offline analysis and monitoring software platform for the LHCb VELO
The LHCb experiment is dedicated to studying CP violation and rare decays phenomena.
In order to achieve these physics goals precise tracking and vertexing around
the interaction point is crucial. This is provided by the VELO (VErtex LOcator)
silicon detector. After digitization, large FPGAs are employed to run several
algorithms to suppress noise and reconstruct clusters. This is performed by a FPGA based
processing board. An off-line software framework, VETRA, has been developed which
performs a bit perfect emulation of this complex processing in the FPGAs. This is a novel
development as this hardware emulation is not standalone but rather is fully
integrated into the LHCb software to allow the reconstruction of full data
from the detector. This software platform facilitates: developing and understanding
the behaviour of the processing algorithms; optimizing the parameters of the algorithms
that will be loaded into the FPGA; and monitoring the performance of the detector.
This framework has also been adopted by the Silicon Tracker detector of LHCb.
This framework was successfully used with the first 1500 tracks of data
in the VELO obtained from the LHC beam in September 2008.
The software architecture and utilisation of VETRA project will be discussed in detail.
Videoconferencing infrastructure at IN2P3
IN2P3, the institute bringing together HEP laboratories in France along CEA's IRFU, opened a videoconferencing service in 2002 based on a H323 MCU. This service has steadily grown up since then, serving other French communities than the HEP one, to reach an average of about 30 different conferences a day. The relatively small amount of manpower that has been devoted to this project can be explained by the very sound design and the large array of capabilities of the equipment that replaced the original one in 2005.
The service will be described, and its original mode of operation not resorting to the use of a gatekeeper, in contrast with more traditional customs, will be compared to others, notably those put in place by ESnet and DFN.
An outline of developments around MCUs that could be of interest to the whole community will be presented. Some issues of integration of this service with other collaborative tools in use today will be discussed.
Virtuality and Efficiency - Overcoming Past Antinomy in the Remote Collaboration Experience
Several recent initiatives have been put in place by the CERN IT Department to improve the user experience in remote dispersed meetings and remote collaboration at large in the LHC communities worldwide. We will present an analysis of the factors which were historically limiting the efficiency of remote dispersed meetings and describe the consequent actions which were undertaken at CERN to overcome these limitations. After giving a status update of the different equipment available at CERN to enable the virtual sessions and the various collaboration tools which are currently proposed to users, we will focus on the evolution of this market: how can the new technological trends (among others, HD videoconferencing, Telepresence, Unified Communications, etc.) impact positively the user experience and how to attain the best usage of them. Finally, by projecting ourselves in the future, we will give some hints as to how to answer the difficult question of selecting the next generation of collaboration tools: which set of tools among the various offers (systems like Vidyo H264 SVC, next generation EVO, Groupware offers, standard H323 systems, etc.) is best suited for our environment and how to unify this set for the common user. This will finally allow us to definitively overcome the past antinomy between virtuality and efficiency.
Plenary: TuesdayCongress Hall
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Live broadcasting at:
Interoperability - Grids, Clouds and Collaboratories
The reach and diversity of computationally based Collaboratories continues to expand. The quantity and quality of remote processing and storage continues to advance with new additional entrants from the Commercial Clouds and coverage by Campus, Regional and National Grids. Ensuring interoperability across all these computing facilities is an important responsibility for the common infrastructure projects and community at large.
Ruth Pordes is an Associate Head of the Fermilab Computing Division. She has a long history of working on collaborative projects between domain scientists, computing professionals and computer sciences. Ruth is the Executive Director of the Open Science Grid. She is really enjoying the new opportunities of not only supporting the core physics experiments but also bringing the organization and technology experience of these communities to the broader domains of scientific scholarship.
Collaborating at a Distance: Operations Centres, Tools, and Trends
Commissioning the LHC accelerator and experiments will be a vital part of the worldwide high-energy physics program in 2009. Remote operations centers have been established in
various locations around the world to support collaboration on LHC activities. For the CMS experiment the development of remote operations centers began with the LHC@FNAL ROC and has evolved into a unified approach with multiple operations centers, collectively referred to as CMS Centres
An overview of the development of operations centers for CMS will be presented. Other efforts to enhance remote collaboration in high-energy physics will be presented, along
with a brief overview of collaborative tools and remote operations capabilities developed in otherfields of research. Possible future developments and trends that are sure to make remote collaboration ubiquitous in high-energy physics will be explored.
Belle Monte-Carlo production on the Amazon EC2 cloud
The SuperBelle project to increase the Luminosity of the KEKB collider
by a factor 50 will search for Physics beyond the Standard Model through
precision measurements and the investigation of rare processes in
Flavour Physics. The data rate expected from the experiment is
comparable to a current era LHC experiment with commensurate Computing
needs. Incorporating commercial cloud computing, such as that provided
the Amazon Elastic Computing Cloud (EC2), into the SuperBelle computing
model may provide a lower Total Cost of Ownership for the SuperBelle
To investigate this possibility, we have deployed the complete Belle
Monte-Carlo simulation chain on EC2 to benchmark the cost and
performance of the service. This presentation will describe how this was
achieved as well as the bottlenecks and costs of large-scale Monte-Carlo
production on EC2.
(University of Melbourne)
coffee break, exhibits and posters
Plenary: TuesdayCongress Hall
Prague Congress Centre
5. května 65, 140 00 Prague 4, Czech Republic
Live broadcasting at:
Addressing the Challenges of High Performance Computing with IBM Innovation and iDataPlex: Take Advantage of Cooler, Denser, and More Efficient Compute Power
In 2008 IBM shattered the U.S. patent record becoming the first company to
surpass 4,000 patents in a single year - the 16th consecutive year that IBM
has achieved U.S. patent leadership. Come learn how IBM has leveraged our
deep Research and Development innovation to deliver the iDataPlex server
solution. With over 40 patented innovations, the iDataPlex product is one
of the x86 first clean-sheet designs optimized for energy efficient High
IBM has built iDataPlex from the ground up to maximize data center density,
optimize server deployment efficiency, to use less energy, be easy to
service and to lower your high performance computing expenses. The IBM
innovation in the iDataPlex solution results in up to 40% less energy
consumption (when compared to equivalently configured standard 1U servers),
enables you to efficiently deploy racks of servers at a time and offers an
option to virtually eliminate special data center air conditioning. This
presentation will cover these features and explore the technology behind
the iDataPlex High Performance Computing alternative.