2โ€“9 Sept 2007
Victoria, Canada
Europe/Zurich timezone
Please book accomodation as soon as possible.

Session

Computer facilities, production grids and networking

CF
3 Sept 2007, 14:00
Victoria, Canada

Victoria, Canada

Conveners

Computer facilities, production grids and networking: CF 1

  • Kors Bos (NIKEF)

Computer facilities, production grids and networking: CF 2

  • Kors Bos (NIKEF)

Computer facilities, production grids and networking: CF 3

  • Kors Bos (NIKEF)

Computer facilities, production grids and networking: CF 4

  • Kors Bos (NIKEF)

Computer facilities, production grids and networking: CF 5

  • Kors Bos (NIKEF)

Computer facilities, production grids and networking: CF 6

  • Kors Bos (NIKEF)

Computer facilities, production grids and networking: CF 7

  • Kors Bos (NIKEF)

Presentation materials

There are no materials yet.

  1. Dr Jamie Shiers (CERN)
    03/09/2007, 14:00
    Computer facilities, production grids and networking
    oral presentation
    This talk summarises the main lessons learnt from deploying WLCG production services, with a focus on Reliability, Scalability, Accountability, which lead to both manageability and usability. Each topic is analysed in turn. Techniques for zero-user-visible downtime for the main service interventions are described, together with pathological cases that need special treatment. The...
    Go to contribution page
  2. Dr Markus Schulz (CERN)
    03/09/2007, 14:20
    Computer facilities, production grids and networking
    oral presentation
    Today's production Grids connect large numbers of distributed hosts using high throughput networks and hence are valuable targets for attackers. In the same way users transparently access any Grid service independently of its location, an attacker may attempt to propagate an attack to different sites that are part of a Grid. In order to contain and resolve the incident, and since such an...
    Go to contribution page
  3. Mrs Ruth Pordes (FERMILAB)
    03/09/2007, 14:40
    Computer facilities, production grids and networking
    oral presentation
    The Open Science Grid (OSG) is receiving five years of funding across six program offices of the Department of Energy Office of Science and the National Science Foundation. OSG is responsible for operating a secure production-quality distributed infrastructure, a reference software stack including the Virtual Data Toolkit (VDT), extending the capabilities of the high throughput virtual...
    Go to contribution page
  4. Dr Jeremy Coles (RAL)
    03/09/2007, 15:00
    Computer facilities, production grids and networking
    oral presentation
    Over the last few years, UK research centres have provided significant computing resources for many high-energy physics collaborations under the guidance of the GridPP project. This paper reviews recent progress in the Grid deployment and operations area including findings from recent experiment and infrastructure service challenges. These results are discussed in the context of how GridPP...
    Go to contribution page
  5. Dr Pavel Murat (Fermilab)
    03/09/2007, 15:20
    Computer facilities, production grids and networking
    oral presentation
    CDFII detector at Fermilab is taking physics data since 2002. The architechture of the CDF computing system has substantially evolved during the years of the data taking and currently it reached stable configuration which will allow experiment to process and analyse the data until the end of Run II. We describe major architechtural components of the CDF offline computing - dedicated...
    Go to contribution page
  6. Mr Lars Fischer (Nordic Data Grid Facility)
    03/09/2007, 15:40
    Computer facilities, production grids and networking
    oral presentation
    The Tier-1 facility operated by the Nordic DataGrid Facility (NDGF) differs significantly from other Tier-1s in several aspects: It is not located one or a few locations but instead distributed throughout the Nordic, it is not under the governance of a single organization but instead a "virtual" Tier-1 build out of resources under the control of a number of different national...
    Go to contribution page
  7. Dr Richard Mount (SLAC)
    03/09/2007, 16:30
    Computer facilities, production grids and networking
    oral presentation
    The PetaCache project started at SLAC in 2004 with support from DOE Computer Science and the SLAC HEP program. PetaCache focuses on using cost-effective solid state storage for the hottest data under analysis. We chart the evolution of metrics such as accesses per second per dollar for different storage technologies and deduce the near inevitability of a massive use of solid- state...
    Go to contribution page
  8. Dr Giuseppe Lo Presti (CERN/INFN)
    03/09/2007, 16:50
    Computer facilities, production grids and networking
    oral presentation
    In this paper we present the architecture design of the CERN Advanced Storage system (CASTOR) and its new disk cache management layer (CASTOR2). Mass storage systems at CERN have evolved over time to meet growing requirements, both in terms of scalability and fault resiliency. CASTOR2 has been designed as a Grid-capable storage resource sharing facility, with a database-centric...
    Go to contribution page
  9. Dr Horst Goeringer (GSI)
    03/09/2007, 17:10
    Computer facilities, production grids and networking
    oral presentation
    GSI in Darmstadt (Germany) is a center for heavy ion research and hosts an Alice Tier2 center. For the future FAIR experiments at GSI, CBM and Panda, the planned data rates will reach those of the current LHC experiments at Cern. Since more than ten years gStore, the GSI Mass Storage System, is successfully in operation. It is a hierarchical storage system with a unique name...
    Go to contribution page
  10. Paul Avery (University of Florida)
    03/09/2007, 17:30
    Computer facilities, production grids and networking
    oral presentation
    UltraLight is a collaboration of experimental physicists and network engineers whose purpose is to provide the network advances required to enable and facilitate petabyte-scale analysis of globally distributed data. Existing Grid-based infrastructures provide massive computing and storage resources, but are currently limited by their treatment of the network as an external, passive, and...
    Go to contribution page
  11. Ms Alessandra Forti (University of Manchester)
    03/09/2007, 17:50
    Computer facilities, production grids and networking
    oral presentation
    The HEP department of the University of Manchester has purchased a 1000 nodes cluster. The cluster is dedicated to run EGEE and LCG software and is currently supporting 12 active VOs. Each node is equipped with 2x250 GB disks for a total amount of 500 GB and there is no tape storage behind nor raid arrays are used. Three different storage solutions are currently being deployed to...
    Go to contribution page
  12. Dr Ian Fisk (FNAL)
    04/09/2007, 11:00
    Computer facilities, production grids and networking
    oral presentation
    In preparation for the start of the experiment, CMS has conducted computing, software, and analysis challenges to demonstrate the functionality, scalability, and useability of the computing and software components. These challenges are designed to validate the CMS distributed computing model by demonstrating the functionality of many components simultaneously. In the challenges CMS...
    Go to contribution page
  13. Mr Michel Jouvin (LAL / IN2P3)
    04/09/2007, 11:20
    Computer facilities, production grids and networking
    oral presentation
    Quattor is a tool aimed at efficient management of fabrics with hundred or thousand of Linux machines, still being easy enough to manage smaller clusters. It has been originally developed inside the European Data Grid (EDG) project. It is now in use at more than 30 grid sites running gLite middleware, ranging from small LCG T3 to very large one like CERN. Main goals and specific...
    Go to contribution page
  14. Torsten Antoni (Forschungszentrum Karlsruhe)
    04/09/2007, 11:40
    Computer facilities, production grids and networking
    oral presentation
    The organization and management of the user support in a global e-science computing infrastructure such as EGEE is one of the challenges of the grid. Given the widely distributed nature of the organisation, and the spread of expertise for installing, configuring, managing and troubleshooting the grid middleware services, a standard centralized model could not be deployed in EGEE. This...
    Go to contribution page
  15. Mr Antonio Retico (CERN)
    04/09/2007, 12:00
    Computer facilities, production grids and networking
    oral presentation
    Grids have the potential to revolutionise computing by providing ubiquitous, on demand access to computational services and resources. They promise to allow for on demand access and composition of computational services provided by multiple independent sources. Grids can also provide unprecedented levels of parallelism for high-performance applications. On the other hand, grid...
    Go to contribution page
  16. Dirk Duellmann (CERN)
    05/09/2007, 14:00
    Computer facilities, production grids and networking
    oral presentation
    Relational database services are a key component of the computing models for the Large Hadron Collider (LHC). A large proportion of non-event data including detector conditions, calibration, geometry and production bookkeeping metadata require reliable storage and query services in the LHC Computing Grid (LCG). Also core grid services to catalogue and distribute data cannot operate...
    Go to contribution page
  17. Dr Xavier Espinal (PIC/IFAE)
    05/09/2007, 14:20
    Computer facilities, production grids and networking
    oral presentation
    In preparation for first data at the LHC, a series of Data Challenges, of increasing scale and complexity, have been performed. Large quantities of simulated data have been produced on three different Grids, integrated into the ATLAS production system. During 2006, the emphasis moved towards providing stable continuous production, as is required in the immediate run-up to first data, and...
    Go to contribution page
  18. Mr Jose Hernandez Calama (CIEMAT)
    05/09/2007, 14:40
    Computer facilities, production grids and networking
    oral presentation
    Monte Carlo production in CMS has received a major boost in performance and scale since last CHEP conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two...
    Go to contribution page
  19. Smirnov Yuri (Brookhaven National Laboratory)
    05/09/2007, 15:20
    Computer facilities, production grids and networking
    oral presentation
    The Open Science Grid infrastructure provides one of the largest distributed computing systems deployed in the ATLAS experiment at the LHC. During the CSC exercise in 2006-2007, OSG resources provided about one third of the worldwide distributed computing resources available in ATLAS. About half a petabyte of ATLAS MC data is stored on OSG sites. About 2000k SpecInt2000 CPU's is available....
    Go to contribution page
  20. Mr Dave Evans (Fermi National Laboratory)
    05/09/2007, 15:40
    Computer facilities, production grids and networking
    oral presentation
    The CMS production system has undergone a major architectural upgrade from its predecessor, with the goals of reducing the operations manpower requirement and preparing for the large scale production required by the CMS physics plan. This paper discusses the CMS Monte Carlo Workload Management architecture. The system consist of 3 major components: ProdRequest, ProdAgent, and ProdMgr...
    Go to contribution page
  21. Mr Philip DeMar (FERMILAB)
    05/09/2007, 16:30
    Computer facilities, production grids and networking
    oral presentation
    Fermilab hosts the American Tier-1 Center for the LHC/CMS experiment. In preparation for the startup of CMS, and building upon extensive experience supporting TeVatron experiments and other science collaborations, the Laboratory has established high bandwidth, end-to-end (E2E) circuits with a number of US-CMS Tier2 sites, as well as other research facilities in the collaboration. These...
    Go to contribution page
  22. Mr Maxim Grigoriev (FERMILAB)
    05/09/2007, 16:50
    Computer facilities, production grids and networking
    oral presentation
    The LHC experiments will start very soon, creating immense data volumes capable of demanding allocation of an entire network circuit for task-driven applications. Circuit-based alternate network paths are one solution to meeting the LHC high bandwidth network requirements. The Lambda Station project is aimed at addressing growing requirements for dynamic allocation of alternate network...
    Go to contribution page
  23. Dr Matt Crawford (FERMILAB)
    05/09/2007, 17:10
    Computer facilities, production grids and networking
    oral presentation
    Due to shortages of IPv4 address space - real or artificial - many HEP computing installations have turned to NAT and application gateways. These workarounds carry a high cost in application complexity and performance. Recently a few HEP facilities have begun to deploy IPv6 and it is expected that many more must follow within several years. While IPv6 removes the problem of address...
    Go to contribution page
  24. Mr Maxim Grigoriev (FERMILAB)
    05/09/2007, 17:30
    Computer facilities, production grids and networking
    oral presentation
    End-to-end (E2E) circuits are used to carry high impact data movement into and out of the US CMS Tier-1 Center at Fermilab. E2E circuits have been implemented to facilitate the movement of raw experiment data from Tier-0, as well as processed data to and from a number of the US Tier-2 sites. Troubleshooting and monitoring those circuits presents a challenge, since the circuits typically...
    Go to contribution page
  25. Dr Luc Goossens (CERN)
    05/09/2007, 17:50
    Computer facilities, production grids and networking
    oral presentation
    ATLAS is a multi-purpose experiment at the LHC at CERN, which will start taking data in November 2007. To handle and process the unprecedented data rates expected at the LHC (at nominal operation, ATLAS will record about 10 PB of raw data per year) poses a huge challenge on the computing infrastructure. The ATLAS Computing Model foresees a multi-tier hierarchical model to perform this...
    Go to contribution page
  26. Dr Lukas Nellen (I. de Ciencias Nucleares, UNAM)
    06/09/2007, 14:00
    Computer facilities, production grids and networking
    oral presentation
    The EELA project aims at building a grid infrastructure in Latin America and at attracting users to this infrastructure. The EELA infrastructure is based on the gLite middleware, developed by the EGEE project. A test-bed, including several European and Latin American countries, was set up in the first months of the project. Several applications from different areas, especially...
    Go to contribution page
  27. Dr Alexei Klimentov (BNL)
    06/09/2007, 14:20
    Computer facilities, production grids and networking
    oral presentation
    ATLAS Distributed Data Management Operations Team unites experts from Tier-1s and Tier-2s computer centers. The group is responsible for all day by day ATLAS data distribution between different sites and centers. In our paper we describe ATLAS DDM operation model and address the data management and operation issues. A serie of Functional Tests have been conducted in the past and is in...
    Go to contribution page
  28. Dr Daniele Bonacorsi (INFN-CNAF, Bologna, Italy)
    06/09/2007, 14:40
    Computer facilities, production grids and networking
    oral presentation
    The CMS experiment is gaining experience towards the data taking in several computing preparation activities, and a roadmap towards a mature computing operations model stands as a primary target. The responsibility of the Computing Operations projects in the complex CMS computing environment spawns a wide area and aims at integrating the management of the CMS Facilities Infrastructure,...
    Go to contribution page
  29. Luca dell'Agnello (INFN-CNAF)
    06/09/2007, 15:00
    Computer facilities, production grids and networking
    oral presentation
    Performance, reliability and scalability in data access are key issues when considered in the context of HEP data processing and analysis applications. The importance of these topics is even larger when considering the quantity of data and the request load that a LHC data centers has to support. In this paper we give the results and the technical details of a large scale validation,...
    Go to contribution page
  30. Jan van ELDIK (CERN)
    06/09/2007, 15:20
    Computer facilities, production grids and networking
    oral presentation
    This paper presents work, both completed and planned, for streamlining the deployment, operation and re-tasking of Castor2 instances. We present a summary of what has recently been done to reduce the human intervention necessary for bringing systems into operation; including the automation of Grid host certificate requests and deployment in conjunction with the CERN Trusted CA and...
    Go to contribution page
  31. Mr Timur Perelmutov (FERMILAB)
    06/09/2007, 15:40
    Computer facilities, production grids and networking
    oral presentation
    The Storage Resource Manager (SRM) and WLCG collaborations recently defined version 2.2 of the SRM protocol, with the goal of satisfying the requirement of the LCH experiments. The dCache team has now finished the implementation of all SRM v2.2 elements required by the WLCG. The new functions include space reservation, more advanced data transfer, and new namespace and permission...
    Go to contribution page
  32. Dr Maxim Potekhin (BROOKHAVEN NATIONAL LABORATORY)
    06/09/2007, 16:30
    Computer facilities, production grids and networking
    oral presentation
    The simulation program for the STAR experiment at Relativistic Heavy Ion Collider at Brookhaven National Laboratory is growing in scope and responsiveness to the needs of the research conducted by the Physics Working Groups. In addition, there is a significant ongoing R&D activity aimed at future upgrades of the STAR detector, which also requires extensive simulations support. The...
    Go to contribution page
  33. Igor Sfiligoi (Fermilab)
    06/09/2007, 16:50
    Computer facilities, production grids and networking
    oral presentation
    Pilot jobs are becoming increasingly popular in the Grid world. Experiments like ATLAS and CDF are using them in production, while others, like CMS, are actively evaluating them. Pilot jobs enter Grid sites using a generic pilot credential, and once on a worker node, call home to fetch the job of an actual user. However, this operation mode poses several new security problems when...
    Go to contribution page
  34. Dr Simone Campana (CERN/IT/PSS)
    06/09/2007, 17:10
    Computer facilities, production grids and networking
    oral presentation
    The ATLAS experiment has been running continuous simulated events production since more than two years. A considerable fraction of the jobs is daily submitted and handled via the gLite Workload Management System, which overcomes several limitations of the previous LCG Resource Broker. The gLite WMS has been tested very intensively for the LHC experiments use cases for more than six months,...
    Go to contribution page
  35. Mr Sergey Chechelnitskiy (Simon Fraser University)
    06/09/2007, 17:30
    Computer facilities, production grids and networking
    oral presentation
    SFU is responcible for running two different clusters - one is designed for WestGrid internal jobs with its specific software and the other should run Atlas jobs only. In addition to different software configuration the Atlas cluster should have a diffener networking confirugation. We would also like to have a flexibility of running jobs on different hardware. That is why it has been...
    Go to contribution page
  36. Ms Zhenping Liu (BROOKHAVEN NATIONAL LABORATORY)
    06/09/2007, 17:50
    Computer facilities, production grids and networking
    oral presentation
    BNL ATLAS Computing Facility needs to provide a Grid-based storage system with these requirements: a total of one gigabyte per second of incoming and outgoing data rate between BNL and ATLAS T0, T1 and T2 sites, thousands of reconstruction/analysis jobs accessing locally stored data objects, three petabytes of disk/tape storage in 2007 scaling up to 25 petabytes by 2011, and a...
    Go to contribution page
Building timetable...