HEPiX Fall 2013 Workshop

Name: HEPiX Fall 2013 Workshop
Start: 2013-10-28T08:00:00-04:00
End: 2013-11-01T18:00:00-04:00
Location: University Of Michigan

28 Oct 2013, 08:00 → 1 Nov 2013, 18:00 America/Detroit

340 West Hall (University Of Michigan)

340 West Hall

University Of Michigan

1085 S University Ave, Ann Arbor, MI 48109 US

Helge Meinhard (CERN), Robert Ball (University of Michigan (US)), Sandy Philpott (JLAB), Shawn Mc Kee (University of Michigan (US))

Description

HEPiX Fall 2013 at University of Michigan, Ann Arbor

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges. Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, JLAB, NIKHEF, RAL, SLAC, TRIUMF and many others.

Thanks everyone for a great conference!

Hosted by ATLAS Great Lakes Tier 2 at the University of Michigan in Ann Arbor Michigan.

Support

aglt2-umich@umich.edu

Monday 28 October
- 08:30 → 09:00
  
  Registration 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  coffee and morning snack
- 09:00 → 09:30
  Miscellaneous 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Dr Helge Meinhard (CERN), Sandy Philpott (JLAB)
  - 09:00
    
    Welcome address 20m
    
    Speaker: Homer Neal (University of Michigan (US))
    
    Slides
  - 09:20
    
    Workshop logistics 10m
    
    Speaker: Dr Shawn Mc Kee (University of Michigan (US))
    
    Slides
- 09:30 → 10:30
  Site reports 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
  - 09:30
    
    INFN-T1 site report 15m
    
    An update on what's going on at INFN-T1 center
    
    Speaker: Andrea Chierici (INFN-CNAF)
    
    Slides
  - 09:45
    
    Nikhef Site Report 15m
    
    Fall 2013 site report
    
    Speaker: Paul Kuipers (Nikhef)
    
    Slides
  - 10:00
    
    The Caltech CMS Tier2 15m
    
    The Caltech Tier2 is a major site providing substantial and reliable computational and storage resources to CMS, combining production processing of simulated events, support for US CMS physics analysis, and computing, software systems, and network developments. Caltech continues to lead key several areas of the LHC computing and software aimed at enabling grid-based data analysis, as well as global-scale networking. An update on the status of the Caltech Tier 2 is given together with the synergistic activities in networking operations and R&D at Caltech.
    
    Speaker: Dr Dorian Kcira (California Institute of Technology (US))
    
    Slides
  - 10:15
    
    AGLT2 Site Report 15m
    
    Fall 2013 ATLAS Great Lakes Tier 2 site report covering recent network updates, work with AFS on ZFS, new provisioning with cobbler and cfengine, recent experiences with dCache issues, and the usual statistics/status information.
    
    Speaker: Mr Benjeman Meekhof (University of Michigan)
    
    Slides
- 10:30 → 11:00
  
  Break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 11:00 → 12:30
  Site reports 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
  - 11:00
    
    BNL RACF Site Report 15m
    
    Brookhaven National Lab (BNL) will present the site report for the RHIC-ATLAS Computing Facility (RACF)
    
    Speakers: Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY), Dr Tony Wong (Brookhaven National Laboratory)
    
    Slides
  - 11:15
    
    NDGF Site Report 15m
    
    Overview of recent developments in the distributed NDGF Tier1. Might include a closer look at running Atlas computing on a couple of different HPC resources.
    
    Speaker: Erik Mattias Wadenstein (Unknown)
    
    Slides
  - 11:30
    
    DESY Site report 15m
    
    Fall 2013 DESY Site report
    
    Speaker: Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
    
    Slides
  - 11:45
    
    RAL Tier1 Site Report 15m
    
    Update form RAL
    
    Speaker: Martin Bly (STFC-RAL)
    
    Slides
  - 12:00
    
    The University of Wisconsin Madison CMS T2 site report 15m
    
    As a major WLCG/OSG T2 site, the University of Wisconsin Madison CMS T2 has provided very productive and reliable services for CMS MonteCarlo production/processing, and large scale global CMS physics analysis using high throughput computing, highly available storage system, and scalable distributed software systems. The close integration of the CMS specific T2 resources with that of the UW campus grid (GLOW), and the strong collaboration with the UW Condor Team has resulted in many significant contributions to the CMS computing and the global grid users community. UW CMS T2 continues to lead and contribute to several key areas of the LHC computing/software aimed at providing CMS data access to users transparently and round the clock, and serving as a testbed for expanding the CMS MC production by successfully integrating commercial cloud resources into the T2 for trial use. An update on the current status of and activities at the UW Tier-2 will be presented.
    
    Speaker: Ajit Kumar Mohapatra (University of Wisconsin (US))
    
    Slides
  - 12:15
    
    PDSF at NERSC - Site Report HEPiX Fall 2013 15m
    
    PDSF (Parallel Distributed Systems Facility) has been in continuous operation at NERSC since 1996 on dedicated and ever changing hardware, supporting a broad user base in the high energy physics community. We will describe recent and ongoing changes in the underlying architecture of PDSF. We are moving to a model where the PDSF cluster will consist of dedicated front end servers and a set of compute nodes drawn from a backend cluster, unified by a batch system. User storage will be migrated from a dedicated GPFS cluster to a site wide NERSC Global File system, facilitating the use of other NERSC clusters which have a faster and lower latency interconnect for parallel jobs. These changes should help minimize downtime when the cluster is moved to a new building at Lawrence Berkeley Laboratory in 2014/15.
    
    Speaker: James Botts (L)
    
    Slides
- 12:30 → 14:00
  
  Lunch 1h 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 14:00 → 15:15
  IT facilities and business continuity 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Convener: Wayne Salter (CERN)
  - 14:00
    
    The CSC Kajaani datacentre 25m
    
    Early 2013 CSC (the it centre for science in Finland) powered up the new datacentre on Kajaani, 600 km north of the offices. The datacentre focuses of energy efficiency and cost-cutting, but the design and implementation was not easy.
    
    Speaker: Ulf Tigerstedt (CSC Oy)
    
    Slides
  - 14:25
    
    Operating Dedicated Data Centers - Is it cost-effective? 25m
    
    The advent of cloud computing centers such as Amazon's EC2 and Google's Computing Engine has elicited comparisons with dedicated computing clusters. Discussions on appropriate usage of cloud resources (both academic and commercial) and costs have ensued. This presentation discusses a detailed analysis of the costs of operating and maintaining the RACF (RHIC and ATLAS Computing Facility) compute cluster at Brookhaven National Lab and compares them with the cost of cloud computing resources under various usage scenarios. An extrapolation of likely future cost effectiveness of dedicated computing resources is also presented.
    
    Speaker: Dr Tony Wong (Brookhaven National Laboratory)
    
    Slides
  - 14:50
    
    Safety in the Data Center 25m
    
    We describe a recent rack installation incident at the RACF and its effects on facility operations. A draft proposal to address safety issues will also be discussed.
    
    Speaker: Dr Tony Wong (Brookhaven National Laboratory)
    
    Slides
- 15:15 → 15:40
  
  Coffee Break 25m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 15:40 → 17:00
  Basic IT services 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Convener: Dr Helge Meinhard (CERN)
  - 15:40
    
    AI monitoring update 20m
    
    Speaker: Massimo Paladin (CERN)
    
    Slides
  - 16:00
    
    Building a Puppet Infrastructure @ DESY 25m
    
    Starting in 2012, DESY has been extending it's IT systems infrastructure to make use of the Puppet configuration management system developed by Puppetlabs. The main focus of this talk is to share the experience gained in this program of work and to summarize the current status and outlook of the Puppet infrastructure at the DESY site.
    
    Speaker: Jan Engels (Deutsches Elektronen-Synchrotron (DE))
    
    Slides
  - 16:25
    
    Automatic server registration and burn-in framework 25m
    
    This talk provides an overview of CERN IT’s automatic server registration and burn-in framework. It will mainly focus on the reasons behind the development of such framework, and the implementation details. A detailed walkthrough of the process stages will be presented, along with the first results of the acceptance of about 1,500 servers. Finally, the talk will underline the benefits of the server registration and certification process automation.
    
    Speaker: Afroditi Xafi (CERN)
    
    Slides
- 18:00 → 20:00
  
  Welcome Reception 2h 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
Tuesday 29 October
- 08:30 → 09:00
  
  Registration 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  coffee and morning snack
- 09:00 → 10:30
  Site reports 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
  - 09:00
    
    GridKa Site Report 15m
    
    Current status and latest news at GridKa, e.g.: - Hardware status - Storage systems - Batch system
    
    Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
    
    Slides
  - 09:15
    
    Jefferson Lab Site Report 15m
    
    An update of high performance and scientific computing activities since the Spring 2012 meeting.
    
    Speaker: Sandy Philpott (JLAB)
    
    Slides
  - 09:30
    
    UK GridPP Tier 2s Status Report 15m
    
    An update from the UK GridPP Tier 2s
    
    Speaker: Dr Chris Brew (STFC - Science & Technology Facilities Council (GB))
    
    Slides
  - 09:45
    
    Fermilab Site Report - Fall 2013 HEPiX 15m
    
    Fermilab site report - Fall 2013 HEPiX.
    
    Speaker: Dr Keith Chadwick (Fermilab)
    
    Slides
  - 10:00
    
    CERN Site report 15m
    
    News from CERN since the Bologna Workshop.
    
    Speaker: Dr Arne Wiebalck (CERN)
    
    Slides
  - 10:15
    
    IHEP Site Report 15m
    
    Speaker: Jingyan SHI
    
    Slides
- 10:30 → 11:00
  
  Coffee break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 11:00 → 12:30
  Computing and batch systems 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Gilles Mathieu (CNRS), Michele Michelotto (Universita e INFN (IT)), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
  - 11:00
    
    HS06 - A first look in the Performance per Watt 30m
    
    I started to make measurament of Power consumption when running the HEP-SPEC06 benchmark. A few slides on the move from SL5 to SL6.
    
    Speaker: Dr Michele Michelotto (Universita e INFN (IT))
    
    Slides
  - 11:30
    
    Future of Batch Processing at CERN 30m
    
    The CERN Batch System is comprised of 4000 worker nodes, 60 queues and various types of large user communities. In light of the recent developments driven by the Agile Infrastructure and the more demanding processing needs, the Batch System will be faced with increasingly challenging scalability and flexibility needs. So as to prepare for these high expectations, the CERN Batch Team have been designing a framework to be able to easily subject the Batch System to different types of strains and scalability tests. This framework has been the testbed for a number of candidate batch systems, one of which will foster the future of batch processing at CERN. So far, SLURM, Condor and Grid Engine have been under evaluation. In this talk, we present the design of this test framework and the initial results of our evaluation of the aforementioned batch systems, from a scalability and an administrative perspective.
    
    Speaker: Jerome Belleman (CERN)
    
    Slides
  - 12:00
    
    HPC Activities at CERN 30m
    
    While the majority of physics computing needs are covered sufficiently by the CERN batch services, some other applications have some special requirement and therefore need special treatment. In this presentation we will discuss which these applications are, the methodology we used in order to gather more in depth statistics and we will present some preliminary results. Finally some future steps and thoughts will be presented.
    
    Speaker: Ioannis Agtzidis (CERN)
    
    Slides
- 12:30 → 12:45
  
  Group photo 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  HEPiX Group Photo outside on steps of Randall Lab
  
  Convener: Shawn Mc Kee (University of Michigan (US))
- 12:45 → 14:00
  
  Lunch 1h 15m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 14:00 → 15:30
  Computing and batch systems 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Gilles Mathieu (CNRS), Michele Michelotto (Universita e INFN (IT)), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
  - 14:00
    
    Batch System Development and Evolution at RACF 30m
    
    Scheduling jobs with heterogeneous resource requirements to a pool of computers with heterogeneous resources is a challenging task and Condor is beginning to tackle this in the most generic form. Integrating so-called partitionable slots (a batch resource able to be sliced along a variety of dimensions, from RAM to CPUs, to disks or even GPUs) with the rest of Condor's accounting and matchmaking mechanisms is a significant challenge. At BNL we began working closely with the Condor team to integrate Condor's Hierarchical Accounting Groups with partitionable slots in a consistent and correct manner. The results of that work are presented in this talk.
    
    Speaker: William Strecker-Kellogg (Brookhaven National Lab)
    
    Slides
  - 14:30
    
    Batch system status at the RAL Tier 1 30m
    
    The RAL Tier 1 maintains a batch farm with close to 10000 job slots that is used by all the LHC VOs as well as a number of smaller users. We have increasingly found that our existing batch system is unable to cope with the demands placed on it by our users. During the past year work has been carried out evaluating alternative technologies to our existing Torque/Maui batch system and preparing a new system for use in production. This talk will discuss why we decided to migrate to a new batch system, the selection process, testing carried out, and our experiences with the new batch system so far.
    
    Speaker: Andrew David Lahiff (STFC - Science & Technology Facilities Council (GB))
    
    Slides
  - 15:00
    
    Grid Engine: One Roadmap 30m
    
    The presentation will cover the status and future of Grid Engine. Specifically a summary of what Univa offers Grid Engine users, what we have done since SGE 6.2.u5, and what is included in our soon to be released 8.2.0 version. Additionally several case studies of Grid Engine users from different industries will be presented to show the value Univa provides our users.
    
    Speaker: Cameron Brunner (U)
    
    Slides
- 15:30 → 16:00
  
  Coffee break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 16:00 → 17:30
  Security and networking 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Dave Kelsey (STFC - Science & Technology Facilities Council (GB)), Dr Shawn Mc Kee (University of Michigan (US))
  - 16:00
    
    Mobility at CERN 30m
    
    With the dramatic increase of wireless-capable devices, Wi-Fi connectivity has become an essential network service on a par with the traditional cabled network. The evolution of the CERN Wi-Fi infrastructure will be presented, including the BYOD strategy and the integration of eduroam. The deployment considerations for large conference rooms and underground facilities will also be addressed.
    
    Speaker: Sebastien Ceuterickx (CERN)
    
    Slides
  - 16:30
    
    Latest changes on CERN networks 30m
    
    The latest changes on the CERN network infrastructure will be presented. This includes the deployment of IPv6, the network connectivity for the Data Centre extension at Wigner and the upgrade of the network infrastructure for Business Continuity. The second part of the talk will give an overview of the implementation of a new, safety-related wireless network (TETRA).
    
    Speaker: Sebastien Ceuterickx (CERN)
    
    Slides
  - 17:00
    
    Deploying perfSONAR-PS for WLCG: An Overview 30m
    
    The WLCG infrastructure has evolved from its original restrictive network topology, based on the MONARC model, to a more interconnected system, where data movement between regions or countries does not necessarily need to involve T1 centers. While this evolution brought obvious advantages, especially in terms of flexibility for the LHC experiment’s data management systems, it also raises the question of how to monitor and troubleshoot the increasing number of possible network paths, in order to provide a global, reliable network service. The perfSONAR network monitoring system (specifically the perfSONAR-PS implementation) has been evaluated and agreed as a proper solution to cover the WLCG network monitoring use cases: it allows WLCG to plan and execute latency and bandwidth tests between any instrumented endpoint through a central scheduling configuration, it allows archiving of the metrics in a local database, it provides a programmatic and a web based interface exposing the tests results; it also provides a graphical interface for remote management operations. In this presentation we will discuss our activity to deploy a perfSONAR-PS based network monitoring infrastructure, in the scope of the WLCG Operations Coordination initiative: we will motivate the main choices we agreed in terms of configuration and management, describe the additional tools we developed to complement the standard packages and present the status of the deployment, together with the possible future evolution.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    Slides
- 17:30 → 19:30
  
  HEPiX Board Meeting 348 West Hall
  
  348 West Hall
  
  University Of Michigan
  
  Board meeting. VC capabilities available
  
  Conveners: Dr Helge Meinhard (CERN), Sandy Philpott (JLAB)
Wednesday 30 October
- 09:00 → 10:30
  Security and networking 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Convener: Dave Kelsey (STFC - Science & Technology Facilities Council (GB))
  - 09:00
    
    The HEPiX IPv6 Working Group 30m
    
    An update on the activities of the group in IPv6 testing and planning since the Bologna meeting.
    
    Speaker: Dave Kelsey (STFC - Science & Technology Facilities Council (GB))
    
    Slides
  - 09:30
    
    Security update 30m
    
    This presentation provides an update of the security landscape since the last meeting. It describes the main vectors of compromises in the academic community and presents interesting recent attacks. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing, for example identity federation and virtualisation.
    
    Speaker: Mr Romain Wartel (CERN)
    
    Slides
  - 10:00
    
    Identity Management in Future Scientific Collaborations 30m
    
    Scientific collaborations are evolving to a model where large, multi-dimensional data sets are analyzed in whole or in part by relatively small groups of researchers. These groups are often without the expertise and/or resources to develop and maintain a sophisticated IT infrastructure and represent the growing "long tail of science". The presentation will discuss the evolving structures and new approaches, policies, and services needed to lower the barriers to providing Identity Management to future collaborations.
    
    Speaker: Bob Cowles (Indiana University / CACR)
    
    Slides
- 10:30 → 11:00
  
  Coffee break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 11:00 → 12:30
  Security and networking 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Dave Kelsey (STFC - Science & Technology Facilities Council (GB)), Dr Shawn Mc Kee (University of Michigan (US))
  - 11:00
    
    Federated Identity Management for HEP 30m
    
    There is much activity in the area of identity management for research communities. This talk will present the current status of this work and explore possible future options for WLCG and HEP more generally.
    
    Speaker: Dave Kelsey (STFC - Science & Technology Facilities Council (GB))
    
    Slides
  - 11:30
    
    Evolution of the OSG Authentication Model 30m
    
    The Open Science Grid (OSG) has undergone some changes in its authentication model for user job submission. This talk will outline changes already implemented as well as future plans as they stand now, together with the use cases that motivated them.
    
    Speaker: kevin hill
    
    Slides
  - 12:00
    
    Technical security tips and techniques 30m
    
    This is an interactive technical session aimed at system administrators, and presenting several Linux tools and tips that can be used to both drastically reduce the chance of root compromise, as well as increase the amount information available during forensics. (This session is not necessarily meant to be a general presentation for the audience and can also be a dedicated BoF.)
    
    Speaker: Mr Romain Wartel (CERN)
    
    Slides
- 12:30 → 14:00
  
  Lunch 1h 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 14:00 → 15:30
  Basic IT services 340 West Hall (University of Michigan)
  
  340 West Hall
  
  University of Michigan
  
  Convener: Dr Helge Meinhard (CERN)
  - 14:00
    
    Using control system tools for operation and debugging 30m
    
    When investigating a problem, one typically needs to gather and correlate information from disparate sources: operating system, batch system, data transfer tools, and so on. We investigate the use of Control System Studio to gather information from multiple places. When the appropriate hooks are created, this should allow the end user to create ad-hoc ways to mix and match data without the need of writing programs or web pages.
    
    Speaker: Mr Gabriele Carcassi (Brookhaven National Laboratory (US))
    
    Slides
  - 14:30
    
    Logstash and ElasticSearch deployment scenario at GSI 30m
    
    This talk will present some use cases for Logstash and ElasticSearch. In particular we will show how we are using these tools to collect, parse, index and analyze logs of different services: Grid Engine, SSH, Apache and Cisco Firewalls, among the others. Moreover, the introduction of the version three of Kibana, as an HTML plus Javascript interface, has improved the analytics capabilities of ElasticSearch.
    
    Speaker: Matteo Dessalvi (GSI)
    
    Slides
  - 15:00
    
    The migration from ELFms/Quattor to Agile Infrastructure 30m
    
    The life cycle of the ELFSms (extremely large fabric management system) is reaching is end. This set of tools provided to manage machines in the CERN Computer Centre has reached the end-of-life and a new Configuration Management System is going to take its place. The new Configuration Management System is going to change drastically the way we manage machines in the CERN Computer Centre and a migration is necessary to be implemented. To ensure a smooth migration the CERN Configuration Management System Team has been doing several activities to anticipate the switch off of the ELFms. In this talk we will introduce the new Configuration Management System and its principal components. We will compare it with ELFSms/Quattor system and provide inner details how the migration from ELFSms/Quattor is being implemented. The details will include of what have been done so far and what are the plans for the future.
    
    Speaker: Vitor Emanuel Gomes Gouveia (CERN)
    
    Slides
- 15:30 → 16:00
  
  Coffee break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 16:00 → 17:25
  Basic IT services 1324 East Hall (University of Michigan)
  
  1324 East Hall
  
  University of Michigan
  
  Convener: Dr Helge Meinhard (CERN)
  - 16:00
    
    Puppet at Fermilab- Managing a Large Heterogeneous Environment 25m
    
    Puppet has been in use by the Fermilab Experiments Facilities department to support computing for a variety of experiments for the last several years. This presentation will discuss our experience deploying, refining, and upgrading Puppet to scale to thousands of systems, across different experiments, servers, batch nodes, and workstations. This presentation will describe our efforts to build an understandable and maintainable Puppet repository, that is flexible and usable by multiple administrators.
    
    Speakers: Edward Simmonds (Fermilab), Tyler Parsons (Fermilab)
  - 16:25
    
    Puppet at USCMS-T1 and FermiLab 25m
    
    Over the past year, the USCMS-T1 project has decided to jump head-first into Puppet as our primary configuration management tool. We would like to talk about what's worked, what hasn't worked, and how we've been able to work with the other Fermilab teams to share our experiences without necessarily sharing a code base.
    
    Speaker: Timothy Michael Skirvin (F)
    
    Slides
  - 16:50
    
    Unified Communications, the new IP telephony @ CERN 25m
    
    The CERN Telephone service is moving towards unified communications. New IP Phone devices connected to Lync IP Phone Service enhance the classic telephony by adding many features like IM, presence, voice mailbox, call delegation, etc. The complete integration with Exchange allows the mailbox to be used as call log history, voice mailbox, etc. In addition, connecting the IP Phone to your computer via USB makes the range of possibilities wider by profiting from the integration of the device with the operating system: click to call from web pages or emails, etc. The service is also available from anywhere with software clients on many platforms (desktops & laptops, Smartphones) allowing calls as if you were in your office, conference call meetings, instant setup in any meeting room, etc. What about the future plans? Lync 2013 and its integration with Skype will increase the already big possibilities that this solution offers.
    
    Speaker: Fernando Moreno Pascual (CERN)
    
    Slides
- 17:45 → 22:45
  
  HEPiX Dinner 5h
  
  Dinner at the Henry Ford. Buses begin departing from Church St at 17:45. Buses begin returning from the Henry Ford at 10 PM (35-40 minute ride)
Thursday 31 October
- 09:00 → 10:45
  Storage and file systems 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
  - 09:00
    
    Why specify and monitor hard disk drive workloads 45m
    
    The reliability of hard disk drives (HDD) has been quantified historically by a mean time to failure (MTTF), or an annualized failure rate (AFR), defined at a specified operating temperature, and an assumed functional duty cycle. We provide justification for replacing the ambiguous concept of duty cycle with the readily quantifiable “workload”, which is defined as the total amount of data read from or written to the drive per unit time. This relatively subtle change leads to the conclusion that MTTF alone is insufficient to describe the field reliability of HDDs. This results from the fact that HDD failure rates are more tightly coupled to the total amount of data transferred rather than the total power-on-time. It immediately follows that a metric based on Mean Petabytes to Failure (MPbF) is most appropriate to quantify the intrinsic reliability of HDDs. Until MPbF is accepted as the critical measure of product quality, however, WD will specify both the MTTF and the maximum workload at which the product meets the MTTF requirement to unambiguously define the HDD reliability. Furthermore, WD will offer a new drive feature, the Drive Workload Monitor (DWM), that will facilitate retrieval of the total data transferred at any point of the HDD lifetime.
    
    Speaker: Dr Amit Chattopadhyay (Western Digital Corporation)
    
    Slides
  - 09:45
    
    HEPiX Bit Preservation Working Group 30m
    
    The goal of the HEPiX Bit Preservation Working Group is to share ideas, practices and experience on bit stream preservation activities across sites providing long-term and large-scale archive services. Different aspects should be covered like: technology used for long-term archiving, definition of reliability, mitigation of data loss risks, monitoring/verification of the archive contents, procedures for recovering unavailable and/or lost data, procedures for archive migration to new-generation technology. The Working Group is producing a survey on existing practices across HEPiX and WLCG sites responsible for large-scale long-term archiving, and will present its initial findings at the HEPiX Fall'2013 workshop.
    
    Speaker: German Cancio Melia (CERN)
    
    Slides
  - 10:15
    
    USCMS T1 and LPC Data Storage Challenges and Solutions 30m
    
    The CMS T1 facility at Fermilab manages many tens of petabytes of data for CMS. This talk will present some historical information on the solutions used to store this data as well as information on the new solutions we are in the process of implementing and how we got to where we are now.
    
    Speaker: Lisa Ann Giacchetti (Fermi National Accelerator Lab. (US))
    
    Slides
- 10:45 → 11:15
  
  Coffee break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 11:15 → 12:30
  Storage and file systems 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
  - 11:15
    
    OpenAFS and IPv6: Follow-up on the Bologna discussions / HEPiX survey 15m
    
    This will be a follow-up of the discussions about OpenAFS and IPv6 we had in Bologna, in particular summarizing input from potential developers on timelines/prices/development models, as well as conclusions from the survey conducted to understand the needs of the HEPiX community regarding the lack of IPv6 in OpenAFS.
    
    Speaker: Dr Arne Wiebalck (CERN)
    
    Slides
  - 11:30
    
    OpenAFS Status Report 30m
    
    A status report on OpenAFS with a focus on: * 2013 Security Vulnerabilities . OPENAFS-SA-2013-001 Buffer overflows in OpenAFS fileserver . OPENAFS-SA-2013-002 Buffer overflow in OpenAFS ptserver . OPENAFS-SA-2013-003 Brute force DES attack permits compromise of AFS cell . OPENAFS-SA-2013-004 vos -encrypt doesn't encrypt connection data * latest OS platform support including Windows 8.1, OSX Maverick, and Linux kernels * Foundation status
    
    Speaker: Derrick Brashear (Y)
    
    Slides
  - 12:00
    
    Introducing YFS 1.0 30m
    
    YFS is a Software Defined Storage solution for secure private, public and hybrid cloud storage deployments. YFS 1.0 clients and servers are dual protocol stack providing next generation file system capabilities while maintaining backward compatibility with IBM AFS 3.6 and OpenAFS clients and servers. This talk will highlight the enhanced capabilities of YFS 1.0 vs OpenAFS 1.6.5 including the new high performance IPv6-capable RX implementation, security improvements, improved Windows / OSX application compatibility, Year 2038 safety, and other protocol improvements.
    
    Speakers: Derrick Brashear (Y), Jeffrey Altman (Your File System Inc.)
    
    Slides
- 12:30 → 14:00
  
  Lunch 1h 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 14:00 → 15:00
  Storage and file systems 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
  - 14:00
    
    Solving Small Files Problem in Enstore 30m
    
    Enstore is a tape based Mass Storage System originally designed for Run II Tevatron experiments at FNAL (CDF, D0). Over the years it has proven to be reliable and scalable data archival and delivery solution, which meets diverse requirements of variety of applications including US CMS Tier 1, High Performance Computing, Intensity Frontier experiments as well as data backups. Data intensive experiments like CDF, D0 and US CMS Tier 1 generally produce huge amount of data stored in files with the average size of few Gigabytes, which is optimal for writing and reading data to/from tape. In contrast, much of the data produced by Intensity Frontier experiments, Lattice QCD and Cosmology is sparse, resulting in accumulation of large amounts of small files. Reliably storing small files on tape is inefficient due to file marks writing which takes significant amount of the overall file writing time (few seconds). There are several ways of improving data write rates, but some of them are unreliable, some are specific to the type of tape drive and still do not provide transfer rates adequate to rates offered by tape drives (20% of the drives potential rate). In order to provide good rates for small files in a transparent and consistent manner, the Small File Aggregation (SFA) feature has been developed to provide aggregation of files into containers which are subsequently written to tape. The file aggregation uses reliable internal Enstore disk buffer. File grouping is based on policies using file metadata and other user defined steering parameters. If a small file, which is a part of a container, is requested for read, the whole container is staged into internal Enstore read cache thus providing a read ahead mechanism in anticipation of future read requests for files from the same container. SFA is provided as service implementing file aggregation and staging transparently to user. The SFA is has been successfully used since April 2012 by several experiments. Currently we are preparing to scale up write/read SFA cache. This paper describes Enstore Small Files Aggregation feature and discusses how it can be scaled in size and transfer rates.
    
    Speaker: Dr Alexander Moibenko (Fermi NAtiona Accelerator Laboratoy)
    
    Slides
  - 14:30
    
    dCache news 30m
    
    This presentation is intended to bring HEP storage administrators up to speed with ongoing dCache developments and activities. In the context of WLCG, we will report on our collaboration with the xRootd folks in terms of federated storage and monitoring, our efforts to support the strict separation of CMS between disk and tape storage endpoints, and we hope to have the first results on direct NFS 4.1/pNFS access of WLCG GRID jobs to the dCache storage elements at DESY. On a related topic we will briefly describe a new module, allowing dCache to handle files, being too small to be exchanged with a tertiary storage system individually. In general terms we will elaborate on interesting developments in dCache, triggered by our involvement in the German "Large Scale Data Management and Analysis, LSDMA" project. This includes, but is not limited to integrating dCache into federated identity infrstructures and, in collaboration with students of the HTW Berlin, providing standard cloud interfaces. Last but not least we will give some insight into the current and future funding structure of the dCache collaboration; release policy and channels and our first attempt to provide a concrete cloud storage service.
    
    Speaker: Dr Patrick Fuhrmann (DESY)
    
    Slides
- 15:00 → 15:30
  
  Coffee break 30m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 15:30 → 16:20
  Storage and file systems 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
  - 15:30
    
    Building an organic block storage service at CERN with Ceph 25m
    
    This is a report on CERN IT's 3 PetaByte Ceph pre-production cluster set up in the past couple of months which will initially serve as a storage backend for OpenStack images and volumes (backends for AFS or NFS servers are options we will explore as well). In addition to a discussion of the architecture and configuration of the cluster, we will present results of our functionality tests, some performance numbers as well as best practices learned.
    
    Speaker: Dr Arne Wiebalck (CERN)
    
    Slides
  - 15:55
    
    Experiences with Ceph at the ATLAS Midwest Tier 2 25m
    
    This talk will cover our deployment of Ceph, a highly-scalable next-generation distributed filesystem, at Midwest Tier 2 to back-end various projects. We'll talk about our experience deploying Ceph, performance benchmarks, and some thoughts about where we would like to go next.
    
    Speaker: Lincoln Bryant (University of Chicago (US))
    
    Slides
- 16:20 → 17:15
  IT End User Services 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Convener: Sandy Philpott (JLAB)
  - 16:20
    
    Building RPM packages at CERN with Koji 25m
    
    This talk will show how we used Koji, to give the different IT teams flexibility. It will also cover the building of Redhat packages for Scientific Linux Cern and other Redhat addons. Finally, we will state the limitations of Koji and some workaround we found.
    
    Speaker: Thomas Oulevey (CERN)
    
    Slides
  - 16:45
    
    Scientific Linux current status update 25m
    
    This presentation will provide an update on the current status of Scientific Linux, descriptions for some possible future goals, and allow a chance for users to provide feedback on its direction.
    
    Speaker: Pat Riehecky (Fermilab)
    
    Slides
Friday 1 November
- 09:00 → 10:15
  Grids, clouds, virtualisation 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
  
  Conveners: Ian Collier (UK Tier1 Centre), Dr Keith Chadwick (Fermilab)
  - 09:00
    
    CERN Cloud Status 25m
    
    This talk will provide an update on the CERN private infrastructure-as-a-service cloud which is now in production based on OpenStack Grizzly. Along with the current status, we will give plans for the next developments and evolution of the service in areas such as block storage using ceph and Netapp, Kerberos and X.509 support and scaling to multiple cells across CERN's two data centres.
    
    Speaker: Thomas Oulevey (CERN)
    
    Slides
  - 09:25
    
    IN2P3-CC IAAS cloud status 25m
    
    The talk presents some past and ongoing testings made around IAAS cloud technologies at IN2P3-CC. Openstack has been deployed for some years now, involved in a variety of projects. The presentation tackles: - Openstack: the CMP of choice - use cases with test and production services, computing, community cloud - implemented features, hardware and services - impacts on the datacenter - how to implement high availability with Openstack - Openstack experience (pros/cons) - what's next ?
    
    Speaker: Mattieu Puel (CNRS)
    
    Slides
  - 09:50
    
    FermiCloud update - enabling scientific workflows in the cloud 25m
    
    In 2010, Fermilab initiated the FermiCloud project to deliver a dynamic and scalable Infrastructure-as-a-Service (IaaS) capability using open source cloud computing frameworks to support the needs of the Fermilab scientific communities. A collaboration of personnel from Fermilab and the Korea Institute of Science and Technology Information (KISTI) has focused significant work over the past 18 months to deliver improvements to the applicability and robustness of FermiCloud, together with specific accomplishments with respect to direct and indirect support of science. The opportunities, challenges and successes of cloud computing at Fermilab will be presented, including: * GridBursting to FermiCloud: extending the grid through the cloud * Idle VM suspension * FermiCloud interoperability
    
    Speaker: Gerard Bernabeu Altayo (F)
    
    Slides
- 10:15 → 10:30
  
  Coffee break 15m 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  1085 S University Ave, Ann Arbor, MI 48109 US
- 10:30 → 12:10
  Grids, clouds, virtualisation 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  Conveners: Ian Collier (UK Tier1 Centre), Dr Keith Chadwick (Fermilab)
  - 10:30
    
    PanDA Beyond ATLAS: Workload Management for Data Intensive Science 25m
    
    The PanDA Production ANd Distributed Analysis system has been developed by ATLAS to meet the experiment's requirements for a data-driven workload management system for production and distributed analysis processing capable of operating at LHC data processing scale. After 7 years of impressively successful PanDA operation in ATLAS there are also other experiments which can benefit from PanDA in the Big Data challenge, with several at various stages of evaluation and adoption. The new project "Next Generation Workload Management and Analysis System for Big Data" is extending PanDA to meet the needs of other data intensive scientific applications in HEP, astro-particle and astrophysics communities, bio-informatics and other fields as a general solution to large scale workload management. PanDA can utilize dedicated or opportunistic computing resources such as grids, clouds, and High Performance Computing facilities, and is being extended to leverage next generation intelligent networks in automated workflow management and brokerage. This presentation will provide an overview, the current status and future plans of the Big PanDA project.
    
    Speaker: Jaroslava Schovancova (Brookhaven National Laboratory (US))
    
    Slides
  - 10:55
    
    Experience with dynamically provisioned worker nodes at the RAL Tier 1 25m
    
    Even with the growing interest in cloud computing, grid-based submission to traditional batch systems is still the primary way for the experiments to run jobs at WLCG sites. Integrating a batch system with virtualised worker nodes on a cloud potentially offers sites many benefits. At RAL we have recently investigated making opportunistic use of a private StratusLab cloud when it has unused resources and there are idle jobs in the batch system. Our ability to do this is greatly simplified due to our migration of the batch system to HTCondor, currently in progress. Here we describe the work that has been done so far, present preliminary results and discuss some of the issues raised by the testing, including virtualisation overheads, fairshares, virtual machine lifetimes, and monitoring requirements for dynamic environments.
    
    Speaker: Andrew David Lahiff (STFC - Science & Technology Facilities Council (GB))
    
    Slides
  - 11:20
    
    CernVM-FS - Beyond LHC Computing 25m
    
    In the last three years the CernVM Filesystem (CernVM-FS) has transformed the distribution of experiment software to WLCG grid sites. CernVM-FS removes the need for local installations jobs and performant software at sites, in addition it often improves performance at the same time. Furthermore the use of CernVM-FS standardizes the computing environment across the grid and removes the need for software tagging at sites. Now established and proven to work at scale, CernVM-FS is beginning to perform a similar role for non-LHC computing. We discuss the deployment of a non-LHC Stratum 0 'master' CernVM-FS repository at the RAL Tier 1 and the development of a network of Stratum 1 replicas somewhat modeled upon the infrastructure developed to support WLCG computing.
    
    Speaker: Ian Collier (UK Tier1 Centre)
    
    Slides
  - 11:45
    
    OpenShift on your own cloud 25m
    
    OpenShift has three offerings, Origin, Online, and Enterprise. Now you can enjoy the benefits of Paas in the public cloud, or on your own cloud. I will be showing OpenShift Origin, setup locally. What features does it have for both admin and user. How will that help both labs and experiments.
    
    Speaker: Mr Troy Dawson (Red Hat)
    
    Slides
- 12:10 → 12:30
  Miscellaneous 340 West Hall
  
  340 West Hall
  
  University Of Michigan
  
  Conveners: Dr Helge Meinhard (CERN), Sandy Philpott (JLAB)
  - 12:10
    
    Workshop wrap-up 15m
    
    Speaker: Dr Helge Meinhard (CERN)
    
    Slides

Choose timezone

HEPiX Fall 2013 Workshop

340 West Hall

University Of Michigan

HEPiX Fall 2013 at University of Michigan, Ann Arbor

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

348 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University of Michigan

340 West Hall

University Of Michigan

1324 East Hall

University of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall

University Of Michigan

340 West Hall