HEPiX Spring 2014 Workshop

Europe/Paris
Auditorium Marcel Vivargent (LAPP)

Auditorium Marcel Vivargent

LAPP

9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
Frederique Chollet (Centre National de la Recherche Scientifique (FR)), Helge Meinhard (CERN), Sandy Philpott (JLAB)
Description

HEPiX Spring 2014 at LAPP, Annecy-le-Vieux, France

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.
Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, IRFU, JLAB, NIKHEF, PIC, RAL, SLAC, TRIUMF and many others.

 

Our sponsors 


 

 

notes
    • 8:30 AM 9:00 AM
      Registration 30m Indoor patio (ground floor)

      Indoor patio (ground floor)

      LAPP

    • 9:00 AM 9:30 AM
      Miscellaneous Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      • 9:00 AM
        LAPP welcome 20m
        Welcome
        Speaker: Nadine Neyroud (LAPP/CNRS)
        Slides
      • 9:20 AM
        Workshop logistics 10m
        Logistics
        Speaker: Frederique Chollet (Centre National de la Recherche Scientifique (FR))
        Slides
    • 9:30 AM 10:30 AM
      Site reports Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 9:30 AM
        PIC Site Report 15m
        Spring 2014 PIC Tier 1 site report covering recent updates, improving energy efficiency by means of free-coooling tehcniques, and preparation for new challenges.
        Speaker: Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
        Slides
      • 9:45 AM
        Nebraska Site Report 15m
        The Holland Computing Center at the University of Nebraska-Lincoln hosts the state's research computing resources. There are several grid-enabled clusters available to HEP for opportunistic computing and a CMS Tier-2 site. In this presentation, we will cover the recent updates to site networking and the CMS Tier-2 cluster. Particular attention will be paid to: 1. The recent rollout of IPv6 to the production services. 2. Software upgrades, including the OSG software stack and planning for RHEL7. 3. Progress on the site's 100Gbps upgrade. We will also give context in how the Nebraska Tier-2 updates fit in to the broader USCMS activities
        Speaker: Brian Paul Bockelman (University of Nebraska (US))
        Slides
      • 10:00 AM
        IHEP Site Report 15m
        The site report will give a summary of IHEP site status including local cluster, EGI site. Besides, it will talk about improvement of distribution computing we have done.
        Speaker: Jingyan Shi (IHEP)
        Slides
      • 10:15 AM
        GRIF and LAL Site Report 15m
        This site report will cover GRIF grid site and LAL internal computing
        Speaker: Mr Michel Jouvin (Universite de Paris-Sud 11 (FR))
        Slides
    • 10:30 AM 11:00 AM
      Coffee break 30m
    • 11:00 AM 12:00 PM
      Site reports Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 11:00 AM
        KIT SIte Report 20m
        KIT Site Report
        Speaker: Manfred Alef (Karlsruhe Institute of Technology (KIT))
        Slides
      • 11:20 AM
        Fermilab Site Report - Spring 2014 HEPiX 20m
        Fermilab Site Report - Spring 2014 HEPiX.
        Speaker: Dr Keith Chadwick (Fermilab)
        Paper
        Slides
      • 11:40 AM
        BNL RACF Site Report 20m
        A summary of developments at BNL's RHIC/ATLAS Computing Facility since the last HEPiX meeting.
        Speaker: Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY)
        Slides
    • 12:00 PM 1:30 PM
      Lunch 1h 30m 'Tom Morel' Cafeteria

      'Tom Morel' Cafeteria

    • 1:30 PM 3:40 PM
      IT end user services and operating systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Sandy Philpott (JLAB), connie sieh (Fermilab)
      • 1:30 PM
        10 Years of Scientific Linux 25m
        SL was announced to the world at the Spring 2004 HEPiX meeting in Edinburgh so it seems a good moment to review its origins and how it became the preferred Linux of most HEP sites.
        Speaker: Mr Alan Silverman (CERN (retired))
        Slides
      • 1:55 PM
        CentOS and Red Hat 25m
        Speaker: Karanbir Singh
      • 2:20 PM
        Scientific Linux Status and Futures 25m
        Status of Scientific Linux and Futures
        Speaker: Connie Sieh (Fermilab)
        Slides
      • 2:45 PM
        Next Linux version at CERN 25m
        CERN is maintaining and deploying Scientific Linux CERN since 2004. In January 2014 CentOS and Red Hat announced joining forces in order to provide common platform for open source community project needs. How is this merger affecting plans for future CERN Linux version ?
        Speaker: Jarek Polok (CERN)
        Slides
      • 3:10 PM
        Discussion about future OS for HEP 30m
    • 3:40 PM 4:10 PM
      Coffee break 30m Indoor patio (ground floor)

      Indoor patio (ground floor)

      LAPP

    • 4:10 PM 5:00 PM
      IT end user services and operating systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Sandy Philpott (JLAB), connie sieh (Fermilab)
      • 4:10 PM
        Issue Tracking & Version Control Services status update 25m
        The current efforts on the issue tracking and version control services at CERN will be presented. Special attention to the new central git service, the integration between issue tracking and version control and future service deployments.
        Speaker: Alvaro Gonzalez Alvarez (CERN)
        Slides
      • 4:35 PM
        Windows 8 Integration 25m
        In this presentation we will talk about Windows 8 and how we are integrating it at CERN. What are the issues we met and how we solved them for our users community. We will focus on issues, customization and deployment. If you have to start a Windows 8 pilot project in your organization, you must be there.
        Speaker: Sebastien Dellabella (CERN)
    • 5:00 PM 5:30 PM
      Site reports Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 5:00 PM
        ASGC site report 15m
        Site report for ASGC
        Speaker: Hung-Te Lee (Academia Sinica (TW))
        Slides
      • 5:15 PM
        QMUL site report 15m
        Site report from Queen Mary University of London.
        Speaker: Christopher John Walker (University of London (GB))
        Slides
    • 6:00 PM 6:10 PM
      Buses depart from LAPP at 18:00 - LAPP - BONLIEU (Annecy Centre) - QUAI DE LA TOURNETTE 10m Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
    • 7:00 PM 8:30 PM
      Boat cruise & Welcome drink 1h 30m aboard the boat "Le Cygne"

      aboard the boat "Le Cygne"

    • 9:00 AM 10:30 AM
      Site reports Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 9:00 AM
        RAL Site Report 20m
        Update for UK Tier1 and RAL
        Speaker: Martin Bly (STFC-RAL)
        Slides
      • 9:20 AM
        INFN-T1 site report 20m
        INFN-T1 site update
        Speaker: Andrea Chierici (INFN-CNAF)
        Slides
      • 9:40 AM
        DESY Site Report 20m
        DESY site report for Spring 2014 HEPiX workshop
        Speaker: Yves Kemp (Deutsches Elektronen-Synchrotron (DE))
        Slides
      • 10:00 AM
        Status report from Tokyo Tier-2 for the one year operation after whole scale system upgrade 15m
        The Tokyo Tier-2, which is located at International Center for Elementary Particle Physics (ICEPP) in the University of Tokyo, was established as a regional analysis center in Japan for the ATLAS experiment. The official operation with WLCG was started in 2007 after the several years’ development since 2002. In December 2012, we have replaced almost all hardware as the third system upgrade to deal with analysis for further growing data of the ATLAS experiment. The number of CPU cores are increased by a factor of two from the previous system (9984 cores including CPUs for service instance), and the performance of individual CPU core is improved by 20 % according to the HEPSPEC06 benchmark test at 32bit compile mode. The score is estimated as 18.03 per core under Scientific Linux 6 by using Intel Xeon E5-2680 2.70 GHz. As of February 2013, 2560 CPU cores and 2.00 PB of disk storage have already been deployed for ATLAS. They have been operated stably with 95% availability in one year operation after the system upgrade. Since the number of CPU cores in the new worker node was increased from 8 cores to 16 cores, the local I/O performance for the data staging area might become a possible bottleneck for the job throughput. We have evaluated the performance by making a special worker node, which have a SSD for the local storage, at the mixture situation of running real ATLAS production jobs and analysis jobs. In consequence, we could confirm that SAS-HDD attached with nominal worker nodes at Tokyo Tier-2 is not a bottleneck for the long batch type jobs at least for the situation of 16 jobs running concurrently in one node. In this report, we would like to introduce several results of the evaluation of the local I/O performance with some experiences on the site operation.
        Speaker: Tomoaki Nakamura (University of Tokyo (JP))
        Slides
      • 10:15 AM
        IRFU Saclay site report 15m
        What is new in the IRFU Saclay site?
        Speaker: Pierrick Micout (Unknown)
        Slides
    • 10:30 AM 10:45 AM
      Group photo 15m Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
    • 10:45 AM 11:05 AM
      Coffee break 20m
    • 11:05 AM 12:10 PM
      Site reports Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 11:05 AM
        CERN Site Report 20m
        News from CERN since the last workshop.
        Speaker: Dr Arne Wiebalck (CERN)
        Slides
      • 11:25 AM
        WIGNER Datacenter - Introduction 15m
        As newcomers to the HEPIX community, WIGNER Datacenter, the newly established scientific computing facility of the WIGNER Research Centre for Physics in Budapest, would like to give an introduction on its background, construction and mission and model of operation. Featuring a long-term sustainable, energy-efficient and high availability infrastructure, WIGNER Datacenter aims to provide a full range of computing services (including hosting, cluster and cloud based resources) to the scientific community.
        Speaker: Szabolcs Hernath (Hungarian Academy of Sciences (HU))
        Slides
      • 11:40 AM
        Jefferson Lab Site Report 15m
        The JLab talk will cover our current high performance and experimental physics computing status, including node-sharing between clusters for the 12GeV data challenges, Puppet configuration management plans, our latest GPU and MIC environment, workflow tools, LTO6 integration into the mass storage system, initial results of XFS on Linux testing, and plans for a Lustre 2.5 update and LMDS reconfiguration.
        Speaker: Sandy Philpott (JLAB)
        Slides
      • 11:55 AM
        AGLT2 Site Update 15m
        I will present an update on our site since the last report and cover our work with dCache, perfSONAR-PS, VMWare and experience with Cobbler and CFengine3 as our node provisioning system. There will also be an overview of our recent networking changes including the status of our new 100G connection to the WAN. I conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.
        Speaker: Shawn Mc Kee (University of Michigan (US))
        Slides
    • 12:10 PM 12:35 PM
      Basic IT services Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 12:10 PM
        Managing secrets 25m
        The talk will discuss the problems which arise from managing and distributing secrets like root passwords, keytabs, certificates in a large site. Secrets are needed in the process of installing and administrating of compute and storage systems. They should be accessible by authorized admins and from the system they belong to. There should be a way to audit the information to enforce the policies from your security department. For example quality and lifetime of passwords. In the presentation we will describe the workflows at DESY/Hamburg and show systems we use today and their deficits. Then we will describe our upcoming solution, and threats we still see.
        Speaker: Sven Sternberger (D)
        Slides
    • 12:35 PM 2:00 PM
      Lunch 1h 25m 'Tom Morel' Cafeteria

      'Tom Morel' Cafeteria

    • 2:00 PM 3:40 PM
      Computing and batch systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Gilles Mathieu (CNRS), Michele Michelotto (Universita e INFN (IT)), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 2:00 PM
        Batch System Review 25m
        WLCG GDB organized a meeting last March about batch systems. With an audience mostly from grid sites, it has been a successful review of the main batch systems used in the community by sites with concrete experience. This presentation will summarize what was presented and the main conclusions of this meeting.
        Speaker: Mr Michel Jouvin (Universite de Paris-Sud 11 (FR))
        Slides
      • 2:25 PM
        A Year of Condor at the RAL Tier 1 25m
        It’s been almost a year since we first started running ATLAS and CMS production jobs at RAL using HTCondor, and 6 months since we fully migrated from Torque/Maui. This talk will discuss our experience so far and future plans.
        Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 2:50 PM
        Future of Batch Processing at CERN 25m
        The CERN Batch System is comprised of 4000 worker nodes. 60 queues offer a service for various types of large user communities. In light of the recent developments driven by the Agile Infrastructure and the more demanding processing requirements, the Batch System will be faced with increasingly challenging scalability and flexibility needs. Last year the CERN Batch Team has started to evaluate three candidate batch systems: SLURM, HTCondor and GridEngine. This year as we are reaching a conclusion, one of our candidates is HTCondor. In this talk we give a short reminder of our requirements and our preliminary results from last year. Then we'll focus on HTCondor, our experience with it thus far, our testing framework and the results of our performance tests. Finally, we give a summary of the foreseeable challenges we would have to face if we decide to migrate the CERN Batch Service to Condor.
        Speaker: Janos Daniel Pek (CERN)
        Slides
      • 3:15 PM
        The Art of Running HTCondor as a batch system 25m
        HTCondor is a well known platform for distributed high-throughput computing and often resembles a the Swiss-Army-knife of computing - there's a bit of something for everyone. With a user manual weighing in at about 1,100 printed pages, there's no wonder that sysadmins can overlook some of the most exciting features. This presentation will be dedicated to uncovering the hidden gems for running HTCondor as a batch system - useful features that are well-hidden, under-appreciated, or very recently added. This broad overview will include topics in worker node resource management, scripting, monitoring, deployment, and debugging the system.
        Speaker: Brian Paul Bockelman (University of Nebraska (US))
        Slides
    • 3:40 PM 4:10 PM
      Coffee break 30m
    • 4:10 PM 5:25 PM
      Computing and batch systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Gilles Mathieu (CNRS), Michele Michelotto (Universita e INFN (IT)), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 4:10 PM
        Support for Linux Control Groups 25m
        Current Linux distributions including support for a new kernel enhancement called control groups (cgroups). This talk is about how Univa Grid Engine integrates the Linux cgroup subsystems for better resource isolation, utilization, and limitation in the job execution and resource allocation context. Example configurations and use cases for today's NUMA compute nodes are discussed.
        Speaker: Mr Daniel Gruber (Univa)
        Slides
      • 4:35 PM
        CC IN2P3 experience with Univa Grid Engine 25m
        After 20 years using a home made batch system named BQS (Batch Queuing System), CC-IN2P3 decided to move to Grid Engine in order to offer the scalability and robustness needed for multi-experiment production, HEP et non HEP. The site migrated from BQS to Oracle Grid Engine in 2011, then switched to Univa’s version after only two years, in June 2013. The talk presents the assessment of the change from Oracle to Univa and gives an overview of the configuration equilibrating user requirements and constraints of the site’s infrastructure, especially for multi-core jobs. Finally, plans for the deployment of new features are shown and requests to Univa are explained.
        Speaker: Suzanne Poulat (Centre de calcul IN2P3)
        Slides
      • 5:00 PM
        Scheduling of multicore jobs 25m
        Nowadays, the Worldwide LHC Computing Grid consists of multi and manycore CPUs. A lot of work is undertaken by the experiments and the HEP community, in order to use these resources more efficiently. As a result, the parallelization of applications has been the main goal so far in order to allow a parallel execution of jobs. However, experiments must also consider how to schedule multicore jobs within the Computing Grid. Taking into account the trend of going towards manycore architectures, tasks might not scale sufficiently well on large number of cores. Since non linear speedup can drastically decrease overall throughput, a scheduler must define the best degree of parallelism for each job. The aim of the presentation is to define the scheduling problem and to present algorithms to solve it. Related problems, like estimation of job runtimes, will be also discussed.
        Speaker: Nathalie Rauschmayr (CERN)
        Slides
    • 5:30 PM 7:30 PM
      HEPiX Board Meeting Salle des Sommets (3rd floor)

      Salle des Sommets (3rd floor)

      LAPP

      By invitation only

      Conveners: Dr Helge Meinhard (CERN), Sandy Philpott (JLAB)
    • 9:00 AM 10:40 AM
      Security and networking Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn Mc Kee (University of Michigan (US))
      • 9:00 AM
        IPv6 Deployment at FZU in Prague 25m
        At FZU we are continuing with deployment of IPv6 in our testbed as well as the production network. On dual stack, we are currently running several subclusters of worker nodes and our DPM storage system. Production data transfers from DPM to dualstack worker nodes using lcg-cp are currently running via IPv6. We present our experience with this deployment, new nagios sensors needed in this situation, results of our tests with IPv6 transfers using webdav, and several news from our IPv6 testbed.
        Speaker: Marek Elias (Institute of Physics ASCR (FZU))
        Slides
      • 9:25 AM
        IPv6 status and perfsonar testing in the UK 25m
        IPv6 rollout at UK sites varies from one site where nearly all services are dual stack (Imperial), to others without any IPv6 addresses. The current rollout status will be presented. In addition, results of IPv6 connectivity testing using perfsonar will be discussed.
        Speaker: Christopher John Walker (University of London (GB))
        Slides
      • 9:50 AM
        WLCG perfSONAR-PS Update 25m
        As reported at the last HEPiX meeting, the WLCG has been supporting the deployment of perfSONAR-PS Toolkit instances at all WLCG sites over the last year. The WLCG perfSONAR-PS Deployment Task Force has now wrapped up its work in April 2014. The perfSONAR network monitoring framework was evaluated and agreed as a proper solution to cover the WLCG network monitoring use cases: it allows WLCG to plan and execute latency and bandwidth tests between any instrumented endpoint through a central scheduling configuration, it allows archiving of the metrics in a local database, it provides a programmatic and a web based interface exposing the tests results; it also provides a graphical interface for remote management operations. In this presentation we will provide an update on the status of perfSONAR in WLCG and future plans for commissioning and maintaining perfSONAR in the scope of the WLCG Operations Coordination initiative and its role in supporting higher level services that are under development.
        Speaker: Shawn Mc Kee (University of Michigan (US))
        Slides
      • 10:15 AM
        Measuring WLCG data streams at batch job level 25m
        Batch system monitoring and related system monitoring tools allow tracking data streams at different levels. With the introduction of federated data access to the workflows of WLCG it is becoming increasingly important for data centers to understand specific data flows regarding storage element accesses, firewall configurations, or the scheduling of workflows themselves. For this purpose a proof of concept has been implemented at the GridKa Tier1 center for monitoring data streams of batch jobs. The approach aims for a direct integration into the existing batch system to enhance batch job statistics by adding continuous traffic profiles for WLCG jobs and pilots. The presentation will introduce the general concept of the developed tool and integration into the batch system as well as first results of measurements at GridKa.
        Speaker: Eileen Kuhn (KIT - Karlsruhe Institute of Technology (DE))
        Slides
    • 10:40 AM 11:10 AM
      Coffee break 30m
    • 11:10 AM 12:25 PM
      Security and networking Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Convener: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
      • 11:10 AM
        Emergency suspension list in WLCG 15m
        The Emergency suspension list (also known as central banning list) is finally getting deployed in WLCG, allowing quick automated responses to incidents. This short presentation will present the goal of this new features, the technology behind this system and details about the current deployment.
        Speaker: Vincent Brillault (CERN)
        Slides
      • 11:25 AM
        New Windows security at CEA and IRFU 25m
        In 2013/2014 the CEA has decided to change dramatically the security of the Windows PC and the way to manage them. I’ll explain the new philosophy of the security based on two levels: - Lateral security - Escalade security I’ll explain the problematic for the end-users and also for the IT team.
        Speaker: joel surget (CEA/Saclay)
      • 11:50 AM
        Security update 35m
        This presentation provides an update of the security landscape since the last meeting. It describes the main vectors of compromises in the academic community and presents interesting recent attacks. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing, for example identity federation and virtualisation.
        Speaker: Vincent Brillault (CERN)
        Slides
    • 12:30 PM 2:00 PM
      Lunch 1h 30m 'Tom Morel' Cafeteria

      'Tom Morel' Cafeteria

    • 2:00 PM 3:40 PM
      IT facilities and business continuity Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Dr Keith Chadwick (Fermilab), Wayne Salter (CERN)
      • 2:00 PM
        Open Compute at CERN 25m
        The Open Compute Project, OCP (http://www.opencompute.org/), was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at lowest possible cost. The technologies are released as open hardware design, with the goal to develop servers and data centers following the model traditionally associated with open source software projects. In order to try out the hardware we acquired two OCP twin servers (http://hyvesolutions.com/resources/docs/2013Hyve1500Datasheet.pdf) in 2013. The servers have been tested and compared with our production hardware. Some results from this testing will be presented as well as the future plans for a possible larger deployment.
        Speaker: Olof Barring (CERN)
        Slides
      • 2:25 PM
        Shared datacenter in Orsay University : first results 25m
        As presented at past HEPiX, 8 labs in Orsay region/university started 2 years ago a project to build a new datacenter aimed to replace the existing inefficient computing rooms. This project has been delivered on-time and is in production since last October. This presentation will summarize the needs that motivated the project, the design choices, the building phase experience and gives an early feedback after 6 months of operations. It will also present the future directions for this project and the other related initiatives.
        Speaker: Mr Michel Jouvin (Universite de Paris-Sud 11 (FR))
        Slides
      • 2:50 PM
        WIGNER Datacenter - Operational experience 25m
        In this talk we would like to give a summary on the experiences of the first year of opeartion of the WIGNER Datacenter. We will discuss the topics of infrastructure operations, facility management, energy efficiency and value added hosting services, with a special focus on the CERN@WIGNER project, the hosting of the external capacity of CERN Tier-0 resources. We will highlight some of the difficulties and pitfalls, along with insights and best practices we gathered during our fist year of operation.
        Speaker: Szabolcs Hernath (Hungarian Academy of Sciences (HU))
        Slides
        Video
      • 3:15 PM
        Lesson learned after our recent cooling problem 25m
        In march we had a major cooling problem in our computing center and we had to completely shut the center down. We learnt a lot from this problem and would like to share the experience within the community.
        Speaker: Andrea Chierici (INFN-CNAF)
        Slides
    • 3:40 PM 4:10 PM
      Coffee break 30m
    • 4:10 PM 4:35 PM
      IT facilities and business continuity Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Dr Keith Chadwick (Fermilab), Wayne Salter (CERN)
      • 4:10 PM
        Business Continuity at DESY 25m
        a collection of themes and thoughts on BC, covering among others measures, procedures and dependencies
        Speakers: Mr Peter van der Reest (DESY), Yves Kemp (DESY)
        Slides
    • 4:35 PM 5:50 PM
      Computing and batch systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Gilles Mathieu (CNRS), Michele Michelotto (Universita e INFN (IT)), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 4:35 PM
        Intel Ivybridge vs. AMD Opteron: performance and power implications 25m
        The RACF has evaluated the Intel Ivybridge and AMD Opteron cpu's before an anticipated purchase of Linux servers for its RHIC and USATLAS programs in 2014. Price performance considerations are no longer sufficient as we must consider long-term power, cooling and space capacities in the data center. This presentation describes how these long-term considerations are increasingly altering hardware acquisition cycles at BNL.
        Speaker: Dr Tony Wong (Brookhaven National Laboratory)
        Slides
      • 5:00 PM
        Beyond HS06 - Toward a New HEP CPU Benchmark 25m
        The HEPiX Benchmarking Working Group is preparing for the deployment of a successor of the widely used HS06 benchmark. - Why we are looking for a replacement of HS06 - Summary of discussions at GDB - Requirements - Benchmark candidates - Volunteers
        Speaker: Manfred Alef (Karlsruhe Institute of Technology (KIT))
        Slides
      • 5:25 PM
        Evaluation of avoton CPU 25m
        At INFN-T1 we are facing the problem of TCO of computing nodes, which count for the bigger part of our electricity bill. Intel recently introduced the Avoton SOC, targeted on the microserver, entry communication infrastructure and cloud storage market. We benchmarked this CPU and evaluated the possible adoption of this technology in our computing farm.
        Speaker: Andrea Chierici (INFN-CNAF)
        Slides
    • 7:00 PM 11:00 PM
      HEPiX dinner 4h Restaurant 'Moon', Plage de l'Impérial, Annecy

      Restaurant 'Moon', Plage de l'Impérial, Annecy

    • 9:00 AM 10:40 AM
      Storage and file systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
      • 9:00 AM
        Batch system data locality via managed caches 25m
        Modern data processing solutions increasingly rely on data locality to achieve high data access rates and scalability. In contrast the common HEP system architectures emphasis uniform resource pools with minimal locality, allowing even for cross-site data access. The concept for the new High Performance Data Analysis (HPDA) Tier3 at KIT aims at introducing data locality to HEP batch systems. Coordinating dedicated cache drives on worker nodes, existing storage hierarchies are extended into the active batch system. The presentation will illustrate the considerations of extending the classic batch architecture and showcase the planned software and hardware architecture of the HPDA T3.
        Speaker: Max Fischer (KIT - Karlsruhe Institute of Technology (DE))
        Slides
      • 9:25 AM
        Ceph at CERN: one year on 25m
        Ceph was introduced at CERN in early 2013 as a potential solution to new use-cases (e.g. cloud block storage) while also providing a path toward a consolidated storage backend for other services including AFS, NFS, etc... This talk will present the outcome of the past year of testing and production experience with Ceph. We will present our real operations experience and lessons-learned, and review the state of each of the object, block, and filesystem components of Ceph. Finally, we will present our plans moving forward, with a discussion about the potential and non-potential of Ceph as a backend for our physics data stores.
        Speaker: Dr Daniel van der Ster (CERN)
        Slides
      • 9:50 AM
        Ceph at the UK Tier 1 25m
        We are trialling the use of Ceph both as a file-system and as a cloud storage back end. I will present our experiences so far.
        Speaker: George Ryall (STFC)
        Slides
      • 10:15 AM
        Update on CERN tape status 25m
        CERN stores over 100PB of data on tape via CASTOR and TSM. This talk will present the current status of the CERN tape infrastructure, with a particular focus on tape performance and efficiency and the status of the large media repacking exercise.
        Speaker: German Cancio Melia (CERN)
        Slides
    • 10:40 AM 11:10 AM
      Coffee break 30m
    • 11:10 AM 12:00 PM
      Storage and file systems Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
      • 11:10 AM
        The DESY Big Data Cloud Service 25m
        DESY -IT- has implemented a cloud storage service on the basis of dCache. The talk will describe architecture and service concepts.
        Speaker: Mr Peter van der Reest (DESY)
        Slides
      • 11:35 AM
        Update on the bit-preservation Working Group 25m
        In this talk, we will provide an update of bit-level preservation WG activities, notable on the ongoing work on a set of recommendations and on a model for estimating long-term (10-20-30 years) archiving cost outlooks.
        Speaker: German Cancio Melia (CERN)
        Slides
    • 12:00 PM 12:25 PM
      Basic IT services Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 12:00 PM
        Lavoisier : a data aggregation framework 25m
        Many of us need tools for service monitoring adapted to our site specificities, or tools to do custom processing on user data. Regardless of the use-cases, we have to develop (or get developed) applications that aggregate, process and format data from heterogeneous data sources. Lavoisier (http://software.in2p3.fr/lavoisier) is a framework, which enables building such applications by assembling reusable software components (i.e. plugins). These applications can then be used through a RESTful web service API, a web interface or a command line interface with little effort. The Lavoisier framework is developed by CC-IN2P3 and used by several projects; the Operations Portal of the European Grid Infrastructure (EGI), the VAPOR portal, and some CC-IN2P3 internal tools. The presentation will give an overview of Lavoisier, and explain how it can help to easily get a maintainable, performant, robust and secure data aggregation application, while focusing on business code.
        Speaker: Mr Sylvain Reynaud (CNRS)
        Slides
    • 12:25 PM 1:55 PM
      Lunch 1h 30m 'Tom Morel' Cafeteria

      'Tom Morel' Cafeteria

    • 1:55 PM 2:00 PM
      Announcement 5m Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Slides
    • 2:00 PM 3:40 PM
      Basic IT services Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 2:00 PM
        Agile Infrastructure Monitoring 25m
        The agile infrastructure monitoring team is working on new solutions to modernise and improve how monitoring and analytics is done at CERN. We will give an update on these activities, in particular the recent progress on testing and adopting different open source technologies (e.g. hadoop, elasticsearch, flume, kibana) for the various monitoring architecture layers. We will report on the efforts to build a monitoring and analytics community with participants from different areas: service managers, users, security, management, etc. We will present concrete examples on how this community is using these different solutions to improve their daily activities.
        Speaker: Pedro Andrade (CERN)
        Slides
      • 2:25 PM
        Scaling Agile Infrastructure, development and change management 25m
        As the Agile Infrastructure project scaled from being a development effort of a few peoplethat could sit in one meeting room to a production service for the CERN computer centre, what changes were needed to our service, tools and workflow? We will look at the technical challenges scaling the puppet infrastructure, scaling the development effort of puppet code, and also the procedural changes to QA, change management and continuous delivery. How well does puppet scale to thousands of nodes and how much work is involved? What tools have been useful to manage an agile workflow? How can we fit a fast moving development pipeline to different groups with different expectations of the speed of change?
        Speaker: Ben Jones (CERN)
        Slides
      • 2:50 PM
        Agile Infrastructure: an updated overview of IaaS at CERN 25m
        The CERN private cloud has been in production since July 2013 and has grown steadily to 60000 cores, hosting more than 5500 Virtual Machines for 370 users and 140 shared projects. New features have been made available this year like block storage and IPv6. This presentation will provide an overview of the current status of the infrastructure and of the plans for the next developments and evolution of the services. During this talk, different topics will be dealt with such as the successful migration from OpenStack Grizzly to Havana, the imminent upgrade to IceHouse, IPv6 ready machines and the metering infrastructure.
        Speaker: Stefano Zilli (CERN)
        Slides
      • 3:15 PM
        Field Experience in the Agile Infrastructure 25m
        As the Agile Infrastructure is moving forwards at CERN, more and more services are migrating to it. New tools are put in place to get the most out of its strengths while we learn lessons from the problems we hit when converting services from Quattor. At CERN, a number of services have made some significant progress in the migration to the new infrastructure; the batch service, several interactive services, CEs and VOMS are but a few examples. In this talk, we will describe some aspects of the migration process, such as virtualisation, Puppet configuration and alarming. We will discuss the strengths of the Agile Infrastructure which make running services easier, and expose the problematic areas for which we will present some of the future projects which are to address them.
        Speaker: Jerome Belleman (CERN)
        Slides
    • 3:40 PM 4:10 PM
      Coffee break 30m
    • 4:10 PM 5:50 PM
      Basic IT services Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 4:10 PM
        from quattor to puppet, a T2 point of view 25m
        The IRFU site, member of the GRIF wLCG T2 site, decided to move from quattor to puppet in 2012. The migration was almost complete early april 2014. This talk will focus mainly on the goals, the ways to achieve what we achieved, the manpower that was required, what we gained with puppet and the new challenges that we must now face as a T2 with this management tool.
        Speaker: Mr Frederic Schaer (CEA)
        Slides
      • 4:35 PM
        Quattor Update 25m
        A report on the status of the Quattor toolset, with particular emphasis on recent developments in both teh user and development communities.
        Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 5:00 PM
        HEPiX configuration management working group update 25m
        A year ago we began working with other sites to see how we could best share knowledge and effort amongst sites migrating to puppet. This talk will present a reminder of the working group, it's formation and mandate, and how puppet had been developing already amongst earlier adopters. We will discuss how puppet module development occurs in the wider puppet community, and what conventions the working group has agreed upon. The current state of HEP related development will be explained, along with areas of interest that have been discussed. Along with that we will look at areas in which there are either open questions or perceived barriers to development.
        Speaker: Ben Jones (CERN)
        Slides
      • 5:25 PM
        Cluster Consolidation at NERSC 25m
        This talk will provide a case study of cluster consolidation at NERSC. In 2012, NERSC began deployment of "Mendel", a 500+ node, Infiniband-attached, Linux "meta-cluster" which transparently expands NERSC production clusters and services in a scalable and maintainable fashion. The success of the software automation infrastructure behind the Mendel multi-clustering model encouraged investigation into even more aggressive consolidation efforts. This talk will detail one such effort: under the constraints of a 24x7, disruption-sensitive environment, NERSC staff merged a 400-node legacy production cluster, consisting of multiple hardware generations and ad-hoc software configurations, into Mendel's automation infrastructure. By leveraging the hierarchical management features of the xCAT software package in combination with other open-source and in-house tools, such as Cfengine and CHOS, NERSC abstracted the unique characteristics of both clusters away below a unified management interface. Consequently, both cluster components are now managed as a single, albeit complex, integrated system. Additionally, this talk will provide an update on the PDSF system at NERSC, including improvements to trending data collection and ongoing CHOS development.
        Speaker: Larry Pezzaglia (LBNL)
        Slides
    • 9:00 AM 10:40 AM
      Grids, clouds, virtualisation Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      9 Chemin de Bellevue 74940 Annecy-le-Vieux FRANCE <b>GPS coordinates :</b> N 45° 55' 14.002'' E 6° 9' 33.998''
      Conveners: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB)), Dr Keith Chadwick (Fermilab)
      • 9:00 AM
        Big Data Transfer over Internet 25m
        In many cases where Big Data phenomenon is taken place there is the need to transfer the Big Data from one point of computer network to another point. Quite often those points are far away from each other. The transfer time is significant factor to transfer the Big Data. During this time the features of the data link might be changed drastically including interruptions of channel operation once or more times during data transfer. There are a number of known utilities/systems which are used for Big Data transfer. The authors investigate which utilities/systems are more suitable for Big Data transfer and which are most important architecture features for such the systems. It is of interest the comparison study of the data transfer methods. The testbed is developed to compare the data transfer utilities and study how Software Defined Networks (SDN) approach affects the Big Data transfer.
        Speaker: Mr Andrey SHEVEL (University of Information Technology, Mechanics, and Optics)
        Slides
      • 9:25 AM
        FermiCloud On-demand Services: Data-Intensive Computing on Public and Private Clouds 25m
        The FermiCloud project exists to provide on-demand computing and data movement services to the various experiments at Fermilab. We face a dynamically changing demand for compute resources and data movement, which we meet by enabling users to run on our own site, remote grid sites, and cloud sites. We also instantiate on-demand data movement and web caching services to support this remote analysis. In this presentation we will summarize some of our recent research results and outline the challenges of our current research projects. These include coordinated launches of compute nodes and data movement servers, interoperability with new commercial clouds, idle machine detection, and exploration of distributed storage models.
        Speaker: Steven Timm (Fermilab)
        Slides
      • 9:50 AM
        RAL Tier 1 Cloud & Virtualisation 25m
        The RAL Tier 1 is now working deploying a production quality private cloud, to meet the emerging needs of both the Tier 1 and STFCs Scientific Computing Department. This talk will describe the work so far and the roadmap for the coming year. We will also discuss other virtualisaton developments.
        Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 10:15 AM
        Enabling multi-cloud resources at CERN within the Helix Nebula project 25m
        Helix Nebula – the Science Cloud is a European public-private-partnership between leading scientific research organisations (notably CERN, EMBL and ESA) and European IT cloud providers. Its goal is to establish a Cloud Computing Infrastructure for the European Research Area and the Space Agencies, serving as a platform for innovation and evolution of a federated cloud framework for e-Science. CERN contributes to the Helix Nebula initiative by providing a flagship use case: the exploitation of cloud resources within the workload management system of the ATLAS and CMS experiments at the Large Hadron Collider. This contribution will summarize the CERN experience in Helix Nebula during the past two years and the lessons learned in deploying applications from ATLAS and CMS with several commercial providers. The integration with the experiment framework will also be explained.
        Speaker: Dr Domenico Giordano (CERN)
        Slides
    • 10:40 AM 11:10 AM
      Coffee break 30m
    • 11:10 AM 11:35 AM
      Grids, clouds, virtualisation Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Conveners: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB)), Dr Keith Chadwick (Fermilab)
      • 11:10 AM
        Experiences with ATLAS and LHCb jobs in Vac virtual machines 25m
        We present experiences with running ATLAS and LHCb production jobs in virtual machines at Manchester and other sites in the UK using Vac. Vac is a self-contained VM management system in which individual hypervisor hosts act as VM factories to provide VMs contextualized for experiments, and offers an alternative to conventional CE/Batch systems and Cloud interfaces to resources. In the Vacuum model implemented by Vac, VMs appear spontaneously at sites, with contextualizations provided by the sites using templates provided by the experiments. This system takes advantage of the pilot job frameworks for managing jobs and cvmfs for managing software distribution, which together lead to these contextualizations being extremely simple in practice. Vac is implemented as a daemon, vacd, which runs on each hypervisor host. Each daemon uses a peer-to-peer UDP protocol to gather information from other Vac daemons at the site about what mix of experiment VMs are already running, and acts autonomously to decide which VMs to start using a policy given in its configuration file. The UDP protocol is also used to avoid starting VMs for experiments which have no work available, by detecting when a VM has been started recently and has stopped immediately because the pilot framework client could find no work. Vac has been running LHCb production jobs since 2013 and in 2014 a suitable ATLAS VM contextualization was developed and has been used to run ATLAS production work too. We present some preliminary comparisons of the efficiency of running LHCb and ATLAS jobs on batch worker nodes and in virtual machines using the same hardware.
        Speaker: Andrew McNab (University of Manchester (GB))
        Slides
    • 11:35 AM 11:55 AM
      Miscellaneous Auditorium Marcel Vivargent

      Auditorium Marcel Vivargent

      LAPP

      Convener: Dr Helge Meinhard (CERN)
      • 11:35 AM
        Workshop wrap-up 20m
        Wrap-up
        Speaker: Dr Helge Meinhard (CERN)
        Slides