HEPiX Spring 2015 Workshop

Europe/London
Martin Wood Lecture Theatre, Parks Road (Physics Department, Oxford University)

Martin Wood Lecture Theatre, Parks Road

Physics Department, Oxford University

Helge Meinhard (CERN), Peter Gronbech (University of Oxford (GB)), Tony Wong (Brookhaven National Laboratory)
Description

HEPiX Spring 2015 at Oxford University, UK

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, IRFU, JLAB, NIKHEF, PIC, RAL, SLAC, TRIUMF and many others.

The workshop will be hosted in the Physics department's Martin Wood Lecture theatre complex on Parks Road. Accomodation is offered in St Anne's College approximately 10 minutes walk away. More details are available on the registration pages.

HEPiX Spring 2015 is proudly sponsored by Viglen and Boston jointly at the platinum level, Western Digital at Gold and DDN and AWS at the silver level.

Platinum

Viglen Boston

Gold

Silver

   http://aws.amazon.com/

 

    • 08:00 09:00
      Registration 1h Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 09:00 09:15
      Miscellaneous Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 09:00
        Welcome Address 10m
        Welcome Address
        Speaker: Prof. John Wheater (Oxford University)
      • 09:10
        Workshop Logistics 5m
        Workshop Logistics
        Speaker: Peter Gronbech (University of Oxford (GB))
        Slides
    • 09:15 10:30
      Site Reports Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 09:15
        CSC - It Center for Science 15m
        - CSC General HPC updates - New Haswell Hardware for supercluster and supercomputer - Slurm/Lustre - taito-shell.csc.fi - a slurm/sshd/iptables-based interactive shell load balancer. Replaced large memory (1TB) interactive nodes. - DDN SFA12k - Using ELK stack (Elasticsearch Logstash Kibana) - sort through dCache logs - Search through auditd logs - anybody used shred? root logins from bad IPs? - Updates at T2_FI_HIP - SE and CE for Helsinki Insitute of Physics - dCache to 2.10 / postgresql to 9.3 update - To virtualize dCache door/admin and other nodes (bdii, xrootd AAA) - Will use Ansible for configuration - HP 5x DL360 G7 + 26 x D2600 has been very stable. (7 Segate 2TB disks replaced over 3 years, two P411 controllers) - Next hardware renewal hopefully during 2015, probably HP SL4540 - Decommission of jade.hip.fi - resources moved into one of CSC's openstack - Converted unused IB port in dCache pool servers to 10GbE with QSFP-to-SFP+ adapters.
        Speaker: Johan Henrik Guldmyr (Helsinki Institute of Physics (FI))
      • 09:30
        Nikhef site-report 15m
        Spring 2015 site report
        Speaker: Paul Kuipers (Nikhef)
        Slides
      • 09:45
        IHEP Site Report 15m
        The status of IHEP site and the improvement we've got and what is in our plan this year.
        Speaker: Jingyan Shi (IHEP)
        Slides
      • 10:00
        Site Report - Prague 15m
        We will give an overview of the site and will share experience with these topics: migration of virtualized servers to a new infrastructure, migration from cfengine to puppet and spacewalk as new systems management solution, procurement of a new hardware (worker nodes and storage servers).
        Speakers: Martin Adam (UJF Rez), Václav Říkal
        Slides
      • 10:15
        INFN-T1 Site report 15m
        Update on INFN-T1
        Speaker: Giuseppe Misurelli (Unknown)
        Slides
    • 10:30 10:55
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 10:55 12:40
      Site Reports Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 10:55
        Site Report GSI 15m
        Site Report GSI
        Speaker: Walter Schon
        Slides
      • 11:10
        BNL RACF Site Report 15m
        Brookhaven National Lab (BNL) will present the site report for the RHIC-ATLAS Computing Facility (RACF), covering developments over the past 6 months.
        Speaker: William Strecker-Kellogg (Brookhaven National Lab)
        Slides
      • 11:25
        PDSF Site Report and Relocation 15m
        PDSF, the Parallel Distributed Systems Facility, has been serving high energy physics and been in continuous operation at NERSC since 1996. It is currently a tier-1 site for Star, tier-2 for Alice and tier-3 for Atlas. This site report will describe recent updates to the system and upcoming modifications. PDSF will move this year from its current site to a new building on the LBNL campus and particular focus will given on how this will affect computing during the transition.
        Speaker: James Botts (LBNL)
        Slides
      • 11:40
        T2_US_Nebraska Site Report 15m
        Site report covering the status of T2_US_Nebraska and changes / updates since the Fall 2014 meeting.
        Speaker: Garhan Attebury (University of Nebraska (US))
        Slides
      • 11:55
        Jefferson Lab Scientific and High Performance Computing 15m
        Current high performance and experimental physics computing environment updates: core exchanges between USQCD and Experimental Physics clusters for load balancing, job efficiency, and 12GeV data challenges; Nvidia K80 GPU experiences and updated Intel MIC environment; update on locally developed workflow tools and write-through to tape cache filesystem; status of LTO6 integration into our MSS; ZFS on Linux production environment, and status of our Lustre 2.5 update. It will also indicate JLab’s interest in upcpoming Ceph BOF sessions and workshops.
        Speaker: Sandy Philpott (JLAB)
        Slides
      • 12:10
        DESY site report 15m
        DESY site report
        Speaker: Mr Peter van der Reest (DESY)
        Slides
      • 12:25
        CCIN2P3 Site Report 15m
        We will present the lastest status of the IN2P3 Computer Center. Emphasis will be made to the infrastructure and system area.
        Speaker: Mr Julien Carpentier (CCIN2P3)
        Diapositives
    • 12:40 14:00
      Lunch Break 1h 20m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Lunch buffet provided for registered participants

    • 14:00 15:40
      End-user Services and Operating Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Sandy Philpott (JLAB), connie sieh (Fermilab)
      • 14:00
        Accelerating Scientific Analysis with SciDB 25m
        SciDB is an open-source analytical database for scalable complex analytics on very large array or multi-structured data from a variety of sources, programmable from Python and R. It runs on HPC, commodity hardware grids, or in a cloud and can manage and analyze terabytes of array-structured data and do complex analytics in-database. We present an overall description of the SciDB framework and describe its implementation at NERSC at Lawrence Berkeley National Laboratory. A case study using SciDB to analyze data from the LUX dark matter detector is described. LUX is a 370 kg liquid xenon time-projection chamber built to directly detect galactic dark matter in an underground laboratory 1 mile under the Black Hills in South Dakota, USA. In the 2013 initial data run, LUX collected 86 million events and wrote 32 TB of data of which only 160 events are retained for final analysis. The data rate for the new dark matter run starting in 2014 is expected to exceed 250 TB / year. We describe how SciDB is used to dramatically streamline the data collection and analysis, and discuss future plans for a large parallel SciDB array at NERSC.
        Speakers: Lisa Gerhardt (LBNL), Mr Yushu Yao (LBNL)
        Slides
      • 14:25
        HEP Software Foundation 25m
        The HEP Software Foundation (HSF) is a one year old inititative to foster collabarotion in software development in the HEP community and related scientific communities. Launched by a kick-off meeting at CERN in April 2014, the first year has been spend to better define what HSF should be. An HSF workshop was held in January at SLAC and HSF is now entering is "implementation phase". This talk will present the reasons behind HSF, the goals and organizational model.
        Speaker: Mr Michel Jouvin (Laboratoire de l'Accelerateur Lineaire (FR))
        Slides
      • 14:50
        CERN Search and Social for the Enterprise Web experience 25m
        • Status of CERN Web Services
          • Overview
          • Web Site Life Cycle Management
          • Web Analytics
        • CERN’s Enterprise Social Networking System
          • Motivation & purpose
          • Feature overview: microblogging, profiles, social networking, suggestion systems and discussion forums
        • CERN Search
          • Status
          • Enterprise vs Web search
          • New features
        Speaker: Mr Andreas Wagner (CERN)
        Slides
      • 15:15
        Evolutions in the CERN Conferencing Services Landscape 25m
        A lot of visible and behind-the-scene actions have been taken in recent months to prepare CERN conferencing services (Indico, Vidyo, the webcast and conference rooms services) for challenges to come. These services will be described in terms of features and usage statistics. We will present their integration to the CERN layered cloud infrastructure, and with other IT base services. We will analyse the feedback gathered through recent user satisfaction surveys and describe the orientation taken for their development, including the challenges to open some of these services to communities outside CERN. Some security aspects will also be highlighted as well as the importance of cross-service integrations.
        Speaker: Thomas Baron (CERN)
        Slides
    • 15:40 16:05
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 16:05 17:20
      End-user Services and Operating Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Sandy Philpott (JLAB), connie sieh (Fermilab)
      • 16:05
        Scientific Linux Current Status 25m
        Current Status of Scientific Linux
        Speaker: Connie Sieh (FNAL)
        Slides
      • 16:30
        CERN CentOS 7 Update 25m
        In this talk we will present a brief status update on CERN's work on CentOS 7, the uptake by the various IT services, and the interaction with the upstream CentOS community.
        Speaker: Dr Arne Wiebalck (CERN)
        Slides
      • 16:55
        Getting the most from the farm at the Sanger Institute 25m
        The Wellcome Trust Sanger Institute is a charitably funded genomic research centre. A leader in the Human Genome Project, it is now focused on understanding the role of genetics in health and disease. Large amounts of data is produced at the institute by next-generation sequencing machines. The data is then stored, processed and analysed on the institute's computing cluster. The main compute farm has 14,000 cores and has of the order of 20PB of storage in a mix of NFS, lustre and iRODS. Two of the main challenges from a systems administration point-of-view is helping the users to get the best out of the computing resources available and to manage their storage use effectively We present examples of how the human genetics informatics team is providing tools and reports to facilitate this effort. On the compute side we generate a weekly usage report which is used as a point of discussion in a standup style meeting of the user community, and on the storage side we discuss tools we have developed which provide an easy visualisation of utilisation using treemaps.
        Speaker: Mr Emyr James (Wellcome Trust, Sanger Institute)
        Slides
    • 17:20 17:45
      IT Facilities and Business Continuity Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB)), Wayne Salter (CERN)
      • 17:20
        Overview of operational issues at CERN in the recent past 25m
        Many of you are aware of the power incident we had on the 16th October during the last HEPiX workshop. I will give a detailed explanation of what happened, the impact on IT services as well as the actions taken to recover from the incident. I will also note some improvements that will be implemented as a result of this incident. I will then go on to discuss other operations incidents that we have had in the recent past as well as a summary of the hardware failures.
        Speaker: Wayne Salter (CERN)
        Slides
    • 18:00 21:00
      Welcome Reception 3h Martin Wood Lecture Theatre, Parks Road (The University Museum)

      Martin Wood Lecture Theatre, Parks Road

      The University Museum

      Parks Road, Oxford, OX1 3PW

      http://www.oum.ox.ac.uk/

    • 08:30 09:00
      Registration 30m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 09:00 10:30
      Site Reports Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      • 09:00
        PIC Tier-1 Spring 2015 report 15m
        We will be revising the status of PIC Tier-1 by Spring 2015. The typical site report which is reported in HEPIX.
        Speaker: Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
        Slides
      • 09:15
        Oxford University Site Report 15m
        A site report from the University of Oxford focusing on the integration challenges between the various systems.
        Speaker: Dr Sean Brisbane (University of Oxford)
        Slides
      • 09:30
        CERN Site Report 15m
        News from CERN since the Lincoln meeting.
        Speaker: Dr Arne Wiebalck (CERN)
        Slides
      • 09:45
        DLS site report 15m
        Diamond Light Source site report
        Speaker: Tina Friedrich (Diamond Light Source Ltd)
        Slides
      • 10:00
        KISTI-GSDC Site Report 15m
        The status of KISTI-GSDC Tier-1 site will be present including a brief of history of the KISTI-GSDC Site, system summary (configuration management), PBS batch issues, Tier-1 operations and future plan.
        Speaker: Sang Un Ahn (KiSTi Korea Institute of Science & Technology Information (KR))
        Slides
      • 10:15
        RAL Site Report 15m
        Latest updates for the RAL Tier-1.
        Speaker: Martin Bly (STFC-RAL)
        Slides
    • 10:30 10:55
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 10:55 12:05
      End-user Services and Operating Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Sandy Philpott (JLAB), connie sieh (Fermilab)
      • 10:55
        Update on software collaboration services at CERN 20m
        An update will be given on the status of collaborative tools for software developers, Version Control Services (Git and SVN), Issue Tracking (JIRA), Integration (Jenkins) and documentation (TWiki) The presentation will focus on collaborative ascpects for software developers and report on progress since the fall meeting.
        Speaker: Nils Hoimyr (CERN)
        Slides
      • 11:15
        E-Mail-Migration: transition from Exchange and UNIX mail to Zimbra 25m
        After more than ten years of operations the game is over for Exchange 2003 at DESY. Now Zimbra has been set into production and data from both Exchange 2003 and the UNIX mail service is being migrated and consolidated gradually. The architecture of the Zimbra mail service, the migration procedures and some experiences will be presented. Finally we will look at some integration aspects of Zimbra in the existing DESY landscape.
        Speaker: Mr Dirk Jahnke-Zumbusch (DESY)
        Slides
      • 11:40
        Status of volunteer computing at CERN 25m
        Status of LHC@home, volunteer computing at CERN and for the LHC experiments. The presenter will give an update on the volunteer computing strategy for HEP and different scenarii for use of volunteer cloud computing or other lightweight cloud infrastructes to run experiment code under CernVM on available computing resources. Furthermore, the current status of the CERN BOINC server infrastructure and recent experience from ATLAS and CMS as well as the outlook for the service will be described.
        Speaker: Nils Hoimyr (CERN)
        Slides
    • 12:05 12:20
      Site Reports Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Sebastien Gadrat (CC-IN2P3 - Centre de Calcul (FR))
      • 12:05
        FNAL site report 15m
        Site report from Fermilab
        Speakers: Rennie S. Scott (FNAL), connie sieh (Fermilab)
        Slides
    • 12:20 12:45
      Security and Networking Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn McKee (University of Michigan ATLAS Group)
      • 12:20
        Effects of packet loss and delay on TCP performance 25m
        Following an incident with a slow database replication between CERN's data centers, we discovered that even a very low rate packet loss in the network (order of 0.001%) can induce significant penalties to long distance single stream TCP transfers. We explore the behaviour of multiple TCP congestion control algorithms in a controlled loss and delay environment in order to understand what is the achievable throughput of TCP data transfers between CERN’s remote data centres.
        Speaker: Adam Lukasz Krajewski (Warsaw University of Technology (PL))
        Slides
    • 12:45 14:00
      Lunch Break 1h 15m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Lunch buffet provided for registered participants

    • 14:00 15:40
      Security and Networking Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn McKee (University of Michigan ATLAS Group)
      • 14:00
        Computer Security update 25m
        This presentation gives an overview of the current computer security landscape. It describes the main vectors of compromises in the academic community including lessons learnt, and reveal inner mechanisms of the underground economy to expose how our resources are exploited by organised crime groups, as well as recommendations to protect ourselves. By showing how these attacks are both sophisticated and profitable, the presentation concludes that the only mean to adopt and appropriate response is to build a tight international collaboration and trusted information sharing mechanisms within the community.
        Speaker: Mr Romain Wartel (CERN)
        Slides
      • 14:25
        WLCG Cloud Traceability Working Group 25m
        Report on the initial activities of the WLCG Cloud Traceability Working Group
        Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 14:50
        Recent experiences in operational security: incident prevention and incident handling in the EGI and WLCG infrastructure 25m
        The European Grid Infrastructure (EGI) and Worldwide Large Hadron collider Grid (WLCG) infrastructure largely overlap and share the majority of security activities. A lot of security related activity goes on behind the scenes concerning such a large scale distributed computing infrastructure. Security incident prevention takes up the larger amount of effort, and this is carried out via Security Policy definition, Software vulnerability handling and monitoring to ensure the infrastructure is as secure as is practical. This will describe some of the recent experiences concerning security, from handling vulnerabilities (including some of the high profile ones) to the types of incidents which have occurred in recent times. This will also describe some of the changes needed to handle new technologies and trust models which are coming into use.
        Speaker: Linda Ann Cornwall (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 15:15
        A recent view of OSSEC and Elasticsearch at Scotgrid Glasgow 25m
        OSSEC, the popular HIDS (Host Intrusion Detection System), has been widely used for a number of years. More recently, tools like Elasticsearch, Logstash and Kibana (ELK) have become popular in visualising and working with data such as that aggregated by OSSEC. We report on a recent implementation of OSSEC, coupled to an ELK instance, at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. In particular, we report on our experience of the installation and use of these tools in a puppet deployment context. We cover installation, additional utility scripts deployed as well as the configuration workflow. We broadly cover the specific Grid related rules that have been implemented thus far. This presentation is particularly relevant for sysadmins and security officers interested in a recent view of the installation of this software and our experience with it.
        Speaker: David Crooks (University of Glasgow (GB))
        Slides
    • 15:40 15:50
      Coffee Break 10m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 15:50 17:20
      Security and Networking Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn McKee (University of Michigan ATLAS Group)
      • 15:50
        Update on WLCG/OSG perfSONAR Infrastructure 25m
        WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The WLCG Network and Transfer Metrics working group was established to ensure sites and experiments can better understand and fix networking issues. In addition, it aims to integrate and combine all network-related monitoring data collected by the WLCG infrastructure from both network and transfer systems. This has been facilitated by the already existing network of the perfSONAR instances that is being commissioned to operate in full production. Recently, several higher level services were developed to help bring perfSONAR network to its full potential. This includes a Web-based mesh configuration system, which allows to centrally schedule and manage all the network tests performed by the instances; a network datastore (esmond), which collects, stores and provides interfaces to access all the network monitoring information from a single place as well as perfSONAR infrastructure monitoring, which ensures that the current perfSONAR instances are configured and operated correctly. In this presentation we will provide an update on the status of perfSONAR in WLCG and OSG as well as highlight the future plans for the commissioning and production operations of the perfSONAR in the scope of the WLCG Network and Transfer Metrics working group. We will also present details on the specific higher level services and provide a summary of the current status of the pilot projects to integrate network and transfer metrics.
        Speaker: Marian Babik (CERN)
        Slides
      • 16:15
        News from the HEPiX IPv6 Working Group 25m
        This talk will present an update from the HEPiX IPv6 Working Group. This will include details of recent testing activities and plans for the deployment of dual-stack data services and monitoring on (at least some of) the WLCG infrastructure.
        Speaker: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 16:40
        Testing dCache and IPv6 15m
        A view back on testing IPv6 and different versions of dCache as it has evolved from 2.6 to 2.12 and barely-working to well working.
        Speaker: Ulf Bobson Severin Tigerstedt (Helsinki Institute of Physics (FI))
        Slides
      • 16:55
        The IPV6 post office: labeling and sorting everywhere. 25m
        Probably the most prominent change that IPv6 introduces in the semantics of internet protocol applications is the need to *always* deal with multiple addresses (possibly both IPv4 and IPv6) associated to each network endpoint. A quick overview of how and where addresses are categorised, ordered and preferred is presented, both from the system administrator and the developer viewpoint. A few not too obvious practical consequences of RFC3484 are also shown, in an attempt to tame its subtleties.
        Speaker: Francesco Prelz (Università degli Studi e INFN Milano (IT))
        Slides
    • 17:20 17:45
      Storage and File Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
      • 17:20
        Evaluation of distributed open source solutions in CERN database use cases 25m
        There are terabytes of data stored in a relational database (Oracle) at CERN which in fact does not need a relational model. Moreover, using a relational database management system very often brings a significant overhead in terms of resource utilization. The problem is notably observable for warehouse-type data sets. At the same time running analytical workloads on such data sets requires large amount of computing power combined with high storage throughput, a combination which can be achieved with a scalable database system. Introducing this kind of system will not only speed up processing but open new possibilities for data mining as well. This presentation will discuss advantages of using distributed architecture like Hadoop for scalable data processing of CERN data sets; such as stored in Oracle: DB LHC logging system, SCADA systems or even LHC experiments events data stored in Ntuples.
        Speaker: Kacper Surdy (CERN)
        Slides
    • 17:45 18:00
      Break 15m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      no supplies

    • 18:00 19:30
      Security and Networking: IPv6 tutorial for administrators Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn McKee (University of Michigan ATLAS Group)
      • 18:00
        IPv6 tutorial for administrators 1h 30m
        Speakers: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn McKee (University of Michigan ATLAS Group)
        Slides
    • 08:30 09:00
      Registration 30m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 09:00 09:25
      IT Facilities and Business Continuity Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB)), Wayne Salter (CERN)
      • 09:00
        Dust sensors for long term data preservation 25m
        CERN Computer Center (CC) is a large building that integrates several kilometers of fibers, copper cables, pipes and several complex installations (UPSes, water cooling, heat exchangers...). This evolving building is a large theater with numerous actors: - contractors, performing construction work, building maintenance or hardware replacement - engineers and technicians, debugging hardware and performing critical upgrades on the servers - visitors, touring inside the computing facilities All those external activities are not without consequences on the hosted IT equipment. Last year, the CERN CC tape infrastructure was impacted by an isolated air contamination incident which affected around 125 files on a dozen tapes and two drives. Since long term data preservation is one of our missions we need to take this new environmental parameter into account. This presentation will expose the problems and challenges we are facing, and more importantly the solutions we developed to better monitor CERN CC environment around our tape libraries and strategies to limit the impact of airborne particles on the LHC data.
        Speaker: Julien Leduc (CERN)
        Slides
    • 09:25 10:40
      Storage and File Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
      • 09:25
        Building large storage systems with small units: How to make use of disks with integrated network and CPU 25m
        Recent advances in both, hard-disks and system-on-a-chip (SoC) designs enabled the development of a novel form of hard-disk: a disk that includes a network interface and an additional ARM processor, not involved in low level disk operations. This setup allows those disks to run an operating system and to communicate with other nodes autonomously using wired Ethernet. No additional hardware or infrastructure is required. The HGST laboratory provided us, the dCache team, with early access to those devices. We investigated how such devices might be used in grid and cloud environments. By deploying dCache software and observing the system's behavior we evaluated how a petabyte-sized storage infrastructure, based on these disks, might be build, including possible changes to the dCache software. We will present two realistic deployment scenarios for those new disks and compare them to existing deployments at the Deutsches Elektronen-Synchrotron (DESY) research centre, where direct attached RAID systems are in use. The results of our initial investigations are presented along with an outline of future work.
        Speaker: Yves Kemp (Deutsches Elektronen-Synchrotron (DE))
        Slides
      • 09:50
        ASAP3: New data taking and analysis infrastructure for PETRA III 25m
        PETRA III is DESY's largest ring accelerator and the most brilliant storage-ring-based X-ray radiation source in the world. With its recent extension, new and faster detectors are used for the data acquisition. They exceed previous detectors in terms of data rate and volume; this is highly demanding for the underlying storage system. This talk will present the challenges we faced, the new infrastructure and services based on IBM's GPFS and our first experiences with it.
        Speaker: Stefan Dietrich (DESY)
        Slides
      • 10:15
        BeeGFS at DESY 25m
        The presentation will present: - History and current status of the BeeGFS project (formerly know as FhGFS, originating from Fraunhofer) - Design and technology decisions made by BeeGFS developers - BeeGFS setup and operational experience as IniniBand based high-performance cluster file system serving as scratch space for the DESY HPC system - Discussion of future usage scenarios and comparison with other products in use at DESY
        Speaker: Yves Kemp (Deutsches Elektronen-Synchrotron (DE))
        Slides
    • 10:40 11:05
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 11:05 12:45
      Storage and File Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
      • 11:05
        Ceph storage at RAL 25m
        RAL is currently exploring the possibilities offered by Ceph. This talk will describe two of these projects. The first project aims to provide large scale, high throughput storage for experimental data. This will initially be used by the WLCG VOs. A prototype cluster built from old hardware has been in testing since October 2014. The WLCG VOs will continue to need to access their data via familiar protocols. The cluster is therefore accessed via xrootd and GridFTP gateways. We describe the current state of the project, the testing that has already been carried out as well as future plans for the cluster. The second project provides Ceph RBD storage for the cloud infrastructure at RAL. This cluster is on hardware optimised for Ceph and provides low latency data access. This talk will describe the current status of the project as well as our experience running the cluster.
        Speaker: Alastair Dewhurst (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 11:30
        Ceph operations at CERN 25m
        Ceph has become over time a key component of CERN’s Agile Infrastructure by providing storage for the Openstack service. In this talk, we will briefly introduce Ceph’s concepts, our current cluster and the services we provide such as NFS filers, Object Store for the Atlas experiment and Xroot-to-Ceph gateways. We will then talk about our experience running Ceph with some real-world examples (everyday operations, lessons learned, upgrades) Finally we will try to have a look at what is coming for Ceph at CERN and how it could be used to secure data to a remote place or to consolidate storage (Castor).
        Speaker: Herve Rousseau (CERN)
        Slides
      • 11:55
        Status Report on Ceph Based Storage Systems at the RACF 25m
        We review various functionality, performance, and stability tests performed at the RHIC and ATLAS Computing Facility (RACF) at Brookhaven National Laboratory (BNL) in 2014-2015. Tests were run on all three (object storage, block storage and file system) levels of Ceph, using a range of hardware platforms and networking solutions, including 10/40 Gbps Ethernet and IPoIB/4X FDR Infiniband. We also report the current status of a 1 PB scale Ceph-based object storage system, provided with Amazon S3 compliant RADOS Gateway interfaces, that was built for the RACF in 2014 and went through a major hardware upgrade in the beginning of 2015. We present performance measurements of this system and discuss the experience gained while operating it as ATLAS event service storage for the past 8 months.
        Speaker: Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY)
        Slides
      • 12:20
        Ceph development update 25m
        The Ceph storage system is an open source, highly scalable, resilient data storage service providing object, block and file interfaces. This presentation will introduce what is new in the latest Ceph release, codenamed *Hammer*, and describe the ongoing development activities around CephFS, the Ceph filesystem. An intermediate level of familiarity with large scale storage systems will be assumed.
        Speaker: Mr Spray John (Red Hat, Inc.)
        Slides
    • 12:45 14:00
      Lunch Break 1h 15m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Lunch buffet provided for registered participants

    • 13:00 14:00
      Storage and File Systems: BoF session/panel: Ask the CEPH experts Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Arne Wiebalck (CERN), Mr Peter van der Reest (DESY)
      • 13:00
        Panel and BoF session: Ask the CEPH experts 1h
        Speaker: Dr Arne Wiebalck (CERN)
    • 14:00 15:40
      Computing and Batch Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 14:00
        Operation and benchmarking of a commercial datacentre 25m
        This contribution describes the usage and benchmarking of a commercial data centre running Openstack. Different cloud provisional tools are described highlighting the pros and cons of each system. A comparison is made between this facility and a standard grid T2 site in terms of job throughput and availability. Usage of the centre’s local object store is also described.
        Speaker: Peter Love (Lancaster University (GB))
        Slides
      • 14:25
        Remote evaluation of hardware 25m
        The RHIC-ATLAS Computing Facilty (RACF) at BNL has traditionally evaluated hardware on-site, with physical access to the systems. The effort to request evaluation hardware, shipping, set-up and testing has consumed an increasing amount of time and the process has become less productive over the years. To regain past productivity and shorten the evaluation process, BNL has started a pilot project to evaluate hardware remotely in 2015. This presentation discusses the status of the remote evaluation project and its future prospects.
        Speaker: Dr Tony Wong (Brookhaven National Laboratory)
        Slides
      • 14:50
        Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads running at a Tier-2 25m
        Modern Linux Kernels include a feature set that enables the control and monitoring of system resources, called Cgroups. Cgroups have been enabled on a production HTCondor pool sited at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. A system has been put in place to collect and aggregate metrics extracted from Cgroups on all worker nodes within the Condor pool. From this aggregated data, memory and CPU usage footprints are extracted. From the extracted footprints the resource usage for each type of ATLAS workload can be obtained and studied. This system has been used to identify broken payloads, real-world memory usage, job efficiencies etc. Additionally work has begun on near-real-time tracking of running jobs with a goal to proactively identify and stop broken payloads from consuming unnecessary CPU time and resources.
        Speaker: Gang Qin (University of Glasgow (GB))
        Slides
      • 15:15
        Beyond HS06: Toward a New HEP CPU Benchmark - Update March 2015 25m
        In this talk we will provide information about the current status of the preliminary work to relaunch the HEPiX Benchmarking Working Group which will develop the next release of the HEP CPU benchmark.
        Speaker: Manfred Alef (Karlsruhe Institute of Technology (KIT))
        Slides
    • 15:40 16:05
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 16:05 17:45
      Computing and Batch Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 16:05
        Looking for a fast benchmark 25m
        The WLCG community has requested a fast benchmark to quickly assess the perfomances of a worker node. A good candidate is a python script used in LHCb
        Speaker: Dr Michele Michelotto (INFN Padua & CMS)
        Slides
      • 16:30
        Evaluation of low power Systems on Chip for scientific computing 25m
        Systems on Chip (SoCs), originally targeted for mobile and embedded technology, are becoming attractive for HEP and HPC scientific communities, given their low cost, huge worldwide shipments, low power consumption and increasing processing power - mostly associated with their GPUs. A variety of development boards are currently available, making it foreseeable to use these power-efficient platforms in a standard computing center configuration. We discuss hardware limitations and programming constraints of low power solutions, namely ARM-based SoCs and Intel (Avoton) SoCs. Finally, we present some preliminary evaluations of the performances of SoC architectures running non-trivial scientific applications exploiting both the CPU and GPU available on the low power chip: results get particularly encouraging when considering workloads with low memory requirements and/or image processing.
        Speaker: Dr Lucia Morganti (INFN)
        Slides
      • 16:55
        A look beyond x86: OpenPOWER8 & AArch64 25m
        x86 is the uncontested leader for server platforms in terms of market share and is currently the architecture of choice for High Energy Physics applications. But as more and more importance is given to power efficiency, physical density and total cost of ownership we are seeing new processor architectures emerging and some existing ones becoming more open. With the introduction of AArch64, ARM's 64-bit architecture, coupled with the adoption of industry standards such as UEFI, ACPI and SMBIOS, ARM has for the first time in history the opportunity of becoming a real contender in the server space. Sharing some similarities with the Open Computer Project and with a strong industry backing the OpenPOWER Foundation aims to create an open ecosystem around the enterprise centric POWER architecture. I will present the specificities of each of these alternative architectures highlighting the differentiating server features of each. Performance and power profiling of the following uni-socket platforms will be presented: Intel Xeon E3-1200 v3 (Haswell), Intel Atom (Avoton), Applied Micro X-Gene and OpenPOWER 8.
        Speaker: Liviu Valsan (CERN)
        Slides
      • 17:20
        HEPSPEC analysis across modern architectures 25m
        The talk's coverage will include Xeon Haswell, ARM and Open Compute Platforms
        Speaker: Mr David Power (Boston Ltd.)
        Slides
    • 19:00 22:00
      Social Dinner 3h Balliol College

      Balliol College

      Broad Street OX1 3BJ

      http://www.balliol.ox.ac.uk/

    • 08:30 09:00
      Registration 30m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 09:00 10:40
      Basic IT Services Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 09:00
        Status of Centralized Config Management at the RACF 25m
        It's simple enough to instantiate a new process in an existing environment; it can be much more challenging to foster acceptance of such a process in IT environments and cultures that are traditionally stagnant and resistant to change, and to maintain and optimize that process to ensure it continues to realize optimal benefit. To enhance our computing facility, we've already taken considerable strides toward simplifying, optimizing, and automating our technical deployment and maintenance procedures by analyzing, adopting, and implementing policies and technologies. As our facility evolves, so does technology change: we continuously revisit our environment and needs, evaluate our current tools and processes, and watch the horizon of the IT landscape for new and more optimal technology and solutions. Our configuration management core relies in part upon Puppet: Puppet Labs has developed a new server model, which we have tested and evaluated against our existing Puppet deployment. We've developed an automated testing process based upon Jenkins CI, a continuous integration tool that validates pending changes before they can be pushed into our production environment. We've begun evaluating MCollective, an orchestration framework that may prove useful to our current automation processes by adding functionality such as resource grouping and reporting. We're working with other organizations at our site that share our interest in configuration management to share ideas and refine solutions. In this talk, we present an overview of the current state of configuration management environment in our facility, the technical challenges we currently face, the technology we're evaluating and using to address those challenges, and the direction in which we plan to steer our future efforts.
        Speaker: William Strecker-Kellogg (Brookhaven National Lab)
        Slides
      • 09:25
        Configuration management at CERN: Status and directions 25m
        CERN’s experience of migrating a large site to a Puppet-based and more dynamic Configuration Service will be presented. The presentation will review some of the challenges encountered along the way and describe future plans for how to scale the service and improve the overall automation of operations on the site.
        Speaker: Alberto Rodriguez Peon (Universidad de Oviedo (ES))
        Slides
      • 09:50
        Deployment and usage of MCollective in production 25m
        Marionette Collective, also known as MCollective, is a framework for building server orchestration, monitoring, and parallel job execution. MCollective uses a modern "Publish Subscribe Middleware" for a scalable and fast execution environment. It is a powerful tool in combination with Puppet, due to the good integration. However it can be a challenging task to configure and deploy MCollective in a reliable and secure way. In this talk, we will give an overview of our DESY MCollective setup, the design decisions, intended use and encountered problems.
        Speaker: Stefan Dietrich (DESY)
        Slides
      • 10:15
        Quattor in 2015 25m
        The Quattor community has been maintaining Quattor for over ten years and having recently held our 19th community workshop the pace of development continues to increase. This talk will demonstrate why Quattor is more than just a configuration management system, report on recent developments and provide some notable updates and experiences from sites.
        Speaker: James Adams (STFC RAL)
        Slides
    • 10:40 11:05
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 11:05 12:20
      Basic IT Services Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 11:05
        Subtlenoise: sonification of distributed computing activity 25m
        The dominant monitoring system used in distributed computing consists of visually rich time-series graphs and notification systems for alerting operators when metrics fall outside of accepted values. For large systems this can quickly become overwhelming. In this contribution a different approach is described using the sonification of monitoring messages with an architecture which fits easily within existing infrastructures. The benefits of this approach are described in the context of various computing operations.
        Speaker: Peter Love (Lancaster University (GB))
        Slides
      • 11:30
        Towards a modernisation of CERN’s telephony infrastructure 25m
        IP-based voice telephony (VoIP) and the SIP protocol are clear examples of disruptive technologies that have revolutionised a previously settled market. In particular, open-source solutions now have the ascendancy in the traditional Private Branch eXchange(PBX) market. We present a possible architecture for the modernisation of CERN's fixed telephony network, highlighting the technical challenges to be addressed and the critical services that must be maintained and then describing how the introduction of open-source call routers based on the SIP protocol and Session Border Controllers (SBC) could foster the introduction of new services and increase the agility of our telephony network to adapt to new communication standards.
        Speaker: Francisco Valentin Vinagrero (CERN)
        Slides
      • 11:55
        Updates from Database Services at CERN 25m
        CERN has a great number of applications that rely on a database for their daily operations. From physics related databases to the administrative, sector there is a high demand to have a database system appropriate to the users' needs and requirements. This presentation gives a summary of the current state of the Database Services at CERN, the work done during LS1 and some insights into the evolution of our services.
        Speaker: Andrei Dumitru (CERN)
        Slides
    • 12:20 12:45
      Computing and Batch Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 12:20
        DRMAA2 - An Open Standard for Job Submission and Cluster Monitoring 25m
        - Introduction - DRMAA2 in a Nutshell - The C Interface - Data Types, Monitoring Sessions, Job Sessions, Working with Jobs, Job Templates, Error Handling and Dealing with Enhancements - Getting started with DRMAA2 - Example Applications - Job Monitoring Applications and Simple Multi-Clustering
        Speaker: Daniel Gruber (U)
        Slides
    • 12:45 14:00
      Lunch Break 1h 15m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Lunch buffet provided for registered participants

    • 14:00 15:40
      Grids, Clouds and Virtualisation Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Brian Paul Bockelman (University of Nebraska (US)), Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
      • 14:00
        Cloud @ RAL, an update 25m
        The STFC Scientific computing department has been developing an OpenNebula based cloud underpinned by Ceph block storage. I will describe some of our use cases, our set up,and give a demonstration of our development VM on demand service. I will go on to explore some of the problems we have overcome to reach this point. Finally, I will present the work we are doing to use spare capacity on this cloud to run virtual worker nodes
        Speaker: George Ryall (STFC - Rutherford Appleton Lab.)
        Slides
      • 14:25
        CERN Cloud Report 25m
        This is a report on the current status and future plans of CERN’s OpenStack-based Cloud Infrastructure.
        Speaker: Bruno Bompastor (CERN)
        Slides
      • 14:50
        Ceph vs Local Disk For Virtual Machines 25m
        The Scientific Computing Department at the STFC has been developing a Ceph block storage backed OpenNebula cloud. We have carried out a quantitative evaluation of the performance characteristics of virtual machines which have been instantiated with a variety of different storage configurations (using both Ceph and local disks). I will describe our motivations for this testing, our methodology and present our initial results.
        Speaker: Alexander Dibbo (urn:Google)
        Slides
      • 15:15
        The Vacuum model for running jobs in VMs 25m
        The Vacuum model provides a method for managing the lifecycle of virtual machines based on their observed success or failure in finding work to do for their experiment. In contrast to centrally managed grid job submission and cloud VM instantiation systems, the Vacuum model gives resource providers direct control over which experiments' VMs or jobs are created and in what proportion. This model also leads to a highly decentralised, feedback-based infrastructure, in which the responsibility of providing VMs for the same experiment can be undertaken by a mix of sites, local groups, national teams, and the experiment's central operations staff. This mixture is well matched to the variety of entities which need to act as a virtual resource provider, due to differences in available funding and objectives. We present three implementations of this model developed by GridPP: Vac which manages VMs on autonomous hypervisor machines; Vcycle which manages VMs on IaaS systems such as OpenStack; and a system for managing VMs at an HTCondor site. The Pilot VM architecture originally developed by LHCb is particularly suitable to the lightweight Vacuum approach, and involves the configuration of an environment similar to conventional WLCG batch worker nodes within the VM, and then the execution of the pilot framework client as with conventional grid jobs. We present Pilot VM designs used to run production jobs for LHCb, ATLAS, CMS, and GridPP DIRAC, where the same VM design is able to run on all three implementations of the Vacuum model for VM management.
        Speaker: Andrew McNab (University of Manchester (GB))
        Slides
    • 15:40 16:05
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 16:05 17:45
      Grids, Clouds and Virtualisation Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Brian Paul Bockelman (University of Nebraska (US)), Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
      • 16:05
        Running ATLAS at scale on Amazon EC2 25m
        Beginning in September 2014, the RACF at Brookhaven National Lab has been collaborating with Amazon's scientific computing group in a pilot project. The goal of this project is to demonstrate the usage of Amazon AWS (EC2, S3, etc.) for real-world ATLAS production. This will prove the practical and economic feasibility of ATLAS beginning to leverage commercial cloud computing to optimize resource provisioning. This project includes all elements needed for at-scale work: networking, data management, capacity provisioning, VM image creation and management, and alterations/additions to ATLAS compute infrastructure to leverage AWS. Each of these elements is established for all three AWS regions (east, west-1, and west-2). The work done so far, the results of initial scaling tests, and plans for the near future will be presented in detail.
        Speaker: John Hover (Brookhaven National Laboratory (BNL)-Unknown-Unknown)
        Slides
      • 16:30
        Best Practices for Big Science in the Cloud 25m
        On the heals of discussing the BNL RACF Group's Proof Of Concept on AWS, this session will share best practices on some of the most common AWS services used by Big Science, such as EC2, VPC, S3, and complex hybrid networking and routing. We will also provide an overview of the AWS Scientific Computing Group which was created to help Global Scientific collaborations develop and ecosystem supporting the long-tail of science, research and engineering and to educate these communities on the role AWS can play in making Science successful in the cloud.
        Speaker: Mr Dario Rivera (Amazon Web Services)
        Slides
      • 16:55
        OpenStack Heat @ CERN 25m
        Heat, the Openstack orchestration service, is being deployed at CERN. We will be presenting the overall architecture and features included in the project, our deployment challenges and future plans.
        Speaker: Bruno Bompastor (CERN)
        Slides
      • 17:20
        STAR Experience with Automated High Efficiency Grid Based Data Production Framework at KISTI/Korea 25m
        In statistically hungry science domains, data taking data deluges can be both a blessing and a curse. They allow the winnowing out of statistical errors from known measurements, open the door to new scientific opportunities as the physics program matures but are also a testament to the efficiency of the experiment and accelerator and skill of its operators. However, the data samples need to be dealt with and in experiments like those at RHIC, the planning for computer resources do not allow huge increases in computing capacity. A standard strategy has then been to share resources across multiple experiments at a given facility. Another has been to use middleware that “glues” resources across the world so they are able to locally run the experimental software stack (either natively or virtually). In this presentation, we will describe a framework STAR has successfully used to reconstruct a ~400TB of data consisting of over 100,000 jobs submitted to a remote site in Korea from its Tier0 facility at the Brookhaven National Laboratory. The framework automates the full path taking raw data files from tape and writing Physics ready output back to tape without operator or remote site intervention. Through hardening we have demonstrated an efficiency of 99%, over a period of 7 months of operation. The high performance is attributed to finite state checking with retries to encourage resilience in the system over capricious and fallible infrastructure.
        Speaker: Mr Levente Hajdu (Brookhaven National Laboratory)
        Slides
    • 17:45 19:45
      HEPiX Board Meeting Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      By invitation only

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
    • 08:30 09:00
      Registration 30m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 09:00 10:40
      Computing and Batch Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 09:00
        Batch Processing at CERN 25m
        The CERN Batch System comprises 4000 worker nodes, 60 queues and offers a service for various types of large user communities. In light of the developments driven by the Agile Infrastructure and the more demanding processing requirements, it is faced with increasingly challenging scalability and flexibility needs. This production cluster currently runs IBM/Platform LSF. Over the last few months, an increasing number of large-scale interventions had to take place, betraying some critical limitations we will need to overcome in the future. We have started working on a project helping us implementing work flows to help use face these problems.
        Speaker: Jerome Belleman (CERN)
        Slides
      • 09:25
        Grid Engine at GridKa 25m
        The Grid Computing Centre Karlsruhe (GridKa) is using the Grid Engine batch system since 2011. In this presentation I will talk about the experiences with this batch system, including multi-core job support, and first experiences with cgroups.
        Speaker: Manfred Alef (Karlsruhe Institute of Technology (KIT))
        Slides
      • 09:50
        SLURM update from the Nordics 25m
        An update on the current status of SLURM usage in the Nordics, as well as recent developments in improving support for LHC type jobs including tuning for efficient scheduling of multicore grid jobs. Also an overview of some remaining challenges will be given together with discussion on how to address them.
        Speaker: Erik Mattias Wadenstein (University of Umeå (SE))
        Slides
      • 10:15
        Condor Workshop Summary 25m
        I propose to give a summary of the Condor workshop, held at CERN mid-December.
        Speaker: Mr Michel Jouvin (Laboratoire de l'Accelerateur Lineaire (FR))
        Slides
    • 10:40 11:05
      Coffee Break 25m Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

    • 11:05 12:45
      Computing and Batch Systems Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Michele Michelotto (Universita e INFN (IT)), Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY), Wolfgang Friebel (Deutsches Elektronen-Synchrotron (DE))
      • 11:05
        Two years of HTCondor at the RAL Tier-1 25m
        After running Torque/Maui for many years, the RAL Tier-1 migrated to HTCondor during 2013 in order to benefit from improved reliability, scalability and additional functionality unavailable in Torque. This talk will discuss the deployment of HTCondor at RAL, our experiences and the evolution of our pool over the past two years, as well as our future plans.
        Speaker: Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 11:30
        Future of Batch Processing at CERN 25m
        While we are taking measures to face the limitations discussed earlier on in our IBM/Platform LSF cluster, we have been working on setting up a new batch system based on HTCondor. There has been some progress with the pilot service which we described last HEPiX. We also went on investigating some of the more advanced functions which will lead up to the production state of the new CERN Batch Service.
        Speaker: Jerome Belleman (CERN)
        Slides
      • 11:55
        HTCondor within the European Grid and in the Cloud 25m
        With the increasing interest in HTCondor in Europe, an important question for sites considering migrating to HTCondor is how well it integrates with the standard grid middleware, in particular integration with the information system and APEL accounting. Also, with the increasing interest and usage of private clouds, how easily a batch system can be integrated with a private cloud is another important question. This talk will discuss both of these topics based on the experience of the RAL Tier-1.
        Speaker: Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))
        Slides
      • 12:20
        DrainBoss: A Drain Rate Controller for ARC/HTCondor 25m
        This talk describes DrainBoss, which is a proportional integral (PI) controller with conditional logic that strives to maintain the correct ratio between single-core and multi-core jobs in an ARC/HTCondor cluster. DrainBoss can be used instead of the HTCondor DEFRAG Daemon.
        Speaker: Stephen Jones (Liverpool University)
    • 12:45 13:05
      Miscellaneous Martin Wood Lecture Theatre, Parks Road

      Martin Wood Lecture Theatre, Parks Road

      Physics Department, Oxford University

      Conveners: Dr Helge Meinhard (CERN), Dr Tony Wong (Brookhaven National Laboratory)
      • 12:45
        A word on phishing 5m
        ... at HEPiX
        Speaker: Mr Romain Wartel (CERN)
        Slides
      • 12:50
        Workshop Wrap-Up 15m
        Usual summary and conclusions
        Speaker: Dr Helge Meinhard (CERN)
        Slides