HEPiX Fall 2015 Workshop

America/New_York
Bldg. 510 - Physics Department Large Seminar Room (Brookhaven National Laboratory)

Bldg. 510 - Physics Department Large Seminar Room

Brookhaven National Laboratory

Upton, NY 11973
Helge Meinhard (CERN), Tony Wong (Brookhaven National Laboratory)
Description

HEPiX Fall 2015 at Brookhaven National Laboratory (BNL), Upton NY, USA

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, IRFU, JLAB, NIKHEF, PIC, RAL, SLAC, TRIUMF, and many others.

HEPiX Fall 2015 is made possible thanks to support by the following sponsors:

Platinum Sponsor:

Western Digital: Platinum Sponsor


Gold Sponsors:

Hewlett-Packard: Gold Sponsor

Red Hat: Gold Sponsor


Silver Sponsor:

DataDirect Networks Storage: Silver Sponsor


Organized and hosted by the RACF at BNL



 

Notes from Workshop
    • 08:00 09:00
      Registration Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
    • 09:00 09:15
      Welcome to BNL Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 09:00
        Welcome to BNL 15m
        Speaker: Michael Ernst (Unknown)
    • 09:15 10:35
      Site Reports Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
    • 10:35 11:00
      Coffee Break 25m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 11:00 12:40
      Site Reports Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 11:00
        Oxford and SouthGrid Site Status Report 20m
        Update on Oxford University Particle Physics group computing setup, including short updates from the other member sites of SouthGrid.
        Speaker: Peter Gronbech (University of Oxford (GB))
      • 11:20
        PIC Site Report 20m
        We will be revising the status of PIC by Fall 2015. News from Oxford meeting will be reported.
        Speaker: Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
      • 11:40
        PDSF Site Report and Relocation 20m
        PDSF, the Parallel Distributed Systems Facility, has been in continuous operation since 1996 serving high energy physics research. It is currently a tier-1 site for Star, a tier-2 site for Alice and a tier-3 site for Atlas. The PDSF cluster will move early next year from its current site at Oakland to a new building on the LBNL campus. Several racks have already been installed at the new site. This site report will describe recent updates to the system, upcoming modifications, and how the PDSF cluster will operate between the two locations.
        Speaker: Tony Quan (LBL)
      • 12:00
        RAL Site Report 20m
        Update from RAL Tier1.
        Speaker: Martin Bly (STFC-RAL)
      • 12:20
        BEIJING Site Report 20m
        News and updates from IHEP since the Spring HEPiX Workshop. In this talk we will present a brief status of IHEP site including computing farm, Grid, data storage ,network and so on.
        Speaker: Dr Qiulan Huang (Institute of High Energy of Physics Chinese Academy of Science)
    • 12:40 14:00
      Lunch Break 1h 20m Berkner Hall cafeteria

      Berkner Hall cafeteria

      Brookhaven National Laboratory

    • 14:00 14:20
      Grid, Cloud and Virtualization Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 14:00
        Simulating 5 Trillion Events on the Open Science Grid 20m
        We report on a simulation effort using the Open Science Grid which utilized a large fraction of the available OSG resources for about 13 weeks in the first half of 2015. sPHENIX is a proposed upgrade of the PHENIX experiment at the Relativistic Heavy Ion Collider. We have collected large data sets of proton-proton collision data in 2012 and 2013, and plan to carry out a similar study with upgraded sPHENIX detector further into the 2020s. One important aspect of the study is to understand the different contributions for the forward production of muons that are measured in the detector. This requires a large-scale PYTHIA simulation that matches the integrated luminosity of the PHENIX 2013 data set (about 5000 Billion collisions). Since there is no way to pre-select the different parton-level collision types, the simulation selects minimum-bias collisions and then discards those which are not of the desired topology, leading to an acceptable I/O-to-CPU ratio. This made this simulation ideal to be run on the Grid. The Open Science Grid opened up the possibility of carrying out this simulation in a reasonable time frame. The project used about 5 million CPU hours total. We will report on the experience running the jobs on the OSG, and describe the steps there were taken to automate the production in such a way that the entire simulation could be run by one person.
        Speaker: Martin Lothar Purschke (Brookhaven National Laboratory (US))
    • 14:20 15:20
      End-user Services and Operating Systems Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 14:20
        Scientific Linux Status Update 20m
        Scientific Linux status and news.
        Speaker: Bonnie King (Fermilab)
      • 14:40
        CERN CentOS 7 and community update 20m
        In this talk we will present a brief status update on CERN's work on CentOS 7, the uptake by the various IT services, and the interaction with the upstream CentOS community. We will talk about the SIGs status, new projects and work done over the last months, presenting a list of our contributions and feedback about the experience.
        Speaker: Arne Wiebalck (CERN)
      • 15:00
        Software collaboration tools as a stack of services 20m
        The Version Control and Issue Tracking services team at CERN, is facing a transition and consolidation of the services provided in order to fulfill the needs of the CERN community as well as maintain the current deployments that are used. Software and services development is a key activity at CERN that currently is widely carried out using Agile methodologies. Code hosting and review, version control, documentation, issue tracking and continues integration and delivery are common requirements that each working group applies in daily basis. As platform and engineering services providers, we are deploying a stack of tools and procedures around GitLab EE for version control based on Git and code review, Atlassian JIRA for issue tracking, Jenkins for continuous integration and TWiki for documentation, that can integrate to each other and allow each working group to build up their own development environment. On the other hand, important software projects at CERN, such as LHC accelerator controls, CMS on-line and ATLAS off-line software rely on the legacy SVN service. This poses a challenge for the team to keep both environments and frameworks maintained and up to date in parallel.
        Speaker: Borja Aparicio Cotarelo (CERN)
    • 15:20 15:50
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 15:50 16:30
      End-user Services and Operating Systems Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 15:50
        Using ownCloud to provide mobile access to existing storage 20m
        We present our work deploying on ownCloud gateway to our existing home file storage to provided access to mobile clients.
        Speaker: Chris Brew (STFC - Rutherford Appleton Lab. (GB))
      • 16:10
        Self service kiosk for Mac and mobiles 20m
        CERN has recently deployed a Mac self service portal to allow users to easily select software and perform standard configuration steps. This talk will review the requirements, product selection and potential evolution for mobile device management.
        Speaker: Tim Bell (CERN)
    • 16:30 16:50
      IT Facilities and Business Continuity Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 16:30
        NERSC Plans for the Computational Research and Theory Building 20m
        The NERSC facility is transitioning to a new building housed at the LBNL main campus in the 2015 timeframe. This state of the art facility is energy efficient, providing year-round free air and water cooling, is initially provisioned for 12.5 MW power and capable of up to 42 MW power, has two office floors, a 20K square foot HPC floor with seismic isolation and a mechanical floor. Substantial completion was scheduled for May 2015. As more supercomputing facilities are investigating water cooling as an alternative, the Computational Research and Theory building (CRT) demonstrates the viability of energy efficiency using alternative environmental cooling and leverages the Bay Area’s climate to maintain temperature throughout the HPC floor and offices. The seismic isolation floor ensures that the systems are able to withstand earthquakes that are known to occur in this geographic area. The backup power system uses gasoline and can sustain energy to maintain the critical infrastructure of the facility up to 24 hours and can be refilled. Environmental data is collected throughout the facility to monitor not only temperature but also humidity, dust, electrical data from the power distribution units and sub panels. The move will occur through several months in order to maximize system availability, preserve users’ data, minimize the number of outages and minimize move costs.
        Speaker: Elizabeth Bautista (Lawrence Berkeley National Lab)
    • 16:50 17:10
      Miscellaneous Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 16:50
        Growing our own systems administrators 20m
        Introducing a non-graduate recruitment path in scientific computing at the Rutherford Appleton Laboratory. Recruiting and retaining high quality staff is an increasing challenge at STFC. We traditionally recruit people with relevant degrees and/or industry experience. But this becomes increasingly difficult, as does recruiting to our graduate recruitment program. At the same time steep rises in tuition fees mean that young people face the prospect of building many tens of thousands of pounds of debt to gain a degree. This creates an opportunity for a different approach. We describe the development of our new computing apprenticeship scheme which combines part time study toward s degree qualification with practical work experience in software development and systems administration.
        Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
    • 17:10 17:25
      Site Reports Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 17:10
        LAL Site Report 15m
        Changes at LAL and GRIF in the last 18 months.
        Speaker: Michel Jouvin (Laboratoire de l'Accelerateur Lineaire (FR))
    • 18:00 20:00
      Welcome Reception EastWind Long Island (Wading River, NY)

      EastWind Long Island

      Wading River, NY

    • 08:30 09:00
      Registration Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
    • 09:00 10:20
      Site Reports Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 09:00
        Jefferson Lab Scientific and High Performance Computing 20m
        Current high performance and experimental physics computing environment updates: core exchanges between USQCD and Experimental Physics clusters for load balancing, job efficiency, and 12GeV data challenges; Nvidia K80 GPU experiences and updated Intel MIC environment; update on locally developed workflow tools and write-through to tape cache filesystem; status of LTO6 integration into our MSS; ZFS on Linux production environment, and status of our Lustre 2.5 update.
        Speaker: Sandy Philpott
      • 09:20
        University of Wisconsin Madison CMS T2 site report 20m
        The University of Wisconsin Madison CMS T2 is a major WLCG/OSG T2 site. It has consistently delivered highly reliable and productive services for CMS MC production/processing, and large scale CMS physics analysis using high throughput computing (HTCondor), highly available storage system (Hadoop), efficient data access using xrootd/AAA, and scalable distributed software systems (CVMFS). The site is a member of the LHCONE community with a 100Gb WAN connectivity, and supports IPv6 networking. An update on the current status of and activities (since the last report at the Nebraska meeting) will be presented.
        Speaker: Ajit Kumar Mohapatra (University of Wisconsin (US))
      • 09:40
        Australia-ATLAS site report 20m
        Update on activities at Australia's HEP Tier 2 grid facility.
        Speaker: Lucien Philip Boland (University of Melbourne (AU))
      • 10:00
        NDGF Site Report 20m
        Update on recent events in the Nordic countries
        Speaker: Erik Mattias Wadenstein (University of Umeå (SE))
    • 10:20 10:50
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 10:50 12:30
      Site Reports Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 10:50
        Wigner Datacenter - Site report 20m
        We give an update on the infrastructure, Tier-0 hosting services, Wigner Cloud and other recent developments at the Wigner Datacenter. We will also include a short summary on the Budapest WLCG Tier-2 site status as well.
        Speaker: Szabolcs Hernath (Hungarian Academy of Sciences (HU))
      • 11:10
        Site report from KEK 20m
        KEK computing research center supports various project of accelerator based science in Japan. Hadron and Neutrino experiments (T2K) at J-PARC have started with good rate after the recovery of earthquake damage at Fukushima. Belle II experiment is going to collect 100PB of raw data within the several years. In this talk, we would like to report the current status of our computing facility and near future plan of resource provisioning for the KEK experiments. The current situation of international network for Japan and our preparation of Grid system will be also introduced.
        Speaker: Tomoaki Nakamura (High Energy Accelerator Research Organization (JP))
      • 11:30
        KIT Site Report 20m
        News about GridKa Tier-1 and other KIT IT projects and infrastructure.
        Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
      • 11:50
        CSC.fi - Site Report 20m
        - CSC - Ansible and HPC - Slurm as an interactive shell load balancer
        Speaker: Johan Henrik Guldmyr (Helsinki Institute of Physics (FI))
      • 12:10
        SLAC Scientific Computing Services 20m
        An update on SLAC's central Unix services in support of Scientific Computing and Core Infrastructure. New funding model for FY15 identifies indirect vs direct-funded effort. Socializing the concept of service and service lifecycle. Sustainable business models to address hardware lifecycle replacement: Storage-as-a-Service with GPFS, OpenStack for dev/test environments and cluster provisioning.
        Speaker: Yemi Adesanya
    • 12:30 14:00
      Lunch Break 1h 30m Berkner Hall Cafeteria

      Berkner Hall Cafeteria

      Brookhaven National Laboratory

    • 14:00 14:40
      Site Reports Bldg. 555 - Chemistry Department Hamilton Seminar room

      Bldg. 555 - Chemistry Department Hamilton Seminar room

      Brookhaven National Laboratory

      • 14:00
        DESY Site Report 20m
        Updates from DESY Zeuthen
        Speaker: Wolfgang Friebel (Deutsches Elektronen-Synchrotron Hamburg and Zeuthen (DE))
      • 14:20
        BNL Site Report 20m
        This site report will discuss the latest developments at the RHIC-ATLAS Computing Facility (RACF).
        Speaker: Dr Shigeki Misawa (Brookhaven National Laboratory)
    • 14:40 15:20
      Security and Networking Bldg. 555 - Chemistry Department Hamilton Seminar Room

      Bldg. 555 - Chemistry Department Hamilton Seminar Room

      Brookhaven National Laboratory

      • 14:40
        News from the HEPiX IPv6 Working Group 20m
        This talk will present a status update from the IPv6 working group, including recent testing and the deployment of (some) dual-stack services and monitoring in WLCG.
        Speaker: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
      • 15:00
        Status on the IPv6 OSG Software stack tests 20m
        This talk will present the latest results from the IPv6 compatibility tests performed on the OSG Software stack.
        Speaker: Edgar Fajardo Hernandez (Univ. of California San Diego (US))
    • 15:20 15:50
      Coffee Break 30m Bldg. 555 - Chemistry Department Hamilton Seminar Lounge

      Bldg. 555 - Chemistry Department Hamilton Seminar Lounge

      Brookhaven National Laboratory

    • 15:50 17:50
      Security and Networking Bldg. 555 - Chemistry Department Hamilton Seminar Room

      Bldg. 555 - Chemistry Department Hamilton Seminar Room

      Brookhaven National Laboratory

      • 15:50
        Network infrastructure for the CERN Datacentre 20m
        With the evolution of transmission technologies, going above 10Gb Ethernet requires a complete renewal of the fibre infrastructure. In the last year the CERN Datacentre has evolved to deal with the expansion of the physical infrastructure inside and outside the local site, to support higher speeds like 40GbE and 100GbE, to be ready for any other future requirement. We will explain the choice we made at CERN in terms of network infrastructure for our datacentre.
        Speaker: Eric Sallaz (CERN)
      • 16:10
        WLCG Network and Transfer Metrics WG After One Year 20m
        It has been approximately one year since the WLCG Network and Transfer Metrics working group was initiated and we would like provide a summary what has been achieved during this first year and discuss future activities planned for the group. The working group as chartered had a number of objectives: - Identify and make continuously available relevant network and transfer metrics - Document those metrics and their use - Facilitate their integration in the middleware and/or experimental tool chains - Coordinate commissioning and maintenance of WLCG network monitoring We will report on the status of these objectives and summarize the work that has been done in a number of areas, concluding with our vision for where the working group is likely to go in the future.
        Speaker: Shawn Mc Kee (University of Michigan (US))
      • 16:30
        Update on WLCG/OSG perfSONAR Infrastructure 20m
        WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The WLCG Network and Transfer Metrics working group was established to ensure sites and experiments can better understand and fix networking issues. In addition, it aims to integrate and combine all network-related monitoring data collected by the WLCG infrastructure from both network and transfer systems. This has been facilitated by the already existing network of the perfSONAR instances that is being commissioned to operate in full production. Recently, several higher level services were developed to help bring perfSONAR network to its full potential. This includes a Web-based mesh configuration system, which allows to centrally schedule and manage all the network tests performed by the instances; a network datastore (esmond), which collects, stores and provides interfaces to access all the network monitoring information from a single place as well as perfSONAR infrastructure monitoring, which ensures that the current perfSONAR instances are configured and operated correctly. In this presentation we will provide an update on how to use and benefit from perfSONAR, including new features that are included in the recent release of version 3.5 of the Toolkit. We will also cover the status of our WLCG/OSG deployment and provide some information on our future plans.
        Speaker: Shawn Mc Kee (University of Michigan (US))
      • 16:50
        Using VPLS for VM mobility 20m
        With the virtualization of the data centre, there is a need to move virtual machines transparently across racks when the physical servers are being decommissioned. We will present the solution being tested at CERN using VPLS in an MPLS network
        Speaker: Carles Kishimoto Bisbe (CERN)
      • 17:10
        Computer Security update 20m
        This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing. By showing how the attacks we are facing are both sophisticated and profitable, the presentation concludes that the only mean to adopt an appropriate response is to build a tight international collaboration and trusted information sharing mechanisms within the community. This talk is based on contributions and input from the CERN Computer Security Team.
        Speaker: Liviu Valsan (CERN)
      • 17:30
        Building a large scale Security Operations Centre 20m
        The HEP community is facing an ever increasing wave of computer security threats, with more and more recent attacks showing a very high level of complexity. Having a centralised Security Operations Centre (SOC) in place is paramount for the early detection and remediation of such threats. Key components and recommendations to build an appropriate monitoring and detection Security Operation Centre will be presented, as well as means to obtain and share relevant and accurate threat intelligence information. The presentation concludes that the key to achieve an appropriate response is to both build an efficient security infrastructure and a tight international collaboration, enabling information to be shared globally with trusted partners, and in particular between the various HEP sites.
        Speaker: Liviu Valsan (CERN)
    • 08:30 08:50
      Registration Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
    • 08:50 10:20
      Grid, Cloud and Virtualization Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 08:50
        Introduction to GDB 10m
        This presentation is a general introduction, and it will also describe the opportunities for cooperation with HEPIX and how sites can participate in GDB.
        Speaker: Michel Jouvin (Laboratoire de l'Accelerateur Lineaire (FR))
      • 09:00
        CERN Cloud Status 20m
        This presentation will provide an update of the activities in the CERN Cloud service since the Oxford workshop.
        Speaker: Arne Wiebalck (CERN)
      • 09:20
        Optimisations of the Compute Resources in the CERN Cloud Service 20m
        This talk will summarise our activities related to the optimisation of the virtualised compute resources in our OpenStack-based infrastructure. In particular, we will discuss some of the issues we've encountered and the various optimisations we've applied to bring the virtual resources as close as possible to bare-metal performance.
        Speaker: Arne Wiebalck (CERN)
      • 09:40
        Automated performance testing framework 20m
        In the CERN Cloud Computing project, there is a need to ensure that the overall performance of hypervisors and virtual machines does not decrease due to configuration changes, or just because of the passage of time. This talk will outline an automated performance framework currently being developed, which will allow performance of virtual machines and hypervisors to be graphed and linked.
        Speaker: Sean Crosby (University of Melbourne (AU))
      • 10:00
        An initial evaluation of Docker at the RACF 20m
        Application containers have become a competitive alternative to virtualized servers. Containers allow applications to be written once, distributed across a heterogeneous environment (ie, cloud, remote data centers) and executed transparently on multiple platforms without the performance overhead commonly found on virtual systems. We present an initial evaluation of Docker, along with a description of the testbeds and the sample applications used to simulate a distributed computing environment, and we examine the pros and cons of Docker containers in the RACF environment.
        Speaker: Christopher Hollowell (Brookhaven National Laboratory)
    • 10:20 10:50
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 10:50 12:30
      Grid, Cloud and Virtualization Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 10:50
        Benchmarking commercial cloud resources 20m
        Performance measurements and monitoring are essential for the efficient use of computing resources as they allow selecting and validating the most effective resources for a given processing workflow. In a commercial cloud environment an exhaustive resource profiling has additional benefits due to the intrinsic variability of a virtualised environment. In this context resource profiling via initial benchmarking quickly allows to identify issues and mitigate them. Ultimately it provides additional information to compare the presumed performance of invoiced resources and the actual delivered performance as perceived at the client level. In this report we will discuss the experience acquired in benchmarking commercial cloud resources during production activities such as the recent ATLAS Monte Carlo production in Helix Nebula cloud providers. The workflow put in place to collect and analyse performance metrics will be described. Results of the comparison study among commonly used benchmark metrics will be also reported. Those common benchmarks span from generic open-source benchmarks (encoding algorithm and kernel compilers) to experiment specific benchmarks (ATLAS KitValidation) and fast benchmarks based on random number generation.
        Speaker: Domenico Giordano (CERN)
      • 11:10
        A comparison of performance between KVM and Docker instances in OpenStack 20m
        Cloud computing enables a flexible provisioning of computing resources by utilizing virtual machines on demand, and can provide an elastic data analysis environment. At KEK we plan to integrate cloud-computing technology into our batch cluster system in order to provide heterogeneous clusters dynamically, to support various kinds of data analyses, and to enable elastic resource allocation among the various projects supporting at KEK. Container-based virtualization provides an alternative approach to virtual machines with a reduction in the overhead associated with the hardware and operating system resources on the host machine. We have evaluated the performance for virtual machines and containers in a cloud environment and have also investigated the performance of collocated instances on the same host machine. This talk will present a comparison of performance between KVM and Docker instances in OpenStack by using Rally.
        Speaker: Wataru Takase (KEK)
      • 11:30
        The Open Science Grid: Physics Experiments to Campus Researchers 20m
        The Open Science Grid (OSG) ties together computing resources from a broad set of research communities, connecting their resources to create a large, robust computing grid; this computing infrastructure started with large HEP experiments such as ATLAS, CDF, CMS, and Dzero and it has evolved so now it is also enabling the scientific computation of many non-physics researchers. OSG has been funded by the Department of Energy Office of Science and National Science Foundation since 2006 to meet the US LHC community's computational needs and support other science communities. The large physics experiments played a key role in evolving the job submission methods in OSG resulting in the implementation of pilot based architectures that improved the ease of job submission while improving site utilization. An important aspect of these overlay systems is the change in the trust relationship for sites which now depend more on the science communities (i.e. Virtual Organizations) instead of the individual researcher. The OSG has further extended this trust model to remove many dependencies on PKI certificates without reducing security and job traceability; today, many researchers use OSG without having a PKI certificate. OSG continues to evolve these access methods as part of its drive to improve the access to Distributed High Through-put Computing (DHTC) for US researchers and increase delivery of DHTC to users from a wide variety of disciplines at campus and research institutions. Presently, the OSG Open Facility delivers over 150 million computing wall hours annually to researchers who are not already owners of the OSG computing sites; this is primarily accomplished by harvesting and organizing the temporarily unused capacity (i.e. opportunistic cycles) from the various sites in the OSG. We present an overview of the infrastructure developed to accomplish this from access methods to harvesting opportunistic resources to providing user support to a diverse set of researchers. In addition, we discuss the architectures that have been developed for small research communities and individual researchers to use the OSG for science and to share their own compute clusters with the broader OSG community. We believe that expanded access to DHTC is an essential tool for scientific innovation and OSG continues to expand these services for researchers from all science domains.
        Speaker: Chander Sehgal (Fermilab)
      • 11:50
        Improving IaaS resources to accommodate scientific applications 20m
        INDIGO-DataCloud aims at developing a data and computing platform targeted at scientific communities, integrating existing standards and open software solutions. INDIGO proposes: - to build-up a PaaS solution leveraging existing resources and e-Infrastructures, since the mere access to IaaS resources has been demonstrated as not being a realistic option for most Research Communities - to improve the virtualization layer provided by already diffed IaaS In order to build such a Scientific oriented PaaS, several improvements are needed at the IaaS layer. However, the approach of the project is to ensure that those improvements are beneficial enough by their own and not focused on just enabling the upper PaaS layer. Therefore, we find that they can be interesting for the HEP community and for the WLCG in particular. In this presentation we will try to give an overview of the developments that we consider of interest for the HEPiX community, namely: - Container support in Cloud stacks: INDIGO will incorporate specific drivers into the most used cloud management frameworks (i.e. OpenNebula and OpenStack) to support the deployment and execution of containers as first-class resources on the IaaS. - Scheduling improvements: Currently scheduling in most cloud middlewares is based in first-come, first-served policies. The project will develop new intra-cloud scheduling algorithms that are more appropriate for scientific scenarios. Namely we foresee two different and complimentary approaches: a fair-share based scheduling with a queuing mechanism similar to the current batch scheduling and a spot-instances based approach for opportunistic usage. - IaaS orchestration trough standards: We will provide IaaS orchestration using a common standard language (TOSCA) in both OpenStack and OpenNebula, easing the deployment and management of (computing, storage and network) resources at the IaaS layer. - Container integration in batch systems: We will study the possibility of executing containers through batch systems, with the aim of providing the access to specialized hardware (such as Infiniband networks and GPGPUs).
        Speaker: Andrea Chierici (INFN-CNAF)
      • 12:10
        First experiences with Mesos at RAL 20m
        Running services in containers managed by a scheduler offers a number of potential benefits compared to traditional infrastructure, such as increased resource utilisation through multi-tenancy, the ability to have elastic services, and improved site availability due to self-healing. At the RAL Tier-1 we have been investigating migration of services to an Apache Mesos cluster running on bare metal. In this model the whole concept of individual machines is abstracted away and services are run on the cluster in ephemeral Docker containers. This talk will describe the setup of our test Mesos infrastructure, including how we deal with service discovery, monitoring, and logging in such a dynamic environment. The challenges faced running grid middleware in this way and future plans will be discussed.
        Speaker: Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))
    • 12:30 14:00
      Lunch Break 1h 30m Berkner Hall cafeteria

      Berkner Hall cafeteria

      Brookhaven National Laboratory

    • 14:00 15:00
      Grid, Cloud and Virtualization Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 14:00
        HTCondor-CE: Managing the Grid with HTCondor 20m
        HTCondor-CE is a special configuration of HTCondor designed to connect compute resources to the wider grid. Leveraging the power of HTCondor, HTCondor-CE is able to provide built-in security measures, end-to-end job tracking, and better integration with overlay job systems. This talk will present an overview of the HTCondor-CE software, its deployment in the Open Science Grid (OSG), and upcoming development plans.
        Speaker: Brian Lin (University of Wisconsin - Madison)
      • 14:20
        Running ATLAS, CMS, ALICE Workloads on the NERSC Cray XC30 20m
        This presentation will describe work that has been done to make NERSC Cray systems friendlier to data intensive workflows in anticipation of the availability of the NERSC-8 system this Autumn. Using shifter, a docker-like container technology developed at NERSC by Doug Jacobsen and Shane Canon, the process of delivering software stacks to Cray compute nodes has been greatly simplified. Some detail will be given to describe the method used to deliver CVMFS to the Cray compute nodes using shifter. We found unexpected increases in metadata performance though the use of image file systems. Results of tests with ATLAS G4 simulations and Analysis Software on the Cray XC30 system, edison, will be presented.
        Speaker: James Botts (LBNL)
      • 14:40
        computing resources virtualization in IHEP 20m
        The report will introduce our plan of computing resources virtualization. We will discuss progress of this project such as Openstack-based infrastructure, virtual scheduling and measure. And we will share experience of our test-bed which deployed with Openstack icehouse.
        Speaker: Dr Qiulan Huang (Institute of High Energy of Physics Chinese Academy of Science)
    • 15:00 15:20
      Storage and Filesystems Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 15:00
        CvmFS deployment status and trends 20m
        CvmFS is a network file system based on HTTP and optimised to deliver experiment software in a fast, scalable, and reliable way. This presentation will review the status of the stratum 0 deployment at CERN, mentioning some of the challenges faced during its migration to ZFS as the underlying file system.
        Speaker: Alberto Rodriguez Peon (Universidad de Oviedo (ES))
    • 15:20 15:50
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 15:50 17:30
      Storage and Filesystems Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 15:50
        Future home directory at CERN 20m
        Several discussions are ongoing at CERN concerning the future of AFS and a possible replacement using the existing CERNBOX service based on OwnCloud and the CERN disk storage solution developed for the LHC Computing (EOS). The talk will mainly review the future plans for teh CERNBOX Services ant the various use case that have been identified.
        Speaker: Alberto Pace (CERN)
      • 16:10
        Ceph Based Storage Systems at the RACF 20m
        We give a report on the status of Ceph based storage systems deployed in RACF that are currently providing 1 PB of data storage capacity for the object store (with Amazon S3 compliant RADOS Gateway front end), block storage (RBD), and shared file system (CephFS with dCache front end) layers of Ceph storage system. The hardware deployment and software upgrade procedures performed in the year of 2015, and the resulting improvements of our Ceph clusters performance and functionality are also summarized, and the usage statistics for the main clients of our Ceph clusters, including the ATLAS Event Service, is shown. The future prospects for the Ceph based storage systems in RACF are also discussed.
        Speaker: Alexandr Zaytsev (Brookhaven National Laboratory (US))
      • 16:30
        Ceph object storage at RAL 20m
        RAL is currently developing storage services powered by a Ceph object storage system. We review the test results and experiences of the newly-installed 5 PB cluster at RAL, as well as our plans for it. Since the aim is to provide large scale storage for experimental data with minimal space overhead, we focus on testing a variety of erasure coding techniques and schemes. We look at functionality and performance testing of XrootD and GridFTP servers that have been adapted to use a Ceph backend, as well as, RADOS gateways providing an S3-compatible interface. We also look at the effects of erasure coding on the stability and resiliency of a highly distributed system at this scale.
        Speaker: George Vasilakakos (STFC)
      • 16:50
        Scientific Data Storage at FNAL 20m
        Fermilab stores more than 110PB of data employing different technologies (dCache, EOS, BlueArc) to address a wide variety of use cases and application domains. This presentation captures present state of data storage at Fermilab and maps out future directions in storage technology choices at the lab.
        Speaker: Gerard Bernabeu Altayo (Fermilab)
      • 17:10
        Accelerating High Performance Cluster Computing Through the Reduction of File System Latency 20m
        The acceleration of high performance computing applications in large clusters has primarily been achieved with a focus on the cluster itself. Lower latency interconnects, more efficient message passing structures, higher performance processors, and general purpose graphics processing units have been incorporated in recent cluster designs. There has also been a great deal of study regarding processing techniques such as symmetric multi-processing versus efficient message passing to accomplish true parallel processing. There has been, however, only incremental changes in parallel file system technology. Clusters perform input/output operations through gateway servers and a file creation infers locking operations in all parallel file systems. In fact, a file creation is a serial process which locks and assigns V-nodes, I-nodes and extent lists through one server to complete the operation. For years, web users have explored parallel methods of moving data to get around network connection limitations. Applications such as Napster and Bit Torrent have used the technology of a Distributed Hash Table to effectively allow true parallel file operations where “pieces” can be placed to, or gathered from, a number of service nodes arranged in a redundant fashion. This paper will explore the use of a Distributed Hash Table technology to service the data needs of a large scale cluster allowing the same parallelism in data mobility as is assumed in processing. This new paradigm will displace the concept of the gateway server and will allow data intensive operations in a “non-blocking” construct.
        Speaker: Mr David Fellinger (DataDirect Networks, Inc)
    • 18:30 21:30
      Conference Dinner Lombardi's on the Bay (Patchogue, NY)

      Lombardi's on the Bay

      Patchogue, NY

    • 09:00 10:20
      Storage and Filesystems Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 09:00
        Supervised Machine Learning 20m
        Given the amount of inline data generated, large volume hard-drive manufacturing is an appropriate environment to employ contemporary 'Big Data' techniques. It can be used to generate a feedback loop, as well as a feed-forward path. The feedback loop helps fix what is broken. The feed-forward path can be used to predict drive health. In this presentation, I will focus on some work we have started at Western Digital - using Supervised Machine Learning techniques to predict the health of a hard drive and develop a hierarchy of "health grades". As the technique matures, this could fundamentally influence the manufacturing environment as well as WD's customer engagement.
        Speaker: Mr Amit Chattopadhyay (WD)
        Slides
      • 09:20
        Quality of Service in storage and the INDIGO-DataCloud project. 20m
        The pressure to provide cheap, reliable and unlimited cloud storage space in the commercial area has provided science with affordable storage hardware and open source storage solutions with low maintenance costs and tuneable performance and durability properties, resulting in different cost models per storage unit. Those models, already introduced by WLCG a decade ago (disk vs tape) are now offered by large companies like Amazon (Bock Storage, S3 and Glacier) or Google (Standard, Durable Reduced, Cloud Storage Nearline). Industry appliances or software stacks (e.g. HPSS) offer similar storage properties for your locally installed storage. However, other than with SRM for WLCG, those offered storage quality properties don’t follow a common description or specification and are hard to compare programatically. Moreover they aren’t even close to a common way of been negotiated between the requesting client and the providing storage technology, which would be a prerequisite for federating different public and private storage services. To fill this gap, the INDIGO-DataCloud project is proposing a process to agree on common semantics in describing QoS attributes in storage in a consistent way, independently of the used API or protocol. The process involves gathering uses-cases from scientific communities and creating working groups in international organisations, like RDA, OGF and SNIA to further discuss possible solutions with other interested parties like EUDAT and EGI. In a second step, based on feedback received, INDIGO will propose an implementation of the defined semantics to steer quality of service in storage as an extension to an existing industry standard, e.g. CDMI. As a proof of concept INDIGO will implement the proposed solution in storage systems used within the INDIGO project, like dCache, StoRM and some typical industry products, as a reference for other systems. We are presenting our work at HEPIX, in order to receive feedback from HEP communities on our plans and because we regard the outcome of our work as beneficial for HEP in general and for WLCG experiments in particular, as with the proposed end of the usage of SRM in WLCG, experiments are again left alone with the necessary steering of data quality of service attributes (e.g. disk, tape or both) at the different storage endpoints. This presentation will report on the goals achieved so far and on our next steps.
        Speaker: Patrick Fuhrmann (Deutsches Elektronen-Synchrotron Hamburg and Zeuthen (DE))
      • 09:40
        USATLAS dCache storage system at BNL 20m
        As ATLAS tier-1 computing facility center in US, RHIC and ATLAS Computing Facility (RACF) at Brookhaven National Lab has been operating a very large scale disk storage system dCache with tape back-end to serve a geographically diverse, worldwide ATLAS scientific community. This talk will present the current state of USATLAS dCache storage system at BNL. It will describe structure, configuration, usage, data replication, tape-backend, client access, monitoring methods, and etc. Our near-future plan of reconfiguration from symmetric replication to asymmetric replication will also be discussed.
        Speaker: Zhenping Liu (Brookhaven National Laboratory (US))
      • 10:00
        Space usage monitoring for distributed heterogeneous data storage systems. 20m
        Prior to LHC Run 2 CMS collected over 100 PB of physics data on the distributed storage facilities outside CERN, and the storage capacity will considerably increase in the next years. During the Run 2 the amount of storage allocated to individual users analysis data will reach up to 40% of the total space pledged by the CMS sites. CMS Space Monitoring system is developed to give a comprehensive view of storage usage across distributed CMS sites, including centrally managed experiment data and individual users data not accounted centrally, as one common namespace.
        We discuss general architecture and the components of the system, as well as the challenges we met in the process of deploying and operating the system at CMS Tier-1 and Tier-2 sites.
        Speaker: Natalia Ratnikova (Fermilab)
    • 10:20 10:40
      IT Facilities and Business Continuity Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 10:20
        The HP IT Data Center Transformation Journey 20m
        HP IT consolidated from 85 ancient data centers running old IT technology to six new mega data centers with modern IT running transformed applications in the late 2000’s. That achievement resulted in (literally) billions saved through a meshing of significantly more efficient data centers, utilization of then-current IT technology, and application transformation. Since achieving that, HP IT has set and achieved higher goals several times, including deploying ultra-efficient modular data centers with containerized infrastructure, cloud and low-power solutions, and further application modernization. The author, the strategist and technology for HP IT data centers globally, will discuss how by executing a plan that encompasses the data center facility, modern IT equipment, and application modernization together, the total benefits can be further increased over the significant benefits from individually addressing each, using examples from HP IT’s own transformation.
        Speaker: Dave Rotheroe (HP)
    • 10:40 11:10
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 11:10 12:30
      IT Facilities and Business Continuity Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 11:10
        Proposal for a new Data Center at BNL 20m
        The methods and techniques of scientific research at Brookhaven National Laboratory are increasingly dependent upon the ability to acquire, analyze, and store vast quantities of data. The needs for data processing equipment and supporting infrastructure are anticipated to grow significantly in the next few years, soon exceeding the capacity of existing data center resources, currently located in Building 515. The Core Facility Revitalization – Phase 1 (CFR-1) project anticipates the partial renovation and revitalization of Building 725 (NSLS-I) for use as a new state of the art computing facility which will house HPC equipment for RHIC-ATLAS, NSLS-II, CFN and CSC and also provide expansion space for future science programs. This presentation summarizes the technical discussions held so far and the status of the project.
        Speaker: Mr Imran Latif (Brookhaven National Laboratory)
      • 11:30
        Asset management in CERN data centres 20m
        This presentation will give an overview of the recent efforts to establish an accurate and consistent inventory and stock management for CERN data centre assets. The underlying tool, Infor EAM (http://www.infor.com/solutions/eam/), was selected because of its wider usage over many years in other areas at CERN. The presentation will focus on the structuring of the IT assets data and how it is prepared for the integration with the spare parts stock management. The talk will also summarize the experience and some lessons learnt from the long and manual process for achieving an accurate recording of the legacy assets. Finally the integration with an on-going undertaking of global assets management at CERN will be described.
        Speaker: Eric Bonfillou (CERN)
      • 11:50
        Energy efficiency upgrades at PIC 20m
        Energy consumption is an increasing concern for data centers. This contribution summarizes recent energy efficiency upgrades at the Port d’Informació Científica (PIC) in Barcelona, Spain which have considerably lowered energy consumption. The upgrades were particularly challenging, as they involved modifying the already existing machine room, which is shared by PIC with the general IT services of the local University (Universitat Autònoma de Barcelona, UAB), with all the services in full operation, as well as the introduction of “free cooling” techniques in a location 20 km from the Mediterranean Sea.
        Speaker: Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
      • 12:10
        Energy Services Performance Contracting Abstract - HEPIX Fall 2015 20m
        In today’s Federal Information Technology world you are face with many challenges. To name a few Cyber-attacks, un-funded mandates to consolidate data centers, new IT acquisitions laws requiring de-duplication with agency level CIO oversight implemented by the Federal Information Technology Acquisition Reform Act (FITARA), aging equipment and data center infrastructure. Each of these challenges requires, in many cases a significant dollar investment agencies unfortunately do not have. There is a potential answer that can provide the funding to meet these challenges. The Energy Savings Performance Contract administered by Department of Energy and the US Army Corp of Engineers allows agencies to payback the cost of traditional building infrastructure and now IT projects that results in energy savings. Simply put, as energy savings are realized these dollar savings are used to pay back the cost of the project. The holders of the ESPC or Energy Service Companies (ESCO) provide the up-front dollars and are paid back through these savings. Using the ESPC is a win…win…win solution. It allows for the creation of jobs, no appropriations required, and once the project is paid off the savings are realized by the federal agency. We hope you attend HEPIX Fall 2015 to learn more about using the ESPC.
        Speaker: Mr Michael Ross (HP)
    • 12:30 14:00
      Lunch Break 1h 30m Berkner Hall cafeteria

      Berkner Hall cafeteria

      Brookhaven National Laboratory

    • 14:00 15:20
      Basic IT Services Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 14:00
        Document oriented database infrastructure for monitoring HEP data systems applications 20m
        Within the High Energy Physics (HEP) software infrastructure a diverse Data storage and Distribution software technologies are used. Despite their heterogeneity they all provide a capability to trace application event data to be able to troubleshoot problems related to the software stack or software usage. The subsequent data is written once and stored in the majority of cases in a file, commonly known as log file, for posterior analysis. An open source infrastructure to collect, store in a document oriented type database, and visualize data processed from the application log files is explored in this presentation. Experience in deploying the ELK (Elasticsearch, Logstash and Kibana) framework applied to a data storage technology is summarized.
        Speaker: Carlos Fernando Gamboa (Brookhaven National Laboratory (US))
      • 14:20
        ELK at NERSC 20m
        The ELK (Elastic, Logstash, Kibana) stack has been chosen to be one of the key component for our new centerwide monitoring project. I'll discuss our overall philosophy on monitoring and how ELK fits in. The current structure and how it is performing. Define centerwide: Everything in the data centers. All hosts, filesystems, most applications, power, cooling, water flow, temperature, particulate counts, starting from the substation and cooling tower to the node.
        Speaker: Cary Whitney (LBNL)
      • 14:40
        CERN Monitoring Status Update 20m
        This presentation will provide an update of the activities concerning IT Monitoring. This includes the monitoring of data centre infrastructure, hardware monitoring, host monitoring and application monitoring; as well as the tools being used or tested.
        Speaker: Miguel Coelho dos Santos (CERN)
      • 15:00
        Update on Configuration Management at CERN 20m
        An update on CERN’s Configuration Service will be presented. This presentation will review the current status of the infrastructure and describe some of the ongoing work and future plans, with a particular focus on automation and continuous integration. Recent effort to scale and accommodate a higher number of puppet clients will also be mentioned.
        Speaker: Alberto Rodriguez Peon (Universidad de Oviedo (ES))
    • 15:20 15:50
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 15:50 16:30
      Basic IT Services Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 15:50
        ITIL Service models in Detector DAQ Computing 20m
        Fermilab has moved from the era of two large multi-decade experiments to hosting several smaller experiments with a shorter lifecycle. Improvements in micrcontroller performance have moved computers closer to the experiment Data Acquisition systems where custom electronics have previously been used. There are also efforts to standardize DAQ software into reuseable products in alignment with offline analysis tools. Nearly all DAQ systems in Fermilab experiments contain Linux computers, with or without special hardware, to collect data, build events, generate online monitoring and ship data to offline. The challenge is to productionize DAQ computers (or test stands) that have often been set up by grad students or post docs and have not always been subject to rigorous best practices. The Experiment Computing Facilities/Scientific Linux Architecture and Management group has had success in onboarding experiment DAQ computer systems to its Experiment Online service, which has reduced duplicate effort, improved resiliency, and introduced industry best practices. Legacy, running and future experiments have adopted our services, all of them presenting different challenges. Additionally, these systems require sensitivity to users' needs for elevated access and agility in development. We discuss our successes and lessons learned in standing up the Experiment Online service, efficiencies gained from the ITIL service model, and upcoming innovations for Linux-based DAQ computers.
        Speaker: Bonnie King (Fermilab)
      • 16:10
        Quattor Update 20m
        An update of developments and activities in the Quattor community over the last six months.
        Speaker: James Adams (STFC RAL)
    • 16:30 17:30
      Computing and Batch Services Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 16:30
        NVM Express (NVMe): An Overview and Performance Study 20m
        NVMe (Non-Volatile Memory express) is a leading-edge SSD technology where drives are directly attached to the PCI-e bus. Typical SAS/SATA controllers are optimized for use with traditional rotating hard drives, and as such can increase latency, and reduce the bandwidth available to attached SSDs. Since NVMe drives bypass a SAS/SATA controller, they can help minimize/eliminate performance reductions often seen when using these controllers with solid state storage. NVMe may become a key building block in future applications that require high-performance disk storage. This presentation describes the testbeds used to evaluate this technology in the context of a data center with a heterogeneous user community and a variety of applications, and summarizes performance and the pros and cons in the RACF environment.
        Speaker: Christopher Hollowell (Brookhaven National Laboratory)
      • 16:50
        Status of DESY Batch Infrastructures 20m
        This presentation will provide information on the status of the batch systems at DESY Hamburg. This includes the clusters for GRID, HPC and local batch purposes showing the current state and the activities for upcoming enhancements.
        Speaker: Thomas Finnern (DESY)
      • 17:10
        Update on benchmarking 20m
        Status of the benchmarking working group and work going on in WLCG around benchmarking
        Speakers: Helge Meinhard (CERN), Michele Michelotto (Universita e INFN, Padova (IT))
        Slides
    • 17:40 19:10
      HEPIX Board Meeting Bldg. 510 - Physics Department room 3-192

      Bldg. 510 - Physics Department room 3-192

      Brookhaven National Laboratory

    • 09:00 10:20
      Computing and Batch Services Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 09:00
        An Easy HTCondor Configuration to Support all Workloads (for some definition of all) 20m
        Scheduling jobs with heterogeneous requirements to a heterogeneous pool of computers is a challenging task. HTCondor does a great job supporting such a general-purpose setup with features like Hierarchical Group Quotas and Partitionable Slots. At BNL we have a model, configuration, and software to handle the administration of such a pool, and in this talk we will share our experience building and running this setup along with the software we've used to make it easy to administer.
        Speakers: William Edward Strecker-Kellogg (Brookhaven National Laboratory (US)), William Strecker-Kellogg (Brookhaven National Lab)
      • 09:20
        Upgrade to UGE 8.2 : Positives effects at IN2P3-CC 20m
        We are using Univa Grid Engine as BATCH scheduling system to our satisfaction since four years. We focus on the latest major version 8.2.1, which was deployed at IN2P3-CC 4 months ago, and provides further scalability improvements. We are supporting about 200 groups and experiments running up to 17.000 jobs simultaneously. The requirements, in terms of computing resources, storage or network, differ widely from one group to another, and therefore the batch system must be particularly scalable and reliable which is what we get with this release. The new read-only threads work independently from the SGE qmaster which removes the load previously induced by serving status requests. In addition, this version introduces a way to limit user requests which avoids system overloads, it comes also with new information for job accounting and uses a 32-bit range for job ID’s. In this talk we will present the assessment of this upgrade and give an overview of the issues being resolved according to the operation of the service. Finally, we will show the plans for the deployment of new features and the roadmap of Univa Grid Engine at the IN2P3-CC for the next few months.
        Speaker: Vanessa HAMAR (CC-IN2P3)
        Slides
      • 09:40
        HTCondor Recent Enhancement and Future Directions 20m
        The goal of the HTCondor team is to to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Increasingly, the work performed by the HTCondor developers is being driven by its partnership with the High Energy Physics (HEP) community. This talk will present recent changes and enhancements to HTCondor, including interactions with Docker and public cloud services. It will also discuss the upcoming HTCondor development roadmap, and seek to solicit feedback on the roadmap from the HEP community.
        Speaker: Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)
      • 10:00
        Non-traditional workloads at the RACF 20m
        The RACF is a key component in BNL's new Computational Science Initiative (CSI). One of CSI's goals is to leverage the RACF's expertise to shorten the time and effort needed to archive, process and analyze data from non-traditional fields at BNL. This presentation describes a concrete example of how the RACF has helped non-traditional workloads run in the RACF computing environment, and how this helps establish a roadmap for future collaborative efforts.
        Speakers: William Edward Strecker-Kellogg (Brookhaven National Laboratory (US)), William Strecker-Kellogg (Brookhaven National Lab)
    • 10:20 10:50
      Coffee Break 30m Bldg. 510 - Physics Department Seminar Lounge

      Bldg. 510 - Physics Department Seminar Lounge

      Brookhaven National Laboratory

    • 10:50 11:10
      Computing and Batch Services Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 10:50
        Future of Batch Processing at CERN 20m
        So as to have our Batch Service at CERN answer increasingly challenging scalability and flexibility needs, we have chosen to set up a new batch system based on HTCondor. We have set up a Grid-only pilot service and major LHC experiments have started trying it out. While the pilot is slowly becoming production-ready, we're laying out a plan for our next major milestone: to run local jobs too, providing all the service dependencies they need.
        Speaker: Jerome Belleman (CERN)
    • 11:10 12:30
      Basic IT Services Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973
      • 11:10
        Foreman - A full automation solution for a growing facility 20m
        In a rapidly growing facility as NSLS-II, we use foreman as an automation tool that integrates to DNS, DHCP, TFTP, Puppet which makes installation & provisioning processes much easier and help to bring the service/server components online in a short timely manner. For those who use Puppet Enterprise as an paid version ENC, Foreman can also substitute of that. This talk will present the detail of the foreman architecture in NSLS-II and the work flow in a real computing facility.
        Speaker: Mizuki Karasawa (BNL)
      • 11:30
        Gitlab and its CI - an intelligent solution for hosting Git repos 20m
        Gitlab - a MIT licensed open source tool, that cooperates a set of rich features managing Git repositories, code reviews, issue tracking, activity feeds and wikis. The most powerful feature - CI for continuous integration makes the code developing much more efficient and cost saving, it's also a great tool to enhance the communication and collaboration. In NSLS-II, we have a great number of developers, currently hosting more than 300+ repos in mercurial, in a direction moving towards new git repo hosting solutions such as Gitlab, this presentation will discuss the details of Gitlab and the architecture view of Gitlab in NSLS-II.
        Speaker: Mizuki Karasawa (BNL)
      • 11:50
        Host deployment and configuration technologies at SCC. 20m
        Host deployment and configuration technologies at SCC. The Steinbuch Centre for Computing(SCC) at Karlsruhe Institute Of Technology (KIT) serves a number of projects, including the WLCG Tier-1 GridKa, the Large Scale Data Facility (LSDF), and the Smart Data Innovation Lab (SDIL) using bare metal and virtual compute resources and provides a variety of storage and computing services to the end users. To take control of the provisioning and configuration process, a unified configuration management and deployment infrastructure was set up, based on a stack of open source solutions like Puppet, Foreman, GitLab, GitLab-CI and others. Because each project has its specific use cases, the infrastructure is build to be customizable. Underlying workflows and software setups ensure that they can be reused and are as uniform as possible. Workflows to provision, setup and manage hosts and services are defined. Users are able to access infrastructure using existing configuration templates, or provide custom solutions, sharing them with other projects. To achieve control over changes everything is managed with GIT/GitLab and Continues Integration(CI), so that all production and testing services can be rolled back in case of failures. This talk describes the evaluation of workflows and gives an overview of the components and their connections within the infrastructure.
        Speaker: Dmitry Nilsen
      • 12:10
        Monitoring with InfluxDB and Grafana 20m
        At RAL we have been considering InfluxDB and Grafana as a possible replacement for Ganglia, in particular for application-specific metrics. Here we present our experiences with setting up monitoring for services such as Ceph, FTS3 and HTCondor, and discuss the advantages and disadvantages of InfluxDB and Grafana over Ganglia.
        Speaker: Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))
    • 12:30 13:00
      Closing and HEPIX Business Bldg. 510 - Physics Department Large Seminar Room

      Bldg. 510 - Physics Department Large Seminar Room

      Brookhaven National Laboratory

      Upton, NY 11973