HEPiX Fall 2016 Workshop

Name: HEPiX Fall 2016 Workshop
Start: 2016-10-17T08:00:00-07:00
End: 2016-10-21T14:30:00-07:00
Location: LBNL

17 Oct 2016, 08:00 → 21 Oct 2016, 14:30 US/Pacific

Building 50 Auditorium (LBNL)

Building 50 Auditorium

LBNL

Berkeley, CA 94720

Helge Meinhard (CERN), Tony Wong (Brookhaven National Laboratory)

Description

HEPiX Fall 2016 at Lawrence Berkeley National Laboratory, Berkeley, CA, USA

LBNL view

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, and many others.

HEPiX Fall 2016 is proudly sponsored by Seagate at the platinum level and Intel and Penguin Computing at the silver level.

Platinum

Silver

Intel Logo

Local organisers

hepix2016@lbl.gov

001-510-486-7612

Monday 17 October
- 08:30 → 09:00
  
  Registration Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 09:00 → 09:55
  Miscellaneous Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 09:00
    
    Logistics & Safety Announcement 10m
    
    Speakers: Helge Meinhard (CERN), Tony Wong (Brookhaven National Laboratory)
  - 09:10
    
    Welcome To NERSC/LBNL 15m
    
    Sudip Dosanjh, NERSC
    
    NERSC-Overview-HiPEX.pdf
  - 09:25
    
    Plans to Support Data-Intensive Computing on the NERSC 8 System 30m
    
    NERSC-Overview-HiPEX.pdf
- 09:55 → 10:40
  Site Report Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 09:55
    
    JLab Scientific and High Performance Computing 15m
    
    JLab high performance and experimental physics computing environment updates since the spring 2016 meeting, including recent hardware installs of KNL and Broadwell compute clusters, Supermicro storage; our Lustre Intel upgrade status; 12GeV computing updates; and Data Center modernization progress.
    
    Speaker: Sandy Philpott
    
    HEPiX_LBNL16_JLabSiteRpt.pdf
  - 10:10
    
    BNL Site Report 15m
    
    The site report contains the latest news and updates on
    computing at BNL.
    
    Speaker: William Strecker-Kellogg (Brookhaven National Lab)
    
    bnl-hepix-site-report-fall-2016.pdf
    
    bnl-hepix-site-report-fall-2016.pptx
  - 10:25
    
    TRIUMF Site Report 15m
    
    Updates on the status of the Canadian Tier-1 and other TRIUMF computing news will be presented.
    
    Speaker: Denice Deatrich
    
    TRIUMF-sitereport.pdf
- 10:40 → 11:10
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 11:10 → 12:40
  Site Report Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 11:10
    
    AGLT2 Site Update 15m
    
    We will present an update on our site since the Spring 2016 report, covering our changes in software, tools and operations.
    
    We will also report on our recent significant hardware purchases during summer 2016 and the impact it is having on our site.
    
    We conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    AGLT2SiteReport-HEPiXFall2016.pdf
    
    AGLT2SiteReport-HEPiXFall2016.pptx
  - 11:25
    
    University of Nebraska CMS T2 Site Report 15m
    
    Updates from T2_US_Nebraska covering our experiences operating CentOS 7 + Docker/SL6 worker nodes, banishing SRM in favor of LVS balanced GridFTP, and some attempts at smashing OpenFlow + GridFTP + ONOS together to live the SDN dream.
    
    Speaker: Garhan Attebury (University of Nebraska-Lincoln (US))
    
    HEPiX Fall 2016 T2_US_Nebraska.pdf
  - 11:40
    
    University of Wisconsin-Madison CMS T2 site report 15m
    
    As a major WLCG/OSG T2 site, the University of Wisconsin-Madison CMS T2 has consistently been delivering highly reliable and productive services towards large scale CMS MC production/processing, data storage, and physics analysis for last 10 years. The site utilises high throughput computing (HTCondor), highly available storage system (Hadoop), scalable distributed software systems (CVMFS), and provides efficient data access using xrootd/AAA. The site fully supports IPv6 networking and is a member of the LHCONE community with 100Gb WAN connectivity. An update on the activities and developments at the T2 facility over the last year (since the BNL meeting) will be presented.
    
    Speaker: Ajit Mohapatra (University of Wisconsin-Madison (US))
    
    hepix_LBL2016_T2Wisc.pdf
  - 11:55
    
    Status of IHEP Site 15m
    
    This talk will give a brief introduction to the status of computing center IHEP, CAS, including local cluster, Grid Tier2 site for Atlas and CMS, file and storage system, cloud infrastructure, planned HPC system, Internet and domestic network.
    
    Speaker: Yaodong Cheng (IHEP)
    
    IHEP_site_report_2016_Fall.pdf
    
    IHEP_site_report_2016_Fall.pptx
  - 12:10
    
    KEK Site Report 15m
    
    The new KEK Central Computer system started the service on September 1st, 2016 after the renewal of all hardware. In this talk, we would like to introduce the performance of the new system and improvement of network connectivity with LHCONE.
    
    Speaker: Tomoaki Nakamura (KEK)
    
    2016-10-17_TomoakiNakamura.pdf
  - 12:25
    
    Fermilab Site Report 15m
    
    News and updates from Fermilab.
    
    Speaker: Rennie Scott (Fermilab)
    
    Fermilab - Fall 2016 - HEPiX Presentation.pdf
    
    Fermilab - Fall 2016 - HEPiX Presentation.pptx
- 12:40 → 14:00
  
  Lunch Break 1h 20m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 14:00 → 14:30
  Site Report Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 14:00
    
    Tokyo Tier-2 Site Report 15m
    
    The Tokyo Tier-2 site, which is located in International Center for Elementary Particle Physics (ICEPP)
    at the University of Tokyo, is providing resources for the ATLAS experiment in WLCG. In December 2015,
    almost all hardware devices were replaced as the 4th system. Operation experiences with the new system
    and ??a migration plan from CREAM-CE + Troque/Maui to ARC-CE + HTCondor will be reported.
    
    Speaker: Tomoe Kishimoto (University of Tokyo (JP))
    
    2016_10_18.pdf
  - 14:15
    
    Australia-ATLAS Site report 15m
    
    Will provide updates on technical and managerial changes to Australia's only HEP grid computing site.
    
    Speaker: Lucien Philip Boland (University of Melbourne (AU))
    
    Lucien_Boland_Australia_site_report_LBNL_HEPIX_2016.pdf
    
    Lucien_Boland_Australia_site_report_LBNL_HEPIX_2016.pptx
- 14:30 → 15:20
  End-User IT Services & Operating Systems Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 14:30
    
    Scientific Linux Status Update 25m
    
    Scientific Linux status and news.
    
    Speaker: Rennie Scott (Fermilab)
    
    sl_status_fall2016 - Final.pdf
    
    sl_status_fall2016 - Final.pptx
  - 14:55
    
    An e-mail quarantine with open source software 25m
    
    Filtering e-mails for security reasons is a common procedure. At DESY e-mails with suspicious content are quarantained, users are notified and may request delivery of those e-mails. DESY is in the process of shifting from a commercial product to a quarantine solution made of open source and self-made software. This solution will be presented in context with DESY's e-mail infrastructure.
    
    Speaker: Mr Dirk Jahnke-Zumbusch (DESY)
    
    47_dirk-jahnke-zumbusch_qpsmtpd.pdf
- 15:20 → 15:50
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 15:50 → 17:30
  Security & Networking Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 15:50
    
    Platform Providing Network Awareness to ATLAS and Beyond 25m
    
    With the change of the ATLAS computing model from hierarchical to dynamic, processing tasks are dispatched to sites based not only on availability of resources but also network conditions along the path between compute and storage, which may be topologically and/or geographically distant. We describe a system developed to collect, store, analyze and provide timely access to the network conditions for ATLAS sites, which is also generalized for broader use. We describe the data we collect from four different sources giving orthogonal views of network performance and utilization. The pre-existing ATLAS Distributed Computing Analytics platform is used for data transport and storage. The platform provides interactive monitoring dashboards, and serves as a backend to an alarm and alert system which we have developed for site operators. A co-located Jupyter service is used to perform in-depth interactive data analysis, train different Machine Learning algorithms and test models on historical data. We discuss how the derived knowledge gets used by ATLAS for network anomaly detection, job scheduling and data brokering.
    
    Speaker: Ilija Vukotic (University of Chicago (US))
    
    Platform Providing Network Awareness to ATLAS and Beyond.pdf
    
    Providing Network Awareness to ATLAS And Beyond
  - 16:15
    
    Upgrade of network connection between KEK and SINET 25m
    
    Since last Apr.1, SINET that is NREN for universities in Japan has started the operation of 5th generation infrastracture, SINET5. It accepts 100Gbps connection to the backbone from each institutes, and newly provides the direct path from Japan to Europe. KEK is connected to SINET by 120Gbps bandwidth in total and mostly the bandwidth
    will be used by the mass data transmission via LHCONE. We will report how we upgrade and change the monitoring scheme to keep the security level.
    
    Speaker: Soh Suzuki
    
    HEPIX2016-20161017-S.Y.Suzuki.pdf
  - 16:40
    
    SDN-enabled Intrusion Detection System 25m
    
    CERN networks are dealing with an ever-increasing volume of network traffic. The traffic leaving and entering CERN has to be precisely monitored and analysed in order to properly protect the networks from potential security breaches. To provide the required monitoring capabilities, the Computer Security team and the Networking team at CERN have joined efforts in designing and deploying a scalable Intrusion Detection System (IDS) setup. The setup features symmetrical load-balancing of monitored traffic across a pool of IDS servers with optional OpenFlow-based traffic shunting (offloading) and selective packet capturing capabilities. Having an experimental instance deployed, the solution is currently under testing with a promising perspective of putting it in production in the near future.
    
    Speaker: Adam Lukasz Krajewski (CERN)
    
    SDN-IDS-HEPIX-10172016.pdf
    
    SDN-IDS-HEPIX-10172016.pptx
  - 17:05
    
    SDN Implementation in IHEP 25m
    
    High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link competition problems. On the other side, with the development of cloud computing technologies,IHEP have established a cloud platform based on OpenStack which can ensure the flexibility of the computing and storage resources, and more and more computing applications have been moved to this platform,however,under the traditional network architecture, network capability become the bottleneck of restricting the flexible application of cloud computing.
    This report introduces the SDN implemtation in IHEP to solve the above problems, we built a dedicated and elastic network platform based on the data center SDN technologies and network virtualization technologies, meanwhile the SDN@WAN solution in IHEP will also be introduced.
    In the end, the test results and future works will be shared and analyzed.
    
    Speaker: Mrs SHAN ZENG (IHEP)
    
    SDN Implementation in IHEP-HEPiX2016Fall.pdf
- 17:30 → 18:00
  Site Report Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 17:30
    
    SLAC Site Report 15m
    
    Update on SLAC Scientific Computing Service
    
    SLAC’s Scientific Computing Services team provide long-term storage and
    midrange compute capability for multiple science projects across the lab.
    The team is also responsible for core enterprise (non-science) unix
    infrastructure. Sustainable hardware lifecycle is a key part of the central
    computing strategy. We continue to push the idea of business models for
    computing services as an alternative to one-time hardware investments.
    Seamless cloud bursting for high-throughput batch compute is under
    development using OpenStack and AWS with VPN.
    
    Speaker: Yemi Adesanya
    
    HEPiX_Oct_2016.pdf
    
    HEPiX_Oct_2016.pptx
  - 17:45
    
    Caltech Site Report 15m
    
    Caltech site report (USCMS Tier 2 site)
    
    Speaker: Wayne Hendricks (California Institute of Technology (US))
    
    Caltech-T2Update-HEPiX.pdf
    
    Caltech-T2Update-HEPiX.pptx
- 18:00 → 21:00
  
  Welcome Reception Lawrence Hall of Science
  
  Lawrence Hall of Science
  
  LBNL
  
  Berkeley, CA 94720
Tuesday 18 October
- 08:30 → 09:00
  
  Registration Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 09:00 → 10:15
  Site Report Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 09:00
    
    ASGC Site Report 15m
    
    report on facility deployment, recent activities, collaborations and plans
    
    Speakers: Eric Yen (Academia Sinica Grid Computing), Felix.hung-te Lee (Academia Sinica (TW))
    
    ASGC_site_report_HEPiXFall2016.odp
    
    ASGC_site_report_HEPiXFall2016.pdf
  - 09:15
    
    CERN Site Report 15m
    
    News from CERN since the DESY workshop.
    
    Speaker: Jerome Belleman (CERN)
    
    sitereport-cern-belleman.pdf
  - 09:30
    
    RAL Site Report 15m
    
    Latest news of activities at the RAL Tier1.
    
    Speaker: Martin Bly (STFC-RAL)
    
    2016-10 HEPiX Berkeley - RAL Site Report.pdf
    
    2016-10 HEPiX Berkeley - RAL Site Report.pptx
  - 09:45
    
    Nikhef Site Report 15m
    
    Update from Nikhef
    
    Speaker: Paul Kuipers (Nikhef)
    
    Nikhef Site Report.pdf
    
    Nikhef Site Report.pptx
  - 10:00
    
    INFN-T1 Status report 15m
    
    A short update on what's going on at the Italian T1 center.
    
    Speaker: Andrea Chierici (INFN-CNAF)
    
    20161018_HEPIX-infn-t1_site_report.pptx
- 10:15 → 10:45
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 10:45 → 12:30
  Site Report Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 10:45
    
    NDGF Site Report 15m
    
    News and interesting events from NDGF and NeIC.
    
    Speaker: Erik Mattias Wadenstein (University of Umeå (SE))
    
    20161018-NDGF-Site-Report.pdf
  - 11:00
    
    KIT Site Report 15m
    
    News about GridKa Tier-1 and other KIT IT projects and infrastructure.
    
    Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
    
    kit-site-report-hepix-fall-2016.pdf
  - 11:15
    
    GSI Site Report 15m
    
    During the last few months, HPC @ GSI has moved servers and services to the new data center Green IT Cube. This included moving the users from the old compute cluster to the new one with a new scheduler, and moving several Petabytes of data from the old to the new Lustre cluster.
    
    Speaker: Dr Thomas Roth (GSI Darmstadt)
    
    GSI-SiteReport_Fall2016.pdf
  - 11:30
    
    ITER siter eport 15m
    
    Critical to the success of ITER reaching its scientific goal (Q≥10) is a data system that supports the broad range of diagnostics, data analysis, and computational simulations required for this scientific mission. Such a data system, termed ITERDB in this document, will be the centralized data access point and data archival mechanism for all of ITER’s scientific data. ITERDB will provide a unified interface for accessing all types of ITER scientific data regardless of the consumer (e.g., scientist, engineer, plant operations) including interfaces for data management, archiving system administration, and health monitoring capabilities.
    Due to the INB nature of ITER, there are two parts – one located in POZ (Plant Operation Zone) to collect experimental data and another one located in XPOZ (outside Plant Operation Zone) to allow offline analysis execution and storage. In this paper, we will focus on ITERDB-POZ part, the other part being still under-designed.
    ITER is the international project consisting of seven Das (Domestic Agencies). Its procurement makes it quite challenging. To smooth integration, we developed the CODAC Core system which is a mini-platform based on RHEL and EPICS which simulates the functional CODAC behaviour. Since its first version (2010), it has been increased with new features and new APIs. ITER consists of roughly 200 systems (roughly millions of variables). In this paper, we will focus on the Data Acquisition Network (DAN). Many systems will stream data over DAN at various rates from a few hundred kB/sec to 50GB/sec). We describe in this document the various components involved in the data acquisition and a data storage chain.
    
    Speaker: lana abadie (ITER)
    
    HepixLAE.pdf
    
    HepixLAE.pptx
  - 11:45
    T2_FI_HIP Site Report 15m
    
    hardware renewal
    
    dCache and OS upgrade
    
    ansible
    
    Speaker: Johan Henrik Guldmyr (Helsinki Institute of Physics (FI))
    
    t2_fi_hip_201610.pdf
  - 12:00
    Irfu site report 15m
    
    Windows10 migration
    
    network : IPV6
    
    infra : monitoring
    
    new H2020 call EOSF
    
    Speaker: Sophie Ferry
    
    IRFU_Site_Report_2016 2v.pdf
  - 12:15
    
    Wigner Datacenter - Site report 15m
    
    We give an update on the infrastructure, Tier-0 hosting services, Cloud services and other recent developments at the Wigner Datacenter.
    
    Speaker: Mr Domokos Szabo (Wigner Datcenter)
    
    Site_report_of_Wigner_Datacenter.pdf
    
    Site_report_of_Wigner_Datacenter.pptx
- 12:30 → 14:00
  
  Lunch Break 1h 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 14:00 → 15:40
  Security & Networking Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 14:00
    
    Plans to support IPv6-only CPU on WLCG - an update from the HEPiX IPv6 Working Group 25m
    
    This report from the HEPiX IPv6 Working Group will present activities during the last 6-12 months. With IPv4 addresses running out and with some sites and Cloud providers now wishing to offer IPv6-only CPU, together with the fact that several WLCG sites are already successfully running production dual-stack storage services, we have a plan to support IPv6 CPU from April 2017 onwards. This plan will be presented.
    
    Speaker: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
    
    Kelsey18oct16v2.pdf
    
    Kelsey18oct16v2.pptx
  - 14:25
    
    Security Update 25m
    
    What’s been happening in security for HEP? We will discuss the recent trends in the ever changing threat landscape, and the new initiatives being put in place to protect our people, data and services. One such initiative to highlight is our focus on boostrapping international collaboration within research and academia, encouraging communities to participate in intelligence sharing and incident response. We will also discuss developments in the technologies being used to target us and the rest of the academic community.
    
    Speaker: Hannah Short (CERN)
    
    20161018 HEPiX Security Update.pdf
    
    20161018 HEPiX Security Update.pdf
    
    20161018 HEPiX Security Update.pptx
  - 14:50
    
    Pre-Studies for Wi-Fi service enhancement at CERN 25m
    
    Over the last few years, the number of mobile devices connected to the CERN internal network has increased from a handful in 2006 to more than 10,000 in 2015. Wireless access is no longer a “nice to have” or just for conference and meeting rooms, now support for mobility is expected by most, if not all, of the CERN community. In this context, a full renewal of the CERN Wi-Fi network has been launched in order to provide a state-of-the-art Campus-wide Wi-Fi Infrastructure. Which technologies can provide an end-user experience comparable, for most applications, to a wired connection? Which solution can cover more than 200 office buildings, which represent a total surface of more than 400.000 m2, while keeping a single, simple, flexible and open management platform? The presentation will focus on the pre-studies which were done at CERN to review the full Wi-Fi infrastructure across the Campus. Moreover modern demands for Wi-Fi connectivity, as well as designing process of new CERN Wi-Fi network (RF planning, simulation, site survey) will be presented.
    
    Speaker: Adam Wojciech Sosnowski (AGH University of Science and Technology (PL))
    
    HEPIX_Fall_2016_-_Pre-Studies_for_Wi-Fi_Service_Enhancement_at_CERN.pdf
    
    HEPIX_Fall_2016_-_Pre-Studies_for_Wi-Fi_Service_Enhancement_at_CERN.pptx
  - 15:15
    
    Wi-Fi service enhancement at CERN 25m
    
    Over the last few years, the number of mobile devices connected to the CERN internal network has increased from a handful in 2006 to more than 10,000 in 2015. Wireless access is no longer a “nice to have” or just for conference and meeting rooms, now support for mobility is expected by most, if not all, of the CERN community. In this context, a full renewal of the CERN Wi-Fi network has been launched in order to provide a state-of-the-art Campus-wide Wi-Fi Infrastructure. Which technologies can provide an end-user experience comparable, for most applications, to a wired connection? Which solution can cover more than 200 office buildings, which represent a total surface of more than 400.000 m2, while keeping a single, simple, flexible and open management platform? The presentation will focus on the studies and tests performed at CERN to address these issues, as well as some feedback about the global project organisation.
    
    Speaker: Vincent Ducret (CERN)
    
    HEPIX Fall 2016 - Wi-Fi Service Enhancement at CERN.pdf
    
    HEPIX Fall 2016 - Wi-Fi Service Enhancement at CERN.pptx
- 15:40 → 16:10
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 16:10 → 17:00
  Security & Networking Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 16:10
    
    Cloud Services – Network realities 25m
    
    HEP use of cloud services has brought to light various network issues that hamper the full integration of such services with WLCG resources. In this presentation we comment on the issues that have been encountered and present the ongoing actions of the international network community to facilitate the integration of cloud services into the research computing environment.
    
    Speaker: Tony Cass (CERN)
    
    csnr.pptx
  - 16:35
    
    Can we trust eduGAIN? 25m
    
    EduGAIN, the international identity federation, allows users from all over the world to access a globally distributed suite of academic resources. You are most likely already able to use your primary account, from CERN or your home organisation, to tap in to these services! Federated Identity Management, the technology underpinning eduGAIN, brings many benefits for users and organisations alike but… how can we trust these users with our HEP services? This is one of the questions that the AARC project (https://aarc-project.eu), in which CERN is a partner, is seeking to answer. We will discuss the measures being put in place to allow WLCG to reap the rewards of eduGAIN without exposing itself to increased risk.
    
    Speaker: Hannah Short (CERN)
    
    20161018 HEPiX Can we trust eduGAIN.pdf
    
    20161018 HEPiX Can we trust eduGAIN.pptx
- 17:00 → 17:25
  Storage and Filesystems Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 17:00
    
    Deep dive into Spectrum Scale (fomerly known as GPFS) 25m
    
    Intent of this presentation is to give current (or potential) users of Spectrum Scale a deep dive into various key components and functions of the Product and its usage in High Performance Computing. i will share Performance data for problematic filesystem workloads like shared directory or file access as well as demonstrate some new capabilities that have been added into the 4.2.1 release. i will further explain some i/o optimization technologies like LROC and HAWC that allow the use of FLASH technologies of various sorts to accelerate workloads. if time permits i can show some of the advanced performance and problem determination capabilities that were recently added to the product as well, including a live realtime performance demo.
    
    Speaker: Sven Oehme
    
    Spectrum_Scale-HEPIX_V1a.pdf
- 17:30 → 19:00
  
  Board Meeting Building 59, room 4102
  
  Building 59, room 4102
  
  LBNL
  
  Berkeley, CA 94720
Wednesday 19 October
- 08:30 → 09:00
  
  Registration Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 09:00 → 10:15
  Computing and Batch Services Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 09:00
    HEPiX Benchmarking Working Group - Status Report HEPiX Fall 2016 25m
    
    The HEPiX Benchmarking Working Group has been relaunched in spring 2016. First tasks are:
    
    Development and proposal of a fast benchmark to estimate the performance of the provided job slot (in traditional batch farms) or VM instance (in cloud environments)
    
    Preliminary work for a successor of the HS06 benchmark
    
    This talk provides a status report of the work done so far.
    
    Speaker: Manfred Alef (Karlsruhe Institute of Technology (KIT))
    
    status-report-2016-10-19.pdf
  - 09:25
    
    Big Data: Genomics vs. Physics 25m
    
    Big data is typically characterized by only a few features, such as Volume, Velocity and Variety. This is a simplification that overlooks many factors that affect the way data is used and managed, factors that can have a profound effect on the computing systems needed to serve different communities.
    
    I compare the computing and data-management needs of the genomics domain with those of big physics experiments, highlight the differences between them and discuss the implications of those differences.
    
    Speaker: Tony Wildish (Lawrence Berkeley National Laboratory)
    
    2016-10-19 HEPiX Big Data Genomics vs Physics.pdf
    
    2016-10-19 HEPiX Big Data Genomics vs Physics.pptx
  - 09:50
    
    JLab's SciPhi-XVI Knights Landing Cluster 25m
    
    Jefferson Lab recently installed a 200 node Knights Landing cluster, becoming an Intel® Parallel Computing Center. This talk will give an overview of the cluster installation and configuration, including its Omni-Path fabric, benchmarking, and integation with Lustre and NFS over Infiniband.
    
    Speaker: Sandy Philpott
    
    HEPiX_LBNL16_JLabKNL_upd1.pdf
- 10:15 → 10:45
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 10:45 → 12:25
  Computing and Batch Services Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 10:45
    
    A Race for the Data Center: POWER8 and AArch64 25m
    
    x86 processors have been the long-time leaders of the server market and x86_64 the uncontested target architecture for the development of High Energy Physics applications. Up until few years ago, interests in alternative architectures targeting server environments that could compete in terms of performance, power efficiency and total cost of ownership with x86 could not find any concrete response. However, the past few years have seen the introduction of new processor architectures and initiatives aimed at challenging the leading position of x86. With the introduction in 2011 of the ARMv8 Instruction Set Architecture supporting 64-bit, ARM set the first milestone for the expansion into the server landscape. The OpenPOWER Foundation founded in 2013 set as its main goal the development of the POWER ecosystem in the server market, initially embracing under this initiative the POWER8 processor family. In 2015 we presented performance and power consumption benchmarks of uni-socket platforms that proved the existence of a significant gap between x86 and other competitors (A look beyond x86: OpenPOWER8 & AArch64, HEPiX Spring 2015) . The ecosystem has grown both in terms of availability of hardware platforms and software support. I will present new performance and power consumption results covering recent dual-socket ARMv8 and POWER8 platforms.
    
    Speaker: Marco Guerri (CERN)
    
    A Race for the Data Center
  - 11:10
    
    Dynamical Provisioning of Cloud Computing Resources for Batch Processing 25m
    
    We aim to build a software service for provisioning cloud-based computing resources that can be used to augment users’ existing, fixed resources and meet their batch job demands. This service must be designed to automate the delivery of compute resources (HTCondor execute nodes) to match user job demand in such a way that cloud-based resource utilization is high and, thus, cost per cpu-hour is low. In addition, since this provisioning service will acquire resources on behalf of its users, acting as a third-party buyer for them, it is also our fiduciary responsibility to ensure the system is stable or, at least, that stability can be maintained. In order to assess if stable resource utilization is possible, a dynamical systems approach is developed to provide a framework for understanding how the provisioning service will respond to user job demand. We will present our latest results on the project and give an overview of the development plan moving forward.
    
    Speaker: Dr Martin Kandes (Univ. of California San Diego (US))
    
    mkandes_hepix_fall_2016.pdf
  - 11:35
    
    What's new in HTCondor? What is upcoming? 25m
    
    The goal of the HTCondor team is to to develop, implement, deploy, and evaluate mechanisms
    and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Increasingly, the work performed by the HTCondor developers is being driven by its partnership with the High Energy Physics (HEP) community.
    
    This talk will present recent changes and enhancements to HTCondor, including details on some of the enhancements created for the imminent HTCondor v8.6.0 release, changes created on behalf of the HEP community, and advancements on interactions with Docker and public cloud services. It will also discuss the upcoming HTCondor development roadmap, and seek to solicit feedback on the roadmap from HEPiX attendees.
    
    Speaker: Todd Tannenbaum
    
    TannenbaumT_HEPIX_Oct_2016.pdf
    
    TannenbaumT_HEPIX_Oct_2016.pptx
  - 12:00
    
    Profiling data intensive workflows on Genepool and PDSF clusters at NERSC. 25m
    
    NERSC is well known for its user friendly, large-scale computing environment. Along with the large Cray systems (Edison and Cori), NERSC also supports data intensive workflows of the Joint Genome Institute, HEP and material science community via its Genepool, PDSF and Matgen clusters. These clusters are all provisioned from a single backend cluster, Mendel. This talk will briefly outline the workflows in Mendel and provide a comparative profile of its various applications. It will also summarize various user and system incidents over the last few years of its service. A deeper analysis of the bio-informatics workflow on the Genepool compute cluster, and a plan for testing workflows on a Mendel testbed with Cori-like environment will be discussed. Finally, a prospective plan for future evolution of Genepool part of Mendel will also be outlined.
    
    Speaker: Dr Bhupender Thakur (NERSC, Lawrence Berkeley National Lab)
    
    NERSC_hepix2016.pdf
- 12:25 → 14:00
  
  Lunch Break 1h 35m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 13:00 → 14:00
  
  BOF session: HPC hardware acquisition practices, software and application porting experiences Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  
  HPC hardware acquisition practices, software and application porting experiences
- 14:00 → 15:40
  Storage and Filesystems Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 14:00
    
    CEPHFS: a new generation storage platform for Australian high energy physics 25m
    
    In this paper we present a CEPHFS use case implementation at the Center of Excellence for Particle Physics at the TeraScale (CoEPP). CoEPP operates the Australia Tier-2 for ATLAS and joins experimental and theoretical researchers from the Universities of Adelaide, Melbourne, Sydney and Monash. CEPHFS is used to provide a unique object storage system, deployed on commodity hardware and without single points of failure, used by Australian HEP researchers in the different CoEPP locations to store, process and share data, independent of their geographical location. CEPHFS is also working in combination with a SRM and XROOTD implementation, integrated in ATLAS Data Management operations, and used by HEP researchers for XROOTD or/and POSIX-like access to ATLAS Tier-2 user areas. We will provide details on the architecture, its implementation and tuning, and report performance I/O metrics as experienced by different clients deployed over WAN. We will also explain our plan to collaborate with Red Hat Inc. on extending our current model so that the metadata cluster distribution becomes multi-site aware, such that regions of the namespace can be tied or migrated to metadata servers in different data centers.
    
    Speaker: Goncalo Borges (University of Sydney (AU))
    
    GoncaloBorges-HEPIX16-v3.pdf
  - 14:25
    
    Experience of Development and Deployment of a Large-Scale Ceph-Based Data Storage System at RAL 25m
    
    A new data storage system, Echo, has been developed as a replacement for CASTOR disk-only storage of LHC data at the RAL Tier-1 for the past two years. This presentation will share the RAL experience of developing and deploying a new, ceph-based storage service at the 13 PB scale to the standard required for production use.
    
    This is the first new service that we have developed at this scale for some time and ceph is a very different technology from our existing storage solution. This presentation will explore the changes required to accommodate such a service: from the location of servers in the data centre; development of the network topology and the effect this has on data placement; the design and construction of a system that is more manageable, maintainable and upgradable by a system administrator; the adaptation of existing software in order to support LHC VO workflows and the implementation of new software to support industry standard protocols for both LHC VOs and other user communities. I will also discuss the changes brought by the deployment of a new OS major version and the change from sysVinit to systemd for process management, the changes to monitoring and alerting required to support the continuous operation of the service and the risks and impacts of transitioning to this technology.
    
    Speaker: Bruno Canning (RAL)
    
    Ceph-Experience-at-RAL-final.pdf
    
    Ceph-Experience-at-RAL-final.ppt
  - 14:50
    
    Ceph Based Storage Systems at the RACF 25m
    
    We give a report on the status of Ceph based storage systems deployed at the RHIC & ATLAS Computing Facility (RACF) that are currently providing 1 PB of data storage capacity for the object store (with Amazon S3 compliant Rados Gateway front end), block storage (RBD), and shared file system (CephFS with dCache/GridFTP front-ends) layers of Ceph storage system. The hardware and software upgrades performed over the duration of the last year are reported, including the results of performance tuning for the Rados Gateway subsystem of the cluster in order to support the high concurrency (up to 24k simultaneous connections), high granularity (about 1-10 MB payloads per client session), and high bandwidth (up to 1 GB/s of aggregate bandwidth on the WAN) data transfers via Amazon S3 compatible API in order to match the growing requirements of the ATLAS Event Service. The results of boosting the performance of our Ceph clusters using the low latency PCIe NVMe SSD storage devices and the future plans for our Ceph based storage systems are also discussed.
    
    Speaker: Alexandr Zaytsev (Brookhaven National Laboratory (US))
    
    HEPiX2016_a2_RACF_azaytsev_Ceph_v3.pdf
  - 15:15
    
    Resilient dCache and other news 25m
    
    New developments in dCache, in particular resilient features of redundant headnode services where we can now do automatic failover and rolling upgrades with low to none service impact.
    
    Some other news too, on recent development in other areas like ceph support.
    
    Speaker: Erik Mattias Wadenstein (University of Umeå (SE))
    
    20161019-HA-dCache.pdf
- 15:40 → 16:10
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 16:10 → 17:50
  Storage and Filesystems Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 16:10
    
    Effective Data Retrieval from Massive Amounts of Tape-Resident Data 25m
    
    Randomly restoring files from tapes degrades the read performance primarily due to frequent tape mounts. The high latency and time-consuming tape mount and dismount is a major issue when accessing massive amounts of data from tape storage. BNL's mass storage system currently holds more than 80 PB of data on tapes, managed by HPSS. To restore files from HPSS, we make use of a scheduler software, called ERADAT. This scheduler system was originally based on code from Oak Ridge National Lab, developed in the early 2000s. After some major modifications and enhancements, ERADAT now provides advanced HPSS resource management, priority queuing, resource sharing, web-browser visibility of real-time staging activities and advanced real-time statistics and graphs. ERADAT is also integrated with ACSLS and HPSS for near real-time mount statistics and resource control in HPSS. ERADAT is also the interface between HPSS and other applications such as the locally developed Data Carousel providing fair resource-sharing policies and related capabilities.
    ERADAT has demonstrated great performance at BNL and other scientific organizations.
    
    Speaker: David Yu (Brookhaven National Laboratory (US))
    
    Efficient Access to Massive Amounts of Tape-Resident Data HEPiX 2016 Fall v2.pdf
    
    Efficient Access to Massive Amounts of Tape-Resident Data HEPiX 2016 Fall v2.pptx
  - 16:35
    
    EOS, DPM and FTS developments and plans 25m
    
    The CERN IT-ST Analytics and Development section is responsible for the development of Data Management solution for Disk Storage and Data Transfer, namely EOS, DPM and FTS.
    
    The talk will describe some recent developments in those 3 software solutions
    
    EOS
    
    The integration and evaluation of various technologies to do the transition from a single active in-memory namespace to a scale-out implementation distributed over many meta-data servers. The new architecture aims to separate the data from the application logic and user interface code, thus providing flexibility and scalability to the namespace component.
    
    DPM
    
    The implementation of a new core daemon (DOME) based on the fast-CGI and RESTful technologies. This brings the opportunity of working in a totally SRM-free mode, the implementation of quotas, free/used space on directories, and the implementation of volatile pools that can pull files from external sources, which can be used to deploy data caches.
    
    FTS
    
    The extension to better support data transfer workflows between Grid, Cloud and HPC systems. This includes FTS3 implementing protocol translations and performing efficient 3rd party transfers over HTTP. One of the core component ( Optimizer ) has been also rewritten to allow ranges of active transfers and better exploitation of the network resources.
    
    Speaker: Andrea Manzi (CERN)
    
    Hepix_2016.pdf
    
    Hepix_2016.pptx
  - 17:00
    
    ZFS on Linux 25m
    
    ZFS is a combination of file system, logical volume manager, and software raid system developed by SUN Microsystems for the Solaris OS. ZFS simplifies the administration of disk storage and on Solaris it has been well regarded for its high performance, reliability, and stability for many years. It is used successfully for enterprise storage administration around the globe, but so far on such systems ZFS was mainly used to provide storage, like for users home directories, through NFS and similar network related protocols.
    
    Within GridPP, ZFS was also used before for the management of user home directories through NFS. These systems were based on Solaris or similar systems like the ones provided by Nexenta. However, most of the Grid Middleware run on Linux systems and not on Solaris and therefore ZFS wasn't used so far for Grid storage management or in general for Grid middleware servers.
    
    Since ZFS is available in a stable version on Linux now, here I will present our experience made with ZFS on Linux since we started to updated all GridPP storage (about 1PB) at our site at the end of last year to be managed by ZFS using the current Linux version of it. Since with larger growing disk capacity raid6 rebuild times get soon too large to be feasible, ZFS built in raid functionality was tested as an alternative to hardware raid systems and the results will be presented. I'll also report on other ZFS specific properties like compression,nfs sharing, and snapshots and how it is working in the Linux port.
    ZFS on Linux could be an efficient and cost effective alternative to hardware raid and Solaris based systems, which has characteristics no other file system can provide and which can provide real data safety and reliability.
    
    Speaker: Marcus Ebert (University of Edinburgh (GB))
    
    ZFSonLinuxAtScotGrid.pdf
  - 17:25
    
    OSiRIS: One Year Update 25m
    
    The OSiRIS (Open Storage Research Infrastructure) project started in September 2015, funded under the NSF CC*DNI DIBBs program (NSF grant #1541335). This program seeks solutions to the challenges many scientific disciplines are facing with the rapidly increasing size,
    variety and complexity of data they must work with. As the data grows, scientists are challenged to manage, share and analyze that data and become diverted from a focus on their scientific research to data-access and data-management concerns. Even more problematic is determining how to support many scientists sharing and accessing this ever increasing amount of data across multiple institutions.
    
    We will describe the progress made during the OSiRIS project's first year. OSiRIS has fully deployed and benchmarked its initial multi-institutional Ceph deployment. To do this involved developing,deploying and configuring a number of tools to support consistent provisioning, monitoring and management of the distributed OSiRIS infrastructure. We will cover those details and discuss our initial science engagements and near-term plans for our hardware, Ceph, Authentication/Authorization and Software Defined Networking as well as the longer term plans for this 5-year project.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    OSiRIS-HEPiX-Fall2016.pdf
- 18:00 → 21:00
  
  Conference Dinner UC Berkeley Faculty Club
  
  UC Berkeley Faculty Club
  
  LBNL
  
  Berkeley, CA 94720
Thursday 20 October
- 09:00 → 09:05
  
  Miscellaneous: Safety Announcement Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 09:05 → 10:20
  Storage and Filesystems Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 09:05
    
    Update from Database Services 25m
    
    With the terabytes of data stored in databases and Hadoop at CERN and great number of critical applications relying on them, the database service is evolving and the Hadoop service is expanding to adapt to changing needs and requirements of its users. The demand is high and the scope is broad. This presentation gives an overview of current state of databases services and new technologies approaching in Hadoop Service to make better use of latest hardware developments. Update to Database-On-Demand management model and technologies (MySQL, PostgreSQL) will also be provided.
    
    Speaker: Katarzyna Maria Dziedziniewicz-Wojcik (CERN)
    
    HEPIX-2.pdf
    
    HEPIX-2.pptx
  - 09:30
    
    AFS phaseout at CERN 25m
    
    (Open)AFS has been used at CERN as general purpose filesystem for Linux homedirectories and project space for over 20 years. It has an excellent track record, but is showing its age. It is now slowly being phased out due to concerns on the project's long-term viability. The talk will briefly explain CERN's reasons for phasing out, give an overview of the process, introduce the migration targets for the various use cases (primarily EOS-FUSE), and highlight the challenges (and opportunities) of this migration.
    
    Speaker: Jan Iven (CERN)
    
    CERN_NOAFS_HEPIX2016.pdf
  - 09:55
    The future of AFS family file systems in research computing 25m
    
    Since the introduction of Transarc AFS in 1991, the AFS family of file systems have played a role in research computing around the globe.
    
    This talk will discuss the resurgence in development of the AFS family of file systems. A summary of recent development for several family members will be presented including:
    
    AuriStor File System suite of clients and servers
    
    kAFS, the Linux in-tree client and the associated AF_RXRPC socket interface
    
    OpenAFS clients and servers
    
    The talk will describe the potential uses of the /afs file namespace as a persistent storage solution for Containers.
    
    Finally, the talk will discuss the Tennessee Open Research storage Cloud (TORC) proposal that was submitted to the U.S. National Science Foundation for funding as part of the Cyber Infrastructure initiative. If funded, TORC will provide a wide-area, high-performance and interoperable storage infrastructure designed for scalable, multi-level federation under cooperative management. TORC will combine the global, federated /afs file namespace and the multi-level security and privacy provided by the AuriStor File System with the high performance, scalability and reliability of L-Store and the Internet Backplane Protocol.
    
    Speaker: Mr Jeffrey Altman
    
    Auristor_Fact_Sheet.pdf
    
    AuriStor-HEPIX-Fall-2016-Future-of-AFS.pdf
    
    kAFS and AF_RXRPC Projects
    
    Logistical Storage
    
    OpenAFS Road Map
    
    Tennessee Open Research Cloud Grant Proposal
- 10:20 → 10:45
  
  Coffee Break & California earthquake drill 25m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 10:45 → 12:25
  IT Facilities and Business Continuity Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 10:45
    
    Deploying Open Compute hardware at CERN 25m
    
    The Open Compute Project, OCP, was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at lowest possible cost. Specifications and design documents for Open Compute systems are released under open licenses following the model traditionally associated with open source software projects. In 2014 we presented our plans for a public procurement activity for a small-size Open Compute hardware installation aimed at assessing the maturity of OCP market and whether it could be identified as a possible competitor of "traditional" hardware (Open Compute at CERN, HEPiX Spring 2014). We have finally deployed in September 2015 six Open Compute racks populated with CPU servers and storage enclosures in CERN's Meyrin datacentre. We were presented with interesting challenges during all phases of the project and at all levels of the stack, from the power distribution to hardware monitoring. I will outline some of the hurdles we had to overcome and the lessons we have learnt along the way, together with the results obtained during the evaluation of the systems.
    
    Speaker: Marco Guerri (CERN)
    
    Deploying OCP Hardware at CERN
  - 11:10
    
    CERN Computing Facilities Evolution 25m
    
    This talk will give an overview of current activities to expand CERN's computing facilities infrastructure. This will include a description of the 2nd Network Hub currently being constructed as we ll as its purpose. It will also cover the initial plans for a possible second Data Centre on the CERN site.
    
    Speaker: Wayne Salter (CERN)
    
    B773.mpg
    
    DC Evolution HEPiX Autumn 2016.pdf
    
    DC Evolution HEPiX Autumn 2016.pptx
  - 11:35
    
    The role of dedicated computing centers in the age of cloud computing 25m
    
    BNL anticipates significant growth in scientific programs with large
    computing and data storage needs in the near future and has recently
    re-organized support for scientific computing to meet these needs.
    A key component is the enhanced role of the RHIC-ATLAS Computing
    Facility (RACF) in support of HTC and HPC at BNL.
    
    This presentation discusses the evolving role of the RACF at BNL, in
    light of its growing portfolio of responsibilities and its increasing
    integration with cloud (academic and for-profit) computing activities.
    We also discuss BNL's plan to build a new computing center to support
    the new responsibilities of the RACF and present a summary of the cost
    benefit analysis done, including the types of computing activities
    that benefit most from a local data center vs. cloud computing. This
    analysis is partly based on an updated cost comparison of Amazon EC2
    computing services and the RACF, which was originally conducted in 2012.
    
    Speaker: Tony Wong (Brookhaven National Laboratory)
    
    The_role_of_dedicated_computing_centers_in_age_of_cloud_computing.pdf
    
    The_role_of_dedicated_computing_centers_in_age_of_cloud_computing.pptx
  - 12:00
    
    GreenITCube - Status & Monitoring 25m
    
    The GreenITCube is in production for half a year now. We want to present our experience so far, what we have learned about the system and give an outlook for the next couple of months.
    
    As a second part of the talk, we want to give a detailed overview of the infrastructure monitoring. The focus will be on the different systems, we have in work and how we put all monitoring data together.
    
    Speaker: Mr Jan Trautmann (GSI Darmstadt)
    
    GreenITCube.pdf
- 12:25 → 14:00
  
  Lunch Break 1h 35m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 14:00 → 15:15
  Basic IT Services Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 14:00
    
    Monitoring HTCondor with Clustered Graphite and Grafana 25m
    
    Grafana is a popular tool for data analytics, and HTCondor generates
    large amounts of time-series data appropriate for the kinds of analysis
    Grafana provides. We use a Graphite cluster, which will be described in
    some detail, as a back-end for metric storage, and adapted some scripts
    from Fermilab for metric gathering. This work is in the context of the
    batch-monitoring working group.
    
    Speaker: William Strecker-Kellogg (Brookhaven National Lab)
    
    bnl-hepix-fall-2016-monitoring-analytics.pdf
    
    bnl-hepix-fall-2016-monitoring-analytics.pptx
  - 14:25
    
    Introduction of load balancers at a Tier-1 site 25m
    
    Historically at the RAL Tier-1 we have always directly exposed public-facing services to the internet via static DNS entries. This is far from ideal as it means that users will experience connection failures during server maintenance (both planned and unplanned) and any changes to the servers behind a particular service require DNS changes. Since April we have been using in production HAProxy and Keepalived to facilitate a highly-available load balancer in front of FTS3 in order to avoid the issues resulting from the use of DNS aliases. We are also making extensive use of HAProxy and Keepalived for our OpenStack cloud which is under development. Here we will describe our setup, experience with load balancers for FTS3 and OpenStack as well as our progress and plans for other services.
    
    Speaker: Ian Collier (STFC - Rutherford Appleton Lab. (GB))
    
    HEPiX2016Oct_LoadBalancers_RAL-ADL.pdf
    
    HEPiX2016Oct_LoadBalancers_RAL-ADL.ppt
  - 14:50
    
    Renewal of Puppet for Australia-ATLAS 25m
    
    Australia-ATLAS has been running Puppet for all infrastructure and Grid nodes since 2012. With the release of Puppet 4, and the move to Centos 7, we decided to rejig our Puppet configuration using what we've learnt in 4 years, and best practice methodologies. This talk will describe the problems we had with the old Puppet config, the decisions we made constructing the new system, and how the new system makes configuration management much easier.
    
    Speaker: Mr Sean Crosby (University of Melbourne (AU))
    
    Sean_Crosby_puppet_renewal.pdf
    
    Sean_Crosby_puppet_renewal.pptx
- 15:15 → 15:45
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 15:45 → 16:35
  Basic IT Services Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 15:45
    
    User/group based access control for ElasticSearch + Kibana 25m
    
    Kibana and ElasticSearch are used for monitoring in many places. However, by default they do not support authentication and authorization features. In the case of single Kibana and ElasticSearch services shared among many users, any user that can access Kibana can retrieve any information from ElasticSearch.
    
    In this talk, we will report on our latest R&D experience in securing the Kibana and ElasticSearch services. We will describe a Kibana plugin that allows Kibana dashboards to be separated based on user/group. We will also describe the effect on performance from using SearchGuard, which is an ElasticSearch plugin enables user/group based access control.
    
    Speaker: Wataru Takase (High Energy Accelerator Research Organization (JP))
    
    161020_hepix_wataru_takase.pdf
  - 16:10
    
    Adopting Red Hat Satellite 6 for Lifecycle Management 25m
    
    An overview of results and lessons learned from the Fermilab Scientific Linux and Architecture Management(SLAM) group's Satellite 6 Lifecycle Management Project. The SLAM team offers a portfolio of diverse system management service offerings with a small staff. Managing the risk of resource scarcity involves implementing tools and processes that will facilitate standardization, reduce complexity, and increase efficiency whenever possible. This short talk will give a brief overview of our experience and the results and the future of migrating to Satellite 6.1 as our new base for System Management.
    
    Speaker: Rennie Scott (Fermilab)
    
    Satellite 6.2-Final Draft.pdf
    
    Satellite 6.2-Final Draft.pptx
- 16:35 → 17:25
  Grid, Cloud and Virtualisation Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 16:35
    
    Chameleon: A Computer Science Testbed as Application of Cloud Computing 25m
    
    Did you ever need hundreds of state-of-the-art nodes that you could use to scalably test new ideas on? Run experiments that are not disrupted by what other users are doing? A platform that allows you to reinstall the operating system, recompile the kernel, and gives you access to the console so that you can debug the system? A place where your research team can easily reproduce experiments carried out weeks ago? A lab where your students can work with different hardware configurations, from Infiniband to GPUs, either as part of a class or homework?
    
    This talk will introduce Chameleon, a large-scale, deeply reconfigurable NSF-funded testbed for Computer Science research and education (www.chameleoncloud.org). The testbed consists of ~600 nodes (~14,000 cores) and a total of 5PB disk space hosted at the University of Chicago and TACC, and leverages 100 Gbps connection between the sites. The hardware consists primarily of homogenous nodes to support large-scale experiments – but subgroups of those nodes are equipped with additional capabilities including Infiniband networking, high-bandwidth I/O storages nodes, GPUs, and storage hierarchies with a mix of HDDs, SDDs, NVRAM, and high memory. To support Computer Science experiments, ranging from operating system and virtualization to security research, Chameleon provides a configuration system giving users exclusive access to bare metal nodes on an “as if it were in your lab basis”, i.e., full control of the software stack including root privileges, kernel customization, and console access. In addition, to facilitate educational and application exploratory projects Chameleon also provides a KVM cloud.
    
    I will describe user facing Chameleon capabilities, describe some of the project that the testbed supported in the past, and explain how the testbed was built and will continue to develop.
    
    Speaker: Kate Keahey (Argonne National Laboratory)
    
    HEPiX.pdf
  - 17:00
    
    Extending the farm to external sites: the INFN Tier-1 experience 25m
    
    The Tier-1 at CNAF is the main INFN computing facility offering computing and storage resources to more than 30 different scientific collaborations including the 4 experiments at the LHC. A huge increase in computing needs is foreseen in the next years mainly driven by the experiments at the LHC (especially starting with the run 3 from 2021) but also by other upcoming experiments such as CTA.
    While we are considering the upgrade of the infrastructure of our data center, we are also evaluating the possibility of using CPU resources available in other data centers or even leased from commercial cloud providers.
    Hence, at INFN Tier-1 we have pledged a small amount of computing resources (~2000 cores located at the Bari ReCaS) for the WLCG experiments for 2016 and we are testing the use of resources provided by a commercial cloud provider. While the Bari ReCaS data center is directly connected to the GARR network with the obvious advantage of a low latency and high bandwidth connection, in the case of the commercial provider we rely only on the General Purpose Network.
    In this presentation we describe the setup phase and the first results of these installations, started in the last quarter of 2015, focusing on the issues that we had to deal with and discussing the measured results in terms of efficiency.
    
    Speaker: Andrea Chierici (INFN-CNAF)
    
    20161020_HEPIX_extending_farm.pptx
- 17:25 → 17:50
  Security & Networking Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 17:25
    
    Effective and non-intrusive security within NERSC’s Open Science HPC environment 25m
    
    Providing effective and non-intrusive security within NERSC’s Open
    Science HPC environment introduces a number of challenges for both
    researchers and operational personnel. As what constitutes HPC expands
    in scope and complexity, the need for timely and accurate decision
    making about user activity remains unchanged. This growing complexity
    is balanced against a backdrop of routine user and application
    attacks, which remain surprisingly effective over time.
    
    This presentation describes current efforts at NERSC to maintain
    system integrity without getting in the way of the science being done
    here. These efforts include network monitoring, 2 factor
    authentication as well as ssh and host based data analysis"
    
    Speaker: Abe Singer (Lawrence Berkeley Lab)
    
    2016-10-HEPIX-NERSC-security.pdf
    
    2016-10-HEPIX-NERSC-security.pptx
Friday 21 October
- 09:00 → 10:15
  Grid, Cloud and Virtualisation Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 09:00
    
    On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers 25m
    
    This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers.
    
    We report on recent experience from incorporating a remote HPC center (NEMO Cluster, Freiburg University) and resources dynamically requested from a commercial provider (1&1 Internet SE), which have been seamlessly tied together with the ROCED scheduler [1] such that, from the user perspective, local and remote resources form a uniform, virtual computing cluster with a single point-of-entry. On a local test system, the usage of Docker containers has been explored and shown to be a viable and light-weight alternative to full virtualization solutions in trusted environments.
    
    [1] O. Oberst et al. Dynamic Extension of a Virtualized Cluster by using Cloud
    Resources, J. Phys.: Conference Ser. 396(3)032081, 2012
    
    Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
    
    kit-cloud-hepix-lbl-2016.pdf
  - 09:25
    
    Update on HNSciCloud project 25m
    
    Overview of what has happened in HNSciCloud over the last five months
    
    Speaker: Helge Meinhard (CERN)
    
    2016-10-21-HEPiX-HNSciCloud.pdf
  - 09:50
    
    The advances in IHEP Cloud facility 25m
    
    In IHEP, more large scientific facilities requests more computing resources. Management of large scale resources requests efficient and flexible system architecture. Virtual computing through cloud technical is an approach. IHEPCloud is a private LaaS cloud which supports multi-users and multi-projects to achieve virtual computing. In this paper, we describe the infrastructure of virtual computing cluster in IHEP and discuss the work we done. We also show the performance testing for BES job. IHEPCloud has been online since Nov 2014 and works well. The performance penalty is also acceptable.
    
    Speaker: Tao Cui (IHEP(Institute of High Energy Physics, CAS,China))
    
    The advances in IHEP Cloud facility -HEPIX-Cuitao-final20161021.pdf
    
    The advances in IHEP Cloud facility -HEPIX-Cuitao-final20161021.pptx
- 10:15 → 10:45
  
  Coffee Break 30m Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
- 10:45 → 12:00
  Grid, Cloud and Virtualisation Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  - 10:45
    
    Running HEP Workloads on the NERSC HPC Systems 25m
    
    Running HEP workloads on a Cray system can be challenging since these systems typically don't look very much look a standard Linux system. This presentation will describe several tools NERSC has deployed to enhance HEP and other data intensive computing: Shifter, a docker-like container technology developed at NERSC, the Burst Buffer, a super fast IO layer, and a software defined network that allows high speed connections to the outside world. We will give an overview of the software and hardware architecture, deployment, and performance of these services.
    
    Speaker: Tony Quan (LBL)
    
    TonyQuan_HEPIX2016_Shifter_vF12.pdf
  - 11:10
    
    Further Adventures in Container Orchestration at RAL 25m
    
    We provide an update on our continued experiments with container orchestration at the RAL Tier 1.
    
    Speaker: Ian Collier (STFC - Rutherford Appleton Lab. (GB))
    
    HEPiX2016Oct_Containers_RAL-ADL.pdf
    
    HEPiX2016Oct_Containers_RAL-ADL.ppt
  - 11:35
    
    CSNS Computing Environment Based on OpenStack 25m
    
    OpenStack is an open source software for creating private and public clouds.It controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API. Hundreds of the world’s largest brands rely on OpenStack to run their businesses every day, reducing costs and helping them move faster.
    We are applying this computing mode to the China Spallation Neutron Source(CSNS) computing environment.So from the research and practice aspects,firstly,the application status of cloud computing science in High Energy Physics Experiments and the special requirements of CSNS are introduced in this paper.Secondly, our design and practice of cloud computing platform based on OpenStack are mainly demonstrated from the aspects of cloud computing system framework, some improvments to openstack network, Storage architecture and so on. Finally, some future prospects of CSNS cloud computing environment are discussed in the ending of this paper.
    
    Speaker: Yakang li (ihep)
    
    HEPiX-CSNS-2016.pdf
- 12:00 → 12:30
  
  Closing and HEPIX Business Building 50 Auditorium
  
  Building 50 Auditorium
  
  LBNL
  
  Berkeley, CA 94720
  
  Convener: Tony Wong (Brookhaven National Laboratory)
  
  HEPIX_Fall_2016_Summary.pdf
  
  HEPIX_Fall_2016_Summary.pptx

Choose timezone

HEPiX Fall 2016 Workshop

Building 50 Auditorium

LBNL

HEPiX Fall 2016 at Lawrence Berkeley National Laboratory, Berkeley, CA, USA

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Update on SLAC Scientific Computing Service

Lawrence Hall of Science

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 59, room 4102

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

UC Berkeley Faculty Club

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL

Building 50 Auditorium

LBNL