HEPiX Spring 2018 Workshop

Name: HEPiX Spring 2018 Workshop
Start: 2018-05-14T08:30:00-05:00
End: 2018-05-18T18:00:00-05:00
Location: University of Wisconsin-Madison

14 May 2018, 08:30 → 18 May 2018, 18:00 America/Chicago

Chamberlin Hall (University of Wisconsin-Madison)

Chamberlin Hall

University of Wisconsin-Madison

Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216

Helge Meinhard (CERN), Tony Wong (Brookhaven National Laboratory)

Description

HEPiX Spring 2018 at University of Wisconsin, Madison, USA

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.

The workshop was hosted by the University of Wisconsin-Madison, USA. It was organized by the Physics Department and the Center for High Throughput Computing (CHTC).

HEPiX Spring 2018 was proudly sponsored by DELL and Kingstar Computer (KSC) at the silver level.

Silver Sponsors

Organisers

hepix-2018spring-support@hepix.org

Monday 14 May
- 08:30 → 09:00
  
  Registration 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 09:00 → 09:30
  Miscellaneous Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  
  HPC and AI Innovation Lab.mp4
  
  HPC_N_AI Innovation_Lab.HEPIX.pdf
  - 09:00
    
    Welcome to University of Wisconsin-Madison 15m
    
    HEPIXWelcome.pptx
  - 09:15
    
    Logistics and announcements 15m
    
    hepix_logistics.pdf
- 09:30 → 10:30
  Site reports Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 09:30
    
    Purdue University CMS T2 site report 20m
    
    Through participation in the Community Cluster Program of Purdue University, our Tier-2 center has for many years been one of the most productive and reliable sites for CMS computing, providing both dedicated and opportunistic resources to the collaboration. In this report we will present an overview of the site, review the successes and challenges of the last year of operation, and outline the perspectives and plans for future developments.
    
    Speaker: Stefan Piperov (Purdue University (US))
    
    HEPiX2018_T2_US_Purdue.pdf
    
    Purdue Site Report.mp4
  - 09:50
    
    BNL Site Report 20m
    
    Updates from BNL since KEK meeting
    
    Speaker: David Yu (BNL)
    
    BNL_Site_Report_HEPiX_Spring_2018.pdf
    
    BNL Site Report.mp4
  - 10:10
    
    AGLT2 Site Update 20m
    
    We will present an update on our site since the Fall 2017 report, covering our changes in software, tools and operations.
    
    Some of the details to cover include the enabling of IPv6 for all of our AGLT2 nodes, our migration to SL7, exploration of the use of Bro/MISP at the UM site, the use of Open vSwitch on our dCache storage and information about our newest hardware purchases and deployed middleware.
    
    We conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    AGLT2SiteReport-HEPiXSpring2018.pdf
    
    AGLT2SiteReport-HEPiXSpring2018.pptx
    
    AGLT2 Site Report.mp4
- 10:30 → 11:00
  
  Coffee break 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 11:00 → 12:00
  Site reports Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 11:00
    
    University of Nebraska CMS Tier2 Site Report 20m
    
    Updates from T2_US_Nebraska covering our experiences operating CentOS 7 + Docker/Singularity, random dabbling with SDN to better HEP transfers, involvement with the Open Science Grid, and trying to live the IPv6 dream.
    
    Speaker: Garhan Attebury (University of Nebraska Lincoln (US))
    
    Nebraska Site Report.mp4
    
    T2_US_Nebraska HEPiX 2018 Spring.pdf
  - 11:20
    
    PDSF Site Report 20m
    
    PDSF, the Parallel Distributed Systems Facility, was moved to Lawrence Berkeley National Lab from Oakland CA in 2016. The cluster has been in continuous operation since 1996 serving high energy physics research. The cluster is a tier-1 site for Star, a tier-2 site for Alice and a tier-3 site for Atlas.
    
    This site report will describe lessons learned and challenges met, when migrating from Univa GridEngine to the Slurm scheduler, experiences running containerized software stacks using Shifter, as well as upcoming changes to systems management and the future of PDSF.
    
    Speaker: Georg Rath (Lawrence Berkeley National Laboratory)
    
    PDSF Site Report.mp4
    
    PDSF Site Report.pdf
  - 11:40
    
    IHEP Site Report 20m
    
    The computing center of IHEP maintains a HTC cluster with 10,000 cpu cores and a site including about 15,000 CPU cores and more than 10PB storage. The presentation will talk about the its progress and next plan of IHEP Site.
    
    Speaker: Jingyan Shi (IHEP)
    
    ihep_site_report_2018spring.pdf
    
    ihep_site_report_2018spring.pptx
    
    IHEP Site Report.mp4
- 12:00 → 14:00
  
  Lunch break 2h Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 14:00 → 15:20
  End-user services and operating systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 14:00
    
    Run the latest software on a stable enviroment - A simpler way 20m
    
    What do our users want?
    One group wants the latest version of foo, but the stable version of bar.
    The other group wants the latest version of bar, but the old version of foo.
    
    What have we tried?
    SCL
    SCL's are great in theory. But in practice they are hard for the packagers. They also make the developers have to jump through several hoops. If something was developed in an SCL enviroment, it often wouldn't translate into a non-SCL enviroment.
    Containers
    Containers are also great in theory. They are especially great for allowing people to run their code in the exact enviroment on different machines. But for developers, they still have to jump through many hoops to develop on SCL's. And often we restrict what version of foo and bar we have in those containers.
    Tarballs and Zip files
    We admit it, that's what our developers are really doing. They are pulling down what they need, from who knows where, writting their code around it, and then asking us (or you) to support it. This is very bad for security, as well as for administrators trying to duplicate the enviroment on another machine.
    
    Modules - The simpler way
    The easiest way to explain modules is they are like yum groups, done right.
    Using dnf admins are able to install nodejs 6, or nodejs 9. They aren't installed over in /opt/, they are installed in their usual place in /usr/.
    Are you ready to move from python36 to python37? Just change the version of the python module and dnf with change it all.
    Users will only be allowed to have one version of python, or nodejs, just like normal python or nodejs. But the developers (or their code) won't have to do anything special to use them.
    
    This presentation will go through our up and downs as we've worked on getting a new technology created to help our users.
    
    https://docs.pagure.org/modularity/
    
    Speaker: Troy Dawson
    
    latest-on-stable.odp
    
    latest-on-stable.pdf
  - 14:20
    
    Scientific Linux update 20m
    
    Updates on the status of Scientific Linux
    
    Speaker: Bonnie King (Fermilab)
    
    SL-HEPiX-May-2018.pdf
    
    SL-HEPiX-May-2018.pptx
  - 14:40
    
    CC-IN2P3 User Portal 20m
    
    CC-IN2P3 is one of the largest academic data centers in France. Its main mission is to provide the particle, astroparticle and nuclear physics community with IT services, including largescale compute and storage capacities. We are a partner for dozens of scientific experiments and hundreds of researchers that make a daily use of these resources. The CC-User Portal project's goal is to develop a web portal providing the users with a single-entry point to monitor their activity and incidents, to receive vital information and have the necessary links in order to access and use our services efficiently.
    
    During HEPiX Fall 2017, we presented our first developments and exchanged with the community our thoughts on how to display the information to the users. With this presentation we would like to show what is now deployed in production and which features we are already developing to complete the web portal offer and meet the users' needs.
    
    Speaker: Gino Marchetti (CNRS)
    
    CCIN2P3_UserPortal_Marchetti_HEPiX2.pdf
    
    CC IN2P3 User Portal.mp4
  - 15:00
    
    TRIDENT Tool for collecting and understanding performance hardware counters 20m
    
    Trident, a tool to use low level metrics derived from hardware
    counters to understand Core, Memory and I/O utilisation and bottlenecks.
    The collection of time series of these low level counters does not
    induce significant overhead to the execution of the application.
    
    The Understanding Performance team is investigating on a new node
    characterisation tool, ¹Trident¹, that can look at various low level
    metrics with respect to the Core, Memory and I/O. Trident uses a three
    pronged approach to analysing node¹s utilisation and understand the
    stress on different parts of the node based on the given job. Currently
    core metrics such as memory bandwidth, core utilization, active processor
    cycles, etc., are being collected. Interpretation of this data is often
    non intuitive. The tool preprocesses the data to make the data usable by
    developers and site managers without the need of in-depths expertise of
    CPU and systems architecture details.
    
    Speaker: Servesh Muralidharan (CERN)
    
    HEPiX-Worskshop-May18.pdf
    
    HEPiX-Worskshop-May18.pptx
    
    TRIDENT - collecting and understanding performance hardware counters.mp4
- 15:20 → 15:50
  
  Coffee break 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 15:50 → 17:30
  Networking and security Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 15:50
    
    WLCG/OSG Networking Update 20m
    
    WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and
    resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area is a partner of the WLCG effort
    and is focused on being the primary source of networking information for its partners and constituents. We will report on the changes and updates that have
    occurred since the last HEPiX meeting.
    
    The WLCG Network Throughput working group was established to ensure sites and experiments can better understand and fix networking issues. In addition,
    it aims to integrate and combine all network-related monitoring data collected by the OSG/WLCG infrastructure from both network and transfer systems.
    This has been facilitated by the already existing network of the perfSONAR instances that is being commissioned to operate in full production.
    
    We will provide a status update on the LHCOPN/LHCONE perfSONAR infrastructure as well as cover recent changes in the higher level services due to reorganisation of the OSG. This will include details on the central service migrations, updates to the dashboards; updates and changes to the Web-based mesh configuration system and details on the newly established pipeline for processing perfSONAR results.
    
    In addition, we will provide an overview of the recent major network incidents that were investigated with the help of perfSONAR infrastructure and provide
    information on changes that will be included in the next perfSONAR Toolkit version 4.1. We will also cover the status of our WLCG/OSG deployment
    and provide some information on our future plans.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    OSG_WLCG-net-update-HEPiXSpring2018.pdf
    
    OSG_WLCG-net-update-HEPiXSpring2018.pptx
    
    WLCG, OSG Networking Update.mp4
  - 16:10
    
    Deployment of IPv6 on WLCG - an update from the HEPiX IPv6 working group 20m
    
    For several years the HEPiX IPv6 Working Group has been testing WLCG services to ensure their IPv6 compliance. The transition of WLCG central and storage services to dual-stack IPv4/IPv6 is progressing well, thus enabling the use of IPv6-only CPU resources as agreed by the WLCG Management Board and presented by us at previous HEPiX meetings.
    
    By April 2018, all WLCG Tier 1 data centres have provided access to their services over IPv6. The LHC experiments have requested all WLCG Tier 2 centres to provide dual-stack access to their storage by the end of LHC Run 2.The working group, driven by the requirements of the LHC VOs to be able to use IPv6-only opportunistic resources, continues to encourage wider deployment of dual-stack services and has been monitoring the transition. We will present the progress of the transition to IPv6.
    
    Speaker: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
    
    Deployment of IPv6 on WLCG.mp4
    
    Kelsey14may18.pdf
    
    Kelsey14may18.pptx
  - 16:30
    
    IPv6 Deployment Experience at the GridKa Tier-1 at KIT 20m
    
    Recently, we've deployed IPv6 for the CMS dCache instance at KIT. We've run into a number of interesting problems with the IPv6 setup we had originally chosen. The presentation will detail the lessons we've learned and the resulting redesign of our IPv6 deployment strategy.
    
    Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
    
    gridka-ipv6-20180511.pdf
    
    IPv6 Deployment at GridKa T1 at KIT.mp4
  - 16:50
    
    Computer Security Update 20m
    
    This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks to and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing and around computer security.
    
    This talk is based on contributions and input from the CERN Computer Security Team.
    
    Speaker: Stefan Lueders (CERN)
    
    Computer Security Update.mp4
    
    Situational Awareness @ HEPix (2018).pdf
    
    Situational Awareness @ HEPix (2018).pptx
- 18:00 → 21:00
  
  Welcome reception 3h
Tuesday 15 May
- 08:30 → 09:00
  
  Registration 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 09:00 → 10:20
  Site reports Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 09:00
    
    DESY Site Report 20m
    
    News about what happened at DESY during the last months
    
    Speaker: Timm Essigke (DESY)
    
    HepixSpring2018.odp
    
    HepixSpring2018.pdf
  - 09:20
    
    CERN Site Report 20m
    
    News from CERN since the HEPiX Fall 2017 workshop at KEK, Tsukuba, Japan.
    
    Speaker: Andrei Dumitru (CERN)
    
    CERN Site Report - HEPiX Spring 2018.pdf
  - 09:40
    
    INFN-T1 Site report 20m
    
    A brief update on INFN-T1 site, what is our current status and what is still to be done to reach 100% functionality
    
    Speaker: Stefano Dal Pra (INFN)
    
    20180514_InfnT1_site_report.pptx
    
    INFN T1 Site Report.mp4
  - 10:00
    
    PIC site report 20m
    
    News from PIC since the HEPiX Fall 2017 workshop at KEK, Tsukuba, Japan.
    
    Speaker: Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
    
    PIC_Report_HEPIX_Madison.pdf
    
    PIC Site Report.mp4
- 10:20 → 10:50
  
  Coffee break 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 10:50 → 11:50
  Site reports Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 10:50
    
    Nikhef Site Report 20m
    
    Site report from Nikhef
    
    Speaker: Bart van der Wal (NIkhef)
    
    Nikhef_Site_Report_1_2.pdf
    
    Nikhef_Site_Report_1_2.pptx
    
    Nikhef Site Report.mp4
  - 11:10
    
    RAL Site Report 20m
    
    Update on activities at RAL
    
    Speaker: Martin Bly (STFC-RAL)
    
    2018-05 HEPiX Madison - RAL Site Report(2).pdf
    
    2018-05 HEPiX Madison - RAL Site Report(2).pptx
    
    2018-05 HEPiX Madison - RAL Site Report.pdf
    
    2018-05 HEPiX Madison - RAL Site Report.pdf
    
    2018-05 HEPiX Madison - RAL Site Report.pptx
    
    2018-05 HEPiX Madison - RAL Site Report.pptx
    
    RAL Site Report.mp4
  - 11:30
    
    FZU site report 20m
    
    Recently we deployed new cluster with worker nodes with 10 Gbps network connection
    and new disk servers for DPM and xrootd. I will also discuss migration from Torque/Maui to HTCondor batch system.
    
    Speaker: Jiri Chudoba (Acad. of Sciences of the Czech Rep. (CZ))
    
    FZU_site_report-chudoba.pdf
    
    FZU_site_report-chudoba.pptx
    
    FZU Site Report.mp4
- 11:50 → 12:10
  Networking and security Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 11:50
    
    News from the world of Federated Identity Management and AAI 20m
    
    There are many ongoing activities related to the development and deployment of Federated Identities and AAI (Authentication and Authorisation Infrastructures) in research communities and cyber Infrastructures including WLCG and others. This talk will give a high-level overview of the status of at least some of the current activities in FIM4R, AARC, WLCG and elsewhere.
    
    Speaker: Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
    
    Kelsey15may18.pdf
    
    Kelsey15may18.pptx
    
    News from the world of Federated Identity Management and AAI.mp4
- 12:10 → 14:00
  
  Lunch break 1h 50m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 14:00 → 15:00
  Networking and security Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 14:00
    
    Network Functions Virtualisation Working Group Update 20m
    
    High Energy Physics (HEP) experiments have greatly benefited from a strong relationship with Research and Education (R&E) network providers and thanks to the projects such as LHCOPN/LHCONE and REN contributions, have enjoyed significant capacities and high performance networks for some time. RENs have been able to continually expand their capacities to over-provision the networks relative to the experiments needs and were thus able to cope with the recent rapid growth of the traffic between sites, both in terms of achievable peak transfer rates as well as in total amount of data transferred. For some HEP experiments this has lead to designs that favour remote data access where network is considered an appliance with almost infinite capacity. There are reasons to believe that the network situation will change due to both technological and non-technological reasons starting already in the next few years. Various non-technological factors that are in play are for example anticipated growth of the non-HEP network usage with other large data volume sciences coming online; introduction of the cloud and commercial networking and their respective impact on usage policies and securities as well as technological limitations of the optical interfaces and switching equipment.
    
    As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced, collectively called Network Functions Virtualisation (NFV), ranging from software-based switches such as OpenVSwitch, Software Defined Network (SDN) controllers such as OpenDaylight up to full platform based open solutions such as Cumulus Linux. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon. In this talk we’ll give an update on the Network Functions Virtualisation (NFV) WG that was established at the last HEPiX meeting. We'll provide details on its mandate, objectives, organisation of work as well as areas of interest that were already discussed and plans for the near-term future.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    HEPiX Network Functions Virtualisation Working Group Update.pdf
    
    Network Functions Virtualization Working Group Update.mp4
  - 14:20
    
    Recent status of KEK network 20m
    
    The Belle II detector is already taking data by cosmic ray test and is about to record data by beam. The importance of the network connectivity becomes higher than all other experiments in KEK. It is not only for the data transfer but also for researchers who are watching the condition of detectors from off sites.
    We will report the present status of the campus network and the upgrade plan in this summer.
    
    Speaker: Soh Suzuki
    
    HEPiX-2018-Spring-KEK-yamagata-20180516.pdf
    
    Recent status of KEK network.mp4
  - 14:40
    
    Cyberinfrastructure and China Science and Technology Cloud Plan in Chinese Academy of Sciences 20m
    
    Chinese Academy of Sciences has 104 research institutes, 12 branch academies, three universities and 11 supporting organizations in 23 provincial-level areas throughout the country. These institutions are home to more than 100 national key labs and engineering centers as well as nearly 200 CAS key labs and engineering centers. Altogether, CAS comprises 1,000 sites and stations across the country.
    
    As the science research methods develops, we are coming to the fourth paradigm of the science research—Data intensive science research. Data, compute and the link of them, network has played an more important role in science research. All the institutes have various demands in cyberinfrastructure.
    
    China Science and Technology Cloud(CSTC) was constructed in order to meet the needs of the research institutes under the Chinese Academy of Sciences and even the whole scientific and technological community in China. It is an IT-based resources management and cloud service platform with smart resource dispatching and user self-service. It constructs a new-generation information infrastructure which is at high speed, dynamic and self-adaptive; speed up the state-level high-performance computing environment development, and achieve one-stop service for scientific computing. It integrates cloud computing and cloud storage facilities to enhance data recovery capabilities of the whole academy’s scientific data assets and application systems. It also integrates and gather various scientific and technological information resources, and sets up a smart cloud service platform to provide scientific and technological resources and information services.
    
    CSTC has maintained long-term partnerships with world-class research organizations such as U.S. National Center for Supercomputing Applications and Forschungszentrum Jülich. The predecessor of CSTC, China Science Technology Network (CSTNET) was one of the founding organizations of the Global Ring Network for Advanced Application Development (GLORIAD) which connect the North American with 10Gb/s bandwidth.
    Based on CSTC, we build an expandable basic environment which can carry Big Data resources and support Big Data analysis and processing, realizing management and on-line processing of massive scientific data; targeting fields of relevant disciplines, as well as major research projects and special projects of the state and the academy, to deploy a batch of Big Data driven scientific research and application service in the fields of astronomy, biology, high-energy physics, etc..
    
    The CSTC plans to build a test bed for network research and big science research, for example a Dynamic Virtual Dedicate Network for VLBI research, a DMZ for Advanced Light Source, an Open Network Environment for LHC. We are looking forward for further cooperation with global science research institutes.
    
    Speaker: Dr YANG WANG (Computer Network Information Center, Chinese Academy of Sciences)
    
    Cyberinfrastructure and China Science and Technology Cloud Plan in Chinese Academy of Sciences.mp4
    
    Cyberinfrastructure and China Science and Technology Cloud Plan in Chinese Academy of Sciences.pdf
- 15:00 → 15:45
  
  Coffee break 45m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 15:45 → 16:25
  Networking and security Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 15:45
    
    Network status at IHEP and LHCONE progress in China 20m
    
    Present the Network status at IHEP and LHCONE progress in China
    
    Speaker: Shan Zeng (Chinese Academy of Sciences (CN))
    
    Network status at IHEP and LHCONE progress in China@HEPiX 2018 Spring.pdf
    
    Network status at IHEP and LHCONE progress in China.mp4
  - 16:05
    
    SDN implementation plan in China Science and Technology Network 20m
    
    Scientific activities generate huge data and need to transfer them to some places to research. Traditional networking infrastructure has a defined architecture and can not satisfy such real-time and high-quality transferring requirements.
    
    China Science and Technology Network(CSTNet) was constructed in order to meet the needs of the research institutes under the Chinese Academy of Sciences and even the whole scientific and technological community in China. CSTNet has planned to construct a new-generation infrastructure using some new technology such as SDN, NFV and etc.
    
    CSTNet has started to build a new NOC running system to achieve real-time measurements, monitoring and management of the network flows. The new NOC running system can provide an user interface to enable user to submit networking requirements dynamic and self-adaptive, some network configure task can become effective on time no need to connect to console board.
    
    Dynamic network management will accelerate the integrating of cloud computing and cloud storage facilities because of faster data transfer command delivering and implementing to service research in the fields of astronomy, biology, high-energy physics, etc.
    
    Speaker: JINGJING LI
    
    SDN implementation plan in China Science and Technology Network.mp4
    
    SDN implementation plan in China Science and Technology Network.pdf
- 16:25 → 17:05
  Storage and file systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 16:25
    
    OpenAFS Release Team report 20m
    
    A report from the OpenAFS Release Team on recent OpenAFS releases, including the OpenAFS 1.8.0 release, the first major release in several years. Topics include acknowlegement of contributors, descriptions of issues recently resolved, and a discussion of commits under review for post 1.8.0.
    
    Speaker: Mr Michael Meffie (Sine Nomine Associates)
    
    hepix-2018-openafs-rel-team.pdf
    
    OpenAFS Release Team Report.mp4
  - 16:45
    The OpenAFS Foundation 20m
    
    We would like to have one of the Board members of The OpenAFS Foundation, Inc, speak about this 501(c)(3), US-based, non-profit organization dedicated to fostering the stability and growth of OpenAFS, an open source implementation of the AFS distributed network filesystem. The OpenAFS Foundation adopted a three-fold mission: to attract and increase the community of OpenAFS users, to foster the OpenAFS community of experts, and to nurture and evolve the OpenAFS technology; each will be explained briefly.
    
    We would like to ask for help from the scientific community and ask its researchers to:
    
    Contribute to the OpenAFS code, as such contributions are critical for the survival and improvement of the OpenAFS technology. The Foundation has obtained insurance to protect all contributors from potential liability and infringement of IP lawsuits.
    
    Reviewing code, as their feedback is not only valuable and appreciated in several ways.
    
    Write documentation for already existing code, which is desperately needed.
    Communicate the changes to their computing needs, and how they would like OpenAFS to be even more useful to them in the future.
    
    Help craft code specifications and/or code design.
    
    Become a guardian and as such, an active, long-term champion shaping the future viability and well-being of OpenAFS technology.
    
    Donate and/or identify organizations possibly willing and able to contribute funds to sustain a lean operation and/or to fund specific development efforts to be assigned in an open bid process.
    
    The presenter will be either Todd DeSantis or Margarete Ziemer, both Directors and Board members of the Foundation.
    
    Speaker: Dr Margarete Ziemer (Sine Nomine Associates)
    
    HEPiX2018-OpenAFSFoundation-PresentationSlides.pdf
    
    HEPiX2018-OpenAFSFoundation-PresentationSlides.pptx
    
    The OpenAFS Foundation.mp4
- 17:15 → 19:00
  
  Board meeting 1h 45m Chamberlin Hall (room 4274)
  
  Chamberlin Hall (room 4274)
  
  University of Wisconsin-Madison
  
  By invitation
Wednesday 16 May
- 08:30 → 09:00
  
  Registration 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 09:00 → 10:20
  Storage and file systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 09:00
    WLCG Archival Storage group report 20m
    
    The group has been formed to tackle two main themes
    
    establish a knowledge-sharing community for those operating archival storage for WLCG
    
    understand how to monitor usage of archival systems and optimise their exploitation by experiments
    
    I will report on the recent activities of this group.
    
    Speaker: Vladimir Bahyl (CERN)
    
    WLCG Archival Storage Group Report.mp4
    
    WLCG_Archival_Storage_Group_report.pdf
    
    WLCG_Archival_Storage_Group_report.pptx
  - 09:20
    
    The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center 20m
    
    The computing center GridKa is serving the ALICE, ATLAS, CMS and LHCb experiments as one of the biggest WLCG Tier-1 centers world wide with compute and storage resources. It is operated by the Steinbuch Centre for Computing at Karlsruhe Institute of Technology in Germany. In April 2017 a new online storage system was put into operation. In its current stage of expansion it offers the HEP experiments a capacity of 23 Petabytes of online storage distributed over 16 redundant storage servers with 3900 disks and 50TB SSDs. The storage is connected via two redundant infiniband fabrics to 44 file servers which in turn are connected each via 40Gbit/s and several 100Gbit/s ethernet uplinks to the GridKa backbone network. The whole storage is partitioned into few large file systems, one for each experiment, using IBM Spectrum Scale as software-defined-storage base layer. The system offers a combined read-write performance of 70Gbyte/s. It can be scaled transparently both in size and performance allowing to fulfill the growing needs especially of the LHC experiments for online storage in the coming years.
    In this presentation we discuss the general architecture of the storage system and present first experiences with the performance of the system in production use. In addition we present the current plans for expansion of the system.
    
    Speaker: Jan Erik Sundermann (Karlsruhe Institute of Technology (KIT))
    
    20180516 - HEPiX 2018 Madison.pdf
    
    20180516 - HEPiX 2018 Madison.pptx
    
    The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center.mp4
  - 09:40
    
    Next generation of large-scale storage services at CERN 20m
    
    CERN IT Storage (IT/ST) group leads the development and operation of large-scale services based on EOS for the full spectrum of use-cases at CERN and in the HEP community. IT/ST group also provides storage for other internal services, such as Open Stack, using a solution based on Ceph. In this talk we present current operational status, ongoing development work and future architecture outlook for next generation storage services for the users based on EOS — a technology developed and integrated at CERN.
    
    EOS is the home for all physics data-stores for LHC and non-LHC experiments (at present 250PB storage capacity). It is designed to operate at high data rates for experiment data-taking while running concurrent complex production work-loads. EOS also provides a flexible distributed storage back-end and architecture with plugins for tape archival (CTA - evolution and replacement for CASTOR), synchronization&sharing services (CERNBox) and general-purpose filesystem access for home directories (FUSE for Linux and SMB Gateways for Windows and Mac).
    
    CERNBox is the cloud storage front-end for desktop,mobile and web access focused on personal user files, general-purpose project spaces and smaller physics datasets (at present 12K user accounts and 500M files). CERNBox provides simple and uniform access to storage on all modern devices and operating systems. CERNBox is also hub for integration with other services: Collaborative editing — MS Office365 and alternatives: Collabora and OnlyOffice; Web-based analysis — SWAN Jupyter Notebooks with access to computational resources via Spark and Batch; and software distribution via CVMFS.
    
    This storage service ecosystem is designed to provide “total data access”: from end-user devices to geo-aware data lakes for WLCG and beyond. It also provides a foundation for strategic parternships (AARNet, JRC, …), new communities such as CS3 (Cloud Storage and Synchronization Services) and new application projects such as Up2University (cloud storage ecosystem for education). CERN Storage technology has been showcased to work with commercial cloud providers such as Amazon, T-Systems (Helix Nebula) or COMTRADE (Openlab) and there is an increasing number of external sites testing the CERN storage service stack in their local computing centers.
    
    This strategy proves very successful with the users and as a result storage services at CERN see exponential growth: CERNBox alone has grown by 450% in 2017. Growing overall demand drive the evolution of the service design and implementation of the full ecosystem: EOS core storage as well as CERNBox and SWAN. Recent EOS improvements include new distributed namespace to provide scaling and high-availability; new robust FUSE module providing client-side caching, lower latency and more IOPs; new workflow engine and many more. CERNBox is moving to micro-service oriented architecture and SWAN is tested with Kubernetes container orchestration.
    
    New developments come together with a constant effort to streamline QA, testing and documentation as well as reduce manual configuration and operational effort for managing large-scale storage services.
    
    Speaker: Jakub Moscicki (CERN)
    
    Hepix-2018-NewGenStorageCERN.pdf
    
    Next generation of large-scale storage services at CERN.mp4
  - 10:00
    AFS Update: Spring 2018 20m
    
    Last May it was announced "AFS" was awarded the 2016 ACM System Software Award. .This presentation will discuss the current state of the AFS file system family including:
    
    IBM AFS 3.6
    
    OpenAFS
    
    kAFS
    
    AuriStor File System
    
    IBM AFS 3.6 is a commercial product no longer publicly available.
    
    OpenAFS is fork from IBM AFS 3.6 available under the IBM Public License 1.0. The currently supported release branches are 1.6 and 1.8.
    
    AuriStorFS is a commercial file system that is backward compatible with both IBM AFS 3.6 and OpenAFS clients and servers. AFS cells hosted on AuriStorFS servers benefit from substantial improvements in
    
    kAFS is an AFS and AuriStorFS client distributed as part of the mainline Linux kernel distribution. kAFS shares no source code with IBM AFS or OpenAFS.
    
    Speaker: Mr Jeffrey Altman (AuriStor, Inc.)
    
    AFS Update: Spring 2018.mp4
    
    AFS Update - Spring 2018 (Office 365)
    
    AFS Update - Spring 2018 (PDF)
- 10:20 → 10:50
  
  Coffee break 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 10:50 → 11:50
  Storage and file systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 10:50
    
    Data To Network: building balanced throughput storage in a world of increasing disk sizes 20m
    
    The ever-decreasing cost of high capacity spinning media has resulted in a trend towards very large capacity storage ‘building blocks’. Large numbers of disks - with up to 60 drives per enclosure being more-or-less standard – indeed allow for dense solutions, maximizing storage capacity in terms of floor space, and can in theory be packed almost exclusively with disks. The result are building blocks with a theoretical gross capacity of about 180 TByte per unit height when employing 12 TByte disks. This density comes at a cost, though: getting the data to and from the disks, via front-end storage management software and through a network, has different scaling characteristics than the gross storage density, and as a result maintaining performance in terms of throughput per storage capacity is an ever more complex challenge. At Nikhef, and for the NL-T1 service, we aim to maintain 12MiB/s/TiB combined throughput, supporting at least 2 read- and 1 write stream per 100TiB netto storage, from any external network source down to the physical disks. Especially this combined read-write operational pattern poses challenges not usually found in commercial deployments. Yet this is the pattern most commonly seen for our scientific applications in the Dutch National e-Infrastructure.
    In this study we looked at each of the potential bottlenecks in such a mixed-load storage system: network throughput, limitations in the system bus between CPU, network card, and disk subsystem, at different disk configuration models (JBOD, erasure-encodings, hardware, and software RAID) and the effect on processor load in different CPU architectures. We present the results of different disk configurations and show the limitations of commodity redundancy technologies and how they affect processor load in both x86-64 and PowerPC systems, and how the corresponding system bus design impacts overall throughput.
    Combining network and disk performance optimizations we show how high-density commodity components can be combined to build a cost-cutting system without bottlenecks – offering constant-throughput multi-stream performance with over 700TiB netto in just 10U and able to keep a 100Gbps network link full – as a reference architecture for everything from a single Data Transfer Node down to a real distributed storage cluster.
    
    Speaker: Tristan Suerink (Nikhef National institute for subatomic physics (NL))
    
    Hepix-2018-Madison.pdf
  - 11:10
    
    Operating a large scale distributed XRootd cache across Caltech and UCSD 20m
    
    After the successful adoption of the CMS Federation an opportunity arose to cache xrootd requests in Southern California. We present the operational challenges and the lessons learned from scaling a federated cache (a cache composed of several independent nodes) first at UCSD and the scaling and network challenges to augment it to include the Caltech Tier 2 Site. In which would be a first of a kind multisite Xrootd cache which could potentially ease the data management of CMS.
    
    Speaker: Edgar Fajardo Hernandez (Univ. of California San Diego (US))
    
    Operating a large scale distributed XRootd cache across Caltech and UCSD.mp4
    
    SoCalCache.pdf
  - 11:30
    AFS and Linux Containers 20m
    
    One future model of software deployment and configuration is containerization.
    
    AFS has been used for software distribution for many decades. Its global file namespace, the @sys path component substitution macro which permits file paths to be platform-agnostic, and the atomic publication model ("vos release") have proven to be critical components of successful software distribution systems that scale to hundreds of thousands of systems and have survived multiple OS and processor architecture changes.
    
    The AuriStorFS security model consisting of combined-identity authentication, multi-factor authorization, and mandatory security policies permits a global name space to be shared between internal, dmz and cloud; and to store a mix of open and restricted data.
    
    The combination of Linux Containers, the global AFS namespace, and the AuriStorFS security model is powerful permitting the development of container based software deployments that can safely bridge internal, dmz and cloud with reduced risk of data leaks.
    
    This session will discuss the most recent updates to AuriStorFS and the Linux kernel implementation of AF_RXRPC socket family and (k)AFS filesystem. A demonstration will be included consisting of:
    
    Containers with binary executable files stored in /afs
    
    Containers mounting private AFS volume for scratch space
    
    AuriStorFS and (k)AFS file system implementations running side-by-side
    
    Linux namespaces for /afs
    
    AuriStorFS milestones since HEPiX Spring 2017 include:
    
    Successful migration and replication of volumes exceeding 5.5TB. The largest production volume so far is 50TB with a 250TB volume
    
    Deployment of a single AuriStorFS cell spanning an internal data center, AWS and GCP with more than 25,000 nodes for distribution of software and configuration data.
    
    Meltdown and Spectre remediation. In response to nearly 30% performance hit from Meltdown and Spectre the AuriStor team optimized the Rx stack, Ubik database and fileserver to reduce the number of syscalls by more than 50%
    
    AES-NI, SSSE3, AVX and AVX2 Intel processor optimization of AES256-CTS-HMAC-SHA1-96 cryptographic operations for kernel cache managers reduces computation time by 64%
    
    AF_RXRPC and kAFS highlights:
    
    IPv6 support for AuriStorFS
    
    dynamic root mount -o dyn
    
    @sys and @cell support
    
    multipage read and write support
    
    local hero directory caching
    
    per file acls (for AuriStorFS)
    
    server failover and busy volume retries
    
    Speaker: Mr Jeffrey Altman (AuriStor, Inc)
    
    AFS and Linux Containers.mp4
    
    Using /afs namespace and Linux Containers (Office 365)
    
    Using /afs namespace and Linux Containers (PDF)
- 11:50 → 12:30
  Computing and batch systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 11:50
    
    Benchmarking Working Group. An update 20m
    
    The benchmarking working group holds biweekly meeting. we are focusing on the health of HS06, fast benchmark and study of a new benchmark to replace HS06 since SPEC has moved to a new family of benchmark
    
    Speaker: Michele Michelotto (Università e INFN, Padova (IT))
    
    Benchmarking Working Group. An update..mp4
    
    michelotto-BenchmarkWG-Hepix2018-Madison.pdf
    
    michelotto-BenchmarkWG-Hepix2018-Madison.pptx
  - 12:10
    
    HSF-WLCG Cost and Performance Modeling Working Group 20m
    
    The working group has been established and is now working towards a cost and performance model that allows to quantitatively estimate the computing resources needed for HL-LHC and map them towards the cost at specific sites.
    The group has defined a short and medium term plan and identified the main tasks. Around the tasks teams with members from experiments and sites have formed and started concrete work. We will report on the goals and status of the working group.
    
    Speaker: Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
    
    16052018_Cost_and_Performance_Modeling_HEPiX_Madison_Workshop_JFlix.pdf
    
    HSF-WLCG Cost and Performance Modeling Working Group.mp4
- 12:30 → 14:00
  
  Lunch break 1h 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 14:00 → 15:40
  Computing and batch systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 14:00
    
    Changing Compute Landscape at Brookhaven 20m
    
    Computing is changing at BNL, we will discuss how we are restructuring our Condor pools, integrating them with new tools like Jupyter notebooks, and other resources like HPC systems run with Slurm.
    
    Speaker: William Edward Strecker-Kellogg (Brookhaven National Laboratory (US))
    
    Changing Compute Landscape at Brookhaven.mp4
    
    hepix-2018-spring-bnl.pdf
  - 14:20
    
    News from the DESY batch-clusters 20m
    
    The batch facilities at DESY are currently enlarged significantly while at the same time partly migrated from SGE to HTCondor.
    This is a short overview of what is going on on site in terms of GRID-, local- and HPC cluster development.
    
    Speaker: Christoph Beyer
    
    HEPIX_2018_talk.pdf
    
    News from the DESY batch-clusters.mp4
  - 14:40
    
    Batch on EOS Extra Resources moving towards production 20m
    
    At the last HEPix meeting we described the results of a proof of concept study to run batch jobs on EOS disc server nodes. By now we have moved forward towards a production level configuration and the first pre-production nodes have been setup. Beside the relevance for CERN this is also a more general step towards a hyper-converged infrastructure.
    
    Speaker: Markus Schulz (CERN)
    
    Batch on EOS Extra Resources moving towards production.mp4
    
    BatchOStoreHEPIX18v2.pdf
    
    BatchOStoreHEPIX18v2.pptx
  - 15:00
    
    Techlab benchmarking web portal 20m
    
    Techlab, a CERN IT project, is a hardware lab providing experimental systems and benchmarking data for the HEP community.
    
    Techlab is constantly on the lookout for new trends in HPC, cutting-edge technologies and alternative architectures, in terms of CPUs and accelerators.
    We believe that in the long run, a diverse offer and a healthy competition in the HPC market will serve science in particular, computing in general, and everyone in the end.
    For this reason, we encourage the use of not-quite-there-yet alternatives to the standard x86 quasi-monopoly, in the hope that in the near future, such alternative architectures can proudly compete, on an equal footing.
    We buy hardware, set it up, test and benchmark it, then make it available to members of the HEP community for porting and testing their scientific applications and algorithms. On a best-effort basis, we try and help users make the best out of the hardware we provide.
    
    To serve as basis for hardware choice, we run extensive benchmarks on all the systems we can get our hands on, and share the results to help others make fully informed choices when buying hardware that will fit their computing needs. As a means to achieve this, we developed a benchmarking web portal, open to everyone in the HEP community, to upload and publish data about all kinds of hardware. It was built with security in mind, and provides fine-grained access control to encourage even people working on yet-unreleased hardware to contribute.
    As Techlab cannot possibly buy and test everything, it is our hope that this portal gets used by other HEP labs, and the database we build together becomes the 'one-stop shop' for benchmarking.
    
    This presentation both gives an overview of Techlab's benchmarking web portal — what and whom it is designed for, what we hope to achieve with it — and delves into the technology choices of the implementation.
    
    Speaker: Maxime Reis (CERN)
    
    hardwarelabs_benchmarking_website.pdf
    
    HEPiX_just_in_case_the_demo_effect_messes_with_me.mov
    
    Techlab benchmarking web portal.mp4
  - 15:20
    
    What's new in HTCondor? What is upcoming? 20m
    
    he goal of the HTCondor team is to to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Increasingly, the work performed by the HTCondor developers is being driven by its partnership with the High Energy Physics (HEP) community.
    
    This talk will present recent changes and enhancements to HTCondor, including details on some of the enhancements created for the forthcoming HTCondor v8.8.0 release, as well as changes created on behalf of the HEP community. It will also discuss the upcoming HTCondor development roadmap, and seek to solicit feedback on the roadmap from HEPiX attendees.
    
    Speaker: Todd Tannenbaum (University of Wisconsin Madison (US))
    
    TannenbaumT_WhatsNew_HEPiX_Spring_2018.pdf
    
    TannenbaumT_WhatsNew_HEPiX_Spring_2018.pptx
    
    What's new in HTCondor? What is upcoming?.mp4
- 15:40 → 16:10
  
  Coffee break 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 16:10 → 17:50
  Computing and batch systems Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 16:10
    
    PDSF - Current Status and Migration to Cori 20m
    
    PDSF, the Parallel Distributed Systems Facility, has been in continuous operation since 1996 serving high energy physics research. It is currently a tier-1 site for Star, a tier-2 site for Alice and a tier-3 site for Atlas. We are in the process of migrating PDSF workload from commodity cluster to the Cori a Cray XC40 system.  The process will involve preparing containers that will allow PDSF community to effectively move workloads between systems and other HPC centers. We are in the discovering process of optimizing serial jobs in a parallel environment. The goal is to minimize service interruptions as we shut down the existing PDSF by the second half of 2019.
    
    Speaker: Tony Quan (LBL)
    
    Hepix_Spring2018_TonyQuan.pdf
    
    PDSF - Current Status and Migration to Cori.mp4
  - 16:30
    
    Swiss HPC Tier-2 @ CSCS 20m
    
    For the past 10 years, CSCS has been providing computational resources for the ATLAS, CMS, and LHCb experiments on a standard commodity cluster.
    The High Luminosity LHC upgrade (HL-LHC) presents new challenges and demands with a predicted 50x increase in computing needs over the next 8 to 10 years. High Performance Computing capabilities could help to equalize the computing demands due to there ability to provide specialized hardwre and economies of scale. For the past year, CSCS has been running the Tier-2 workload for these experiments on the flagship system Piz Daint, a Cray XC system.
    
    Speaker: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))
    
    Swiss HPC Tier-2 @ CSCS.mp4
    
    Swiss HPC Tier2 Hepix.pdf
    
    Swiss HPC Tier2 Hepix.pptx
  - 16:50
    
    HPL and HPCG Benchmark on BNL linux farm and SDCC 20m
    
    HPL and HPCG Benchmark on Brookhaven National Laboratory SDCC clusters and various generations of Linux Farm nodes has been conducted and compared with HS06 results. While HPL results are more aligned with CPU/GPU performance. HPCG results are impacted by memory performances as well.
    
    Speaker: Dr Zhihua Dong
    
    HPL and HPCG Benchmark on BNL linux farm and SDCC.mp4
    
    HPL-HPCG-BNL.pdf
  - 17:10
    
    Fast Distributed Image Reconstruction using CUDA/MPI 20m
    
    In this work, we present a fast implementation for analytical image reconstruction from projections, using the so-called "backprojection-slice theorem" (BST). BST has the ability to reproduce reliable image reconstructions in a reasonable amount of time, before taking further decisions. The BST is easy to implement and can be used to take fast decisions about the quality of the measurement, i.e., sample environment, beam-line conditions, among others. A synchrotron facility able to measure a three-dimensional dataset Y within few seconds, needs a fast reconstruction algorithm able to provide a fast "preview" of the tomography within the same amount of time. If the experimental conditions are not satisfactory, the quality of the reconstruction will decrease, and the researcher can decide either to make another scan, or to process later the data using advanced reconstruction algorithms or even high quality segmentation methods. The difficulty here is that inversion algorithms depends on the backprojection operator, which is defined as an average through all the x-rays passing at a given pixel. Backprojection presents a high computational complexity of $O(N^3)$ for an image of $N^2$ pixels. The brute-force approach to compute the backprojection operator can be made extremely slow, even using a GPU implementation. Sophisticated ray-tracing strategies can also be used to make the running time faster and others analytical strategies reduce the backprojection complexity to $O(N^2\log N)$. The BST approach have the same low complexity of $O(N^2 \log N)$ although easier to implement than his competitors, producing less numerical artifacts and following a more traditional "gridding strategy".
    
    Speaker: Mr Fernando Furusato (LNLS/CNPEM)
    
    Fast Distributed Image Reconstruction using CUDA MPI.mp4
    
    FernandoFurusatoHEPiX2018.pdf
- 18:00 → 21:00
  
  Banquet 3h
Thursday 17 May
- 08:30 → 09:00
  
  Registration 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 09:00 → 10:20
  IT facilities Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 09:00
    
    ExDeMon: a new scalable monitoring tool for the growing CERN infrastructure 20m
    
    When monitoring an increasing number of machines, infrastructure and tools need to be rethinked. A new tool, ExDeMon, for detecting anomalies and raising actions, has been developed to perform well on this growing infrastructure. Considerations of the development and implementation will be shared.
    
    Daniel has been working at CERN for more than 3 years as Big Data developer, he has being implementing different tools for monitoring the computing infrastructure in the organisation.
    
    Speaker: Daniel Lanza Garcia (CERN)
    
    ExDeMon: a new scalable monitoring tool for the growing CERN infrastructure.mp4
    
    ExDeMon HEPiX.pdf
  - 09:20
    
    BNL New Data Center - Status and Plans 20m
    
    BNL is planning a new on-site data center for its growing portfolio of programs in need of scientific computing support. This presentation will provide an update on the status and plans for this new data center.
    
    Speaker: Tony Wong (Brookhaven National Laboratory)
    
    BNL New Data Center - Status and Plans.mp4
    
    BNL_New_Data_Center–Status_and_Plans.pdf
    
    BNL_New_Data_Center–Status_and_Plans.pptx
  - 09:40
    
    Planning new datacenter network architecture 20m
    
    In scope of the Wigner Datacenter cloud project we are consolidating our network equipment. According to our plans we would like to purchase 100 Gbps datacenter switches in order to anticipate our current and future needs. We need automated, vendor neutral and easily operable network. This presentation highlights our requirements and design goals, candidates we have tested in our lab. We take the opportunity here to introduce our knowledge lab initiative where we can expand the scope of testing solutions.
    
    Speaker: Ms Szilvia Racz (Wigner Datacenter)
    
    Planning new datacenter network architecture.mp4
    
    Wigner_Datacenter_Planning_new_datacenter_network_architecture.pdf
  - 10:00
    
    INFN-T1 flooding report 20m
    
    On November 9 2017, a major flooding occurred in the computing rooms: this has turned into a down of all the services for a prolonged period of time.
    In this talk we will go through all the issues we faced in order to recover the services in the quickest and most efficient way; we will analyze in detail the incident and all the steps made to recover the computing rooms, electrical power, network, storage and farming.
    Moreover, we will discuss the hidden dependencies among services discovered during the recovery of the systems and will detail how we solved them.
    
    Speaker: Stefano Dal Pra (INFN)
    
    20180514_InfnT1_flooding_report.pptx
    
    INFN-T1 flooding report.mp4
- 10:20 → 11:05
  
  Coffee break 45m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 11:05 → 11:45
  IT facilities Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 11:05
    
    Evolution of technology and markets 20m
    
    A short review of how technology and markets have evolved in areas relevant for HEP computing
    
    Speaker: Helge Meinhard (CERN)
    
    2018-05-17-HEPiX-TechnologyEvolution.pdf
    
    Evolution of technology and markets.mp4
  - 11:25
    
    Proposal for a technology watch WG 20m
    
    Following up from abstract #117, a proposal to form a working group dedicated to technology watch
    
    Speaker: Helge Meinhard (CERN)
    
    2018-05-17-HEPiX-TechnologyWGProposal.pdf
    
    Proposal for a technology watch WG.mp4
- 11:45 → 12:05
  Basic IT services Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 11:45
    
    Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL 20m
    
    The BNL Scientific Data and Computing Center (SDCC) has begun to deploy a user analysis portal based on Jupyterhub. The Jupyter interfaces have back-end access to the Atlas compute farm via Condor for data analysis, and to the GP-GPU resources on the Institutional Cluster via Slurm, for machine learning applications. We will present the developing architecture of this system, current use cases and results, and discuss future plans.
    
    Speaker: Ofer Rind
    
    BNLJupyterPortal.key
    
    BNLJupyterPortal.pdf
    
    Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL.mp4
- 12:05 → 14:00
  
  Lunch break 1h 55m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 14:00 → 15:40
  Basic IT services Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 14:00
    
    A fully High-availability logs/metrics collector @ CSCS 20m
    
    As the complexity of systems increases and the scale of these systems increases, the amount of system level data recorded increases.
    Managing the vast amounts of log data is a challenge that CSCS solved with the introduction of a centralized log and metrics infrastructure based on Elasticsearch, Graylog, Kibana, and Grafana.
    This is a fundamental service at CSCS that provides easy correlation of events bridging the gap from the computation workload to nodes enabling failure diagnosis.
    Currently, the Elasticsearch cluster at CSCS is handling more than 22'000'000'000 online documents (one year) and another 20'000'000'000 archived. The integrated environment from logging to graphical representation enables powerful dashboards and monitoring displays.
    
    Speaker: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))
    
    A fully High-availability logs, metrics collector @ CSCS.mp4
    
    HA Logs and Metrics Collector Hepix.pdf
    
    HA Logs and Metrics Collector Hepix.pptx
  - 14:20
    
    Monitoring Infrastructure for the CERN Data Centre 20m
    
    Since early 2017, the MONIT infrastructure provides services for monitoring the CERN data centre, together with the WLCG grid resources, and progressively replaces in-house technologies, such as LEMON and SLS, using consolidated open source solutions for monitoring and alarms.
    
    The infrastructure collects data from more than 30k data centre hosts in Meyrin and Wigner sites, with a total volume of 3 TB/day and a rate of 65k documents/sec. It includes OS and hardware metrics, as well as specific IT service metrics. Logs and metrics collection is deployed by default in every machine of the data centre, together with alert reporting. Each machine has a default configuration that can be extended for service-specific data (e.g. for specifically monitoring a database server). Service managers can send custom metrics and logs from their applications to the infrastructure through generic endpoints, and they are provided with an out-of-the-box discovery and visualization interface, data analysis tools and integrated notifications.
    
    The infrastructure stack relies on open source technologies, developed and widely used by the industry and research leaders. Our architecture uses collectd for metric collection, Flume and Kafka for transport, Spark for stream and batch processing, Elasticsearch, HDFS and InfluxDB for search and storage, Kibana and Grafana for visualization, and Zeppelin for analytics. The modularity of collectd provides flexibility to the infrastructure users to configure default and service-specific monitoring, and allows to develop and deploy custom plugins.
    
    This contribution is an updated overview of the monitoring service for CERN data centre. We present our main use cases for collection of metrics and logs. Given that the proposed stack of technologies is widely used, and the MONIT architecture is well consolidated, a main objective is to share the lessons learned and find common monitoring solutions within the community.
    
    Speaker: Asier Aguado Corman (Universidad de Oviedo (ES))
    
    2018-05-17-HEPiX Spring 2018 - Monitoring.pdf
    
    2018-05-17-HEPiX Spring 2018 - Monitoring.pptx
    
    Monitoring Infrastructure for the CERN Data Centre.mp4
  - 14:40
    
    First Impressions of Saltstack and Reclass as our new Configuration Management System 20m
    
    In the Autumn of 2016 the Nikhef data processing facility (NDPF) found itself at a junction on the road of configuration management. The NDPF was one of the early adopters of Quattor, which served us well since the early days of the Grid. But where grid deployments were uniquely complex to require the likes of Quattor then, nowadays a plethora of configuration systems have cropped up to fulfill the needs of the booming industry of cloud orchestration.
    
    Faced with the choice of an overhaul of our Quattor installation to bring our site up-to-date, or an investment to adopt a brand new system, we opted for the latter. And among the many candidates like Chef, Puppet, and Ansible, we chose Saltstack and partnered it with Reclass.
    
    This led to hours of discussions designing the flows and processes of the new system, as well figuring out how to do the transition from the old to the new. And many more were spent trying out these ideas, pioneering as it were to find a way forward that would work for us.
    
    We like to present the insights we've gained about Saltstack and configuration management in general, and our design choices in particular. This is very much a work in progress, as we are getting to know our new system while we are implementing more and more of the moving parts. We are gaining invaluable experience with tasks that are otherwise rarely among the day-to-day business of the system administrator, and we like to share it with our peers.
    
    Speaker: Dennis Van Dok
    
    Presentation slides
    
    saltstack-notes.pdf
    
    saltstack-presentation.pdf
  - 15:00
    
    A smorgasbord of tools around Linux at DESY 20m
    
    In the past, we have developed lots of smaller and larger tools to help in various aspects of Linux administration at DESY.
    We present (some) of them in this talk.
    An incomplete list is:
    - Two-Factor-Authentication
    - Timeline repositories
    - Making Kernel upgrade notifications (more) audit safe
    - Fail2ban
    
    Speaker: Yves Kemp (Deutsches Elektronen-Synchrotron (DE))
    
    A smorgasbord of tools around Linux at DESY.mp4
    
    talk_kemp_hepix2018.pdf
  - 15:20
    
    Evolution of the Hadoop and Spark platform for HEP 20m
    
    The interest in using Big Data solutions based on Hadoop ecosystem is constantly growing in HEP community. This drives the need for increased reliability and availability of the central Hadoop service and underlying infrastructure provided to the community by the CERN IT department.
    This contribution will report on the overall status of the Hadoop platform and the recent enhancements and features introduced in many areas including the service configuration, availability, alerting, monitoring and data protection, in order to meet the new requirements posed by the users community.
    
    Speaker: Zbigniew Baranowski (CERN)
    
    Evolution of the Hadoop and Spark platform for HEP.mp4
    
    HadoopatCERN_Hepix2018spring.pdf
    
    HadoopatCERN_Hepix2018spring.pptx
- 15:40 → 16:25
  
  Coffee break 45m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 16:25 → 17:55
  
  BoF session: Tape storage Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  
  Convener: Vladimir Bahyl (CERN)
  
  051818-hepix-jayatilaka.pdf
  
  ATLAS %27Data Carousel%27 R&D.pdf
  
  ATLAS Data Carousel R&D.mp4
  
  CERN_tape_plans_-_2018-2020.pdf
  
  Tape Storage BoF Session, CERN.mp4
  
  Tape Storage BoF Session, Fermilab.mp4
Friday 18 May
- 08:30 → 09:00
  
  Registration 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 09:00 → 10:20
  Clouds, virtualisation, grids Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 09:00
    
    Status update of the CERN private cloud 20m
    
    CERN runs a private OpenStack Cloud with ~300K cores, ~3000 users and a number of OpenStack services. CERN users can built services using a pool of compute and storage resources using the OpenStack APIs like Ironic, Nova, Magnum, Cinder and Manila, on the other hand CERN cloud operators face some operational challenges at scale in order to offer them. In this talk, you will learn about the status of the CERN cloud, new services and plans for expansion.
    
    Speaker: Spyridon Trigazis (CERN)
    
    Hepix_Spring_2018_-_Cloud_Service_Update.pdf
    
    Status update of the CERN private cloud.mp4
  - 09:20
    
    HNSciCloud Status Report 20m
    
    The Helix Nebula Science Cloud (HNSciCloud) Horizon 2020 Pre-Commercial Procurement project (http://www.hnscicloud.eu/) brings together a group of 10 research organisations to procure innovative cloud services from commercial providers to establish a cloud platform for the European research community.
    This 3 year project has recently entered its final phase which will deploy two pilots with a combined capacity of 20,000 cores and 2 PB of storage integrated with the GEANT network at 40Gbps.
    This presentation will provide an overview of the project, the pilots, the applications being deployed and lessons learned to-date.
    
    Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
    
    hnscicloud-hepix-madison-20180518.pdf
    
    hnscicloud-hepix-madison-20180518.pptx
    
    HNSciCloud Status Report.mp4
  - 09:40
    
    Baremetal provisioning in the CERN cloud 20m
    
    Virtual machines is the technology that formed the modern clouds - private and public - however the physical machine are back in a more cloudy way. Cloud providers are offering APIs for baremetal server provisioning on demand and users are leveraging containers for isolation and reproducible deployments. In this talk, I will be presenting one of the newest services at the CERN cloud, Ironic, the Baremetal service of OpenStack. You will learn how the Cloud team improves its operational workflow and accounting and how users can use the same tooling they are used to when working with virtual machines. Finally, you will hear about the recent integration efforts between the container and baremetal services.
    
    Speaker: Spyridon Trigazis (CERN)
    
    Baremetal provisioning in the CERN cloud.mp4
    
    HEPIX Ironic in the CERN Cloud - MAY2018.pdf
  - 10:00
    
    RAL Cloud update 20m
    
    As our OpenStack cloud enters full production, we give an overview of the design and how it leverages the RAL Tier 1 infrastructure & support. We also present some of teh new use cases and science being enabled by the cloud platform.
    
    Speaker: Ian Collier (Science and Technology Facilities Council STFC (GB))
    
    RAL Cloud update.mp4
    
    STFC-Cloud-HEPiX-20180517.pdf
    
    STFC-Cloud-HEPiX-20180517.pdf
    
    STFC-Cloud-HEPiX-20180517.ppt
    
    STFC-Cloud-HEPiX-20180517.ppt
- 10:20 → 10:50
  
  Coffee break 30m Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
- 10:50 → 11:50
  Clouds, virtualisation, grids Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  - 10:50
    
    Data analysis as a service 20m
    
    We are seeing an increasingly wide variety of uses being made of Hybrid Cloud (and Grid!) computing technologies at STFC, this talk will focus on the services being delivered to end users and novel integrations with existing local compute and data infrastructure.
    
    Speaker: Mr James Adams (STFC RAL)
    
    Data analysis as a service.mp4
    
    hepix2018-daaas.pdf
  - 11:10
    
    Integration of OpenStack and Amazon Web Service into local batch job system 20m
    
    Cloud computing enables flexible resource provisioning on demand. Through the collaboration with National Institute of Informatics (NII) Japan, we have been integrating our local batch job system with clouds for expanding its computing resource and providing heterogeneous clusters dynamically. In this talk, we will introduce our hybrid batch job system which can dispatch jobs to provisioned instances on on-premise OpenStack and Amazon Web Service as well as local servers. We will also report some performance test results conducted for investigation of the scalability.
    
    Speaker: Wataru Takase (High Energy Accelerator Research Organization (JP))
    
    180518_hepix_wataru_takase_kek.pdf
    
    Integration of OpenStack and Amazon Web Service into local batch job system.mp4
  - 11:30
    
    Automatic for the People: Containers for LIGO software development on the Open Science Grid and other diverse computing resources 20m
    
    Distributed research organizations are faced with wide variation in computing environments to support. LIGO has historically resolved this problem by providing RPM/DEB packages for (pre-)production software and coordination between clusters operated by LIGO-affiliated facilities and research groups. This has been largely successful although it leaves a gap in operating system support and in the development process prior to formal point releases.
    
    We describe early developments in LIGO’s use of GitLab, GitHub, and DockerHub to continuously deploy researcher-maintained containers for immediate use on all LIGO clusters, the Open Science Grid, and user workstations. Typical latencies are below an hour, dominated by the build-time of the software itself and the client refresh rate of the CernVM File System
    
    Speaker: Dr Thomas Downes (University of Wisconsin-Milwaukee)
    
    Automatic for the People (Hepix 2018)
    
    Automatic for the People (Hepix 2018).pdf
- 11:50 → 12:10
  
  Miscellaneous: HPC and AI Innovation Lab Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  
  HPC and AI Innovation Lab.mp4
  
  HPC_N_AI Innovation_Lab.HEPIX.pdf
- 12:10 → 12:40
  Miscellaneous Chamberlin Hall
  
  Chamberlin Hall
  
  University of Wisconsin-Madison
  
  Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
  
  HPC and AI Innovation Lab.mp4
  
  HPC_N_AI Innovation_Lab.HEPIX.pdf
  - 12:10
    
    Workshop wrap-up 30m
    
    HEPIX _Spring_2018_Summary.pdf
    
    HEPIX _Spring_2018_Summary.pptx
    
    Workshop wrap-up.mp4

Choose timezone

HEPiX Spring 2018 Workshop

Chamberlin Hall

University of Wisconsin-Madison

HEPiX Spring 2018 at University of Wisconsin, Madison, USA

Silver Sponsors

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall (room 4274)

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison

Chamberlin Hall

University of Wisconsin-Madison