HEPiX Fall 2024 Workshop

Name: HEPiX Fall 2024 Workshop
Start: 2024-11-04T08:00:00-06:00
End: 2024-11-08T17:00:00-06:00
Location: No location set

4 Nov 2024, 08:00 → 8 Nov 2024, 17:00 US/Central

Thurman J. White Forum Building 1704 Asp Avenue Norman, OK 73072

Horst Severini (University of Oklahoma (US)), Ofer Rind (Brookhaven National Laboratory), Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES)), Tomoaki Nakamura

Description

HEPiX Fall 2024 at the University of Oklahoma, USA

The HEPiX forum brings together worldwide information technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.

Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, and many other research labs and universities from all over the world.

More information about the HEPiX workshops, the working groups (who report regularly at the workshops) and other events is available on the HEPiX Web site.

This workshop will be hosted by the University of Oklahoma and will be held at the Thurman J. White Forum Building on the campus in Norman, Oklahoma.

Participants

44 View full list

67296861913

Ofer Rind

Join via phone

Monday 4 November
- Mon 4 Nov
- Tue 5 Nov
- Wed 6 Nov
- Thu 7 Nov
- Fri 8 Nov
- 08:00 → 09:15
  
  Registration 1h 15m
- 09:15 → 10:30
  Welcome and Workshop Logistics
  
  Convener: Horst Severini (University of Oklahoma (US))
  - 09:15
    
    Welcome and Workshop Logistics 10m
    
    Speaker: Horst Severini (University of Oklahoma (US))
    
    Break.pdf
    
    Break.pptx
    
    HEPiX-Welcome.pdf
    
    HEPiX-Welcome.pdf
    
    HEPiX-Welcome.pptx
    
    OUEventsEmergencyResponsePlan-HEPiX.docx
    
    OUEventsEmergencyResponsePlan-HEPiX.pdf
  - 09:25
    
    Research Computing and Storage Strategy at the University of Oklahoma 45m
    
    Speakers: Henry Neeman (OU), Horst Severini (University of Oklahoma (US))
    
    hepix2024_talk_neeman_20241104.pdf
    
    hepix2024_talk_neeman_20241104.pptx
  - 10:10
    
    Welcome Address from the Physics Department 20m
    
    Speakers: Phillip Gutierrez (University of Oklahoma (US)), Horst Severini (University of Oklahoma (US))
    
    HEPiX-Gutierrez.pdf
- 10:30 → 11:00
  
  Coffee 30m
- 11:00 → 12:00
  Site Reports
  
  Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
  - 11:00
    
    SWT2 site report 20m
    
    The Southwest Tier-2 (SWT2) consortium is comprised of two data centers
    operated at the University of Texas at Arlington (UTA) and at the
    University of Oklahoma (OU). SWT2 provides distributed computing
    services in support of the ATLAS experiment at CERN. In this
    presentation we will describe the resources at each site (CPU cycles and
    data storage), along with other associated infrastructure required to
    provide these resources. We will conclude with a discussion of plans for
    the future evolution of SWT2.
    
    Speaker: Zachary Booth (University of Texas at Arlington)
    
    SWT2 site report - HEPiX.pdf
    
    SWT2 site report - HEPiX.pptx
  - 11:20
    
    Site report for AGLT2 20m
    
    AGLT2 has a few updates to report since the last HEPix meeting in Spring 2024.
    1) We transitioned from Cobbler and Satellite plus Capsule server for RHEL provision
    2) we transitioned from CFengine to Ansible for configuration management for the RHEL9 nodes.
    3) In order to improve the occupancy of the HTcondor cluster, we started tuning of HTCondor and also new developments of scripts to dynamically adjust the routing rules and monitoring scripts and plots keep track of memory/cpu occupancy.
    
    Speakers: Philippe Laurens (Michigan State University (US)), Shawn Mc Kee (University of Michigan (US)), Dr Wendy Wu (University of Michigan)
    
    AGLT2SiteReport-HEPiXFall2024.pdf
  - 11:40
    
    PIC report 20m
    
    PIC report to HEPIX Fall 2024.
    
    Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
    
    HEPIX_Autumn_2024_PIC_Report_JFlix (2).pdf
- 12:00 → 13:30
  
  Lunch 1h 30m
- 13:30 → 14:10
  Site Reports
  
  Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
  - 13:30
    
    RAL Site Report 20m
    
    An update on activities at the RAL datacentre.
    
    Speaker: Martin Bly (STFC-RAL)
    
    2024-11_04_08-HEPiX Fall Oklahoma - RAL Site Report.pdf
    
    2024-11_04_08-HEPiX Fall Oklahoma - RAL Site Report.pptx
  - 13:50
    
    KEK Site Report 20m
    
    The KEK Central Computer System (KEKCC) is the KEK's largest-scale computer system and provides several services such as Grid and Cloud computing.
    
    Following the procurement policy for the large-scale computer system requested by the government, we have taken a multiple-year contract and replaced the entire system at the end of every contract year. The new system has been in production since September 2024 and will decommission in August 2028.
    
    In this talk, we would like to review the four-year operation and development of the previous system installed in 2020. In addition, we will then show the difference between before and after September 2024, when the system is in production.
    
    Speaker: Go Iwai (KEK)
    
    hepix-fall-2024-kek-site-report-iwai.pdf
- 14:10 → 15:10
  Computing & Batch Services
  
  Conveners: Matthias Jochen Schnepf, Max Fischer (Karlsruhe Institute of Technology), Dr Michele Michelotto (Universita e INFN, Padova (IT))
  - 14:10
    
    Operating the 200 Gbps IRIS-HEP Demonstrator for ATLAS 30m
    
    The ATLAS experiment is currently developing multiple analysis frameworks which leverage the Python data science ecosystem. We describe the setup and operation of the infrastructure necessary to support demonstrations of these frameworks. One such demonstrator aims to process the compact ATLAS data format PHYSLITE at rates exceeding 200 Gbps. Integral to this study was the analysis of network traffic and bottlenecks, worker node scheduling, disk configurations, and the performance of an S3 object store. The demonstration’s performance was measured as the number of processing cores used by the demonstration tests scaled to over 2,000 and as the volume of data accessed in an interactive session approached 200 TB. The presentation will go over the findings and future updates related to the physical infrastructure that supports these demonstrators and what improvements to infrastructure will be made to be better prepared for the future.
    
    Speaker: David Jordan (University of Chicago (US))
    
    Operating the 200 Gbps IRIS-HEP Demonstrator for ATLAS (HEPiX 2024).pdf
    
    Operating the 200 Gbps IRIS-HEP Demonstrator for ATLAS (HEPiX 2024).pptx
  - 14:40
    
    AUDITOR: An Accounting tool for Grid Sites and Opportunistic Resources 30m
    
    More and more opportunistic resources are provided to the Grid. Often behind one Compute Element several opportunistic computing resource provider exists or are additional to the pledged resources of a Grid site. For such use cases and others, we have developed a most flexible multipurpose accounting ecosystem AUDITOR (AccoUnting DatahandlIng Toolbox for Opportunistic Resources).  
    
    AUDITOR is able to individually collect accounting data from multiple resource providers sharing a CE. The collected information can be used for internal accounting or sent to the European Grid Initiative (EGI) accounting portal. We will show some current use-cases and further plans for AUDITOR.
    
    Speaker: Matthias Jochen Schnepf
    
    HEPiX_AUDITOR_Accouting_Tool.pdf
- 15:10 → 15:45
  
  Coffee 35m
- 15:45 → 16:15
  Computing & Batch Services
  
  Conveners: Matthias Jochen Schnepf, Max Fischer (Karlsruhe Institute of Technology), Dr Michele Michelotto (Universita e INFN, Padova (IT))
  - 15:45
    
    HEPiX Benchmarking WG: Status Report 30m
    
    HEPScore23 has been the official benchmark for WLCG sites since April 2023.
    Since then, we have included community feedback and demand. The Benchmarking WG has started a new development effort to expand the Benchmark Suite with modules that can measure server utilization metrics (load, frequency, I/O, power consumption) during the execution of the HEPScore benchmark.
    This enables a closer look at power efficiency and performance in the Grid environment.
    
    We present the current state of our group and an overview of our current studies.
    
    Speaker: Matthias Jochen Schnepf
    
    HEPiX_Benchmarking_WG_Status_Report.pdf
- 16:15 → 17:00
  Miscellaneous
  
  Convener: Horst Severini (University of Oklahoma (US))
  - 16:15
    
    Atmospheric Visibility Estimation From Single Camera Images: A Deep Learning Approach 45m
    
    Atmospheric Visibility Estimation From Single Camera Images: A Deep Learning Approach
    
    Speaker: Prof. Anderw Fagg (University of Oklahoma)
    
    Atmospheric Visibility Estimation From Single Camera Images: A Deep Learning Approach
- 18:00 → 21:00
  
  Welcome Reception 3h
  
  Reception with guided tour.
  
  Fred Jones Fine Arts Museum
  555 Elm Avenue
  Norman, OK 73019
Tuesday 5 November
- Mon 4 Nov
- Tue 5 Nov
- Wed 6 Nov
- Thu 7 Nov
- Fri 8 Nov
- 08:30 → 09:00
  
  Registration 30m
- 09:00 → 09:20
  Site Reports
  
  Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
  - 09:00
    
    S3DF: SLAC Shared Science Data Facility 20m
    
    A site report on the infrastructure and services that underpin SLAC's data-intensive processing pipelines. The SLAC Shared Science Data Facility hosts the Rubin Observatory DF, LCLS-II and many other experimental and research workflows. Networking and Storage form the core of S3DF with hardware deployed in a modern Stanford datacenter.
    
    Speaker: Adeyemi Adesanya (SLAC)
    
    S3DF_HEPiX_November_2024.pdf
- 09:20 → 10:20
  Operating systems, clouds, virtualisation, grids
  
  Conveners: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre)), Michel Jouvin (Université Paris-Saclay (FR))
  - 09:20
    
    Science Cloud based on WLCG Core Technology 30m
    
    This presentation will focus on two topics: 1) status of ATLAS T2 site in Taiwan, and 2) experiences of supporting broader scientific computing over the cloud based on WLCG technology.
    
    Speaker: Eric Yen (Academia Sinica (TW))
    
    ASGCScienceCloud-HEPiXFall2024v4.pdf
  - 09:50
    
    A Cloud-Native Control Plane for Infrastructure and Platform Management 30m
    
    Crossplane is a cloud-native control plane for declarative management of infrastructure and platform resources using Kubernetes-native APIs.
    It enables the integration of infrastructure-as-code practices by reusing existing tools such as Ansible and Terraform, while providing flexible, instanceable "compositions" for defining reusable resource configurations. This approach allows organizations to automate, compose, and manage infrastructure alongside application workloads, streamlining operations in a cloud-native ecosystem.
    
    Speaker: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))
    
    Crossplane - Dino Conciatore - HEPiX.pdf
    
    Crossplane - Dino Conciatore - HEPiX.pptx
- 10:20 → 10:45
  
  Coffee 25m
- 10:45 → 12:15
  Operating systems, clouds, virtualisation, grids
  
  Conveners: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre)), Michel Jouvin (Université Paris-Saclay (FR))
  - 10:45
    
    Nebraska Coffea-Casa Analysis Facility Update 30m
    
    The CMS Coffea-Casa analysis facility at the University of Nebraska-Lincoln provides researchers with Kubernetes based Jupyter environments and access to CMS data along with both CPU and GPU resources for a more interactive analysis experience than traditional clusters provide. This talk will cover updates to this facility within the past year and recent experiences with the 200 Gbps challenge.
    
    Speaker: Garhan Attebury (University of Nebraska Lincoln (US))
    
    HEPiX-Fall-2024-Nebraska-Attebury.pdf
  - 11:15
    
    dCache on Kubernetes 30m
    
    dCache is composed by a set of components running in Java Virtual Machines (JVM) and a storage backend, Ceph in this case. CSCS moved these JVMs into containers and developed an Helm Chart to deploy them on a Kubernetes cluster. This cloud native approach makes the deployments and management of new dCache instances easier and faster.
    
    Encountered challenges and future developments will be exposed in this presentation.
    
    Speaker: Elia Luca Oggian (ETH Zurich (CH))
    
    cscs_dcache_hepix.pdf
    
    cscs_dcache_hepix.pptx
  - 11:45
    
    Summary of the Joint Xrootd and FTS Workshop 30m
    
    The 2nd Joint Xrootd and FTS Workshop at STFC in September 2024 covered many interesting topics. This presentation will summarize the discussion on state of affairs of FTS and Xrootd, plan on FTS4, WLCG token support in FTS, future plan on CERN Data Management Client, The Pelican project and Xrootd/Xcache, Xrootd monitoring, etc. It will cover some of the feedback by experiments, especially with regards to the DC24, and future plan by various experiments.
    
    Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    
    Highlight of the Joint Xrootd & FTS Workshop.pdf
    
    Highlight of the Xrootd&FTS Workshop
- 12:15 → 13:30
  
  Lunch 1h 15m
- 13:30 → 15:00
  Storage & Filesystems
  
  Conveners: Andrew Pickford (University of Glasgow (GB)), Elvin Alin Sindrilaru (CERN)
  - 13:30
    
    Cost Comparison of On-Premises Storage with S3 Interfaces 30m
    
    Abstract: To evaluate the cost of various on-premises storage solutions with traditional and S3 interfaces, including flash, disk, and tape.
    
    This presentation compares the costs, factors of flash, disk, and tape-based storage systems, including systems that are compatible with AWS S3. Key metrics to be considered include purchase price, power consumption, cooling requirements, product lifetime and performance characteristics. Additionally, the presentation will explore the long-term implications of each storage type and their impact on the environment.
    This presentation will also review current and upcoming technologies that may be leveraged to provide long term exascale storage at a fraction of environment footprint of traditional methods.
    
    Speaker: Mr Nathan Thompson (Spectra Logic)
    
    Spectra & Nate Thompson Hepix 2024 Final_NT.pdf
    
    Spectra & Nate Thompson Hepix 2024 Final_NT.pptx
  - 14:00
    
    Optimising Data Access Analytics: Integrating dCache BillingDB with PIC’s Scalable Big Data Platform 30m
    
    PIC has developed CosmoHub, a scientific platform built on top of Hadoop and Apache Hive, which facilitates scalable reading, writing and managing huge astronomical datasets. This platform supports a global community of scientists, eliminating the need for users to be familiar with Structured Query Language (SQL). CosmoHub officially serves data from major international collaborations, including the Legacy Survey of Space and Time (LSST), the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS), the Gaia ESA Archive, and the Marenostrum Institut de Ciències de l'Espai (MICE) simulations.
    
    This platform is highly scalable and adaptable for various data analytics applications. The recent integration of PIC’s dCache billing database records has enabled the exploration of extensive data access logs at PIC, covering roughly eight years. We will share insights from the analysis of CMS data access at PIC, which involved processing approximately 350 million entries using PIC's Hadoop infrastructure. The current system operates with around 1,000 cores, 10 TiB of RAM, 50 TB of NVMe (for caching), and 2 PiB of usable storage. Data is accessed through HiveQL and Jupyter notebooks, with advanced Python scripts enabling efficient interaction.
    
    This framework significantly accelerated data processing, reducing execution times for plot generation to under a minute - a task that previously took several hours using PIC's PostgreSQL databases. This enhanced performance opens up new possibilities for integrating additional data sources, such as job submissions from the local HTCondor batch system, enabling advanced analytics on large datasets.
    
    Speakers: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES)), Mr Marc Santamaria Riba (PIC)
    
    HEPIX_Autumn_2024_dCache_Hive_JFlix.pdf
  - 14:30
    
    Using AI/ML for Data Placement Optimization in a Multi-Tiered Storage System within a Data Center 30m
    
    Scientific experiments and computations, particularly in High Energy Physics (HEP) programs, are generating and accumulating data at an unprecedented rate. Effectively managing this vast volume of data while ensuring efficient data analysis poses a significant challenge for data centers. This paper aims to introduce machine learning algorithms to enhance data storage optimization across various storage media, providing a more intelligent, efficient, and cost-effective approach to data management. We begin by outlining the data collection and preprocessing steps used to explore data access patterns. Next, we describe the design and development of a precise data popularity prediction model using AI/ML techniques. This model forecasts future data popularity based on an analysis of access patterns, enabling optimal data movement and placement. Additionally, the paper evaluates the model's performance using key metrics such as F1 score, accuracy, precision, and recall, alongside a comparison with the Least Recently Used (LRU) strategy. The model achieves an optimal prediction accuracy of up to 92% and an optimal F1 score of 0.47. Finally, we present a prototype use case, leveraging real-world file access data to assess the model’s performance.
    
    Speaker: Qiulan Huang (Brookhaven National Laboratory (US))
    
    Using AI_ML for Data Placement Optimization in a Multi-Tiered Storage System within a Data Center.pdf
- 15:00 → 15:30
  
  Coffee 30m
- 15:30 → 16:00
  Storage & Filesystems
  
  Conveners: Andrew Pickford (University of Glasgow (GB)), Elvin Alin Sindrilaru (CERN)
  - 15:30
    
    Stories from the TSM to HPSS Migration at KIT 30m
    
    In 2020 we started the migration from our TSM-based tape system to HPSS which was finally finished in the summer of 2024. I'll present lessons learned, pitfalls and also the necessary in-house software developments.
    
    Speaker: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE))
    
    tsm-to-hpss-kit-hepix-2024.pptx
- 16:00 → 17:00
  Site Reports
  
  Conveners: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Dr Sebastien Gadrat (CCIN2P3 - Centre de Calcul (FR))
  - 16:00
    
    CERN site report 20m
    
    News from CERN since the last HEPiX workshop. This talk gives a general update from services in the CERN IT department.
    
    Speaker: Elvin Alin Sindrilaru (CERN)
    
    CERN Site Report.pdf
  - 16:20
    
    Jefferson Lab Site Report and HPDF Introduction 20m
    
    I will give an report on the Scientific Computing program at Jefferson Lab and a brief introduction to HPDF, the High Performance Data Facility.
    
    Speaker: Bryan Hess
    
    JLab site update - Bryan Hess - Nov 2024.pdf
    
    JLab site update - Bryan Hess - Nov 2024.pptx
  - 16:40
    
    IHEP Site Report 20m
    
    The progress and status of IHEP site since last Hepix.
    
    Speaker: Siqi Hou
    
    HEPiX2024fall-IHEP.pdf
Wednesday 6 November
- Mon 4 Nov
- Tue 5 Nov
- Wed 6 Nov
- Thu 7 Nov
- Fri 8 Nov
- 08:30 → 09:00
  
  Registration 30m
- 09:00 → 10:30
  Topical Session: Security Operations Center (SOC)
  
  Conveners: Dr David Crooks (UKRI STFC), Shawn Mc Kee (University of Michigan (US))
  - 09:00
    
    Impetus and Drivers 15m
    
    Speaker: Dr David Crooks (UKRI STFC)
    
    HEPiX_Security_Operations_Fall_2024.pdf
    
    HEPiX_Security_Operations_Fall_2024.pptx
    
    HEPiX_Security_Operations_Fall_2024_v2.pdf
    
    HEPiX_Security_Operations_Fall_2024_v2.pptx
  - 09:15
    
    Mini Intro: Zeek 10m
    
    Speaker: Aashish Sharma (LBNL)
  - 09:25
    
    Mini-intro: MISP 10m
    
    Speaker: James Acris (STFC)
    
    MISP Intro.pdf
    
    MISP Intro.pptx
  - 09:35
    
    Mini-Intro: pDNSSOC 10m
    
    Speaker: Romain Wartel (CERN)
  - 09:45
    
    WLCG SOC at U Chicago 15m
    
    Speaker: David Jordan (University of Chicago (US))
    
    UChicago Network Threat Detection - HEPiX 2024.pdf
  - 10:00
    
    Discussion 30m
    
    We need to have a discussion about what sites and possible users need and expect.
    The goal is to both clarify details and get guidance for what we should focus on during the afternoon sessions today.
    
    Speakers: Dr David Crooks (UKRI STFC), Liviu Valsan (CERN)
- 10:30 → 11:00
  
  Coffee 30m
- 11:00 → 11:25
  
  Show Us Your Toolbox
  
  Conveners: Dr David Crooks (UKRI STFC), Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre)), Elia Luca Oggian (ETH Zurich (CH)), Garhan Attebury (University of Nebraska Lincoln (US)), Mary Hester, Ofer Rind (Brookhaven National Laboratory)
  
  Attebury-HEPiX-Fall-2024-YayNay.pdf
- 11:25 → 12:00
  Networking & Security
  
  Conveners: David Kelsey (Science and Technology Facilities Council STFC (GB)), Shawn Mc Kee (University of Michigan (US))
  - 11:25
    
    Computer Security Update 35m
    
    This presentation aims to give an update on the global security landscape from the past year. The global political situation has introduced a novel challenge for security teams everywhere. What's more, the worrying trend of data leaks, password dumps, ransomware attacks and new security vulnerabilities does not seem to slow down.
    We present some interesting cases that CERN and the wider HEP community dealt with in the last year, mitigations to prevent possible attacks in the future and preparations for when inevitably an attacker breaks in.
    
    Speaker: Stefan Lueders (CERN)
    
    Situational+Awareness+@+HEPiX+(2024).pdf
    
    Situational+Awareness+@+HEPiX+(2024).pptx
- 12:00 → 13:30
  
  Lunch 1h 30m
- 13:30 → 15:00
  SOC Hackathon: Technical Details and Working Session 1
  
  Conveners: Dr David Crooks (UKRI STFC), Shawn Mc Kee (University of Michigan (US))
  - 13:30
    
    Topic 1: Issues capturing ALL traffic with Zeek? 30m
    
    We have some sites that have question/potential issues concerning the traffic measurements from Zeek vs SNMP.
    - Should be expect that the Zeek traffic estimate should be close to the SNMP counters from the corresponding switch ports?
    - Is some kind of NIC/hardware offloading hiding traffic from Zeek?
    - Do we have best practice recommendations regarding configurations?
    - What should sites expect regarding Zeek traffic monitoring and traffic estimations?
    
    Speakers: Aashish Sharma (LBNL), Dr David Crooks (UKRI STFC), David Jordan (University of Chicago (US)), Shawn Mc Kee (University of Michigan (US))
  - 14:00
    
    Topic 2: Building a good Zeek alert 30m
    
    What does it take to craft a good Zeek alert? Can we work through an example or two? What is the suggested guidance for doing this?
    
    Speakers: Aashish Sharma (LBNL), Dr David Crooks (UKRI STFC), Romain Wartel (CERN)
  - 14:30
    
    Topic 3: How to deploy pDNSSOC (part 1) 30m
    
    How to deploy pDNSSOC
    Example deployment
    Working session
    
    Speaker: Romain Wartel (CERN)
- 15:00 → 15:30
  
  Coffee 30m
- 15:30 → 17:00
  SOC Hackathon: Technical Details and Working Session 2
  
  Conveners: Dr David Crooks (UKRI STFC), Shawn Mc Kee (University of Michigan (US))
  - 15:30
    
    Topic 1: Configuring sending alerts for various tools 30m
    
    How to enable alerts using webhooks and various applications.
    Sending to SLACK
    Sending to Mattermost
    What about Keybase?
    
    Why not email?
    
    Speakers: Dr David Crooks (UKRI STFC), Liam Atherton, Romain Wartel (CERN)
  - 16:00
    
    Tool options and choices 30m
    
    Zeek, MISP, pDNSSOC, Elasticsearch, Opensearch, Elastiflow, ElastiAlert, other information sources, other tools?
    
    Advantages, capabilities, limitations, concerns....
    
    Let's discuss
    
    Speakers: Dr David Crooks (UKRI STFC), Stefan Lueders (CERN)
- 17:15 → 19:15
  
  HEPiX Board Meeting (Invitation Only) 2h
Thursday 7 November
- Mon 4 Nov
- Tue 5 Nov
- Wed 6 Nov
- Thu 7 Nov
- Fri 8 Nov
- 08:30 → 09:00
  
  Registration 30m
- 09:00 → 10:30
  Topical Session: Carbon & Sustainability in Data Centers
  
  Convener: David Britton (University of Glasgow (GB))
  - 09:00
    
    Exploring the Carbon Compromises 25m
    
    Minimising carbon associated with computing will require compromise. In this presentation I will present the results from simulating a Grid site where the compute is run at reduced frequency when the predicted carbon intensity rises above some threshold. The compromise is a reduction in throughput in exchange for an increased carbon-efficiency for the work that is completed. The presentation will also summarise other, related, work from the Glasgow group.
    
    Speaker: David Britton (University of Glasgow (GB))
    
    241107-HEPIX-Britton.pptx
  - 09:25
    
    Carbon costs of storage: a UK perspective 20m
    
    In order to achieve the higher performance year on year required by the 2030s for future LHC up- grades at a sustainable carbon cost to the environment, it is essential to start with accurate measurements of the state of play. Whilst there have been a number of studies of the carbon cost of compute for WLCG workloads published, rather less has been said on the topic of storage, both nearline and archival. We present a study of the embedded and ongoing carbon costs of storage in multiple configurations, from Tape farms through to SSDs, within the UK Tier-1 and Tier-2s and discuss how this directs future policy.
    
    Speaker: Samuel Cadellin Skipsey
    
    GridPP-Storage-hepix-V1.PPTX
  - 09:45
    
    Case Study: AI Training Power Demand on a GPU-Accelerated Node 20m
    
    Data center sustainability, a phenomenon that has grown in focus due to the continuing evolution of Artificial intelligence (AI)/High Performance Computing (HPC) systems; furthermore, the rampant increase in carbon emissions resulted in an unprecedented rise in Thermal Design Power (TDP) of the computer chips at the Scientific Data and Computing Center (SDCC) at Brookhaven National Laboratory (BNL). With the exponential increase of demand towards the usage of such systems, major challenges have surfaced in terms of productivity, Power Usage Effectiveness (PUE), and thermal/scheduling management.
    
    Deploying AI/HPC infrastructure in data centers will require substantial capital investment. This study quantified the energy footprint of this infrastructure by developing models based on the power demands of AI hardware during training. We measured the instantaneous power draw of an 8-GPU NVIDIA H100 HGX node while training open-source models, including the image classifier and the large language model. The peak power draw observed nearly 18% below the manufacturer’s rated TDP, even with GPUs near full utilization. For the image classifier, increasing the batch size from 512 to 4096 images reduced total training energy consumption by a factor of four when model architecture remained constant. These insights can aid data center operators in capacity planning and provide researchers with energy use estimates. Future studies will explore the effects of cooling technologies and carbon-aware scheduling on AI workload energy consumption.
    
    Speakers: Imran Latif (Brookhaven National Laboratory), Shigeki Misawa (Brookhaven National Laboratory (US))
    
    HEPiX- 11-7-24.pdf
  - 10:05
    
    Smart Procurement Utility 10m
    
    The Smart Procurement Utility is a tool that allows the visualisation of HEPScore/Watt vs HEPScore/unit-cost to guide procurement choices and the compromise between cost and carbon. It uses existing benchmarking data and allows the entry of new benchmarking data. Costs can be entered as relative numbers (percentages relative to a chosen baseline) to generate the cost-related plots.
    
    Speaker: David Britton (University of Glasgow (GB))
    
    241106-HEPIX-Demo.pptx
  - 10:15
    
    Natural job drainage and power reduction in PIC Tier-1 using HTCondor 10m
    
    I will present some preliminary studies and ideas to understand natural job drainage and power reduction in PIC Tier-1, which is using HTCondor. Based on the historical batch system logs, we are simulating natural drainage and understanding how we can modulate the PIC farm without killing jobs.
    
    Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
    
    HEPIX_Autumn_2024_HTCondorNaturalDrainage_JFlix.pdf
    
    Notebook_live.pdf
  - 10:25
    
    Discussion 5m
- 10:30 → 11:00
  
  Coffee 30m
- 11:00 → 12:00
  Operating systems, clouds, virtualisation, grids
  
  Conveners: Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre)), Michel Jouvin (Université Paris-Saclay (FR))
  - 11:00
    
    Purdue CMS Analysis Facility 30m
    
    The Purdue Analysis Facility (Purdue AF) is an advanced computational platform designed to support high energy physics (HEP) research at the CMS experiment. Based on a multi-tenant JupyterHub server deployed on a Kubernetes cluster, Purdue AF leverages the resources of the Purdue CMS Tier-2 computing center to provide scalable, interactive environments for HEP workflows. It supports a full HEP analysis software stack, offers a variety of storage and data access solutions, and integrates modern scale-out tools like Dask Gateway. Since its first deployment in 2023, Purdue AF has been instrumental in numerous published analyses, workshops, and tutorials. We will present the Purdue AF architecture and describe its common use patterns in CMS analyses.
    
    Speaker: Dmitry Kondratyev (Purdue University (US))
    
    hepix2024-purdue-af.pdf
  - 11:30
    
    Transitioning from RHEV to Openshift 30m
    
    A description of our experience deploying Openshift both for container orchestration as well as a replacement for Redhat Enterprise Virtualization.
    
    Speaker: Robert Hancock
    
    RHEV-Openshift-final.pdf
    
    RHEV-Openshift-final.pptx
- 12:00 → 13:30
  
  Lunch 1h 30m
- 13:30 → 15:00
  Networking & Security
  
  Conveners: David Kelsey (Science and Technology Facilities Council STFC (GB)), Shawn Mc Kee (University of Michigan (US))
  - 13:30
    
    One year into the CERN Cyber-Security Audit 30m
    
    This talk presents the findings of the 2023 cybersecurity audit undertaken at CERN, and the resulting plans/progress/accomplishment the Organization experienced in the past 9 months while implementing their recommendations.
    
    Speaker: Stefan Lueders (CERN)
  - 14:00
    
    Firewall under attack: operational security rollercoaster 30m
    
    This talk will walk you through the challenges the ESnet security team faced during an attack against one of its firewalls. It covers the struggle and drama to access the data we needed and, in the end, highlights how nothing quite beats good old-fashioned, down-and-dirty system forensics.
    
    Speakers: Patrick Storm, Romain Wartel (CERN)
  - 14:30
    
    HELP! I have DataCenter Nightmares 30m
    
    With the growing complexity of the IT hardware and software stack, with a move from bare-metal to virtual machines & containers, with the prelevant usage of shared central computing resources for Internet-facing services, provisioning of (internal) user services but also the need for serving industrial control systems (OT) in parallel, the design of data centre architectures and in particular its networks can become more and more challenging. This presentation will introduce the dilemma of creating a highly agile and flexible computer center set-up while still trying to maintain security perimeters within. It is bound to fail.
    
    Speaker: Stefan Lueders (CERN)
    
    DC+Nightmares+@+HEPiX+(2024).pdf
    
    DC+Nightmares+@+HEPiX+(2024).pptx
- 15:00 → 15:30
  
  Coffee 30m
- 15:30 → 17:30
  Networking & Security
  
  Conveners: David Kelsey (Science and Technology Facilities Council STFC (GB)), Shawn Mc Kee (University of Michigan (US))
  - 15:30
    
    Networking Topics for WLCG 30m
    
    We will describe the current activities and plans in WLCG networking, including details about SciTags, the WLCG perfSONAR deployment and the related activities to monitor and analyze our networks. We will also described the related efforts to plan for the upcoming WLCG Network Data Challenge through a series of mini-challenges that incorporate our tools and metrics.
    
    Speaker: Shawn Mc Kee (University of Michigan (US))
    
    Networking Topics for WLCG
    
    Networking Topics for WLCG
    
    Networking Topics for WLCG.pdf
  - 16:00
    
    Getting closer to an IPv6-only WLCG – update from the HEPiX IPv6 Working Group 30m
    
    The HEPiX IPv6 Working Group has been encouraging the deployment of IPv6 in WLCG for many years. At the last HEPiX meeting in Paris we reported that the LHC experiment Tier-2 storage services are now close to 100% IPv6-capable. We had turned our attention to WLCG compute and launched a GGUS ticket campaign for WLCG sites to deploy dual-stack computing elements and worker nodes. At that time 44% of the sites had completed their deployment of dual-stack CPU. The working group has also continued to monitor the use of IPv4 and IPv6 on the LHCOPN. As before we continue to identify uses of legacy IPv4 data transfers and strive to move these to IPv6. A dual-stack network is not a desirable end-point for all this work; we continue to plan the move from dual-stack to IPv6-only.
    
    This talk will present the activities of the working group since April 2024 and our future plans.
    
    Speaker: Martin Bly (STFC-RAL)
    
    IPv6 WG at HEPiX Oklahoma Nov 2024.pdf
    
    IPv6 WG at HEPiX Oklahoma Nov 2024.pptx
  - 16:30
    
    Network tests at CZ Tier-2 30m
    
    The CZ Tier-2 in Prague (the Czech Republic) joined the WLCG Data Challenge 24 and managed to receive and sent more than 2 PB during the second week of the DC24. Since than we upgraded our network connection to LHCONE from 100 to 2x100 Gbps. The LHCONE link uses GEANT connection, which was also upgraded to 2x100 Gbps. During July 2024 we executed dedicated network stress tests between Prague and CERN and we observed maxima close to the link capacity - 200 Gbps.
    
    Speaker: Jiri Chudoba (Czech Academy of Sciences (CZ))
    
    20241107-prg-network-chudoba.pdf
    
    20241107-prg-network-chudoba.pptx
  - 17:00
    
    How AI networking Fabrics are different from today's Data Center fabrics 30m
    
    This presentation looks at what is different about building and deploying AI fabrics. I can if needed remove the Arista logo's from the presentation. I don't see a place to attach the presentation?
    
    Speaker: Paul Gilbert (Arista Networks)
- 18:00 → 21:00
  
  Social Dinner 3h
  
  Gala Dinner at the Sam Noble Museum of Natural History
  2401 Chautauqua Ave.
  Norman, OK 73072-7029
Friday 8 November
- Mon 4 Nov
- Tue 5 Nov
- Wed 6 Nov
- Thu 7 Nov
- Fri 8 Nov
- 08:30 → 09:00
  
  Registration 30m
- 09:00 → 10:30
  Basic and end-user IT services
  
  Conveners: Dennis van Dok (Nikhef), Jingyan Shi (Chinese Academy of Sciences (CN)), Mary Hester
  - 09:00
    
    pkcli: A Framework for Scripts to Manage Applications 30m
    
    System administrators and developers need a way to call application code and other tasks through command line interfaces (CLIs). Some examples include user management (creation, deletion, moderation, etc) or seeding the database for development. We have developed an open source Python framework, pykern.pkcli, that simplifies the creation of these application-specific CLIs. In this talk, I will provide an overview of our framework and share examples of how we've used it to administer our systems. I'll discuss the advantages of using pykern.pkcli over traditional shell scripts, including improvements in development, testing, modification, and distribution. Additionally, I'll present a case study demonstrating how we use one of these scripts to manage user access control for an application and seamlessly share code between the CLI and a web interface.
    
    Speaker: evan carlin (RadiaSoft LLC)
    
    application_clis.pptx.pdf
  - 09:30
    
    CI4FPGA: Continuous Integration for FPGA/SoC Projects 30m
    
    As the complexity of FPGA and SoC development grows, so does the need for efficient and automated processes to streamline testing, building, and collaboration, particularly in large-scale scientific environments such as CERN. This initiative focuses on providing CI infrastructure that is tailored for FPGA development and pre-configured Docker images for essential EDA tools, keeping the learning slope for the more than 100 projected users of the service minimal and using centralized and managed infrastructure that aligns well with CERN's IT services. This centralization facilitates the seamless integration of tools and workflows across diverse experiments, ensuring that development efforts are unified and scalable.
    
    CI4FPGA facilitates testing and building processes by enabling automated pipelines, enhancing collaboration between development teams, and improving overall efficiency. The project frees FPGA designers from the resource-intensive task of maintaining clusters and container images, freeing them up to address key challenges such as automating unit and system-level testing, facilitating shared development of IP cores, among other benefits. One of the features employed is lazy pulling technology, that makes it possible to use scalable VM-based clusters with limited SSD sizes and drastically reduces container image load times from ~15 minutes to ~15 seconds.
    
    Speaker: Carmen Marcos
    
    CI4FPGA-HEPiX.pdf
    
    CI4FPGA-HEPiX.pptx
  - 10:00
    
    CRISP: Collaborative Tools for the ePIC Experiment 30m
    
    This talk describes a project to develop a set of collaborative tools for the upcoming ePIC experiment at the BNL Electron-Ion Collider (EIC). The "Collaborative Research Information Sharing Platform" (CRISP) is built upon an extensible, full-featured membership directory, with CoManage integration and a customized InvenioRDM document repository. The CRISP architecture will be presented, along with plans for future integrations and workflow development.
    
    Speaker: Ofer Rind (Brookhaven National Laboratory)
    
    CRISP HEPiX 2024.pdf
- 10:30 → 11:00
  
  Coffee 30m
- 11:00 → 11:30
  Miscellaneous
  - 11:00
    
    Shifting Hardware Landscape 30m
    
    Advances in computing hardware are essential for future HEP and NP experiments. These advances are seen as incremental improvements in performance metric over time, i.e. everything works the same, just better, faster, and cheaper. In reality, hardware advances and changes in requirements can result in the crossing of thresholds that require a re-evaluation of existing practices. The HEPiX Techwatch working group was created to monitor trends in technology that will impact HEP and NP experiments in the future.
    
    Speaker: Shigeki Misawa (Brookhaven National Laboratory (US))
    
    HEPiX-Fall-2024-Changing-Landscape.pdf
- 11:30 → 12:00
  
  Wrap-up
  
  Convener: Ofer Rind (Brookhaven National Laboratory)
  
  HEPiX Fall 2024 Workshop Wrap-up.pdf

Choose timezone

HEPiX Fall 2024 Workshop

HEPiX Fall 2024 at the University of Oklahoma, USA