HTCondor Workshop Autumn 2025 in Prague

Europe/Prague
FZU - Institute of Physics of the Czech Academy of Sciences

FZU - Institute of Physics of the Czech Academy of Sciences

Pod Vodárenskou věží 2531/3 Praha 8 Czechia
Helge Meinhard (CERN), Todd Tannenbaum, Chris Brew (Science and Technology Facilities Council STFC (GB)), Christoph Beyer, Mary Hester, Alexandr Mikula (Czech Academy of Sciences (CZ))
Description

We are very pleased to announce that the 2025 European HTCondor Workshop will be held from Tuesday 16th September to Friday 19th September, at the Institute of Physics of the Czech Academy of Sciences in Prague.

The meeting will start on Tuesday morning and run until lunchtime on the Friday.

The workshop will be an excellent occasion for learning from the experts (the developers!) about HTCondor, exchanging with your colleagues about experiences and plans and providing your feedback to the experts. 

The HTCondor Compute Entrypoint (CE) will be covered as well as with token authentication (currently a hot topic), along with general use and administration of HTCondor.

Participation is open to all organisations (including companies) and persons interested in HTCondor (and by no means restricted to particle physics and/or academia!) If you know potentially interested persons, don't hesitate to make them aware of this opportunity.

The workshop will cover both using and administering HTCondor; topics will be chosen to best match participants' interests.

We would very much like to know about your use of HTCondor, in you project, your experience and your plans. Hence you are warmly encouraged to propose a short presentation.

If you have any questions, please contact hepix-2025condorworkshop-support@hepix.org.

The workshop is hosted with the support of The Institute of Physics of the Czech Academy of Sciences, The Nuclear Physics Institute of the Czech Academy of Sciences and Mcomputers.

We are looking forward to a rich, productive workshop.

Chris Brew (STFC - RAL) and Christoph Beyer (DESY), Co-Chairs of organising committee.

Alexandr Mikula (FZU), Chair of the Local Organising Committee 

Dagmar Adamova (INP), Local Organising Committee

Todd Tannenbaum, HTCondor Technical Lead, U Wisconsin, Madison, USA

 

Participants
    • 09:00 14:00
      Program comittee meeting 5h

      will do last adjustments to the program, and discuss open topics for the upcoming week also check room for the meeting and setup

    • 08:30 09:00
      Registration 30m
    • 09:00 10:30
      Workshop Session
      • 09:00
        FZU Introduction Talk 20m
        Speaker: Michael Prouza (FZU - Institute of Physics of the Czech Academy of Sciences)
      • 09:25
        Nuclear Physics Institute CAS and its role in the WLCG computing in the Czech republic 20m

        The Nuclear Physics Institute (NPI) of the Czech Academy of Sciences (CAS) conducts research across a broad spectrum of nuclear and particle physics, both experimental and theoretical, as well as applied. It also participates in large-scale international projects. Scientists and students from the Department of Heavy Ion Physics are actively involved in the experiments ALICE@LHC, STAR@RHIC and ePIC@EIC. According to the ALICE Constitution, the ALICE group at NPI provides computing services and resources to support the ALICE collaboration. In this contribution, we briefly introduce NPI, and present an overview of the activities of the NPI ALICE group in the area of computing for ALICE in the Czech Republic.

        Speaker: Dagmar Adamova (Czech Academy of Sciences (CZ))
      • 09:50
        Meeting Housekeeping 5m
        Speaker: Alexandr Mikula (Czech Academy of Sciences (CZ))
      • 10:00
        Round the Room Introductions 30m
    • 10:30 11:00
      Coffee Break 30m
    • 11:00 12:30
      Workshop Session
    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 15:30
      Workshop Session
    • 15:30 16:00
      Coffee 30m
    • 16:00 17:30
      Workshop Session
      • 16:00
        Two decades of HTCondor and CMS success story 20m

        The Compact Muon Solenoid at CERN has successfully leveraged vast amounts of compute resources from a globally distributed infrastructure (primarily the Worldwide LHC Computing Grid) for over two decades, enabling the collaboration to achieve its scientific goals. During this period, a key technology has been the HTCondor Software Suite (HTCSS), which has allowed CMS to manage its High Throughput workloads, an essential capacity for the experiment in order to process, simulate and analyze the enormous datasets produced by the LHC.

        In this contribution, in the year when the HTCSS turns 40, we propose an overview of the nearly 20-year shared history of HTCondor and CMS. We highlight how the CMS Submission Infrastructure team has continuously upgraded and adapted our HTCondor setup to cover the increasingly larger, more complex and diverse processing needs of CMS - thanks in no small part to the invaluable support of the HTCondor community throughout the years.

        Speaker: Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas)
      • 16:25
        Can we break HTCondor before HL-LHC does? 20m

        what is Run 4 scale, and can we test HTCondor to reach it, and what happens when it does

        Speaker: Antonio Delgado (University of Notre Dame (US))
      • 16:50
        News from the DESY cluster 20m

        This talk is a mix of follow-ups to earlier talks in this series about power-modulation of a condor pool and some recent things that hopefully are "something to write home about"

        Speaker: Christoph Beyer
      • 17:10
        Latest generation CPU overview (Intel/AMD) and selection of GPUs for increased performance 20m

        CPU Intel (Xeon 6th generation)

        CPU AMD (Epyc 5th generation)

        GPUs Nvidia (latest generation)

        How to configurate first CPU Node with GPU support

        Speaker: Lukas Vach
    • 18:00 22:00
      Social event: Welcome Drink
      Convener: Alexandr Mikula (Czech Academy of Sciences (CZ))
      • 18:00
        Welcome Drink 4h

        The welcome drink social event will take place in the local microbrewery named Cobolis (Burešova 1661/2, Praha 8), it is around 7 minutes walk from the conference venue.
        The event will start at 6 PM and will include the refreshments and guided tour through the brewery with beer tasting.
        The tour can accommodate only up to 20 persons so we will be divided into two groups taking the tour.

        https://cobolis.cz/

        Speaker: Alexandr Mikula (Czech Academy of Sciences (CZ))
    • 09:00 10:30
      Show Your Toolbox/Lightning Talks: Monitoring
    • 10:30 11:00
      Coffee 30m
    • 11:00 12:30
      Workshop Session
      Convener: Helge Meinhard (CERN)
      • 11:00
        HTC Accounting with AUDITOR 20m

        In the realm of High Throughput Computing (HTC), managing and processing large volumes of accounting data across diverse environments and use cases presents significant challenges. AUDITOR addresses this issue by providing a flexible framework for building accounting pipelines that can adapt to a wide range of needs.
        At its core, AUDITOR serves as a centralized storage solution for accounting records, facilitating data exchange through a REST interface. This enables seamless interaction with the other parts of the AUDITOR ecosystem: the collectors, which gather accounting data from various sources and push it to AUDITOR, and the plugins, which pull data from AUDITOR for subsequent processing. The modular nature of AUDITOR allows for the customization of collectors and plugins to match specific use cases and environments, ensuring a tailored approach to the management of accounting data. The most important collector for HTC is the HTCondor collector, which retrieves the required job ClassAds from HTCondor and converts them into the AUDITOR record structure.
        This presentation will outline the structure of the AUDITOR accounting ecosystem with a special focus on the HTCondor collector, demonstrate existing accounting pipelines and show how AUDITOR could be extended to account environmentally sustainable computing resources.

        Speaker: Dirk Sammel (University of Freiburg (DE))
      • 11:25
        HTCondor-CE Dashboard 20m
        Speaker: Todd Tannenbaum
      • 11:45
        Snakemake Done Three Ways 20m
        Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US))
      • 12:10
        Architecture Overview - What are the components and processes 20m
        Speaker: Todd Tannenbaum
    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 15:30
      Workshop Session
      Convener: Gregory Thain
      • 14:00
        Introduction to GPUs and novel Architectures Theme 15m
        Speaker: MIRON LIVNY
      • 14:20
        Machine Learning and Scheduling Strategies for Sustainable Computing at PIC 20m

        This work addresses the optimisation of energy use, electricity costs, and CO2 emissions at the PIC WLCG Tier-1 site. With data centre energy demand expected to increase, aligning with WLCG sustainability goals is critical.

        Two main studies were conducted. First, simulated natural job drainages, applied to 2023–2024 PIC utilisation data (HTCondor logs), evaluated halting job acceptance during periods of high electricity prices or emissions. The approach yielded modest savings but significant computational losses, mainly due to non-energy-aware HTCondor scheduling, hardware characteristics, and hyperthreading. More promising strategies include selectively shutting down inefficient nodes or adjusting CPU frequencies at compute nodes.

        Second, an XGBoost model was developed to predict CPU-core reduction after real-time drainage events, using only decision-time features. Using two years of HTCondor information at the site, the model accurately forecast core drops, especially 8–40 h post-drainage, enabling the design of a dynamic CPU resource management system responsive to price and environmental signals.

        These results provide actionable insights for sustainable computing operations at PIC and within the broader WLCG framework.

        Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
      • 14:40
        Operating HTCondor for CPU and GPU Workloads at IFIC 20m

        The Instituto de Física Corpuscular (IFIC) is a joint research center of the Spanish National Research Council (CSIC) and the University of Valencia, focused on fundamental physics, from particle physics to cosmology. It hosts over 400 researchers, engineers, and technical staff working on national and international projects.

        In this talk, we will present how IFIC manages two compute clusters, GLUON (CPU) and Artemisa (GPU), using HTCondor. These clusters serve both internal users and external collaborators, and support a wide range of workloads, from classical simulations to deep learning applications. We will describe the general architecture of each pool, our strategies for efficient GPU and CPU resource allocation, the management of usage policies and priorities, as well as some lessons learned from operating a hybrid infrastructure.

        Additionally, we will describe how we handle parallel jobs over InfiniBand in GLUON alongside traditional serial jobs through HTCondor’s vanilla universe.

        Speaker: Miguel Folgado (IFIC / CSIC-UV)
      • 15:00
        HTCondor GPU administration and How We Deploy It In the Real World 30m
        Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US))
    • 15:30 16:00
      Coffee 30m
    • 16:00 16:30
      Workshop Session
      Convener: Chris Brew (Science and Technology Facilities Council STFC (GB))
    • 16:30 17:30
      HTCondor Office Hours
    • 09:00 10:30
      Workshop Session
      Convener: Brian Paul Bockelman (University of Wisconsin Madison (US))
    • 10:30 11:00
      Coffee 30m
    • 11:00 12:30
      Show Your Toolbox/Lightning Talks: Scheduling, Policies and other cool stuff
      Conveners: Chris Brew (Science and Technology Facilities Council STFC (GB)), Christoph Beyer
    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 15:30
      Workshop Session
      Convener: Christoph Beyer
      • 14:00
        Hosting an HTCondor with Container Universe workload on a bare metal Kubernetes cluster 20m

        INFN with its DATAcloud infrastructure provides a scalable network of federated cloud sites. The ICSC project (National Research Center in High-Performance Computing, Big Data, and Quantum Computing), funded by the PNRR (National Recovery and Resilience Plan), was established to drive R&D efforts focused on advancing high-performance computing, simulations, and big data analytics innovation. As part of the ICSC initiative, the INFN Milano computing center has expanded its capacity by deploying a bare metal Kubernetes cluster.

        This contribution describes how to deploy an HTCondor cluster on Kubernetes to run jobs in a Container Universe, both with Docker and Apptainer as container runtimes. This has been achieved via the virtualization of the Condor execute node in a Kubernetes Deployment, to add scalability. These worker nodes have been joined to the existing baremetal HTCondor cluster.

        As a representative use case, a workload generating production events for the LHC ATLAS experiment has been tested with three configurations: Apptainer jobs on baremetal, Apptainer and Docker jobs inside Kubernetes. The added virtualization layer has not hindered the job's performance.

        Speakers: Caterina Marcon (Università degli Studi e INFN Milano (IT)), David Rebatto (Università degli Studi e INFN Milano (IT))
      • 14:25
        Managing credentials with the credmon 20m
        Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US))
      • 14:50
        A Job's Journey Through HTCondor - How a job is placed, matched, and run 40m
        Speaker: Gregory Thain (University of Wisconsin - Madison)
    • 15:30 16:00
      Coffee 30m
    • 16:00 17:30
      HTCondor Office Hours
    • 18:30 21:30
      Social event: Conference diner
      Convener: Alexandr Mikula (Czech Academy of Sciences (CZ))
      • 18:30
        Social dinner 3h
        Speaker: Alexandr Mikula (Czech Academy of Sciences (CZ))
    • 09:30 10:30
      Workshop Session
      • 09:30
        Introduction to Data Management and Movement Theme 15m
        Speaker: MIRON LIVNY
      • 09:50
        Files Common Across Jobs and How to Better Handle Them 20m
        Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US))
      • 10:10
        Disk Usage of Jobs at the EP 20m
        Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US))
    • 10:30 11:00
      Coffee 30m
    • 11:00 12:30
      Workshop Session
      Convener: Chris Brew (Science and Technology Facilities Council STFC (GB))
      • 11:00
        Discussion: Run 4 and HTCSS - What we need 5 years from now 1h

        Let's identify or discuss big upcoming needs / vision for HTCSS. What does the community expect from HTCSS 5 years from now? What changes/demands will Run 4 impose on the computing environment?

        Speaker: MIRON LIVNY
      • 12:10
        Workshop Wrap Up 20m
        Speaker: Chris Brew (Science and Technology Facilities Council STFC (GB))
    • 12:30 14:00
      Lunch 1h 30m