HTCondor Workshop Autumn 2024 in Amsterdam

Name: HTCondor Workshop Autumn 2024 in Amsterdam
Start: 2024-09-23T09:00:00+02:00
End: 2024-09-27T14:00:00+02:00
Location: Nikhef

23–27 Sept 2024

Nikhef

Europe/Amsterdam timezone

Support

hepix-2024condorworkshop-support@hepix.org

Contribution List

38. Welcome, Introduction and Housekeeping

Christoph Beyer, Mary Hester

24/09/2024, 09:00

Miscellaneous

Workshop Session

39. Nikhef Welcome

24/09/2024, 09:10

Miscellaneous

Workshop Session

12. Philosophy and Architecture: What the Manual Won't tell You

MIRON LIVNY

24/09/2024, 09:30

HTCondor presentations and tutorials

Workshop Session

Philosophy and Architecture: What the Manual Won't tell You

40. Round the room introductions

24/09/2024, 10:10

Miscellaneous

Workshop Session

Who are you, where are you from and what do you hope to get out of the workshop?

10. Troubleshooting: What to do when things go wrong

Andrew Owen

24/09/2024, 11:00

HTCondor presentations and tutorials

Workshop Session

Troubleshooting: What to do when things go wrong

9. Practical considerations for GPU Jobs

Andrew Owen

24/09/2024, 11:35

HTCondor presentations and tutorials

Workshop Session

Practical considerations for GPU Jobs

8. Abstracting Accelerators Away

Emily Kooistra

24/09/2024, 14:00

HTCondor user presentations

Workshop Session

Currently more and more frameworks appear to perform offloaded compute to accelerators, or accelerating ML/AI workloads using CPU accelerators or GPUs. However right now the user it self still needs to figure out or decide how and what is the best execution library or acceleration system to execute there workloads.

How can we model this abstraction the best for htcondor so for our users the...

50. An ATLAS researcher's experience with HTCondor.

Zef Wolffs (Nikhef National institute for subatomic physics (NL))

24/09/2024, 14:25

HTCondor user presentations

Workshop Session

A new users experience of switching to HTCondor

45. Monte Carlo simulations of extensive air showers at NIKHEF

Kevin Cheminant (Radboud University / NIKHEF)

24/09/2024, 14:45

HTCondor user presentations

Workshop Session

This presentation will show how the Comic Rays group at Nikhef is using HTCondor in their analysis workflows on the local pool.

52. HTCondor + Nikhef - A History of Productive Collaboration

MIRON LIVNY

24/09/2024, 15:10

HTCondor presentations and tutorials

Workshop Session

24. Dealing with sources of Data: Choices and the Pros/Cons

Brian Paul Bockelman (University of Wisconsin Madison (US))

25/09/2024, 09:00

HTCondor presentations and tutorials

Workshop Session

Dealing with sources of Data: Choices and the Pros/Cons

17. Managing Storage at the EP

Cole Bollig

25/09/2024, 09:35

HTCondor presentations and tutorials

Workshop Session

Managing Storage at the EP

46. NetApp DataOps Toolkit for data management

Didier Gava (NetApp)

25/09/2024, 10:10

Miscellaneous

Workshop Session

The NetApp DataOps Toolkit is a python library that makes it easy for developers, data scientists and data engineers to perform various data management tasks. These tasks include provisioning new data volumes or developing workspace almost instantaneously. It improves flexibility in development’s environment management. In this presentation, we will go over some examples and showcase how these...

47. Storage Solutions with AI workloads

Didier Gava

25/09/2024, 11:00

Miscellaneous

Workshop Session

Various AI workloads, such as Deep Learning, Machine Learning, Generative AI or Retrieval Augmented Generation, require capacity, compute power or data transfer performance. This presentation will show how simple a hardware / Software stack solution deployment, can leverage and/or become part of an AI infrastructure based on Ansible scripts. In addition, I will discuss two use cases, one on...

26. CHTC Vision: Compute and Data Together

MIRON LIVNY

25/09/2024, 11:25

HTCondor presentations and tutorials

Workshop Session

CHTC Vision: Compute and Data Together

27. Pelican Intro

Brian Paul Bockelman (University of Wisconsin Madison (US))

25/09/2024, 11:45

HTCondor presentations and tutorials

Workshop Session

Pelican Intro

28. PANEL and Discussion - Pelican and Condor: Flying Together, Birds of a Feather, Don't drop your data!

Brian Paul Bockelman (University of Wisconsin Madison (US)), MIRON LIVNY, Todd Tannenbaum

25/09/2024, 12:05

Workshop Session

PANEL and Discussion - Pelican and Condor: Flying Together, Birds of a Feather, Don't drop your data!

29. Dynamic resource integration with COBalD/TARDIS

Florian Von Cube (KIT - Karlsruhe Institute of Technology (DE))

25/09/2024, 14:00

HTCondor user presentations

Workshop Session

With the continuing growth of data volumes and computational demands, compute-intensive sciences rely on large-scale, diverse computing resources for running data processing, analysis tasks, and simulation workflows.
These computing resources are often made available to research groups by different resource providers resulting in a heterogeneous infrastructure.
To make efficient use of those...

35. Adapting Hough Analisys workflow to run on IGWN resources

Stefano Dal Pra (Universita e INFN, Bologna (IT))

25/09/2024, 14:25

HTCondor user presentations

Workshop Session

The computing workflow of the Virgo Rome Group for the CW search based on Hough Analisys has been performed for several years using storage and computing resources mainly provisioned by INFN-CNAF and strictly tied with its specific infrastructure. Starting with O4a, the workflow has been adapted to be more general and to integrate with computing centers in the IGWN community. We discuss our...

48. Kubenettes ↔ HTC

Brian Paul Bockelman (University of Wisconsin Madison (US))

25/09/2024, 14:45

HTCondor presentations and tutorials

Workshop Session

Operating HTCondor with kubenettes

42. Fun with Condor Print Formats

Jeff Templon (Nikhef National institute for subatomic physics (NL))

25/09/2024, 15:10

HTCondor user presentations

Workshop Session

During the 20 years history of the Torque batch system at Nikhef, we constructed several command line tools providing various overviews of what was going on in the system. An example: a tool that could tell us "what are the 20 most recently started jobs?"

mrstarts | tail -20

With HTCondor we wanted the same kind of overviews. Much of this can be accomplished using the HTCondor...

30. HTC from the user perspective

Cole Bollig

25/09/2024, 16:00

HTCondor presentations and tutorials

Workshop Session

HTC from the user perspective - to be chosen from former material

23. Exploring Job Histories with ElasticSearch and HTCondor AdStash

Todd Tannenbaum

25/09/2024, 16:30

HTCondor presentations and tutorials

Workshop Session

Exploring Job Histories with ElasticSearch and HTCondor AdStash

31. HTCondor System Administration Introduction

Todd Tannenbaum

25/09/2024, 16:50

HTCondor presentations and tutorials

Workshop Session

Quick overview of HTCondor for system administrators

22. DAGman: I didn't know it could do that!

Cole Bollig

26/09/2024, 09:00

HTCondor presentations and tutorials

Workshop Session

DAGman: I didn't know it could do that!

1. Final project update

David Handelman

26/09/2024, 09:50

HTCondor user presentations

Workshop Session

This year has been eventful for our research lab, New hardware that brought along a host of challenges, we will share network, architecture and recent challenges that we are facing.
It's all about scale.

2. Integrating an IDE with HTCondor

Michael Hubner (University of Bonn (DE))

26/09/2024, 10:10

HTCondor user presentations

Workshop Session

Graphical code editors such as Visual Studio Code (VS Code) have gained a lot of momentum in the last years among young researchers. To ease their workflows, we have developed a VS Code entry point to harness the resources of an HTC cluster within their IDE.

This entry point allows users to have a "desktop-like" experience within VS Code when editing and testing their code while working in...

18. The new HTCSS Python API: Python Bindings Version 2

Cole Bollig

26/09/2024, 11:00

HTCondor presentations and tutorials

Workshop Session

The new HTCSS Python API: Python Bindings Version 2

21. HTCondor: Whats New / Whats coming up

Todd Tannenbaum

26/09/2024, 11:25

HTCondor presentations and tutorials

Workshop Session

HTCondor: Whats New / Whats coming up

36. Moving from Torque to HTCondor on the local cluster

Mr Dennis van Dok (Nikhef)

26/09/2024, 12:10

HTCondor user presentations

Workshop Session

Nikhef operates a local compute facility of around 6k cores. For the last two decades, Torque has been the batch system of choice on this cluster.
This year the system has been replaced with HTCondor; in this talk we share some of the concerns, design choices and experiences of the transition from the operator's perspective.

19. Opportunities and Challenges Courtesy Linux Cgroups Version 2

Brian Paul Bockelman (University of Wisconsin Madison (US))

26/09/2024, 14:00

HTCondor presentations and tutorials

Workshop Session

Opportunities and Challenges Courtesy Linux Cgroups Version 2

44. AMD INSTINCT GPU CAPABILITY AND CAPACITY AT SCALE

Samuel Antao (AMD)

26/09/2024, 14:25

Miscellaneous

Workshop Session

The adoption of AMD Instinct™ GPU accelerators in several of the major high-performance computing sites is a reality today and we’d like to share the pathway that lead us here. We’ll focus on characteristics of the hardware and ROCm software ecosystem, and how they were tuned to match the required compute density and programmability to make this adoption successful, from the discrete GPU to...

37. GPUs in the Grid

Dr Lodewijk Nauta (SURF)

26/09/2024, 14:50

HTCondor user presentations

Workshop Session

In this presentation we will go over GPU deployment at the NL SARA-MATRIX Grid site. An overview of the setup is shown, followed by some rudimentary performance numbers. Finally, the user adoption and how the GPU is used is discussed.

4. Lenovo’s Cooler approach to HTC Computing

Mr Rick Koopman

26/09/2024, 15:10

Miscellaneous

Workshop Session

Breakthroughs in computing systems have made it possible to tackle immense obstacles in simulation environments. As a result, our understanding of the world and universe is advancing at an exponential rate. Supercomputers are now used everywhere—from car and airplane design, oil field exploration, and financial risk assessment, to genome mapping and weather forecasting.

Lenovo’s...

15. WLCG Token Transition Update (incl the illustrious return of x509)

Brian Paul Bockelman (University of Wisconsin Madison (US))

27/09/2024, 09:00

HTCondor presentations and tutorials

Workshop Session

WLCG Token Transition Update (incl the illustrious return of x509)

3. Practical experience with an interactive-first approach to leverage HTC resources

Oliver Freyermuth (University of Bonn (DE))

27/09/2024, 09:25

HTCondor user presentations

Workshop Session

Development and execution of scientific code requires increasingly complex software stacks and specialized resources such as machines with huge system memory or GPUs. Such resources are present in HTC/HPC clusters and used for batch processing since decades,but users struggle with adapting their software stacks and their development workflows to those dedicated resources. Hence, it is crucial...

6. HTCondor setup @ ORNL, an ALICE T2 site

Irakli Chakaberia (Lawrence Berkeley National Lab. (US))

27/09/2024, 09:45

HTCondor user presentations

Workshop Session

ALICE experiment at CERN runs a distributed computing model and it is part of the Worldwide LHC Computing Grid (WLCG). WLCG uses a tiered distributed grid model. As part of the ALICE experiment’s computing grid we run two Tier2 (T2) sites in the US, at Oak Ridge National Laboratory and Lawrence Berkeley National Laboratory. Computing resource usage and delivery are being accounted through OSG...

43. Implementing OSDF Cache in SURF - MS4 Service

Jasmin Colo

27/09/2024, 10:10

HTCondor user presentations

Workshop Session

In this presentation there will be a brief mention of the environment that hosts the OSDF Cache, the setup and suitable software for MS4 service. The presentation will lay out in a bit more depth the process of installing the OSDF cache and the challenges that arose during the installation.

7. HPC use case through PIC

Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))

27/09/2024, 11:00

HTCondor user presentations

Workshop Session

In this contribution, I will present an HPC use case facilitated through gateways deployed at PIC. The selected HPC resource is the Barcelona Supercomputing Center, where we encountered some challenges, particularly in the CMS case, which required meticulous and complex work. We had to implement new developments in HTCondor, specifically enabling communication through a shared file system....

49. HTCondor in Einstein Telescope

Luca Tabasso

27/09/2024, 11:20

HTCondor user presentations

Workshop Session

The Einstein Telescope (ET) is currently in the early development phase
for its computing infrastructure. At present, the only officially
provided service is the distribution of data for Mock Data Challenges
(using the Open Science Data Federation + CVMFS-for-data), with GitLab
used for code management. While the data distribution infrastructure is
expected to be managed by a Data Lake...

5. Transitioning the CMS pools to ALMA9

Florian Von Cube (KIT - Karlsruhe Institute of Technology (DE))

27/09/2024, 11:40

HTCondor user presentations

Workshop Session

The Submission Infrastructure team of the CMS experiment at the LHC operates several HTCondor pools, comprising more than 500k CPU cores on average, for the experiment's different user groups. The jobs running in those pools include crucial experiment data reconstruction, physics simulation and user analysis. The computing centres providing the resources are distributed around the world and...

51. Heterogeneous Tier2 Cluster and Power Efficiency Studies at ScotGrid Glasgow

Emanuele Simili

27/09/2024, 12:00

HTCondor user presentations

Workshop Session

With the latest addition of 4k ARM cores, the ScotGrid Glasgow facility is a pioneering example of a heterogeneous WLCG Tier2 site. The new hardware has enabled large-scale testing by experiments and detailed investigations into ARM performance in a production environment.

I will present an overview of our computing cluster, which uses HTCondor as the batch system combined with ARC-CE as...

41. Workshop Wrap-Up and Goodbye

Chris Brew (Science and Technology Facilities Council STFC (GB))

27/09/2024, 12:20

Miscellaneous

Workshop Session

33. Bens CERN talk

HTCondor user presentations

hence the name ...

34. Christophs introduction to Jupyter notebooks

Christoph Beyer

might include a live-demo ....