Philosophy and Architecture: What the Manual Won't tell You
Who are you, where are you from and what do you hope to get out of the workshop?
Troubleshooting: What to do when things go wrong
Practical considerations for GPU Jobs
Currently more and more frameworks appear to perform offloaded compute to accelerators, or accelerating ML/AI workloads using CPU accelerators or GPUs. However right now the user it self still needs to figure out or decide how and what is the best execution library or acceleration system to execute there workloads.
How can we model this abstraction the best for htcondor so for our users the...
A new users experience of switching to HTCondor
This presentation will show how the Comic Rays group at Nikhef is using HTCondor in their analysis workflows on the local pool.
Dealing with sources of Data: Choices and the Pros/Cons
The NetApp DataOps Toolkit is a python library that makes it easy for developers, data scientists and data engineers to perform various data management tasks. These tasks include provisioning new data volumes or developing workspace almost instantaneously. It improves flexibility in development’s environment management. In this presentation, we will go over some examples and showcase how these...
Various AI workloads, such as Deep Learning, Machine Learning, Generative AI or Retrieval Augmented Generation, require capacity, compute power or data transfer performance. This presentation will show how simple a hardware / Software stack solution deployment, can leverage and/or become part of an AI infrastructure based on Ansible scripts. In addition, I will discuss two use cases, one on...
CHTC Vision: Compute and Data Together
Pelican Intro
PANEL and Discussion - Pelican and Condor: Flying Together, Birds of a Feather, Don't drop your data!
With the continuing growth of data volumes and computational demands, compute-intensive sciences rely on large-scale, diverse computing resources for running data processing, analysis tasks, and simulation workflows.
These computing resources are often made available to research groups by different resource providers resulting in a heterogeneous infrastructure.
To make efficient use of those...
The computing workflow of the Virgo Rome Group for the CW search based on Hough Analisys has been performed for several years using storage and computing resources mainly provisioned by INFN-CNAF and strictly tied with its specific infrastructure. Starting with O4a, the workflow has been adapted to be more general and to integrate with computing centers in the IGWN community. We discuss our...
Operating HTCondor with kubenettes
During the 20 years history of the Torque batch system at Nikhef, we constructed several command line tools providing various overviews of what was going on in the system. An example: a tool that could tell us "what are the 20 most recently started jobs?"
mrstarts | tail -20
With HTCondor we wanted the same kind of overviews. Much of this can be accomplished using the HTCondor...
HTC from the user perspective - to be chosen from former material
Exploring Job Histories with ElasticSearch and HTCondor AdStash
Quick overview of HTCondor for system administrators
DAGman: I didn't know it could do that!
This year has been eventful for our research lab, New hardware that brought along a host of challenges, we will share network, architecture and recent challenges that we are facing.
It's all about scale.
Graphical code editors such as Visual Studio Code (VS Code) have gained a lot of momentum in the last years among young researchers. To ease their workflows, we have developed a VS Code entry point to harness the resources of an HTC cluster within their IDE.
This entry point allows users to have a "desktop-like" experience within VS Code when editing and testing their code while working in...
The new HTCSS Python API: Python Bindings Version 2
HTCondor: Whats New / Whats coming up
Nikhef operates a local compute facility of around 6k cores. For the last two decades, Torque has been the batch system of choice on this cluster.
This year the system has been replaced with HTCondor; in this talk we share some of the concerns, design choices and experiences of the transition from the operator's perspective.
Opportunities and Challenges Courtesy Linux Cgroups Version 2
The adoption of AMD Instinct™ GPU accelerators in several of the major high-performance computing sites is a reality today and we’d like to share the pathway that lead us here. We’ll focus on characteristics of the hardware and ROCm software ecosystem, and how they were tuned to match the required compute density and programmability to make this adoption successful, from the discrete GPU to...
In this presentation we will go over GPU deployment at the NL SARA-MATRIX Grid site. An overview of the setup is shown, followed by some rudimentary performance numbers. Finally, the user adoption and how the GPU is used is discussed.
Breakthroughs in computing systems have made it possible to tackle immense obstacles in simulation environments. As a result, our understanding of the world and universe is advancing at an exponential rate. Supercomputers are now used everywhere—from car and airplane design, oil field exploration, and financial risk assessment, to genome mapping and weather forecasting.
Lenovo’s...
WLCG Token Transition Update (incl the illustrious return of x509)
Development and execution of scientific code requires increasingly complex software stacks and specialized resources such as machines with huge system memory or GPUs. Such resources are present in HTC/HPC clusters and used for batch processing since decades,but users struggle with adapting their software stacks and their development workflows to those dedicated resources. Hence, it is crucial...
ALICE experiment at CERN runs a distributed computing model and it is part of the Worldwide LHC Computing Grid (WLCG). WLCG uses a tiered distributed grid model. As part of the ALICE experiment’s computing grid we run two Tier2 (T2) sites in the US, at Oak Ridge National Laboratory and Lawrence Berkeley National Laboratory. Computing resource usage and delivery are being accounted through OSG...
In this presentation there will be a brief mention of the environment that hosts the OSDF Cache, the setup and suitable software for MS4 service. The presentation will lay out in a bit more depth the process of installing the OSDF cache and the challenges that arose during the installation.
In this contribution, I will present an HPC use case facilitated through gateways deployed at PIC. The selected HPC resource is the Barcelona Supercomputing Center, where we encountered some challenges, particularly in the CMS case, which required meticulous and complex work. We had to implement new developments in HTCondor, specifically enabling communication through a shared file system....
The Einstein Telescope (ET) is currently in the early development phase
for its computing infrastructure. At present, the only officially
provided service is the distribution of data for Mock Data Challenges
(using the Open Science Data Federation + CVMFS-for-data), with GitLab
used for code management. While the data distribution infrastructure is
expected to be managed by a Data Lake...
The Submission Infrastructure team of the CMS experiment at the LHC operates several HTCondor pools, comprising more than 500k CPU cores on average, for the experiment's different user groups. The jobs running in those pools include crucial experiment data reconstruction, physics simulation and user analysis. The computing centres providing the resources are distributed around the world and...
With the latest addition of 4k ARM cores, the ScotGrid Glasgow facility is a pioneering example of a heterogeneous WLCG Tier2 site. The new hardware has enabled large-scale testing by experiments and detailed investigations into ARM performance in a production environment.
I will present an overview of our computing cluster, which uses HTCondor as the batch system combined with ARC-CE as...
Ben will hopefully contribute something