Conveners
Workshop presentations
- Helge Meinhard (CERN)
Workshop presentations
- Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)
Workshop presentations
- Christoph Beyer
Workshop presentations
- Antonio Puertas Gallardo (European Commission)
Workshop presentations
- Catalin Condurache (Science and Technology Facilities Council STFC (GB))
Workshop presentations
- Chris Brew (Science and Technology Facilities Council STFC (GB))
Workshop presentations
- Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
Workshop presentations
- Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)
Workshop presentations
- Antonio Puertas Gallardo (European Commission)
Workshop presentations
- Chris Brew (Science and Technology Facilities Council STFC (GB))
Workshop presentations
- Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
Workshop presentations
- Christoph Beyer
Workshop logistics
HTCondor uses the ClassAd language in three different ways. This tutorial will cover the full syntax of the ClassAd language, the uses in HTCondor, and advanced topics in ClassAd usages for system administration and monitoring.
Clusters running differently sized jobs can easily suffer from fragmentation: Large chunks of free resources are required to run larger jobs, but smaller jobs can block parts of these chunks, making the remainder too small. For example, clusters in the WLCG must provide space for 8-core jobs, while there is a constant pressure of 1-core jobs. Common approaches to this issue are the DEFRAG...
This tutorial covers the basic installation and configuration of the HTCondor system. Theory of operation, and system architecture is also covered.
Distinguishing characteristics of High Throughput Computing (HTC), including how it contrasts with High Performance Computing (HPC). When is HTC appropriate, when is HPC appropriate? Also lessons and best practices learned from experiences running the Open Science Grid, a 100+ institution distributed HTC environment.
The University of Oxford Tier-2 Grid cluster converted to using HTCondor in 2014. At that time, there was no suitable monitoring tool available. The Oxford team developed a command line tool, written in Python, that displays snapshot information about the running jobs. The tool provides the capability of reporting on the number of jobs running on a given node and the efficiency of each job....
An overview of recent developments and future plans in HTCondor.
The HTCondor-CE provides a remote API on top of a local site batch system.
HTCondor has been the primary production batch service at CERN for the last couple of years, passing the 100k core mark last year. The challenge has been to scale the service, in terms of course of the number of resources, but also in terms of the number of heterogenous use cases. The use cases involve dedicated LHC Tier-0 pools, dedicated resources within standard pools, special CE routes to...
This tutorial covers HTCondor's "Fair Share" mechanisms for assigning resources to users, configuring groups of users with quotas, and other aspects of global policy via the HTCondor negotiator.
The talk provides some details of special DESY configurations. It focuses on features we need for user registry integration, node maintenance operations and fair share / quota handling. With the help of job transformations defining job classes and proper job duration and memory setting, we setup a smooth and transparent operating model.
Haggis is an information system used to map CERN users to HTCondor accounting groups as well as hold information about quota and priority allocation per accounting group as well as information relevant to resource usage accounting. It enforces a tree-like domain model that supports resource mapping under different compute pools. All the data stored in Haggis is completely manageable by the...
How HTCondor deals with network architecture difficulties.
Introduction to the HTCondor python bindings and their use to query HTCondor.
Configuring a condor cluster and keeping the configuration synchronised can be quite the chore. For this purpose, under the umbrella of HEP-Puppet, sysadmins have gathered to create a simple-to-use Puppet module. With just a few lines of YAML (hiera) you can configure your own HTCondor cluster within minutes (Puppet infrastructure provided). This talk will showcase the module with snippets...
Tutorial on using python to submit jobs to HTCondor, concentrating on the 8.7 series improvements in the HTCondor python bindings.
Miron Livny would like to lead a discussion on how to best interface with HTCondor when working inside a Python environment, especially an interactive science-based environment such as Jupyter Notebook / Lab. We have been experimenting with some approaches at UW-Madison that we can share, but what we are looking for an open discussion of ideas, feedback, and suggestions.
Learn how the Annex allows you to seamless expand your HTCondor pool using machines from Amazon EC2.
Based on current trends and past experience, this talk will identify and discuss six key challenge areas that will continue to drive High Throughput Computing technologies innovation in the years to come.
RAL Tier-1 originally used the PBS batch system for its Grid related activities. Increased LHC operation requirements exposed scalability problems, therefore other batch systems were taken into consideration.
In this presentation we review the history of HTCondor at RAL and detail on how it evolved from an initial conventional setup with cgroups for resource control to current use of Docker...
HTCondor is a product, but it is not an application. Like operating systems, networks, database management systems, and security infrastructures, HTCondor is a general system, upon which other applications may be built.
Extra work is needed to create something useful from HTCondor. The extra work depends on the goals of the designer. This talk identifies a few general areas that need to be...
Access to both HTC and HPC facilities is vitally important to the fusion community, not only for plasma modelling but also for advanced engineering and design, materials research, rendering, uncertainty quantification and advanced data analytics for engineering operations. The computing requirements are expected to increase as the community prepares for the next generation facility, ITER....
Discussion of the language used by HTCondor for configuration and job submit files.
Request 30 Minute time slot.
DAGMan lets you manage large, complex workflows in HTCondor.
We believe that distributed, scientific computing community has unique authorization needs that can be met by utilizing common web technologies, such as OAuth 2.0 and JSON Web Tokens (JWT). The SciTokens team, a collaboration between technology providers including the HTCondor Project and domain scientists, is working to build and demonstrate a new authorization approach at scale.
In recent times, the CMS HTCondor Global Pool, which unifies access and management to all CPU resources available to the experiment, has been growing in size and evolving in its complexity, as new resources and job submit nodes are being added to the design originally conceived to serve the collaboration during the LHC Run 2. Having achieved most of our milestones for this period, the pool...
Nowadays computational resources come in a wide variety of forms from pilots running on sites, cloud resources and spare cycles on desktops, laptops and even phones through volunteer computing and our duty, as the Submission Infrastructure team at CMS, is to be able to use them all.
When it comes to Integrate these different models into a single pool of resources, different challenges arise....
Geospatial data are one of the core data sources for scientific and technical support to the European Commission (EC) policies. For instance, the Copernicus programme of the European Union provides a vast amount of Earth Observation (EO) data for monitoring the environment through the Sentinel satellites operated by the European Space Agency. In terms of data management and processing, big...
Discussion of policy expressions available to users when the submit their HTCondor jobs, and expressions available to Administrators when they configure HTCondor execute nodes. Time permitting, there will be a demonstration of special purpose execution slots.
Request 60 Minute slot.
In 2013 the RAL Tier-1 switched its batch farm to using HTCondor. In the years following, several more UK sites have made the switch. The RAL Tier-1 batch farm is now well over 20000 job slots and HTCondor is a key service delivering our pledged resources to the WLCG, now and for the forseeable future.
New funding opportunities are available to provide computing in the UK to the "long tail"...
An overview of monitoring an HTCondor pool
Overview of HTCondor's mechanisms in support of job isolation, including Docker, Singularity, cgroups, and namespace mounts.
A setup to share clusters that used to be owned and operated by experimental and theory sub-groups in the Physics Department of the University of Milan is described. Each sub-cluster is configured as a separate Condor Pool, reporting to one additional 'super'-collector. With a few assumptions on the available execution environment, plus mutually agreed priorities for 'local' jobs, this allows...
All members of the LIGO Scientific Collaboration have access to a handful of dedicated LIGO Data Grid clusters which feature HTCondor, system-installed software, the LIGO and Virgo data, and other standard components. Cardiff University also host a LIGO Data Grid Site, but this is built on top of the shared institutional HPC cluster. In this talk I describe how I used HTCondor, Spack,...
Discussion of the Job Transform language in the HTCondor Schedd.
Request 30 Minute time slot.