Conveners
Workshop session
- Gregory Thain (University of Wisconsin-Madison)
- Catalin Condurache (EGI Foundation)
- Michel Jouvin (Universitรฉ Paris-Saclay (FR))
Workshop session
- Helge Meinhard (CERN)
- Christoph Beyer
- Chris Brew (Science and Technology Facilities Council STFC (GB))
Workshop session
- Jose Flix Molina (Centro de Investigaciones Energรฉti cas Medioambientales y Tecno)
- Helge Meinhard (CERN)
- Michel Jouvin (Universitรฉ Paris-Saclay (FR))
Workshop session
- Michel Jouvin (Universitรฉ Paris-Saclay (FR))
- Christoph Beyer
- Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)
Workshop session
- Catalin Condurache (EGI Foundation)
- Jose Flix Molina (Centro de Investigaciones Energรฉti cas Medioambientales y Tecno)
- Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)
Workshop session
- Christoph Beyer
- Jose Flix Molina (Centro de Investigaciones Energรฉti cas Medioambientales y Tecno)
- Gregory Thain (University of Wisconsin-Madison)
Workshop session
- Chris Brew (Science and Technology Facilities Council STFC (GB))
- Helge Meinhard (CERN)
- Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)
Workshop session
- Catalin Condurache (EGI Foundation)
- Helge Meinhard (CERN)
- Gregory Thain (University of Wisconsin-Madison)
Workshop session
- Catalin Condurache (EGI Foundation)
- Chris Brew (Science and Technology Facilities Council STFC (GB))
- Gregory Thain (University of Wisconsin-Madison)
Workshop session
- Jose Flix Molina (Centro de Investigaciones Energรฉti cas Medioambientales y Tecno)
- Christoph Beyer
- Chris Brew (Science and Technology Facilities Council STFC (GB))
In recent months the HTCondor has been the main workload management system for the Grid environment at CC-IN2P3. The computing cluster consists of ~640 worker nodes of various types which deliver in a total of ~27K execution slots (including hyperthreading). The system supports LHC experiments (Alice, Atlas, CMS, and LHCb) under the umbrella of the Worldwide LHC Computing Grid (WLCG) as a Tier...
CNAF started working with HTCondor during spring 2018,
planning to move its Tier-1 Grid Site based on CREAM-CE and LSF
Batch System to HTCondor-CE and HTCondor. The phase out of CREAM and
LSF was completed by spring 2020. This talk describes our experience
with the new system, with particular focus on HTCondor .
In 2016 the local (BIRD) and GRID DESY batch facilities were migrated to HTCondor, this talk will cover some of the experiences and developments we saw over the time and the plans fot the future of HTC at DESY.
GRIF is a distributed Tier-2 WLCG site grouping four laboratories in the Paris Region (IJCLab, IRFU, LLR, LPNHE). Multiple HTCondor instances are deployed at GRIF since several years. In particular an ARC-CE + HTCondor system provides access to the computing resources of IRFU and a distributed HTCondor pool, with CREAM-CE and Condor-CE gateways, gives unified access to the IJCLab and LLR...
GlideinWMS is a pilot framework to provide uniform and reliable HTCondor clusters using heterogeneous and unreliable resources. The Glideins are pilot jobs that are sent to the selected nodes, test them, set them up as desired by the user jobs, and ultimately start an HTCondor schedd to join an elastic pool. These Glideins collect information that is very useful to evaluate the health and...
The resource needs of high energy physics experiments such as CMS at the LHC are expected to grow in terms of the amount of data collected and the computing resources required to process these data. Computing needs in CMS are addressed through the "Global Pool" a vanilla dynamic HTCondor pool created through the glideinWMS software. With over 250k cores, the CMS Global Pool is the biggest...
CNAF started working with the HTCondor Computing Element from May
2018, planning to move its Tier-1 Grid Site based on CREAM-CE and LSF
Batch System to use HTCondor-CE and HTCondor. The phase out of CREAM
and LSF was completed by spring 2020. This talk describes our
experience with the new system, with particular focus on HTCondor-CE.
This contribution provides firsthand experience of adopting HTCondor-CE at German WLCG sites DESY and KIT. Covering two sites plus a remote setup for RWTH Aachen, we share our lessons learned in pushing HTCondor-CE to production. With a comprehensive recap from technical setup, a detour to surviving the ecosystem and accounting, to the practical Dos and Donts, this contribution is suitable for...
A review of how we run and operate a large multi purpose condor pool, with grid, local submission and dedicated resources. Using grid and local submission to drive utilisation of shared resources. Using transforms and routers in order to ensure jobs end up on the correct resources, and are accounted correctly. We will review our automation and monitoring tools, together with integration of...
The Coflu Cluster, also known as the Radio-Protection (RP) Cluster, started as an experimental project at CERN involving a few standard desktop computers, in 2007. It was envisaged to have a job scheduling system and a common storage space so that multiple Fluka simulations could be run in parallel and monitored, utilizing a custom built and easy-to-use web-interface.
Abstract The...
The majority of physics analysis jobs at CERN are run on high-throughput computing batch systems such as HTCondor. However, not everyone has access to computing farms, e.g. theorist wanting to make use of CMS Open Data, and for reproducible workflows more backend-agnostic approaches are desirable. The industry standard here are containers leveraged with Kubernetes, for which computing...
Our HTC cluster using HTCondor has been set up at Bonn University in 2017/2018.
All infrastructure is fully puppetised, including the HTCondor configuration.
OS updates are fully automated, and necessary reboots for security patches are scheduled in a staggered fashion backfilling all draining nodes with short jobs to maximize throughput.
Additionally, draining can also be scheduled for...
We're excited to share the launch of the HTCondor offering on the Google Cloud Marketplace, built by Google software engineer Cheryl Zhang with advice and support from the experts at the CHTC. Come see how quickly and easily you can start using HTCondor on Google Cloud with this new solution.
HEPCloud is working to integrate isolated HPC Centers, such as Theta at Argonne
National Laboratory, into the pool of resources made available to its user
community. Major obstacles to using these centers include limited or no outgoing
networking and restrictive security policies. HTCondor has provided a mechanism
to execute jobs in a manner that satisfies the constraints and...
The bulk of computing at CERN consists of embarrassingly parallel HTC use cases (Jones, Fernandez-Alavarez et al), however for MPI applications for e.g. Accelerator Physics and Engineering, a dedicated HPC cluster running SLURM is used.
In order to optimize utilization of the HPC cluster, idle nodes in SLURM cluster are backfilled with Grid HTC workloads. This talk will detail the CondorCE...
Our Tier2 cluster (ScotGrid, Glasgow) uses HTCondor as batch system, combined with ARC-CE as front-end for job submission and ARGUS for authentication and user mapping.
On top of this, we have built a central monitoring system based on Prometheus that collects, aggregates and displays metrics on custom Grafana dashboards. In particular, we extract jobs info by regularly parsing the output of...
The Physics Data Processing group at Nikhef is developing a Condor-based cluster, after a 19-year absence from the HTCondor community. This talk will discuss why we are developing this cluster, and present our plans and the results so far. It will also spend a slide or two on the potential to use HTCondor for other services we provide.
Dask is an increasingly-popular tool for both low-level and high-level parallelism in the Scientific Python ecosystem. I will discuss efforts at the Center for High Throughput Computing at UW-Madison to enable users to run Dask-based work on our HTCondor pool. In particular, we have developed a "wrapper package" based on existing work in the Dask ecosystem that lets Dask spawn workers in the...
In this presentation, I will introduce the SciTokens model (https://scitokens.org/) for federated capability-based authorization in distributed scientific computing. I will compare the OAuth and JWT security standards with X.509 certificates, and I will discuss ongoing work to migrate HTCondor use cases from certificates to tokens.