WLCG Workload Recommendations
Scope of recommendations
These recommendations are intended as a guide for new sites or sites looking to make changes to their workload management systems, as in batch scheduler or grid CE.
Picking one of the recommended solutions is not mandatory. Sites are free to choose other solutions if they wish, as long as they make their users (LHC experiments in WLCG) and funding agencies happy (for pledged resources this very likely means having working accounting and monitoring).
The recommendations below has some rationale for which to choose, but other factors might be more relevant for your site, like what do nearby sites run or what local expertise do you have access to.
Recommended batch systems
- HTCondor - for sites with HTC loads, lots of jobs each up to one node in size. Especially for sites with a very large number of jobs. A typical dedicated WLCG site looks like this.
- SLURM - for site with HPC loads, as in multi-node MPI.
Recommended CEs
- HTCondor-CE - makes most sense when connected to HTCondor
- Caveats:
- Accounting: please check here
- This CE type can be used by all LHC experiment frameworks, but check with other communities your site needs to support
- ARC-CE - works with SLURM and HTCondor.
- Also used for lightweight CEs with file staging to local filesystem instead of a close SE
Stakeholder statements
Here we will list further statements from experiments or infrastructures that could help sites make informed choices.
For example:
- In the Nordic countries, ARC-CE has good local expertise. -- Mattias Wadenstein for NDGF & NorduGrid collaboration.
Editors
The editors of the recommendations are
AlessandraForti,
HelgeMeinhard,
MaartenLitmaath,
MattiasWadenstein - please send any stakeholder statements or issues to them.
--
MattiasWadenstein - 2018-01-10