Challenge and Future of Job Distribution at a Multi-VO Grid Site

13 Apr 2015, 14:30
15m
C210 (C210)

C210

C210

oral presentation Track6: Facilities, Infrastructure, Network Track 6 Session

Speaker

Birgit Lewendel (Deutsches Elektronen-Synchrotron (DE))

Description

DESY operates a multi-VO Grid site for 20 HEP and non-HEP collaborations and is one of the world-wide largest Tier-2 sites for ATLAS, CMS, LHCb, and BELLE2. In one common Grid infrastructure computing resources are shared by all VOs according to MoUs and agreements, applying an opportunistic usage model allows to distribute free resources among the VOs. Currently, the Grid site DESY-HH provides roughly 100kHS06 in 10000 job slots, exploiting the queueing system PBS/TORQUE. As described in former CHEP conferences, resource utilization and job scheduling in a multi-VO environment is a major challenge. On one hand side all job slots should be occupied, on the other hand jobs with diverging resource usage patterns must be cleverly distributed to the compute nodes in order to guarantee stability and optimal resource usage. Batch systems such as PBS/TORQUE with the scheduler MAUI only scale up to a few thousand job slots. At DESY-HH an alternative scheduler was developed and brought into operation. In the preparation for LHC Run 2 as well as the start of BELLE2, in particular the request for the support of multi-core jobs requires appropriate job scheduling strategies which are not available out-of-the-box. Even more, the operation of work load managements system and pilot factories (DIRAC, PANDA) by the big collaborations question the way of how sites provide computing resources in the future. Is cloud computing the future? What about small VOs and individual users who use the standard Grid tools to submit jobs then? In the contribution to CHEP2015 we will try the review what has been learned by operating a large Grid site for many VOs. We will give a summary of the experiences with the job scheduling at DESY-HH of the last years and we will describe limits of the current system. A main focus will be put on the discussion of future scenarios, including alternatives to the approach of local resource managements systems (LRMS) which are still widely used.

Primary author

Co-author

Birgit Lewendel (Deutsches Elektronen-Synchrotron (DE))

Presentation materials