GridPP Technical Meeting - HTCondor CEs

Europe/London
Virtual Only

Virtual Only

Alastair Dewhurst (Science and Technology Facilities Council STFC (GB)), Andrew McNab (University of Manchester), David Colling (Imperial College Sci., Tech. & Med. (GB))
Description

Weekly meeting slot for technical topics. We will try and focus on one topic per meeting.  We will announce at the Tuesday Ops meeting if this meeting is going ahead and if so the topic to be discussed.

General area of HTCondorCE APEL support

https://twiki.cern.ch/twiki/bin/view/LCG/HTCondorAccounting

Other links mentioned are:

Specific batch systems supported by HTCondorCE
 https://opensciencegrid.org/docs/compute-element/htcondor-ce-overview/

Notes on Scaling Factors in heterogeneous clusters
https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Notes_on_Accounting.2C_Scaling_and_Publishing

And...

https://www.gridpp.ac.uk/wiki/Publishing_tutorial

From the meeting:

- PIC have deployed HTCondor CEs in production.  Have made public their patches.

- HTCondor CE also supports other systems such as SLURM, PBS etc

- Steve solution relies on running an APEL client at the site.  Some sites would rather not running the APEL client.

- The other solution sites would like to see involves the HTCondor CE submitting directly to APEL.  HTCondor provides a lot of flexibility (but not infinite!) when producing logs.  It is hoped that it would require relatively little effort to produce a correctly formatted log.  Initially an additional script may be needed but if we work with the HTCondor developers this would hopefully be fully integrated into HTCondor.

Actions:
- Who would like to try deploying Steve’s solution?
Liverpool will setup an HTCondorCE in the next few months, they support everybody apart from CMS and ALICE.
RAL Tier-1 will look at deploying an HTCondor CE around February 2019.
Steve to message PIC to ask them if the solution GridPP is proposing would work for them, and if they would at some point be prepared to use.

Matt Doidge, HTCondor CE + SGE.  (Already running CREAM CEs + APEL client)

- Who would like to work on generating the accounting records directly?
Steve has done a prototype of this.
Requirement from APEL, need to batch up job logs so that the repository doesn’t need to support millions of individual job reports as opposed to a few thousand job summary scripts.  Adrian will help.  Would be nice to add this feature to VAC (Andrew M to talk to Adrian)
We (GridPP) need to talk to HTCondor developers to ask them if they can .  This should be done after it is clear what we need for direct submission.  
There were no volunteers to look at this at this time.  Steve might get round to it once he has tested deploying the HTCondorCE.  We will review this at the next relevant technical meeting.

Next technical meeting in ~February 2019.

There are minutes attached to this event. Show them.
    • 11:00 11:45
      Deployment and Testing of HTCondor CEs 45m
      Speakers: Alastair Dewhurst (Science and Technology Facilities Council STFC (GB)), Stephen Jones (Liverpool University)
    • 11:45 12:00
      AoB 15m