GridPP Technical Meeting - HTCondor CEs
Virtual Only
Weekly meeting slot for technical topics. We will try and focus on one topic per meeting. We will announce at the Tuesday Ops meeting if this meeting is going ahead and if so the topic to be discussed.
General area of HTCondorCE APEL support
https://twiki.cern.ch/twiki/bin/view/LCG/HTCondorAccounting
Other links mentioned are:
Specific batch systems supported by HTCondorCE
https://opensciencegrid.org/docs/compute-element/htcondor-ce-overview/
Notes on Scaling Factors in heterogeneous clusters
https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Notes_on_Accounting.2C_Scaling_and_Publishing
And...
https://www.gridpp.ac.uk/wiki/Publishing_tutorial
From the meeting:
- PIC have deployed HTCondor CEs in production. Have made public their patches.
- HTCondor CE also supports other systems such as SLURM, PBS etc
- Steve solution relies on running an APEL client at the site. Some sites would rather not running the APEL client.
- The other solution sites would like to see involves the HTCondor CE submitting directly to APEL. HTCondor provides a lot of flexibility (but not infinite!) when producing logs. It is hoped that it would require relatively little effort to produce a correctly formatted log. Initially an additional script may be needed but if we work with the HTCondor developers this would hopefully be fully integrated into HTCondor.
Actions:
- Who would like to try deploying Steve’s solution?
Liverpool will setup an HTCondorCE in the next few months, they support everybody apart from CMS and ALICE.
RAL Tier-1 will look at deploying an HTCondor CE around February 2019.
Steve to message PIC to ask them if the solution GridPP is proposing would work for them, and if they would at some point be prepared to use.
Matt Doidge, HTCondor CE + SGE. (Already running CREAM CEs + APEL client)
- Who would like to work on generating the accounting records directly?
Steve has done a prototype of this.
Requirement from APEL, need to batch up job logs so that the repository doesn’t need to support millions of individual job reports as opposed to a few thousand job summary scripts. Adrian will help. Would be nice to add this feature to VAC (Andrew M to talk to Adrian)
We (GridPP) need to talk to HTCondor developers to ask them if they can . This should be done after it is clear what we need for direct submission.
There were no volunteers to look at this at this time. Steve might get round to it once he has tested deploying the HTCondorCE. We will review this at the next relevant technical meeting.
Next technical meeting in ~February 2019.