WLCG Accounting Task Force Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map

Attended:

Ivan, John, Steve, Costin, Maarten, Julia, Pepe

 

Some short conclusion from the discussion:

APEL accounting works fine for site configuration as:

CREAM CE + HTCondor batch

ARC CE + HTCondor batch.

John will do another check for CEs which Maarten sent on the chat which have CREAM CE+HTcondor configuration, to make sure that they are fine for APEL accounting.

The usecase which needs to be addressed is HTCondor CE + HTCondor batch.

Detailed description of the situation and conclusions from the Mail sent by Steve after the meeting:

Re: Accounting Meeting

We've been discussing how to add Jordi Casal's (PIC) extensions for
HTCondorCE into APEL client. As I promised, I'll summarise the plan so
far, and then describe an  inconsistency with respect to portability
that needs to be solved.   Then we can think what to do; suggestions
welcome.

Background to the problem
****************************

To get up to speed, the background on APEL is as follows.  To collect
data, APEL has two main parts; the parsers, and the publisher. The
parsers get the data from both the CE(s) and the underlying batch
system. Data from the CE (global ids, VO info etc.) is generally in BLAH
format (a standard) and is acquired by the generic BLAH log parser,
creating BlahdRecords. Data from the batch system (more specific info,
e.g. durations, cpu counts etc.)  is acquired from batch-system specific
log parsers for (e.g.) SGE, PBS and HTCondor etc., and put in
EventRecords. Finally, both BlahdRecords and EventRecords are "joined"
to make the final accounting record.  So (apart from the PIC custom
version which I deal with below) users of HTCONDOR have so far had at
least two options with respect to  APEL.

1) They could use ARC, which comes with Jura; it's own, unrelated APEL
client.

-- or --

2) They could use (e.g.) CREAM CE, which writes BLAH logs. They'd then
use standard APEL client to get and send the data. Specifically, they
would use the generic BLAH parser to get the CE Data (as explained
above) and an existing built-in HTCONDOR parser (in a file called
"htcondor.py") to get the batch log data.


PIC Patch for HTCondorCE
**************************

So that's the history. Now let's consider  the PIC patch. When a
HTCondorCE is used with a HTCondor batch system, _all_ the necessary
data for one job is contained in the same log line (which come from the
condor_history command). This is unusual; and in theory it would no
longer necessary to have two parsers (a blah parser and a batch system
parser). One could implement it all in one.

But the structure of "data from two logs" is deeply ingrained in the
APEL client architecture. It is thus a much simpler change to make the
program parse the same log files twice (!) to mimic having two log
files. In one pass, the CE data is extracted (as if it comes from BLAH)
and put in BlahdRecords. And in the next pass, the batch system data is
extracted with a batch system specific parser script, and put in
EventRecords. Hence the program acts as if there were two log files,
even though there are not. In practise, this works well; the structure
of the program is not destroyed and far fewer lines of code need to be
changed and/or tested. It's a decent approach.

But there are difficulties.  First, HTCondorCE does not write BLAH
files, so the patch uses a new "BLAH-type" parser (HTCondorCEParser)
which can be used instead of the real BLAH parser. It has the same
backend as the original BLAH log parser, but it accepts condor_history
data as input.

Second, the existing batch log parser (called htcondor.py) for HTCondor
(class name HTCondorParser) was unsatisfactory for PIC on several
grounds. First, PIC needed to build in a function to apply a scaling
factor (this is necessary for heterogeneous clusters). PIC also make the
parsing slightly more explicit by using key/value pairs etc. and also
changed the delimiter from a pipe (|) to a semi-colon. However, these
changes (to htcondor.py) make the new batch log parser for HTCondor
incompatible with existing sites using (say) option 2 above. In short,
the new htcondor.py file clobbers the old htcondor.py file.

Conclusions
*************

So, to make this patch portable, we need to :

a) stop this interference with existing sites using (e.g.) CREAM/HTCondor

b) we need to make the tag for the scaling translation factor (which
presently uses a fixed tag, called MATCH_EXP_PICScaling) into a variable
name via some parameter to the program.

c) we need to test it all out, and make sure any new version is fully
compatible with both PIC (who perhaps don't want to maintain their own
version forever), existing CREAM/HTCondor sites and the other users.

All these changes are quite doable.  As Maarten suggests, one way is to
add a new class for the new PIC HTCondorParser, leaving the old system
as it is.

-------

Steve kindly agreed to work on the implementation. When it is ready it should be tested by several sites. Final  code should be hosted by the UMD repository.

After it is done, we can finally conclude that all sites with HTCondor batch system have a working solution for APEL accounting

 

 

 

 

There are minutes attached to this event. Show them.
    • 15:30 16:00
      Evaluation of available HTCondor accounting tools 30m
      Speaker: Stephen Jones (Liverpool University)
    • 16:00 16:10
      Discussion 10m