WLCG Accounting Task Force Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map

Attended:

Adrian, Alessandra, John, Steve, Pepe, Gavin, Andrew, Antonio, Costin, Ivan, Julia

 

Discussion related to introduction:

Pepe confirmed that he is working with Miguel to validate T0 accounting numbers

Pepe told that PIC was working on APEL parser for HTCondor. It can be certainly re-used by other sites. As soon as the parser is validated , it can be shared with other sites which might be interested. Julia asked Pepe to give a presentation when the parser is ready and tried in production.

Andrew told hat at the GDB steering group meeting it was decided to review the situation with fast benchmark db12 at April GDB. By this time we might have better idea regarding evaluation of this benchmark by other experiments rather than ALICE and LHCb. ATLAS has tried this benchmark, good agreement with the simulation payloads so far. CMS did not do yet. Pepe told that CMS would have an internal discussion next Tuesday and after that some work might be scheduled.

Andrew told that at the GDP steering group meeting it was suggested to have a presentation from the accounting task force to assess accounting implications of switching to a db12 benchmark in case such decision would be taken. Julia told that though she scheduled some discussion for this meeting, most probably we wouldn’t have enough time and it would be better to postpone it to the next meeting when there would be more time to prepare. Everyone agreed.

 

Discussion related to Andrew’s presentation:

The approach presented by Andrew can be applied for clouds including commercial clouds.

Julia asked, why it is not enabled in production for the opportunistic resources. Andrew explained that it is done in order to avoid mixing of the opportunistic resources with the pledged ones.  Another possible issue is double counting. It is possible that resource usage accounted through (VAC/VCycle) accounting method is also included in the summary report of the site like CERN for examples. Need to understand whether/how we can avoid double counting.

Julia asked how pledged or opportunistic usage is resolved by APEL. John explained that it is done via definition of a particular CE. T1/T2 attribute or opportunistic is resolved at the level of the EGI accounting portal checking GocDB/REBUS information.

Julia told that from the experiment point of view would be nice to have a possibility to show pledged and opportunistic usage at the same plot and be able to compare contribution of every category of resources. John told that it might be possible to implement in the UI of the accounting portal. Ivan confirmed.

 

Integration of the opportunistic resource accounting into APEL

ATLAS input

Discussion is ongoing in ATLAS how to solve benchmarking issue for clouds and HPC. They do not run pilots. Run benchmark with every job? Where to store results? Most evident solution would be to record benchmarking info in the job report and to store it panda db in per job record.

Regarding topology, not every resource has an associated endpoint registered in GocDB. However there is a panda queue and a sort of fake site registered

in AGIS. Again, it is not always possibly to decouple opportunistic usage  from the pledged one.

ALICE input

Alice does account all their usage, but not particularly interested to integrate opportunistic usage in APEL.

Currently conversion from wallclock time to work is done using HEPSEC06 performance factor looking in the CPU model. Though the table mapping CPU model to HEPSPEC is not complete. The plan is to move towards running fast benchmark db12 in every job slot.

 

CMS input

In case of positive evaluation of fast benchmark db12 for CMS payloads, this can be a way to benchmark opportunistic resources. Then benchmarking factor can be reported to Dashboard with every job. Benchmarking done on the Dashboard side and then accounting summary reported to APEL. Benchmarking is currently a missing bit.

 

LHCb input

LHCb looks to be well advanced due to VAC/VCycle APEL accounting implementation presented by Andrew.

 

Some of the open questions:

How do we identify a resource for accounting?

Job can be submitted via CE at site A, while being processed at site B. Against which site we account such usage?

Do we account against instance which payed or against the one which actually provided a resource?

 

CERN input

From the site perspective it would be useful to show additional resources which they provide above pledge in the APEL accounting portal, and that these kind of resources are shown differently compared to the pledged ones.

For experiments it is very important to have a knowledge about level of reliability they can count on for a particular resource. This will define their retry policy in case of failure, but also more general aspects, which workflows they can schedule to a particular resource . This is an interesting topic for discussion at the WLCG operations coordination meeting on the 2nd of March. Gavin agreed to make a presentation which might steer a discussion for resource classification regarding their reliability and correspondingly different ways of usage by the experiments.

Next meeting is the 9th of March

Main topic is accounting implications in case of  switching to the db12 benchmark.

Since MJF might be one of the important conditions for db12 benchmark adoption, Julia asked Andrew to present current status of the MJF deployment at the next WLCG operations coordination meeting the 2nd of March. Andrew agreed.

 

There are minutes attached to this event. Show them.