WLCG Accounting Task Force Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

1
Show room on map
Description

Zoom link:

https://cern.zoom.us/j/63355885303

Password is sent to the mailing list

 

Attended by:

Alessandro and Catalin (EGI), Adrian (APEL), Ivan (EGI Accounting Portal), Helge (HEP-SCORE task force), Pepe (CMS, PIC and GDB), Jarka (CERN ), Concezio (LHCb), Maarten (ALICE and WLCG Ops), Julia (WLCG Ops)

Discussion after Helge's introduction

Jarka asked how we are going to run/collect benchmark metrics?

Helge: Do not expect any major problems, this will be a self contained thing, tarball or so, which you just download and run by a site. Will be used not just for accounting but also for purchase.  Similarly to a current HS06 one.

Jarka: May be experiments will be also interested to run it?

Helge: The benchmark will run for few hours. Not likely that experiments will run it in their workflows. Should not be confused with a fast benchmark. Though it is possible, do not see a need to be run a benchmark by experiments

Julia: What are the plans of the EGI for transition to a new benchmark?

Catalin and Alessandro: No concrete plan for the moment. But do plan to migrate. Wouldbe good to coordinate with WLCG and do migration at the same time.

Julia: Yes, fully agree. We should work together to coordinate the migration.

Julia asked experiment representatives what they thing about strategy for the accounting during migration , do we need to support two benchmarks (old and new one) in the accounting flow during the transition period.

CMS answer was sent by mail (see below) and confirmed by Pepe:

Input from CMS

the CMS current view on accounting in relation to the new benchmark, which could be summarized as follows:
 

First of all is hard to say hard to say how exactly CMS will proceed, without a fixed plan for hepscore deployment. As we don't have a complete benchmark yet, it's difficult to predict when we could start defining pledges in terms of the new benchmark, when it would be useful for sites also to plan purchases according to pledges so that what they buy and install is not underevaluated (or overevaluated).

Concerning accounting of CPU usage, CMS pilots, each managing multiple payload jobs, would get charged based on their CPU usage (walltime) at sites' CEs and local batch systems, only now it would not be in HS06 hours, but scaled with the new benchmark according to resource rating. For CMS to do internal accounting per payload job, we would need to have resource rating available, either as average to the site/cluster/queue, or even per WN. If that information is available at pilot startup time, then our pilots could gather it and propagate it to our payload jobs (for example, we currently get HS06 for each WN for each of our pilots at sites that are making that information available via de Machine - Job features mechanism).

From a pragmatical point of view, the position of CMS is clear: we'll need to have the two benchmarks coexisting for a while, so hopefully we can do a smooth transition rather that a switch from one to another on a given date. This should have an impact on the accounting portal, as accounting information should thus be available in terms of both benchmarks.

Concezio for LHCb: Yes, we need to support both in parallel during transition period

Maarten for ALICE confirmed the same

Adrian presented various scenarios for implementation of the modifications required in the APEL workflow.

Everyone agreed that we need to support two benchmarks. It is up to APEL team to decide whether a new benchmark should be introduced already at the level of the APEL client and propagated through the full chain, or only wallclock consumption is sent with the job record or summary record and the calculation of the wallclock work based on the new benchmark is done in the APEL repository. The second scenario might provide more flexibility if we would need to introduce yet another benchmark later (GPU), but require more work on the repository level ( and more time to implement) and will introduce a dependency of the repository from the global source of info for benchmarks for all clusters. This can be top level BDII or GocDB or CRIC. CRIC has a limitation that it is currently serving only WLCG sites. We have enough time ahead of us, ~ till the end of this year before the actual transition happens, but should not wait for it and start to implement required changes, so that when transition happens we are ready.

Comment from Maarten:

We know that in the coming years we would need to adapt to the new environment (GPUs, etc...)
Now can concentrate on an actual task migrating to a new benchmark.
Might not to solve all problems in one go. We need to take a well calculated decision how we go about introducing a new benchmark. Taking into account also service sustainability.
What concerns future needs for GPU benchmarking, could be that this task would be funded by IRIS.
 
Maarten:
If the calculation happens in the repository, what about scalability of the service?
 
Adrian:
The APEL central service is in much better shape regarding HW than before. One circle calculation takes ~40 minutes. Should not be a problem.
 
Jarka asked how long it takes to show fresh data in the accounting portal.
Adrian: 1 day
 
Julia asked Ivan about effort required to show consumption with two benchmarks in the repository.
 
Ivan told that schema change would be required and some changes in the UI, but not a big deal and should not be too difficult.
Julia and Maarten mentioned that if we need to provide new plots showing comparison of the consumption in two different benchmarks, this can be done outside the EGI portal, in WAU for example. What is important is that EGI portal would provide a possibility to select a benchmark on the UI and show consumption in this particular benchmark.
 
Conclusions:
 
Everyone agreed that we need to support two benchmarks in the accounting workflow during the transition period
 
EGI and WLCG will do the transition to the new benchmark at the same time and will coordinate this transition together
 
Most of work would be required from the APEL development team. This is up to them to decide which implementation to choose (options are described above). In the coming weeks, Adrian will investigate possible scenarios and assess which one is better
 
We have ~ till the end of this year before actual transition happens, but should not wait for it and start to implement required modifications to be ready when the transition starts
 
We will present results of this discussion at the GDB, probably with some additional input from APEL development team.
 
 
 
 
 
 



 

There are minutes attached to this event. Show them.