Alessandro and Catalin (EGI), Adrian (APEL), Ivan (EGI Accounting Portal), Helge (HEP-SCORE task force), Pepe (CMS, PIC and GDB), Jarka (CERN ), Concezio (LHCb), Maarten (ALICE and WLCG Ops), Julia (WLCG Ops)
Discussion after Helge's introduction
Jarka asked how we are going to run/collect benchmark metrics?
Helge: Do not expect any major problems, this will be a self contained thing, tarball or so, which you just download and run by a site. Will be used not just for accounting but also for purchase. Similarly to a current HS06 one.
Jarka: May be experiments will be also interested to run it?
Helge: The benchmark will run for few hours. Not likely that experiments will run it in their workflows. Should not be confused with a fast benchmark. Though it is possible, do not see a need to be run a benchmark by experiments
Julia: What are the plans of the EGI for transition to a new benchmark?
Catalin and Alessandro: No concrete plan for the moment. But do plan to migrate. Wouldbe good to coordinate with WLCG and do migration at the same time.
Julia: Yes, fully agree. We should work together to coordinate the migration.
Julia asked experiment representatives what they thing about strategy for the accounting during migration , do we need to support two benchmarks (old and new one) in the accounting flow during the transition period.
CMS answer was sent by mail (see below) and confirmed by Pepe:
Input from CMS
First of all is hard to say hard to say how exactly CMS will proceed, without a fixed plan for hepscore deployment. As we don't have a complete benchmark yet, it's difficult to predict when we could start defining pledges in terms of the new benchmark, when it would be useful for sites also to plan purchases according to pledges so that what they buy and install is not underevaluated (or overevaluated).
Concerning accounting of CPU usage, CMS pilots, each managing multiple payload jobs, would get charged based on their CPU usage (walltime) at sites' CEs and local batch systems, only now it would not be in HS06 hours, but scaled with the new benchmark according to resource rating. For CMS to do internal accounting per payload job, we would need to have resource rating available, either as average to the site/cluster/queue, or even per WN. If that information is available at pilot startup time, then our pilots could gather it and propagate it to our payload jobs (for example, we currently get HS06 for each WN for each of our pilots at sites that are making that information available via de Machine - Job features mechanism).
From a pragmatical point of view, the position of CMS is clear: we'll need to have the two benchmarks coexisting for a while, so hopefully we can do a smooth transition rather that a switch from one to another on a given date. This should have an impact on the accounting portal, as accounting information should thus be available in terms of both benchmarks.
Concezio for LHCb: Yes, we need to support both in parallel during transition period
Maarten for ALICE confirmed the same
Adrian presented various scenarios for implementation of the modifications required in the APEL workflow.
Everyone agreed that we need to support two benchmarks. It is up to APEL team to decide whether a new benchmark should be introduced already at the level of the APEL client and propagated through the full chain, or only wallclock consumption is sent with the job record or summary record and the calculation of the wallclock work based on the new benchmark is done in the APEL repository. The second scenario might provide more flexibility if we would need to introduce yet another benchmark later (GPU), but require more work on the repository level ( and more time to implement) and will introduce a dependency of the repository from the global source of info for benchmarks for all clusters. This can be top level BDII or GocDB or CRIC. CRIC has a limitation that it is currently serving only WLCG sites. We have enough time ahead of us, ~ till the end of this year before the actual transition happens, but should not wait for it and start to implement required changes, so that when transition happens we are ready.
Comment from Maarten: