WLCG Accounting Task Force Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map

Attended:

Adrian, Greg, Costin, Dimitrios, Panos, Gorka, Florentia, Alessandro, Pepe, Alessandra, Jordi, John, Julia

 

Comments/discussion APEL plans and WLCG requirements

Apel plans

Julia asked what it would imply for the sites change to Argo Messaging system , Adrian replied that some configuration changes on the Apel client would be required

Julia asked what is the time line for short/long term plans. Adrian explained that short term is a year, long term plans by the end of the project

Gorka asked whether access to the EUDAT accounting will be restricted, Adrian replied that everyone would have access

There was a question of how many sites have already enabled Apel storage space accounting. Adrian answered 140.

After the meeting Julia checked numbers presented in the portal for some of those sites. They well agree with the ATLAS storage space accounting portal

WLCG requirements

Split of raw wallclock and scaled wallclock for validation. Adrian thinks it is better to avoid additional processing in the repository calculating raw wallclock out of start and end time stamps. John told that for sites which generate summary reports by themselves as CERN does, they would need to enable this split in the summaries

No problem for using an alternative information source for the topology

Alessandro suggested to enable the possibility to inject data form the experiment-specific accounting system in order to enable comparison of accounting data through Apel with accounting data from the experiments in the portal

Discussion regarding HTCondor accounting

The solution developed at CERN (probably apart of condor-history parser) is not currently used by CERN. Current CERN implementation does not look like the one which can be re-used by other sites in a straight forward way.

 PIC solution looks generic enough and could be re-used by other sites. Though it is not clear how many sites did it. Pepe mentioned Ciemat. Taiwan also expressed interest, but Pepe is not sure whether they finally applied it. PIC solution does not cover accounting for jobs submitted locally.

Julia asked how many HTCondor sites (not OSG ones) do already report to Apel. John checked GocDB after the meeting and  apart of Ciemat, there are only PIC and CERN who labelled services as HTCondorCE . A number of other sites mention HTCondor but they all seem to use it with ARC CE. Possibly the sites who enquired about HTCondorCE ended up with ARC. John will check Apel DB.

Adrian mentions that CE parser for HTCondor is already in the Apel repository, though he was not sure whether it has been used by any site. Alessandra mentioned , that the important point is deployment, which should be straight forward as in case of accounting for CREAM. 

Alessandro suggested to get in touch with Brian in order to understand how HTCondor accounting works for OSG sites  regarding the part  which generates  data and reports to Gratia. Julia will do.

To conclude, looks like  we do not have a clear idea what EGI sites do to enable HTCondor accounting, and whether they know about recipes from PIC for example or implement something on their own, or just wait... Clearly, someone should own a problem, otherwise , there won't be any progress.

Need to get more information (OSG, etc...) and come back to this discussion in order to have a concrete plan.

WLCG Storage Space Accounting

Pepe asked how many sites have already deployed SRR. Julia explained that storage  providers are currently working on implementation, those who progressed the most are DPM and EOS, but we did not get to the point when sites started deployment. This will require coordination effort. Current implementation of the WLCG storage space accounting tool is based on methods which are available (like SRM queries). However it foresees transparent switch to SRR as soon as it is being deployed at the particular site.

Julia asked Pepe, whether PIC started to work on the implementation of the reporting of metrics for tape storage. Pepe confirmed that they were progressing.

Dimitrios told that WLCG monitoring team is working to enable  API from the ElasticSearch which is currently used as a storage backend.

 

Wrong CPU efficiency for CMS T2s in the EGI portal

Pepe worked through various problems which caused wrong efficiency ( 0 CPU reports, CPU>Wallclock, etc...) and the discovered reasons of those problems. Alessandra suggested to keep a track of all those issues and their solutions on the wiki which she started, so that experience is shared.

Then the main discussion was about possible improvements in  detecting and debugging of such issues and who should be in charge of it.

Alessandra expressed her concerns about the fact that there is no manpower to follow up on all those problems on regular basis. 

After long discussion, there were several suggestion which can improve the situation:

1). Expand the current validation view in SSB,  add  CPU consumption and CPU efficiency. Add a possibility for sites to subscribe to SSB alarms in case an alarm condition is met. Pepe and John will come up with the proposal what should be an alarm condition.

2). Julia suggested that experiments insist that sites check the accounting reports and validation SSB view regularly. The role of the task force team is to help in debugging and to provide tools which allow to detect problems (like SSB validation view). 

 

 

 

 

 

 

 

 

There are minutes attached to this event. Show them.
    • 15:00 15:20
      APEL plans 20m
      Speakers: Adrian Coveney, Greg Corbett
      • Integration with EUDAT services
        • Enable EUDAT to send summarized accounting data from the EUDAT accounting collector to the Accounting Repository to have it displayed in the Accounting Portal
        • Custom views for EUDAT should be developed in the Accounting Portal.
        • Deployment of a single Accounting Repository for EGI/EUDAT
        • Adopt/integrate a common user interface
      • Dataset accounting into production and interoperable with EUDAT services
      • Enable the use of the Argo Messaging System for communication
      • Enhancing Cloud Accounting
        • New usage record for public IP
        • New usage record for Block Storage
      • GPGPU Accounting
        • Create a prototype system for cloud based GPGPUs
      • Storage accounting
        • Implement a summary record format to Improve the efficiency of communication between Repository and Portal
        • Add view in the Accounting Portal
        • Enable interoperability with the WLCG storage usage format
      • Improve the usability of the Accounting Portal
      • Proactive maintenance activities, such as simplifying the service, improving Portal update frequency, and enhancing the monitoring of the Accounting system so that users can more rapidly assess and diagnose issues with their publishing of accounting data
    • 15:20 15:40
      HTCondor accounting discussion 20m
    • 15:40 16:00
      WLCG Storage Space Accounting update 20m
      Speaker: Mr Dimitrios Christidis (University of Patras (GR))
    • 16:00 16:20
      Following up on problems with wrong low efficiency of the CMS sites in the EGI accounting portal 20m
      Speaker: Jose Flix Molina (Centro de Investigaciones Energéti cas Medioambientales y Tecno)