WLCG Information System Evolution Task Force

Name: WLCG Information System Evolution Task Force
Start: 2018-10-18T15:00:00+02:00
End: 2018-10-18T16:25:00+02:00
Location: CERN

Thursday 18 Oct 2018, 15:00 → 16:25 Europe/Zurich

513/R-068 (CERN)

513/R-068

CERN

Show room on map

Description

Meeting to discuss the evolution of the WLCG Information System

Hide

Attended by:

Alessandra, Adrian, Alessandro Paolini, Linda Ann Cornwall, Andrew, Alastair, David Crooks, Matthew, Laurence, Balazs, Maarten, Julia

------

Presentation of Alastair followed by discussion

Julia pointed out that discussed json structure does not foresee publishing of dynamic data. However, not all CEs which are currently in use, allow to query number of pending and running jobs. Though all CEs do allow to query status of jobs submitted by a particular user , so following aggregation/result interpretation is possible, however, for the experiments it might mean some additional work. Maarten mentioned that ALICE would like to avoid it. Alessandra told that we could consider publishing of a different json file for dynamic data if required. Alessandro P. pointed out that load on the CE is currently available only via BDII.

Everyone agreed that today meeting and Alastair's presentation are focusing on the static data.

Julia asked whether requirements for accounting have been considered ( APEL takes conversion factor form BDII).

Three different possible options have been proposed:

Add conversion factor to GocDB, add it to JSON file or add it directly to the CE configuration. Maarten suggested that APEL client goes through all options and uses the one which is available at the site. Adrian confirmed that could be possible to use conversion factor from a different source rather than BDII.

There is also dependency of REBUS on BDII. Need to check what exactly it takes from BDII, but certainly takes capacity and number of logical CPUs.

As at the previous meeting Matthew highlighted the concern of EGI that migration to the BDII-less info flow has impact on the EGI operations and therefore, should be done carefully considering all implications and making sure that all tools are ready for such migration. In particular he mentioned monitoring and pointed to the GGUS ticket related to it (https://ggus.eu/index.php?mode=ticket_info&ticket_id=137202, looks like not everyone can access it, Julia can not). Locating effort for changing CSIRT/monitoring was not yet considered in the working plan.

Maarten also mentioned SAM which uses BDII in order to get the queues for LHCb and ALICE. Julia told, that this should be solved soon by providing new VOfeed by CRIC which would have queues. After the meeting Julia checked the LHCb vofeed , it has queues. So only ALICE vofeed is currently missing queues, which should be fixed in vofeed generated by CRIC (work in progress).

Alastair asked about name of the attribute where the json file location would be published in GocDB, the suggestion is "InformationSystem". There was no immediate reaction, may be people would think more about it and come up with other suggestions.

Alessandro Paolini asked, why the considered json format does not use the GLUE2 standard. Balazs replied, that this was an initial intention to be consistent with the standard if possible, but to make it as simple as possible. We wanted the minimum set of info required for WLCG , we do not want to reproduce GLUE schema.

There was a long discussion whether we stay with a flat structure, or go for some hierarchy which would allow to avoid data replication. In the end decided to rather concentrate on content, test the content describing some sites with complex computing configuration, then reconsider whether we can improve by going to hierarchical structure.

Alessandra will send configuration of her ARC to Balazs, so that we can see whether proposed json structure is good enough.

Andrew made a point, that the json configuration per a given CE should be self-content and should be published per CE. So taking this configuration one could submit jobs to a given CE. Any complicated aggregation work on the site to create all-CEs json should be avoided. Julia pointed out, that one should pay attention that the name of the compute resource which identifies set of more or less homogeneous resources which can be accessed via multiple CEs, should be consistent across all CEs json file descriptions, so that the aggregators like CRIC can process info at the site level properly.

Discussion reviewing set of questions-confirmations based on the googledoc comments:

Structure: go for flat for the moment

cs_id can be dropped

For accounting and conversion factor should add # of logical CPUs and overall HEPSPEC capacity of the cluster, out of these numbers one can calculate weighted average which can be used for conversion of time to work

Site name should be added

No naming convention for resource_name is needed, site admins need to make sure that the name they allocate to the resource_name is unique in the scope of a given site (ASCII)

No close SE is needed (confirmed by Andrew)

'maxmemory' is for single logical core

'maxrunningjobs' drop for the time being, can reconsider whether it is needed later

'maxcputime' drop

units to be added to all numeric values (minutes and GB)

timestamp of the modification, initially suggested to be added to the whole file, later Andrew by mail suggested to add it on the block level

'cs_' prefix to be dropped

It was decided to drop number of cores, however compiling minutes, Julia thinks, that in this context number of cores is not overall, but per node. Should re-discuss whether we still need it

Next steps

1). Julia sends around minutes and waits for people to comment and provide feedback on the file structure

2). After considering comments to the minutes, Julia changes the structure in the googledoc (one week from now), following the discussion at the meeting and feedback to the minutes.

3). We need to test the structure on several sites with complex configuration. Might be several UK sites (Alessandra and Alastair will follow up), IN2P3, Nikhef and CERN can be other good candidates (Julia and Maarten will follow up)

4). After getting feedback from test sites, arrange next meeting to review the feedback and hopefully agree on the version proposed for implementation by other sites.

5). Preliminary date for the next meeting is 15th of November

There are minutes attached to this event. Show them.

- 15:00 → 15:30
  
  UK experience with publishing CE description in json format 30m
  
  Speaker: Alastair Dewhurst (Science and Technology Facilities Council STFC (GB))
  
  BdiiDecomissioning20181018.pdf
- 15:30 → 15:50
  
  Discussion of the JSON structure 20m
  
  CEdescriptionOpenQuestions.pdf
  
  CEdescriptionOpenQuestions.pptx
  
  google doc for CE json structure and discussion
- 15:50 → 16:00
  
  Update on the SRR implementations by the storage middleware providers 10m
  
  Speaker: Julia Andreeva (CERN)
  
  SRR implementation twiki