WLCG Information System Evolution Task Force

Name: WLCG Information System Evolution Task Force
Start: 2017-01-12T15:30:00+01:00
End: 2017-01-12T17:00:00+01:00
Location: CERN

Thursday 12 Jan 2017, 15:30 → 17:00 Europe/Zurich

513/R-068 (CERN)

513/R-068

CERN

Show room on map

Description

Meeting to discuss the evolution of the WLCG Information System

Hide

Attended:

Stephen, Jarka, Kyle, Andrew, Stephan, Giuseppe, Salvatore, Pablo, Alberto, Marian, Alessandro, Julia, Maarten

News

Salvatore joined the WLCG operations and experiment liaison team and will work 100% on CRIC

Stephan, Giuseppe, Salvatore, Alessandro and Julia agreed that we need to organize regular CRIC meeting to discuss CMS needs/requirements and implementation. Stephan and Giuseppe will be at CERN in February, looks like a good option for a first round of meetings

Storage object proposal

Alessandro described the plan to introduce an object describing storage service, which should be able to provide a complete description with protocols, quotas, ACLs, application level info for a particular quota, etc….

We might be consistent with GLUE2 schema, the problem with it is that it gives an infinite set of possibilities and people get lost how to properly use this flexibility. The idea is that we restrict them to a certain scenario.

There was a discussion where this description should be hosted

LHCb wants to consume data from GocDB, but they need very basic info, the rest they complement in Dirac.

For ATLAS and CMS there should be a complete description, and the most relevant thing looks to be CRIC. On the other hand, GocDB still needs to to contain storage endpoint for purposes like downtime. We might need to understand with time how this information is split (or may be some of it is duplicated) over GocDB and CRIC.

Alessandro thinks complete storage object description should be in core CRIC. There are various options for how this information is published into CRIC.

Julia suggested that for the next meeting ATLAS (Alessandro) and Andrew (LHCb) will pick up FZK storage (as a complicated use case) and Alessandro proposes his way of describing it, while Andrew explains what LHCb needs and we evaluate whether Alessandro's proposal works both for LHCb and ATLAS.

VOfeed discussion

1) First the transition of CMS and ALICE VOfeed from the monitoring team to experiments, Stephan for CMS and Maarten for ALICE, has been confirmed. At the next meeting we need to check how it is going

2) How we go about enforcing consistency in the service type naming convention through the complete chain (OIM/GocDB, GLUE2, SAM3, ETF, VOfeed).

The proposal from Andrew is to pick up the name from the GLUE2 schema and to use it consistently everywhere. There are two things which have to be considered:

1). How we enforce consistency for already existing service types?

This might imply changes in several places (OIM, SAM3, experiment systems like DIRAC or AGIS).

OIM might be difficult to convince. Julia will check whom to contact and whether they agree with the proposal.

Pablo should check what it implies for SAM3.

Marian thinks that there is no impact on ETF, since it is not used by ETF.

Might need to confirm all other potential applications.

2). Policy for introduction of new service types.

Maarten suggests that the IS evolution task force should take responsibility for name approval of new service types and check for consistency along the chain.

3). Production/non-production status of services in the VOfeed

Marian suggested to have a production flag in the VOfeed, if true the service is production, false not. If no flag, then the service is production.

ETF will test all services independently of the flag value. However, there will be changes on the SAM3 side. It should ignore non-production services for availability/reliability calculations. It implies that experiments need to change their profiles if they want to use this flag. Pablo probably would need to introduce an additional metric to expose the flag value and also to check what it implies in terms of availability/reliability calculation algorithm to take this metric into account.

4). Status of adding queues in the VOfeed

ATLAS implemented it on preproduction, but Marian reminded that since then there have been changes agreed in the format. So Alessandro has to follow up with Marian to change it accordingly

LHCb is working on it

For CMS and ALICE the work did not start. It will be done after transition of responsibilities to the experiments from the monitoring team

Changes in REBUS

Following up an exchange with Brian, Julia asked Pablo whether it would be possible to introduce two new attributes for the site in REBUS (HS06 power of the cluster and APEL factor) and have a collector from OIM or Gratia, wherever it is kept by OSG. Pablo said that REBUS is frozen and he is very much hesitant to perform any changes to the schema. After some discussion, it was decided to propose to Brian to wait for the implementation of CRIC. It does not look to be an urgent issue, since this information is not used by anything apart from job monitoring Dashboard for transition of time to work. CMS does not use HS06 Dashboard distributions, so might be fine.

Action list

1). Object description for FZK storage (Andrea and Alessandro). To be discussed at the next meeting.

2). Check with OSG whether they agree with the proposal of having consistent naming convention for service types which might imply changes on their side (Julia)

Check which other components potentially affected (Pablo will check for SAM3, for GocDB it looks to be already the case, but need to discuss with them the policy for introducing new service types (next meeting?), something else?)

3). Transition of VOfeed responsibility from the monitoring team to ALICE and CMS (Pablo, Stephan, Maarten). Check how it is going at the next meeting

4). Adding queues to VOfeed. Check status at the next meeting

5). Changes required in SAM3 in order to take into account production/non-production flag of the service. Pablo to report at the next meeting.

6). Check with Brian whether we can postpone publishing of the HS06 power of the cluster and APEL factor till CRIC is in place (Julia)

Next meeting will take place on the 2nd of February.

There are minutes attached to this event. Show them.

- 15:30 → 15:35
  
  Introduction and news 5m
  
  Speaker: Julia Andreeva (CERN)
- 15:35 → 16:05
  
  Storage object proposal 30m
  
  Speaker: Alessandro Di Girolamo (CERN)
  
  Storage description
- 16:05 → 16:35
  
  VOfeed discussion 30m