DOMA / TPC Meeting

Europe/Zurich
Description

Topic: WLCG DOMA TPC Meeting

Join Zoom Meeting
https://cern.zoom.us/j/99836057922?pwd=ZFhWN3NpYi9oZmwvM3pIRE9zdzFnZz09

Meeting ID: 998 3605 7922
Passcode: 733660
One tap mobile
+41315280988,,99836057922# Switzerland
+41432107042,,99836057922# Switzerland

Dial by your location
        +41 31 528 09 88 Switzerland
        +41 43 210 70 42 Switzerland
        +41 43 210 71 08 Switzerland
        +33 1 7037 2246 France
        +33 1 7037 9729 France
        +33 1 8699 5831 France
Meeting ID: 998 3605 7922
Find your local number: https://cern.zoom.us/u/aeB4ArMgmT

Data Challenges

Data challenges are to progressively check we have all we need included the networking infrastructure and the monitoring for the future. It is quite a close activity to TPC because as lot of the pieces we need to run these challenges are being developed and deployed here.

We need to make sure the work on data challenges contributes to the build up of a permanent infrastructure whether it is the networking connections sites have to setup or the monitoring we need.

We need to decide the outcome for the first challenge and towards the next ones.

SRM+http

Mihai presented progress on the code development and asked if there is any interest in having the clients macaroons enabled. There is clearly interest particularly for low level debugging and Andrea points out that the clients have to be "token" enabled because macaroons are only one type of token.

Future Uniform tape access

Paul has started a google doc for the requirements. We agree the developers will do a first round of comments between themselves and then all stake holders, i.e. experiments and sites will comment further.

Experiment production

  • CMS: Diego reports they are going to ramp up and the presentation at the CMS S&C week was met with a lot of offers for support.
  • LHCb: Chris has already enabled all LHCb storage sites they are all working apart from RAL and other 2 sites in the UK (Glasgow and QMUL)
  • ATLAS: Alessandra has involved the cloud squads to complete the transition to TPC and also change the WAN/LAN read/write and delete activities because a number of sites are still using SRM and gsiftp for those too. While for TPC WEBdav is required for WAN/LAN it's up to the site to use webdav or root. Petr reports on new bugs he's found and there is a discussion about INFN-T1 and how they can test themselves their efficiency. RAL situation is also discussed because right now it is going to be very difficult for them to respect 31st May deadline. The discussion extends then also to the problems xrootd is having to be production quality. The is going to be continued offline with all the interested parties present.

Token Authorization testbed

Andrea has been working on the ATLAS/CMS IAM instances and the OIDC amnesiac testbed which was down last week has been recovered

Chat

Saul Youssef to Everyone (3:11 pm)
BTW, where are the documents that Alessandra and Christoph are referring to?
Rizart Dona to Everyone (3:11 pm)
https://indico.cern.ch/event/1009737/contributions/4237202/attachments/2196208/3713431/DOMA%20general%20Data%20Challenges%202021_02_24.pdf
Me to Everyone (3:11 pm)
Linked from the minutes
Saul Youssef to Everyone (3:12 pm)
Thanks
Diego Davila to Everyone (3:13 pm)
also this one: https://docs.google.com/document/d/1lMG4dfiPo9bPf-tAO0bINDAuEUIloC45Y-vwu1E9_Xw/edit
Shawn P McKee to Everyone (3:31 pm)
A reminder that the next packet marking meeting is today March 3rd, 11-Noon Eastern time, 5-6 PM CERN time. (coming up in ~30 minutes)

The meeting URL is https://indico.cern.ch/event/1013141/     We would like to have some level of packet marking in place for the upcoming infrastructure challenges...
Paul Millar (DESY) to Everyone (3:31 pm)
https://docs.google.com/document/d/1xioJmM1cr9iWaTd-8cpM7f6h3wP4qNfvAGxcHWSvOQs/edit?usp=sharing
Christophe Haen to Everyone (3:37 pm)
Xroot bug

https://github.com/xrootd/xrootd/issues/1404
Mihai Patrascoiu to Everyone (3:40 pm)
EOS LHCb (contains the fix) runs on EOS v4.8.40 + xrootd v4.12.8
Oliver Keeble to Everyone (3:41 pm)
https://github.com/xrootd/xrootd/issues/1404
Mihai Patrascoiu to Everyone (3:42 pm)
The fix is in xrootd-4.12.8
Riccardo Di Maria to Everyone (3:49 pm)
there is an its testing suite in escape that could be exploited, I think
wrt FTS
Mihai Patrascoiu to Everyone (3:50 pm)
Petr, could I ask why FTS DB access directly is needed?
I think it's been answered. Thanks :)
Petr Vokac to Everyone (3:52 pm)
Because I don't have FTS connected to the external monitoring.
Also I was doing different (error) aggregations than the one available in FTS Web interface
Paul Millar (DESY) to Everyone (3:55 pm)
Sorry, I have to leave for another meeting.
Mihai Patrascoiu to Everyone (3:55 pm)
The 403 Permission Denied was an EOS-specific error.
It doesn't affect other sites (and is not related to XRootd)
Diego Davila to Everyone (4:02 pm)
I have to leave to another meeting
Me to Everyone (4:02 pm)
https://its.cern.ch/jira/projects/DOMATPC/issues/DOMATPC-1

There are minutes attached to this event. Show them.
    • 16:00 16:10
      Network data challenges 10m
      Speaker: Alessandra Forti (University of Manchester (GB))

      Initial thoughts for network data challenges
      * Establish when we want to do this. Once we have a target date we can fill the intermediate steps and adjust the timeline. I would aim for Q3 so if we have some delay we can still correct things.

      * We need to complete the transition to HTTP-TPC for disk I know  we said May 31, but we have now one more motivation: this is needed for these challenges.

      * SRM+http: data carousel is not a data challenge they do overlap. By Q3 hopefully much earlier we should be able to integrate SRM+http in standard workflows. what do we need for the monitoring here? do we have something already?

      * Shared monitoring: as said at DOMA general the most important thing to get right is the shared monitoring. We need to look at various experiments monitoring and see what we can feed into a central dashboard. We also need to decide which metrics we need. We also need to try to get the site monitoring at least from the Tier1s.

      Mattermost channel for more communication

      DOMA general presentation, HL-LHC data challenges metrics document
       

    • 16:10 16:20
      SRM+HTTP tape access 10m
      Speaker: Mihai Patrascoiu (CERN)

      Gfal2 - retrieve SE tokens

      • absorbed functionality from x509-scitokens-issuer-client package
        • built-in requesting of macaroons
        • removes dependency to package
      • development should reach FTS3-Devel by next DOMA-TPC
        • same testing done on FTS3-Devel-Next should move to FTS3-Devel after deployment
           

      FTS

      • FTS already has the server config option to enable/disable retrieving of macaroons (`se_token` branch)
        • RetrieveSEToken=<true|false> (default true) ;  requires FTS server restart
        • when Gfal2 developments go to FTS3-Devel, same will this change

      gfal-macaroon CLI?

      • Above developments allow Gfal2 to exchange a proxy certificate for a macaroon (for a given path).
        • Is there interest to have this in python bindings and Gfal2 CLI?

      E.g.: gfal-macaroon gfal-se-token [--issuer <issuer>] [--validity <validity>] </path> [read(default)|write|<activity_list>]

       

    • 16:20 16:30
      Future uniform tape access 10m
      Speakers: Oliver Keeble (CERN), Paul Millar
    • 16:30 16:45
      Experiments production 15m
      Speakers: Alessandra Forti (University of Manchester (GB)), Diego Davila Foyo (Univ. of California San Diego (US)), Petr Vokac (Czech Technical University (CZ))

      ATLAS

      • Mail to could support sent on Monday asking for HTTP-TPC migration of all production storages (DATADISK) by the 31st of May
      • EOS ATLAS instance temporarily back to gsiftp due to XRootD HTTP-TPC bug, but bugfix is being deployed and once that's done we'll again move immediately to HTTP-TPC (probably this/next week)
      • production ready XRootD 5.1.1 (?) packages not yet available
        • we need documentation how to migrate (US) sites that currently use just GridFTP server
        • it is a bit worrying with transition timeline 31st of May that we don't yet have any bigger sites with native XRootD
        • still with HTTP-TPC memleaks(?)
      • RAL struggle with its XRootD HTTP-TPC testbed
        • painful to just upload stress test data with HTTP (not TPC) protocol
        • checksum calculation sometimes take unacceptable time and often fails
        • more effort necessary to meat 31st of May deadline for HTTP-TPC transition(?)

      CMS

      TAPE testbed

      • Kibana monitoring
      • use dteam proxy to schedule regular Functional Tests TAPE
      • include TAPE SRM+HTTP-TPC and disk HTTP-TPC endpoints
        • currently only INFN-T1
        • we need also dCache tape endpoint and CTA
        • transfer matrix should contain all disk/tape type used in production

      Other topics

      • performnace markers doesn't provide all (necessary?) transfer information, e.g. it seems to me dCache doors can talk internally with pool node that initiate transfer => it is completely hidden which internal node try to initiate transfer - difficult to remotely diagnose one systematically failing pool node (e.g. with missing CA certs)
    • 16:45 16:55
      Token Authorization testbed 10m
      Speakers: Andrea Ceccanti (Unknown), Andrea Ceccanti (Universita e INFN, Bologna (IT))
    • 16:55 17:00
      http and xrootd protocol news 5m
      Speakers: Brian Paul Bockelman (University of Nebraska Lincoln (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 17:00 17:05
      AOB 5m