DOMA / TPC Meeting

Europe/Zurich
Description

Topic: WLCG DOMA TPC Meeting

Join Zoom Meeting
https://cern.zoom.us/j/99836057922?pwd=ZFhWN3NpYi9oZmwvM3pIRE9zdzFnZz09

Meeting ID: 998 3605 7922
Passcode: 733660
One tap mobile
+41315280988,,99836057922# Switzerland
+41432107042,,99836057922# Switzerland

Dial by your location
        +41 31 528 09 88 Switzerland
        +41 43 210 70 42 Switzerland
        +41 43 210 71 08 Switzerland
        +33 1 7037 2246 France
        +33 1 7037 9729 France
        +33 1 8699 5831 France
Meeting ID: 998 3605 7922
Find your local number: https://cern.zoom.us/u/aeB4ArMgmT

SRM - tape T0 - T1 transfers

Paul (with Andrea and Mihai) presented 3 stages possible stages to change SRM authorization mechanism to get to use SRM with HTTP.

  1. X509  + macaroons will require some reorganisation of the code between gfal and FTS
  2. X509 + JWT at first sight no development but after further discussion it may still need some
  3. JWT only requires some major changes.

CTA should work because it is based on xrootd and we have HTTP working on it If we go down the direction of SRM+http do we need to take into account also castor? Agreement that castor shouldn't be touched and we should keep it using gsiftp.

There is no plan to proactively remove gsiftp until we are on more solid ground with other protocols. The plan is to move the infrastructure to use something else but to keep it as backup

Same for SRM in this way we can prolong the SRM klife which is a well known protocol and all T1s know how to deal with it. We can think about removing SRM when we have a better QoS experience.

rucio will need some adjustment too as it assumes gsiftp atm

Action: Mihai will create few slides for next meeting with a plan to enable 1 and 2 in FTS gfal. Andrea volunteered to help with 2. so we have something that can be tested for both setup by the end of the year.

Christophe (LHCb) and Michael (CTA) are on board with the plan.

Experiments production

Petr did some stress testing involving different storages and protocols. two problem highlighted

  • dcache interrupts abruptly the transfers when using xrootd, This happens mostly between FZK and US sites, so it might be due to a TCP timeout. There was some discussion about this but it will continue offline.
  • Abrubtly interrupted transfer errrors appear also in production transfers for http-tpc and shold be looked at.
  • infn-t1 (but not only) has much worst HTTP-TPC transfer rates than gsiftp.
  • smoke tests failed when both the storage and the curl client try to use TLS 1.3. For now a work around to use TLS 1.2 has been put in plae, but we have to worry in case client upgrades start causing failures. We need to understand better how we can avoid this. This concerns clients on centos7 and 8. Alessandra didn't notice any problem when running smoke tests because so far she run them against DPM and DPM doesn't use TLS 1.3
  • further points in the contribution minutes

Question from Julia for next upgrade for dcache sites which is the dest dcache version to upgrade to? Latest! to avoid bugs already fixed. The choice between 5.2.x and 6.2.x is up to the site.

ATLAS has 20 sites configured as active destinations. They use HTTP with any sites that has it enabled, so the matrix is quite large.


AOB

  • WLCG workshop in November will concentrate on storage. There is a call for sites to present their status and plans regarding storage. The aim is to produce a roadmap for storage leading up to HL-LHC. Any volunteer site add themselves in the google doc
  • Paul asked is there is a deadline. [info from after the meeting] The plan is to collect talks within the next two weeks

 

There are minutes attached to this event. Show them.
    • 17:30 17:50
      SRM - tape T0 - T1 transfers 20m
      Speakers: Christophe Haen (CERN), Paul Millar, Petr Vokac (Czech Technical University (CZ))
    • 17:50 18:05
      Experiments production 15m
      Speakers: Alessandra Forti (University of Manchester (GB)), Diego Davila Foyo (Univ. of California San Diego (US)), Petr Vokac (Czech Technical University (CZ))
      • StoRM 1.11.19 with fixed DAVS transfers StoRM-1259 (INFN-T1 now works)
      • EOSATLAS needs XRootD 4.12.5 with fixed root#1293
      • dCache at PIC failed to transfer 0b files GGUS:148795
        • site is using enstore as backend
        • only this configuration affected - now bugfix in pipeline
      • AGLT2 & BNL dCache 6.2.x high failure rate for xrootd protocol
        • transfer got stuck in the middle
        • [ERROR] Server responded with an error: [3012] No response from server after 30 seconds.
        • pool.mover.xrootd.tpc-server-response-timeout
      • INFN-T1 StoRM HTTP limited throughput compared to GridFTP
      • INFN-T1 StoRM still doesn't provide SRM+HTTP
      • We need big native XRootD site with HTTP-TPC for production tests
      • curl with TLSv1.3 & Java issues
        • issues observed with StoRM / dCache limited to curl liked nss
        • CentOS8 have curl linked with openssl - hopefully no issue there - needs to be tested
        • XRootD and DPM use curl for TPC transfers
    • 18:05 18:15
      Token Authorization testbed 10m
      Speakers: Andrea Ceccanti (Universita e INFN, Bologna (IT)), Andrea Ceccanti (Unknown), Jaroslav Guenther (CERN)
    • 18:15 18:20
      HTTP Protocol Update 5m
      Speaker: Brian Paul Bockelman (University of Nebraska Lincoln (US))
    • 18:20 18:25
      Xrootd Protocol Update 5m
      Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 18:25 18:30