WLCG DOMA BDT Meeting

Europe/Zurich
Brian Paul Bockelman (University of Wisconsin Madison (US)) , Maria Arsuaga Rios (CERN) , Petr Vokac (Czech Technical University in Prague (CZ))
Description

Topic: WLCG DOMA BDT Meeting (twiki)

Videoconference
WLCG DOMA BDT Meeting
Zoom Meeting ID
69074333781
Host
Petr Vokac
Useful links
Join via phone
Zoom URL
    • 4:30 PM 4:35 PM
      News 5m
    • 4:35 PM 4:50 PM
      Packet marking 15m
      Speakers: Marian Babik (CERN) , Shawn Mc Kee (University of Michigan (US))

      Preparations for the SC22 demo.

      Succesfully tested flow and packet marking with XRootD 5.5.0 and the latest version of Flowd. Packet marking was performed by flowd via eBPF-TC (so external process was marking packets for xrootd). Verified that flow label is set in all packets (so both experiment and activity can be retrieved per packet). 

      Flowd can now expose all netlink related infromation (per flow) via Prometheus client (statistics related to TCP/IP, e.g. retransmits, retransmits timeout, RTT stats, PMTU/MSS observed , pacing/delivery rate, congestion window, bytes sent/acked/received, segs in/out, TCP options, congestion algorithm, RCV queue size, time w/o outstanding data, etc.)

      We plan to also use CERN P4 setup to collect flow label statistics directly on the network equipment (to showcase possible implementation for the R&E networks).

      Meeting notes: flowd works out of box with recent version of XRootD, deployment activities in the UK and plan to extend that to the US after SC22. Still too early for massive deployment that should aim for DC24, documentation comes later and currently involved only active participants RNTWG.

    • 4:50 PM 5:05 PM
      Transfers with tokens 15m
      Speaker: Francesco Giacomini (INFN CNAF)

      WLCG SAM/ETF storage tests with tokens

      • Plan? Include WLCG JWT storage compliance tests as suggested by CMS?
      • rely on higher level tools like gfal2
      • SAM/ETF transfer tests with tokens 20m

        SAM/ETF CMS plans for transfers with tokens

        • start with related CRIC configurations
        • configure storage with tokens: start with XRootD, DESY - dCache volunteer, EOS - one their our CMS site
          • CMS will provide link to their twiki once they have example configuration
            • all experiments have very similar requirements
            • link CMS twiki documentation once it become available
            • try to provide similar minimalist examples
          • Diego is going to test WLCG XRootD configuration
            • later also provide config that fit CMS requirements
        • than ready to move to storage testing with SAM test using tokens
          • current webdav tests already consist of GSI and macaroon tests ... just add token tests
          • similar update for xroot probes
          • tests rely on gfal2 and use small dataset for read/stat/checksum
          • not just positive tests, but also tests for "unexpected open access"
        • also necessary to configured IAM client that can be used by SAM tests to get token with storage access
    • 5:05 PM 5:20 PM
      Tape REST access 15m
      Speaker: Mihai PATRASCOIU (CERN)

      FTS v3.12.2-rc1 released -- brings full support for Tape REST API

      • Deployed on FTS3-Pilot
      • Stress tests plans to check robustness of CTA implementation
        • CTA testbed configured with virtual tapes => "metadata stress test"
        • send batches with 10k, 100k, 2M FTS transfers
        • FTS limits number of files in one REST API request to 200 (configurable)
          • not to overload / crash REST API

      Golden dCache 8.2 with TAPE REST API released

      • FNAL dCache testbed already use this release
      • real tapes which makes REST API stress tests complicated (long responses / delays)
        • they may be able to provide virtual tapes

      StoRM

      • work in progress
      • developing proof of concept
    • 5:20 PM 5:30 PM
      AOB 10m

      Unclear HTTP-TPC error messages

      • moving from gridftp to webdav decreased clarity of error messages
        • generic HTTP 500 error not very useful to diagnose problems
        • CMS suggest to create project with aim to improve situation
          • followup discussion in the WLCG DOMA TPC mailing list
            • account out of quota or filesystem out of space
            • missing/no such directory/file
            • file or server-with-file unavailable
            • permission creating directory/file
            • certificate/token expired
            • other authorization error
            • transfer timeout
            • connection reset/interrupt
            • service on other endpoint unreachable <IP address (or name, if no DNS entry)>
            • number of transfers/connections/requests exceeeded
            • canceled/aborted transfer
      • response from FTS developers
        • error messages try to signalize where they comes from (storage / gfal / FTS)
        • storage must report reasonable responses in the first place
      • extend compliance testsuite
        • e.g. same HTTP return codes as a response for given storage issue
      • details about unclear XRootD/HTTP error messages should be reported via github issue