WLCG DOMA BDT Meeting

Europe/Zurich
Brian Paul Bockelman (University of Wisconsin Madison (US)), Maria Arsuaga Rios (CERN), Petr Vokac (Czech Technical University in Prague (CZ))
Description

Topic: WLCG DOMA BDT Meeting (twiki)

    • 16:30 16:35
      News 5m

      You can add your contribution to the existing section or if you have different / bigger topic to discuss please let us know before this meeting and we can created dedicated slot for your contributions.

      Small topics that might be discussed in this meeting

      • dCache 9.2.x and xrootd.root door configuration behavior
        • fixed in 9.2.8
      • WLCG Ops: EGI GOCDB downtime for TAPE service (GGUS:165354)
        • SRM.nearline no longer makes sense for WebDAV endpoints with TAPE REST
          • we really don't wan't to keep SRM.nearline just to be able to declare downtime for TAPE
          • EGI asked to create new service type (GGUS:165354)
        • Service types for WebDAV endpoints with TAPE REST
          • EGI GOCDB service type
            • disk: webdav
            • tape: wlcg.webdav.tape
          • OSG Topology
            • disk: WebDAV or WebDAV.disk
            • tape: WebDAV.tape
          • CRIC downtime synchronization implemented CRIC-258
            • validated by BNL for OSG Topology downtime import for WebDAV.tape
        • ATLAS still need different hostname (alias) because FTS configuration use scheme://fqdn
          • tape transfers should be separated in FTS
          • tape (buffer) may need different limits than disks
      • Alma9 xrd client & sites with Root CA signed by SHA1
        • we concluded in issue#2150 XRootD should make CA validation compatible with other aplications (ignore signature on self-signed root CA)
        • new SHA1 issues with CRLs - unable to validate / load CRLs signed with SHA1
          • software: fetch-crl (issue#4), dCache (announcement in user-forum mailing list)
          • some SHA1 CAs at least publish CRLs signed with SHA256 => doesn't cause troubles to fetch-crl and dCache
        • ATLAS asked WLCG to talk with IGTF and provide timeline (WLCG MB#316 Service report)
          • longer term goal: all grid CAs signed by OS trusted CAs
          • ultimate goal: get rid of "globus" /etc/grid-security/certificates
      • FTS optimizer values (different for ATLAS vs. CMS vs. Pilot instance, but same behavior for HTTP-TPC)
        • The FTS Optimizer has two step sizes when increasing connections on the link
          • A conservative step size (default 1, configurable, "OptimizerIncreaseStep" setting), when the Optimizer is set to mode "1"
          • An aggressive step size (default 2, configurable, "OptimizerAggressiveIncreaseStep" setting), when the Optimizer is set to mode "2"
        • Then, the difference between mode "2" and mode"3" is how the number of actives is calculated:
          • In mode "2", number of actives on the link = number of transfers on the link
          • In mode "3", number of actives on the link = number of TCP streams on the link. This was valid in the times of GridFTP, but for HTTP, every transfer only uses one stream
          • In essence, not much difference between mode "2" and mode "3".
        • A lot of functionality applicable only to GridFTP => no longer useful
      • Full TAPE buffer & FTS failing transfers
        • could FTS do better and not to push new transfer when buffer is full? Use WLCG SRR to detect free space in buffer?
        • do we have enough information for more clever decision than waiting for failures and relying on slow optimizer to stop transfers?
        • FTS developers already thought about improvements not to hit buffer size limit
          • FTS knows the total size of staged files
      • dCache TAPE REST and non-default webdav.root issue
        • fixed in dCache 9.2.14 with new configuration option frontend.root
        • current implemenation require different frontend service for each VO with their own doors dcache#7506
      • Performance markers missing for serveral dCache sites GGUS:165469, dCacheRT#10596
        • FZK, NDGF, BNL
      • Switch gfal default HTTP library from libneon to libcurl - done at CERN and BNL
      • dCache: removing internal dependencies on SRM space manager -> transition to dCache quotas
        • ATLAS - currently rely on multiple spacetokens from each site - to be discussed internally how to deal with single quota space
        • future question:
          • make quota calculation more realtime
          • automatic WLCG SRR with quota

       

       

      Topics for next BDT meeting

      • DC27 archival metadata requirement
        • TAPE family /dev/null for "Data Challenge" activity
        • we need more realistic T0 Export simulation
          • e.g. for sites with complex topology like RAL
          • also other sites use disk buffer in front of tape which is not served by same disknodes & filesystems like DATADISK
      •  
      • topics with low priority:
      • XRootD client libraries doesn't implement happy-eye-ball
      • XRootD bug in case-sensitive HTTP headers parsing (RFC2616) is not compatible with StoRM HTTP/2 support (HTTP headers in lowercase RFC7540)
        • HTTP/2 transferheaderauthorization is sent to passive party as authorization header and XRootD don't recognize this header (returns authorization failure)
        • related to FTS upgrade to EL9 which comes with curl (used by gfal) with HTTP/2 support
        • workaround - disable HTTP/2 support on the HTTP-TPC active party (StoRM)
      • HTTP digest handling changed with RFC 9530 which obsoletes RFC 3230  xrootd#2211
      • disable grid proxy delegation during FTS HTTP-TPC (by default?)
      • Do we need condor_test_token equivalent for storage?
      • dCache: authzdb -> multimap + omnisession migration issue#6607
      • reminders about existing (HTTP-TPC) related issues
        • dCache: HTTP-TPC performance markers and RemoteConnections dCache#7441
          • different performance markers types? (start, connection established, connection closed, finish, ...)
        • dCache: issue with xroot-tpc and new default XRootD SHA256 signatures dCache#7599
        • StoRM: no support for RemoteConnections in performance markers (ticket?)
        • StoRM: Forbidden TPC push transfers on gclouds platform (STOR-1563)
        • StoRM: support for "stat" with storage.create and storage.modify STOR-1600
        • StoRM: does this storage rely on sufficiently recent CaNL (GGUS:167085)
    • 16:35 16:55
      Transfers with tokens 20m
      Speakers: Petr Vokac (Czech Technical University in Prague (CZ)), Francesco Giacomini (INFN CNAF)
    • 16:55 17:05
      Tape REST access 10m
      Speaker: Mihai PATRASCOIU (CERN)

      TAPE REST & Tokens

      ATLAS progress

      • Found an issue with non-default webdav.root (issue#7506)
        • affected sites: PIC, IN2P3-CC, NDGF
      • Status:
        • sites with TAPE rest in production
          • CTA: CERN, RAL
          • dCache: FZK, DESY-HH (T2), BNL-OSG2
        • sites with TAPE rest available
          • dCache (but see issue#7506): IN2P3-CC, NDGF-T1, PIC
          • StoRM: INFN-T1
        • sites without configured TAPE REST
          • dCache: SARA-MATRIX, TRIUMF-LCG2
        • sites with old SE
          • dCache 7.x: RRC-KI-T1
        • we would like to move forward with TAPE REST deployment after DC24, details
          • CERN-PROD (REST) - production (April 2023)
          • BNL-OSG2 (REST) - production - still in test (February 2024)
          • FZK-LCG2 (REST) - production (March 2023)
          • IN2P3-CC (REST) - use prefix /pnfs/in2p3.fr/data/atlas and needs dCache 9.2.14
          • INFN-T1 (REST) - ready to be tested
          • NDGF-T1 (REST) - they would like to test ENDIT with SRM before moving to REST (January 2024), NDGF use prefix /pnfs/ndgf.org/data and needs require dCache 9.2.14
          • PIC (REST) - use prefix /pnfs/pic.es/data/atlas and needs dCache 9.2.14, complicated sites shared with IFAE T2 (multiple gplazma) and they'll use also multiple frontend services
          • RAL-LCG2 (REST) - production (~ May 2023)
          • RRC-KI-T1 (REST) - old dCache 7.x
          • SARA-MATRIX (REST) - REST JSON not yet configured
          • TRIUMF-LCG2 (REST) - REST JSON not yet configured, use prefix /pnfs/triumf.ca/data and needs dCache 9.2.14
    • 17:05 17:15
      Packet marking 10m
      Speakers: Marian Babik (CERN), Shawn Mc Kee (University of Michigan (US))
    • 17:15 17:25
      WebDAV Error Message Improvement Project & unified error message format 10m

      Discuss with experts improvements in the error messages produced by failed transfers.
      https://twiki.cern.ch/twiki/bin/view/LCG/WebdavErrorImprovement

      Speaker: Stephan Lammel (Fermi National Accelerator Lab. (US))
    • 17:25 17:30
      AOB 5m