WLCG DOMA BDT Meeting

Petr Vokac
    • 4:30 PM 4:35 PM
      News 5m
    • 4:35 PM 5:00 PM
      Tape REST access 25m
      Speaker: Mihai PATRASCOIU (CERN)
      • Minimum required dCache for TAPE REST is 8.2.22
        • issues discovered with FZK recalls from tape GGUS:161903
        • dCache updated to allow path size 1024 characters long (8.2.21 supports just 255 characters)
          • additional optimization / improvements available only in 9.x branch dCache#7104
      • ATLAS started deployment campaign
        • BNL mentioned dCache developers recommendation is to wait for 9.2
          • Al - performance improvements in 9.2 to support millions of transfers with TAPE REST
          • P.V. there is a limit in FTS for ~ 100k staging requests
            • I hope we should be fine also with dCache 8.2
            • no performance issues observed with FZK that use TAPE REST since April
              • staging ~ 200k files per month
          • already upgraded testbed to dCache 8.2.25 - started with TAPE REST validation
        • PIC - tape REST configured ~ a month ago
          • shared doors webdav-at1.pic.es for all WebDAV transfers (disk & tape areas)
          • manual tests works fine, but we are still trying to understand if their namespace organization is compatible with ATLAS plans for tokens / access with capabilities (storage.* scopes)
        • sufficient dCache versions for ATLAS T1 tapes
          • FZK - 8.2.22
          • NDGF-T1 - 8.2.23
          • pic - 8.2.22
          • SARA - 8.2.24
        • difference of behavior in FTS < 3.12.8 regarding the retention of staged files after they are transferred FTS-1913 (resolved)
          • new 3.12.9 release planned for first week in August
          • 3.12.8 is not going to be deployed at CERN and these FTS instances will jump directly to 3.12.9
    • 5:00 PM 5:09 PM
      Transfers with tokens 9m
      Speaker: Francesco Giacomini (INFN CNAF)

      home directories

      • does users access their EOS home area using grid protocols (e.g. root://eosuser.cern.ch//eos/user/[l]/[login])?
        • yes (Maarten mentioned CMS?)
      • do we expect this area should be accessible with grid protocols using tokens?
        • yes, but first we have to solve grid workflows
      • P.V. current profile and CERN EOS namespace organization is not compatible with token access with capabilities
        • we can't set basepath for multiple issuers to /eos/user
        • multiple experiments (at least VO / IAM Admins) would have full access to the data of any CERN user
        • description of storage.read:/home in the profile is not really clear to me

      storage.create issues

      • WLCG JWT profile storage.create definition: Upload data. This includes renaming files if the destination file does not already exist...
      • what does this mean in case we don't use file level token granularity, e.g. for storage.create:/atlasdatadisk/
        • this token can be used to make a mess in the namespace (rename all files in /atlasdatadisk/)
        • rename /atlasdatadisk/mc23_13p6TeV to /atlasdatadisk/RANDOM_STRING directories(?)
        • rename /atlasdatadisk/mc23_13p6TeV/filename to /atlasdatadisk/DIFFERENT_RANDOM_STRING(?)
          • it is not completely clear from profile what exactly renaming mean
          • might be a bit more tricky for object storage where directories are a bit artificial construct
        • still much better situation than with X.509 - impossible to destroy data
      • how can we prevent abuse of this "renaming" functionality
        • always use tokens with file level access granularity (IAM performance)?
          • ALICE use file level token granularity, but they don't use IAM for file transfers
          • cleanest solution with our current WLCG JWT profile
            • this would mean for ATLAS ~ 2.5M storage.create tokens per day in average (~ 30 tokens per second)
        • get rid of "rename" from WLCG JWT profile and storage implementations?
          • Rucio can be configured not to use different name (with .rucio.upload suffix) during upload
        • atomic "PUT+CHECKSUM+RENAME" operation (rename after successful transfer)?
      • recovering from damage caused by abusing too wide renaming allowed with e.g. storage.create:/atlasdatadisk/
        • for distributed storage management we have all metadata also e.g. in Rucio
        • with (non-negligible effort) filesize + checksum could be used to recover original filename

      FTS tokens design - token validation

      • we should try to follow design document and ask IAM for all tokens that will be used in FTS transfers
      • validate that IAM provides expected functionality, e.g. token exchange configuration
    • 5:09 PM 5:10 PM
      Packet marking 1m
      Speakers: Marian Babik (CERN), Shawn Mc Kee (University of Michigan (US))

      - The Working group is now focusing on packet pacing (next meeting in Sept.)
      - Dale, Tim, Shawn and Marian have written a RFC draft that will be presented to the upcoming IETF (next week).  Opposition is expected by the IETF, but there is always the possibility to publish it as a documentation of the use of the flowlabel field.
      - Draft: https://www.ietf.org/archive/id/draft-cc-v6ops-wlcg-flow-label-marking-02.html
      - WG is also working with the dCache and XRoot developers to  follow up on the flow marking (fireflies) implementation


      Meeting notes

      • dCache with fireflies configured at AGLT2
      • fireflies may become available in dCache in next release or two
        • significant number of sites may sent fireflies during DC24
        • most probably just a simple configuration option on/off
        • fireflies sent along the path with data but in addition they can be sent to central collector(s)
      • packet marking is more tricky with dCache (no direct access to sockets in Java)
        • needs flowd service
        • eBPF flow level rewriting - last SC22 showed 200Gb is reachable
        • expected to have 10% sites configured also with packet marking
    • 5:10 PM 5:25 PM
      WebDAV Error Message Improvement Project & unified error message format 15m

      Discuss with experts improvements in the error messages produced by failed transfers.

      Speaker: Stephan Lammel (Fermi National Accelerator Lab. (US))
    • 5:25 PM 5:30 PM
      AOB 5m