DOMA / TPC Meeting

Europe/Zurich
Description

Topic: WLCG DOMA TPC Meeting

Join Zoom Meeting
https://cern.zoom.us/j/99836057922?pwd=ZFhWN3NpYi9oZmwvM3pIRE9zdzFnZz09

Meeting ID: 998 3605 7922
Passcode: 733660
One tap mobile
+41315280988,,99836057922# Switzerland
+41432107042,,99836057922# Switzerland

Dial by your location
        +41 31 528 09 88 Switzerland
        +41 43 210 70 42 Switzerland
        +41 43 210 71 08 Switzerland
        +33 1 7037 2246 France
        +33 1 7037 9729 France
        +33 1 8699 5831 France
Meeting ID: 998 3605 7922
Find your local number: https://cern.zoom.us/u/aeB4ArMgmT

    • 16:00 16:10
      Network data challenges 10m
      Speakers: Dr Riccardo Di Maria (CERN), Rizart Dona (CERN)
       
       
    • 16:10 16:20
      Future uniform tape access 10m
      Speakers: Cedric Caffy (CERN), Mihai Patrascoiu (CERN)
    • 16:20 16:30
      SRM+HTTP tape access 10m
      Speakers: Mihai Patrascoiu (CERN), Petr Vokac (Czech Technical University (CZ))

      Actions: dedicated meeting with TAPE providers

      Rucio 1.25.4 comes with support for SRM+GridFTP together SRM+HTTP protocol

      • only one of these protocols can be configured on RSE
      • FTS transfer protocol preference for SRM must be set to https;gsiftp;root
        • no FTS interface to use different SRM preference for individual transfers
        • SRM+GridFTP used only for storage that doesn't support SRM+HTTP at all
      • this is sufficient to cover ATLAS use-cases - transfers tape <-> disk
        • motivation - Data Challenges with as little as possible GridFTP (RAL Castor system)
        • CMS plans with tape transfers(?)

      New / additional tape bringonline test

      • upload ~ 10TB dataset with 1GB files to each tape endpoint
      • ask dCache/StoRM administrators to clean these files from disk buffer
        • unfortunately storage administrators can't easily remove individual files and cleanup of whole buffer would certainly affect production
        • use existing old production data(set) with high probability to be on the tape(?)
          • it would be necessary to use production Rucio instance
          • require similar config overwrites (patches for Rucio) used e.g. by ATLAS Functional Tests WebDAV(?)
          • we would have to be more careful, but anyway at some point we have to move SRM+HTTP to production
      • add Rucio rule to trigger transfer of NEARLINE file
      • don't reuse files, because after test transfer they'll be ONLINE
      • once we run out of NEARLINE source files ask again for disk buffer cleanup
        • with current test infrastructure all files will be used ~ in 30 days
        • run less tests or ask for bigger space to reduce cleanup requests(?)
      • what would be good test for transfers with SRM+HTTP TAPE destination(?)
        • is transfer to normal disk instead of disk buffer sufficient(?)
        • how to verify that file really reached tape storage(?)

      Keep current Fuctional Tests TAPE(?)

      • not very useful to test TAPE
      • just SRM+HTTP transfer from tape disk buffer
      • concern that files are not really deleted from tapes
        • test files will be physically stored on tapes for years
        • currently 200GB/day
      • modify to SRM+HTTP tests from disks?
        • e.g. "read timeout" issue is visible also for disks
    • 16:30 16:40
      XrootD 5.1.x news 10m
      Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      1. Potential bug fix of memory corruption in HTTP TPC under code review
      2. UTA testbed is up. BNL testbed is under cyber security review
      3. Tested stress code (by Petr) in small scale with UTA

      We will be waiting for 1 and then test again.

    • 16:40 16:50
      Experiments production 10m
      Speakers: Alessandra Forti (University of Manchester (GB)), Diego Davila Foyo (Univ. of California San Diego (US)), Petr Vokac (Czech Technical University (CZ))

      ATLAS

      • StoRM
        • sites experience stability issues after moving everything to WebDAV (TPC + job stage-out). Need to tune the configurations
          • Improved documentation for WebDAV doors tuning and monitoring (see storm section)
      • dCache
        • SRR status - mail discussion WLCG + dCache devs
          • hopefully update in next dCache release
        • How to fix files uploaded without right WriteToken GGUS:151836?
      • Still missing
        • (US) XRootD sites (XRootD 5.2rc1)
        • RAL Echo update DOMATPC-2 still not very optimistic
          • critical for September Data Challenges
        • (US) HPC & gridftp DTN
          • we need somebody actively working on this topic
            • work in progress on Rucio + Globus Online integration
            • waiting for XRootD 5.2 RSE installation at BNL
              • multihop from FTS to Globus world via this RSE
          • avoid dependency on legacy gridftp by the end of 2021?
        • T3 sites - deadline end of 2021
        • tapes - September 2021
          • minus RAL CASTOR (autumn 2021 start of migration to CTA)
        • 29/92 sites

      Available Rucio DOMA tests

      CMS

      This week we enabled 'davs' in Prod for T2_US (except Vanderbilt). We found some issues at:

      • Purdue and Florida: permissions on specific paths (fixed)
      • DESY (Put on Prod long time ago): wrong port used (fixed)

      Next Week I'm planning to enable 'davs' in Prod for Vanderbilt and the T1s

      Current Status:

      total sites 55  
      with davs 50 90.91%
      passes manual tests 44 80.00%
      in Prod 7 12.73%
    • 16:50 16:55
      StoRM update 5m
      Speaker: Andrea Ceccanti (Universita e INFN, Bologna (IT))

      StoRM 1.11.21

      Released at the end of this week:

      https://issues.infn.it/jira/projects/STOR/versions/16713

      Scripts to updated storage usage report scripts (for sites that do not use quotas or GPFS and want to avoid dus):

      https://github.com/italiangrid/storm-utils/tree/main/space-reporting

      These are also packaged as an RPM:

      https://repo.cloud.cnaf.infn.it/repository/storm-rpm-beta/centos7/storm-utils-1.0.0-0.el7.x86_64.rpm

      StoRM WebDAV configuration documentation improved:

      http://italiangrid.github.io/storm/documentation/sysadmin-guide/1.11.20/installation-guides/webdav/storm-webdav-guide/index.html

       

       

    • 16:55 17:00
      Token Authorization testbed 5m
      Speakers: Andrea Ceccanti (Unknown), Andrea Ceccanti (Universita e INFN, Bologna (IT))

      Since GH actions disables scheduled runs if there's no activity on the repo, I've deployed a run of the test suite also on our Jenkins:

      https://ci.cloud.cnaf.infn.it/view/wlcg/job/wlcg-jwt-compliance-tests/job/master/

      Reports accessible to anybody.

      The situation on compliance hasn't improved:

      https://ci.cloud.cnaf.infn.it/view/wlcg/job/wlcg-jwt-compliance-tests/job/master/18/artifact/reports/reports/20210505_112038/joint-report.html

       

    • 17:00 17:05
      AOB 5m