DOMA / TPC Meeting
Topic: WLCG DOMA TPC Meeting
Join Zoom Meeting
https://cern.zoom.us/j/99836057922?pwd=ZFhWN3NpYi9oZmwvM3pIRE9zdzFnZz09
Meeting ID: 998 3605 7922
Passcode: 733660
One tap mobile
+41315280988,,99836057922# Switzerland
+41432107042,,99836057922# Switzerland
Dial by your location
+41 31 528 09 88 Switzerland
+41 43 210 70 42 Switzerland
+41 43 210 71 08 Switzerland
+33 1 7037 2246 France
+33 1 7037 9729 France
+33 1 8699 5831 France
Meeting ID: 998 3605 7922
Find your local number: https://cern.zoom.us/u/aeB4ArMgmT
-
-
1
Network data challengesSpeakers: Dr Riccardo Di Maria (CERN), Rizart Dona (CERN)
- WLCG Doma Openstack project created, this is going to host the machines that will run the tests etc.
- Repo to host testing code: https://gitlab.cern.ch/wlcg-doma/data-challenge-2021
- JIRA to track the activities
- WLCG Grafana Org, Data Challenges folder: https://monit-grafana.cern.ch/dashboards/f/qY7d-gjMz/data-challenges
- We now have access via this org to the data sources that are described here
- Users that want edit access to this folder should contact Monit via a SNOW ticket
- Starting with FTS based data sources
-
2
Future uniform tape accessSpeakers: Cedric Caffy (CERN), Mihai Patrascoiu (CERN)
-
3
SRM+HTTP tape accessSpeakers: Mihai Patrascoiu (CERN), Petr Vokac (Czech Technical University (CZ))
Actions: dedicated meeting with TAPE providers
Rucio 1.25.4 comes with support for SRM+GridFTP together SRM+HTTP protocol
- only one of these protocols can be configured on RSE
- FTS transfer protocol preference for SRM must be set to https;gsiftp;root
- no FTS interface to use different SRM preference for individual transfers
- SRM+GridFTP used only for storage that doesn't support SRM+HTTP at all
- this is sufficient to cover ATLAS use-cases - transfers tape <-> disk
- motivation - Data Challenges with as little as possible GridFTP (RAL Castor system)
- CMS plans with tape transfers(?)
New / additional tape bringonline test
- upload ~ 10TB dataset with 1GB files to each tape endpoint
- ask dCache/StoRM administrators to clean these files from disk buffer
- unfortunately storage administrators can't easily remove individual files and cleanup of whole buffer would certainly affect production
- use existing old production data(set) with high probability to be on the tape(?)
- it would be necessary to use production Rucio instance
- require similar config overwrites (patches for Rucio) used e.g. by ATLAS Functional Tests WebDAV(?)
- we would have to be more careful, but anyway at some point we have to move SRM+HTTP to production
- add Rucio rule to trigger transfer of NEARLINE file
- don't reuse files, because after test transfer they'll be ONLINE
- once we run out of NEARLINE source files ask again for disk buffer cleanup
- with current test infrastructure all files will be used ~ in 30 days
- run less tests or ask for bigger space to reduce cleanup requests(?)
- what would be good test for transfers with SRM+HTTP TAPE destination(?)
- is transfer to normal disk instead of disk buffer sufficient(?)
- how to verify that file really reached tape storage(?)
Keep current Fuctional Tests TAPE(?)
- not very useful to test TAPE
- just SRM+HTTP transfer from tape disk buffer
- concern that files are not really deleted from tapes
- test files will be physically stored on tapes for years
- currently 200GB/day
- modify to SRM+HTTP tests from disks?
- e.g. "read timeout" issue is visible also for disks
-
4
XrootD 5.1.x newsSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
-
5
Experiments productionSpeakers: Alessandra Forti (University of Manchester (GB)), Diego Davila Foyo (Univ. of California San Diego (US)), Petr Vokac (Czech Technical University (CZ))
ATLAS
- StoRM
- sites experience stability issues after moving everything to WebDAV (TPC + job stage-out). Need to tune the configurations
- Improved documentation for WebDAV doors tuning and monitoring (see storm section)
- sites experience stability issues after moving everything to WebDAV (TPC + job stage-out). Need to tune the configurations
- dCache
- SRR status - mail discussion WLCG + dCache devs
- hopefully update in next dCache release
- How to fix files uploaded without right WriteToken GGUS:151836?
- SRR status - mail discussion WLCG + dCache devs
- Still missing
- (US) XRootD sites (XRootD 5.2rc1)
- RAL Echo update DOMATPC-2 still not very optimistic
- critical for September Data Challenges
- (US) HPC & gridftp DTN
- we need somebody actively working on this topic
- work in progress on Rucio + Globus Online integration
- waiting for XRootD 5.2 RSE installation at BNL
- multihop from FTS to Globus world via this RSE
- avoid dependency on legacy gridftp by the end of 2021?
- we need somebody actively working on this topic
- T3 sites - deadline end of 2021
- tapes - September 2021
- minus RAL CASTOR (autumn 2021 start of migration to CTA)
- 29/92 sites
Available Rucio DOMA tests
- Full transfer matrix tested
- Are all these tests still relevant
- Experiments rely on their own monitoring
- Test parameters modification(?) suggestions(?)
- Tests
- Functional Tests WebDAV & XRootD, 1GB every hour (28 & 16 sites)
- Functional Tests OIDC, 1GB every hour (7 sites)
- Functional Tests TAPE, 1GB every hour (10 endpoints)
- Stress Tests WebDAV & XRootD, 250x 4GB transfers every 4 hours (6 & 4 sites)
- 0.5PB/week with 1.5Gb/s average throughput in/out per participating site
- Stress Tests WebDAV NFiles, 10000x 1KB transfers once a day (7 sites)
CMS
This week we enabled 'davs' in Prod for T2_US (except Vanderbilt). We found some issues at:
- Purdue and Florida: permissions on specific paths (fixed)
- DESY (Put on Prod long time ago): wrong port used (fixed)
Next Week I'm planning to enable 'davs' in Prod for Vanderbilt and the T1s
Current Status:
total sites 55 with davs 50 90.91% passes manual tests 44 80.00% in Prod 7 12.73% - StoRM
-
6
StoRM updateSpeaker: Andrea Ceccanti (Universita e INFN, Bologna (IT))
StoRM 1.11.21
Released at the end of this week:
https://issues.infn.it/jira/projects/STOR/versions/16713
Scripts to updated storage usage report scripts (for sites that do not use quotas or GPFS and want to avoid dus):
https://github.com/italiangrid/storm-utils/tree/main/space-reporting
These are also packaged as an RPM:
https://repo.cloud.cnaf.infn.it/repository/storm-rpm-beta/centos7/storm-utils-1.0.0-0.el7.x86_64.rpm
StoRM WebDAV configuration documentation improved:
-
7
Token Authorization testbedSpeakers: Andrea Ceccanti (Unknown), Andrea Ceccanti (Universita e INFN, Bologna (IT))
Since GH actions disables scheduled runs if there's no activity on the repo, I've deployed a run of the test suite also on our Jenkins:
https://ci.cloud.cnaf.infn.it/view/wlcg/job/wlcg-jwt-compliance-tests/job/master/
Reports accessible to anybody.
The situation on compliance hasn't improved:
https://ci.cloud.cnaf.infn.it/view/wlcg/job/wlcg-jwt-compliance-tests/job/master/18/artifact/reports/reports/20210505_112038/joint-report.html
-
8
AOB
-
1