DOMA / TPC Meeting
Topic: WLCG DOMA TPC Meeting
Join Zoom Meeting
https://cern.zoom.us/j/99836057922?pwd=ZFhWN3NpYi9oZmwvM3pIRE9zdzFnZz09
Meeting ID: 998 3605 7922
Passcode: 733660
One tap mobile
+41315280988,,99836057922# Switzerland
+41432107042,,99836057922# Switzerland
Dial by your location
+41 31 528 09 88 Switzerland
+41 43 210 70 42 Switzerland
+41 43 210 71 08 Switzerland
+33 1 7037 2246 France
+33 1 7037 9729 France
+33 1 8699 5831 France
Meeting ID: 998 3605 7922
Find your local number: https://cern.zoom.us/u/aeB4ArMgmT
-
-
16:00
→
16:10
SRM+HTTP tape access 10mSpeakers: Mihai Patrascoiu (CERN), Petr Vokac (Czech Technical University (CZ))
Just a minor progress on SRM+HTTPs
- new test matrix with disk servers (dCache, DPM, EOS, Echo, StoRM, XRootD)
- fixed issue in Rucio and srm+https rucio#4650
- SH_* RSE with protocols: srm+https, https, davs
- 100MB file transfers every ~ 20 minutes (initial tests with 1MB, test bigger 5GB?)
- no issue related directly to srm+https
- BNL srm+https destination caused by wrong RSE spacetoken configuration
- which spacetoken can be used for dteam VO with srm://dcsrm.usatlas.bnl.gov:8443/srm/managerv2?SFN=/pnfs/usatlas.bnl.gov/users/hiroito/testtpc
- TRIUMF most probably same firewall issue that we found also with prod transfers
- RAL Echo still problematic
- EOS firewall/configuration(?) on p06636710r84969.cern.ch
- dCache prometheus - occaissional MAKE_PARENT failure (most probably related to deploying new release)
- BNL srm+https destination caused by wrong RSE spacetoken configuration
SRM+HTTP tape test - plan:
0) periodic tests only for SRM+HTTP and DISK (e.g. to understand occasional timeouts seen for transfer from DISK) - DONE
1) upload 100TB dataset (1GB, 10GB files) to several dCache/StoRM/EOS/DPM DISK storages (Rucio RSEs)
2) (Rucio) FTS transfer dataset with SRM+HTTP to StoRM TAPE endpoint (INFN-T1) and dCache TAPE endpoint (FNAL or BNL?)
3) ask tape admins to verify uploaded files are in right state, they reached tape and generally everything looks fine (or use SFO?)
4) ask tape admins to cleanup dataset files from disk buffer (to let bringonline in next step do real work)
5) (Rucio) FTS transfer with SRM+HTTP from TAPE storage to DISK endpoints dCache/StoRM/EOS/DPM
6) repeat only if we find an issue / from the point where we found issueIt'll be necessary to synchronize with tape storage admins - site ready for 2, local verification 3-4, site ready for 4 - I can imagine that all steps can take few weeks to complete, but first I would like to understand 0 (I'll replace tape with disk for [1]) and meanwhile I'll prepare test dataset.
-
16:10
→
16:20
Future uniform tape access 10mSpeakers: Oliver Keeble (CERN), Paul Millar
-
16:20
→
16:30
XrootD 5 news 10mSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
-
16:30
→
16:40
Experiments production 10mSpeakers: Alessandra Forti (University of Manchester (GB)), Diego Davila Foyo (Univ. of California San Diego (US)), Petr Vokac (Czech Technical University (CZ))
ATLAS
- HTTP-TPC migration status after May 31 deadline for disk storages
- a week old HTTP-TPC status for DOMA General
- XRootD 5.2 still in OSG development repository
- Echo testbed on XRootD 5.2 - still slow
- Problem with FTS tranfer limits INC2803669
- not possible to set limit for site, but only for protocol
- max transfers is SUM(gsiftp + davs limit)
- this will be solved by dropping other TPC protocols
- we need tapes on HTTP-TPC first
- transfer limits doesn't seems to be enforced correctly
- we saw more active transfers than configured storage limit
- not possible to set limit for site, but only for protocol
CMS
Total Sites 53 In Production 30 - 57% Have passed manual tests 8/23 Have a WebDAV endpoint 12/23 Do NOT have a WebDAV endpoint 3/23 This week we might be able to move 6 more sites to Production
FTS / Gfal2
- introduced LOG_SENSITIVE=<true|false> configuration option FTS-1663
- HTTP-TPC migration status after May 31 deadline for disk storages
-
16:40
→
16:50
Network data challenges 10mSpeakers: Dr Riccardo Di Maria (CERN), Rizart Dona (CERN)
Monitoring Updates (markdown)
-
Limitations at measuring accurate throughput from the FTS aggregated data (ref.)
-
Lack of metadata fields in the FTS aggregated data, this is needed to separate testing traffic from normal traffic (ref.)(this metadata exists in the aggregated data, thanks to Nick Smith for pointing it out) -
CMS Rucio data → retention policy of 1 month (hosted in the short term Elasticsearch cluster)
-
In contrast, ATLAS Rucio data → retention policy of 1+ year (hosted in InfluxDB)
-
Longer retention policy for the CMS case might be needed if we need a long-term monitoring → alternatively just use the FTS aggregated data and do not bother with Rucio data sources
-
-
-
16:50
→
17:00
Token Authorization testbed 10mSpeakers: Andrea Ceccanti (Unknown), Andrea Ceccanti (Universita e INFN, Bologna (IT))
-
17:00
→
17:05
AOB 5m
-
16:00
→
16:10