WLCG DOMA BDT Meeting
→
Europe/Zurich
Brian Paul Bockelman
(University of Wisconsin Madison (US)),
Maria Arsuaga Rios
(CERN),
Petr Vokac
(Czech Technical University in Prague (CZ))
Description
Topic: WLCG DOMA BDT Meeting (twiki)
-
-
16:30
→
16:35
News 5m
-
16:35
→
17:00
Tape REST access 25mSpeaker: Mihai PATRASCOIU (CERN)
- Minimum required dCache for TAPE REST is 8.2.22
- issues discovered with FZK recalls from tape GGUS:161903
- dCache updated to allow path size 1024 characters long (8.2.21 supports just 255 characters)
- additional optimization / improvements available only in 9.x branch dCache#7104
- ATLAS started deployment campaign
- BNL mentioned dCache developers recommendation is to wait for 9.2
- Al - performance improvements in 9.2 to support millions of transfers with TAPE REST
- P.V. there is a limit in FTS for ~ 100k staging requests
- I hope we should be fine also with dCache 8.2
- no performance issues observed with FZK that use TAPE REST since April
- staging ~ 200k files per month
- already upgraded testbed to dCache 8.2.25 - started with TAPE REST validation
- PIC - tape REST configured ~ a month ago
- shared doors webdav-at1.pic.es for all WebDAV transfers (disk & tape areas)
- manual tests works fine, but we are still trying to understand if their namespace organization is compatible with ATLAS plans for tokens / access with capabilities (storage.* scopes)
- sufficient dCache versions for ATLAS T1 tapes
- FZK - 8.2.22
- NDGF-T1 - 8.2.23
- pic - 8.2.22
- SARA - 8.2.24
- difference of behavior in FTS < 3.12.8 regarding the retention of staged files after they are transferred FTS-1913 (resolved)
- new 3.12.9 release planned for first week in August
- 3.12.8 is not going to be deployed at CERN and these FTS instances will jump directly to 3.12.9
- BNL mentioned dCache developers recommendation is to wait for 9.2
- Minimum required dCache for TAPE REST is 8.2.22
-
17:00
→
17:09
Transfers with tokens 9mSpeaker: Francesco Giacomini (INFN CNAF)
home directories
- does users access their EOS home area using grid protocols (e.g. root://eosuser.cern.ch//eos/user/[l]/[login])?
- yes (Maarten mentioned CMS?)
- do we expect this area should be accessible with grid protocols using tokens?
- yes, but first we have to solve grid workflows
- P.V. current profile and CERN EOS namespace organization is not compatible with token access with capabilities
- we can't set basepath for multiple issuers to /eos/user
- multiple experiments (at least VO / IAM Admins) would have full access to the data of any CERN user
- description of
storage
.read:/home
in the profile is not really clear to me
storage.create
issues- WLCG JWT profile storage.create definition: Upload data. This includes renaming files if the destination file does not already exist...
- what does this mean in case we don't use file level token granularity, e.g. for
storage.create:/atlasdatadisk/
- this token can be used to make a mess in the namespace (rename all files in
/atlasdatadisk/
) - rename
/atlasdatadisk/mc23_13p6TeV
to/atlasdatadisk/RANDOM_STRING
directories(?) - rename
/atlasdatadisk/mc23_13p6TeV/filename
to/atlasdatadisk/DIFFERENT_RANDOM_STRING
(?)- it is not completely clear from profile what exactly renaming mean
- might be a bit more tricky for object storage where directories are a bit artificial construct
- still much better situation than with X.509 - impossible to destroy data
- this token can be used to make a mess in the namespace (rename all files in
- how can we prevent abuse of this "renaming" functionality
- always use tokens with file level access granularity (IAM performance)?
- ALICE use file level token granularity, but they don't use IAM for file transfers
- cleanest solution with our current WLCG JWT profile
- this would mean for ATLAS ~ 2.5M storage.create tokens per day in average (~ 30 tokens per second)
- get rid of "rename" from WLCG JWT profile and storage implementations?
- Rucio can be configured not to use different name (with
.rucio.upload
suffix) during upload
- Rucio can be configured not to use different name (with
- atomic "PUT+CHECKSUM+RENAME" operation (rename after successful transfer)?
- always use tokens with file level access granularity (IAM performance)?
- recovering from damage caused by abusing too wide renaming allowed with e.g.
storage.create:/atlasdatadisk/
- for distributed storage management we have all metadata also e.g. in Rucio
- with (non-negligible effort) filesize + checksum could be used to recover original filename
FTS tokens design - token validation
- we should try to follow design document and ask IAM for all tokens that will be used in FTS transfers
- validate that IAM provides expected functionality, e.g. token exchange configuration
- token exchange (example)
- does users access their EOS home area using grid protocols (e.g. root://eosuser.cern.ch//eos/user/[l]/[login])?
-
17:09
→
17:10
Packet marking 1mSpeakers: Marian Babik (CERN), Shawn Mc Kee (University of Michigan (US))
- The Working group is now focusing on packet pacing (next meeting in Sept.)
- Dale, Tim, Shawn and Marian have written a RFC draft that will be presented to the upcoming IETF (next week). Opposition is expected by the IETF, but there is always the possibility to publish it as a documentation of the use of the flowlabel field.
- Draft: https://www.ietf.org/archive/id/draft-cc-v6ops-wlcg-flow-label-marking-02.html
- WG is also working with the dCache and XRoot developers to follow up on the flow marking (fireflies) implementationMeeting notes
- dCache with fireflies configured at AGLT2
- fireflies may become available in dCache in next release or two
- significant number of sites may sent fireflies during DC24
- most probably just a simple configuration option on/off
- fireflies sent along the path with data but in addition they can be sent to central collector(s)
- packet marking is more tricky with dCache (no direct access to sockets in Java)
- needs flowd service
- eBPF flow level rewriting - last SC22 showed 200Gb is reachable
- expected to have 10% sites configured also with packet marking
-
17:10
→
17:25
WebDAV Error Message Improvement Project & unified error message format 15m
Discuss with experts improvements in the error messages produced by failed transfers.
https://twiki.cern.ch/twiki/bin/view/LCG/WebdavErrorImprovementSpeaker: Stephan Lammel (Fermi National Accelerator Lab. (US)) - 17:25 → 17:30
-
16:30
→
16:35