You can add your contribution to the existing section or if you have different / bigger topic to discuss please let us know before this meeting and we can created dedicated slot for your contributions.
Small topics that might be discussed in this meeting
- dCache 9.2.x and
xrootd.root
door configuration behavior
- WLCG Ops: EGI GOCDB downtime for TAPE service (GGUS:165354)
- SRM.nearline no longer makes sense for WebDAV endpoints with TAPE REST
- we really don't wan't to keep SRM.nearline just to be able to declare downtime for TAPE
- EGI asked to create new service type (GGUS:165354)
- Service types for WebDAV endpoints with TAPE REST
- EGI GOCDB service type
- disk: webdav
- tape: wlcg.webdav.tape
- OSG Topology
- disk: WebDAV or WebDAV.disk
- tape: WebDAV.tape
- CRIC downtime synchronization implemented CRIC-258
- validated by BNL for OSG Topology downtime import for WebDAV.tape
- ATLAS still need different hostname (alias) because FTS configuration use scheme://fqdn
- tape transfers should be separated in FTS
- tape (buffer) may need different limits than disks
- Alma9 xrd client & sites with Root CA signed by SHA1
- we concluded in issue#2150 XRootD should make CA validation compatible with other aplications (ignore signature on self-signed root CA)
- new SHA1 issues with CRLs - unable to validate / load CRLs signed with SHA1
- software: fetch-crl (issue#4), dCache (announcement in user-forum mailing list)
- some SHA1 CAs at least publish CRLs signed with SHA256 => doesn't cause troubles to fetch-crl and dCache
- ATLAS asked WLCG to talk with IGTF and provide timeline (WLCG MB#316 Service report)
- longer term goal: all grid CAs signed by OS trusted CAs
- ultimate goal: get rid of "globus"
/etc/grid-security/certificates
- FTS optimizer values (different for ATLAS vs. CMS vs. Pilot instance, but same behavior for HTTP-TPC)
- The FTS Optimizer has two step sizes when increasing connections on the link
- A conservative step size (default 1, configurable, "OptimizerIncreaseStep" setting), when the Optimizer is set to mode "1"
- An aggressive step size (default 2, configurable, "OptimizerAggressiveIncreaseStep" setting), when the Optimizer is set to mode "2"
- Then, the difference between mode "2" and mode"3" is how the number of actives is calculated:
- In mode "2", number of actives on the link = number of transfers on the link
- In mode "3", number of actives on the link = number of TCP streams on the link. This was valid in the times of GridFTP, but for HTTP, every transfer only uses one stream
- In essence, not much difference between mode "2" and mode "3".
- A lot of functionality applicable only to GridFTP => no longer useful
- Full TAPE buffer & FTS failing transfers
- could FTS do better and not to push new transfer when buffer is full? Use WLCG SRR to detect free space in buffer?
- do we have enough information for more clever decision than waiting for failures and relying on slow optimizer to stop transfers?
- FTS developers already thought about improvements not to hit buffer size limit
- FTS knows the total size of staged files
- dCache TAPE REST and non-default
webdav.root
issue
- fixed in dCache 9.2.14 with new configuration option
frontend.root
- current implemenation require different frontend service for each VO with their own doors dcache#7506
- Performance markers missing for serveral dCache sites GGUS:165469, dCacheRT#10596
- Switch gfal default HTTP library from libneon to libcurl - done at CERN and BNL
- dCache: removing internal dependencies on SRM space manager -> transition to dCache quotas
- ATLAS - currently rely on multiple spacetokens from each site - to be discussed internally how to deal with single quota space
- future question:
- make quota calculation more realtime
- automatic WLCG SRR with quota
Topics for next BDT meeting
storage.stage
implementation
- add reference to previous discussion and conclusions
- https://issues.infn.it/jira/browse/STOR-1605
- WLCG: stage is not superset of read, user still need read to download data
- compliance tests
- CMS testsuite built into SAM tests (xroot, WebDAV)
- ATLAS WebDAV testsuite for simple interactive overview (e.g. dCache, EOS, StoRM, XRootD)
- XRootD (EOS) allow explicit configuration of authorization strategy (xrootd#2121) only since 5.7.0 (xrootd#2205)
- EOS can't be used safely with tokens (unless you are fine giving DESTROY privileges to all VO members)
- fixed in EOS 5.2.26(?) which support scritoken with
authorization_strategy = capability
- new problems with
storage.create
which doesn't work correctly (xrootd#2364)
- what are the "optimal" FTS staging queu size limits for different implementations
- should FTS consider buffer size & file size in the queue?
- StoRM seems to struggle with long queue (100k+)
- is there a maximum for dCache & CTA
- should we still set stagin queue limit in FTS for dCache? Or can dCache deal internally with "infinite" staging queue size?
- Discussion about more robust FTS behavior with transfers from TAPE in case of full buffer
- it doesn't make sense to schedule new transfers when buffer in front of TAPE is full
- FTS would need more details in WLCG SRR to make more reasonable decisions
- P.V. note - more details in private email thread "control write throughput to tape buffer"
- DC27 archival metadata requirement
- TAPE family
/dev/null
for "Data Challenge" activity
- we need more realistic T0 Export simulation
- e.g. for sites with complex topology like RAL
- also other sites use disk buffer in front of tape which is not served by same disknodes & filesystems like DATADISK
-
- topics with lower priority:
- XRootD client libraries doesn't implement happy-eye-ball
- XRootD bug in case-sensitive HTTP headers parsing (RFC2616) is not compatible with StoRM HTTP/2 support (HTTP headers in lowercase RFC7540)
- HTTP/2
transferheaderauthorization
is sent to passive party as authorization
header and XRootD don't recognize this header (returns authorization failure)
- related to FTS upgrade to EL9 which comes with curl (used by gfal) with HTTP/2 support
- workaround - disable HTTP/2 support on the HTTP-TPC active party (StoRM)
- HTTP digest handling changed with RFC 9530 which obsoletes RFC 3230 xrootd#2211
- disable grid proxy delegation during FTS HTTP-TPC (by default?)
- Do we need
condor_test_token
equivalent for storage?
- dCache: authzdb -> multimap + omnisession migration issue#6607
- reminders about existing (HTTP-TPC) related issues
- dCache: HTTP-TPC performance markers and RemoteConnections dCache#7441
- different performance markers types? (start, connection established, connection closed, finish, ...)
- dCache: issue with xroot-tpc and new default XRootD SHA256 signatures dCache#7599
- StoRM: no support for RemoteConnections in performance markers (ticket?)
- StoRM: Forbidden TPC push transfers on gclouds platform (STOR-1563)
- StoRM: support for "stat" with storage.create and storage.modify STOR-1600
- StoRM: does this storage rely on sufficiently recent CaNL (GGUS:167085)