# WLCG Archival Storage Group
Monday 16 Apr 2018
https://indico.cern.ch/event/722405/
# Present
Christoph Wissig
Dorin Lobontu
Doris Ressmann KIT
Enrico Fattiene
Gene Oleynik
Jens Jensen
Pierre-Emm
Vanessa Acin PIC
Vladimir Spunenko
Xin Zhao
Dimitrios Christidis
David Yu
Rob Appleyard
"Tim"
Andreas Petzold
Oliver Keeble
Vladimir Bahyl
Andrea Manzi
# Presentation from Andrea Manzi on FTS staging management
Q : We see files being purged from buffer before they are used (or transferred)
There's a pin lifetime, not always implemented, and not guaranteed to work
Where can this be fixed?
FTS prioritising the transfers WAN?
Configure limit on number of staging requests to SEs
But tapes want bulk requests
Let admin set pin lifetimes, disk cache space
Q (Jens) - need a report every quarter on major FTS metrics
Gave up on the new grafana
Old one was really nice
How can I get a 3 month period?
Andrea - this grafana dashboard keeps data only for 1 month
Create a GGUS tick
Q (Vladimir S.) -
Clarification from FTS team:
Submitting 2000 files to FTS immediately results in 10 bulk requests
Submitting 2001 files sends 10 bulk requests, wait 300s before submitting the 11th
We get high request rate from lhcb, good!
Keep in mind dimension of buffer disk and rate of transfer WAN
Should FTS delete the file?
We can do 600MB/s for CMS and have 300TB of buffer.
We should limit number of requests by data volume.
FTS does scheduled transfers internal to the site
Here, volume doesn't matter
Q What is the WAN transfer workflow in all sites? tape->tapebufer->disksystem->WAN ?
Q (KIT) - clarify state and stage + transfer
For reprocessing, FTS should not delete the file
True, but for the transfer case it might be useful.
For reprocessing, typically there is a buffer->"disk system" transfer, so two copies
FTS does this copy
KIT - this double copy isn't done (we ditched it).
but this used to be triggered internally within dcache
# Other orchestrators
Christoph for CMS/Phedex
Every archival site has to deploy a local stager agent
Then up to the site to configure and implement the process
Agent is fed from a db
Frequency of running, bulk size etc are under site control
This config is supposed to be available within CMS
Sites requested to commit their phedex config to cern gitlab
# Discussion on survey results regarding limits.
How about just asking where would the limits be if they were volume?
Reached the "backing off" section. Pick up from here next time
# Vlado and metrics
For those sites who are not submitting metrics, what's the reason? Too difficult? Don't see the benefit?
CNAF - too busy with data centre!!! Can start looking at this now
RAL - lack of time, busy with castor upgrades
backend needs updated, hopefully end of April
have a frontend producing GLUE2, this would then be updated to produce json
IN2P3 - has to fix account problems then will provide this info
some metrics hard to compute (ave tape remounts)
give basics first
Discussion - why so many tape remounts?
KIT probably has higher remounts as it can't process as many requests
David BNL - what do you want from tape drive performance comparisons?
Vlado - can your site use max performance of your drives?
David - will provide best and worst cases.
Vlado - will have to normalise difference between different drives, to discuss in detail.
# AOB
Jens - how do we update 'technology overview' page?
Let Vlado know by email.
# Discussion
How could FTS back off if there's no buffer space left?
Only worth it if there's pinning
Otherwise "no space" doesn't mean "we can't write to buffer"
NB stager queue is shared between VOs, while disk buffer is not
FTS backoff might starve the VO presence in the shared queue and when FTS starts resubmitting, the VO is at a disadvantage
FTS can maybe know what volume of stage requests are in its queue, but it can't know what volume is waiting in the stager queue.
Will add some questions to survey to clarify whether clients (e.g. FTS) should be throttling submissions based on free buffer space.