# WLCG Archival Storage Group
Monday 16 Apr 2018

# Present

Christoph Wissig
Dorin Lobontu
Doris Ressmann KIT
Enrico Fattiene
Gene Oleynik
Jens Jensen
Vanessa Acin PIC
Vladimir Spunenko
Xin Zhao
Dimitrios Christidis
David Yu
Rob Appleyard
Andreas Petzold
Oliver Keeble
Vladimir Bahyl
Andrea Manzi

# Presentation from Andrea Manzi on FTS staging management

Q : We see files being purged from buffer before they are used (or transferred)
  There's a pin lifetime, not always implemented, and not guaranteed to work
  Where can this be fixed?
    FTS prioritising the transfers WAN?
    Configure limit on number of staging requests to SEs
      But tapes want bulk requests
    Let admin set pin lifetimes, disk cache space

Q (Jens) - need a report every quarter on major FTS metrics
    Gave up on the new grafana
      Old one was really nice
      How can I get a 3 month period?
    Andrea - this grafana dashboard keeps data only for 1 month
    Create a GGUS tick

Q (Vladimir S.) -
   Clarification from FTS team:
      Submitting 2000 files to FTS immediately results in 10 bulk requests
      Submitting 2001 files sends 10 bulk requests, wait 300s before submitting the 11th
        We get high request rate from lhcb, good!
   Keep in mind dimension of buffer disk and rate of transfer WAN
      Should FTS delete the file?
      We can do 600MB/s for CMS and have 300TB of buffer.
   We should limit number of requests by data volume.
  FTS does scheduled transfers internal to the site
    Here, volume doesn't matter  

Q What is the WAN transfer workflow in all sites? tape->tapebufer->disksystem->WAN ?

Q (KIT) - clarify state and stage + transfer
    For reprocessing, FTS should not delete the file
      True, but for the transfer case it might be useful.
    For reprocessing, typically there is a buffer->"disk system" transfer, so two copies
      FTS does this copy
    KIT - this double copy isn't done (we ditched it).
      but this used to be triggered internally within dcache

# Other orchestrators

Christoph for CMS/Phedex
  Every archival site has to deploy a local stager agent
    Then up to the site to configure and implement the process
  Agent is fed from a db
    Frequency of running, bulk size etc are under site control
  This config is supposed to be available within CMS
    Sites requested to commit their phedex config to cern gitlab

# Discussion on survey results regarding limits.

How about just asking where would the limits be if they were volume?

Reached the "backing off" section. Pick up from here next time

# Vlado and metrics

For those sites who are not submitting metrics, what's the reason? Too difficult? Don't see the benefit?

CNAF - too busy with data centre!!! Can start looking at this now
RAL - lack of time, busy with castor upgrades
    backend needs updated, hopefully end of April
    have a frontend producing GLUE2, this would then be updated to produce json
IN2P3 - has to fix account problems then will provide this info
  some metrics hard to compute (ave tape remounts)
    give basics first

Discussion - why so many tape remounts?
KIT probably has higher remounts as it can't process as many requests

David BNL - what do you want from tape drive performance comparisons?
  Vlado - can your site use max performance of your drives?
    David - will provide best and worst cases.
  Vlado - will have to normalise difference between different drives, to discuss in detail.


Jens - how do we update 'technology overview' page?
  Let Vlado know by email.

# Discussion

How could FTS back off if there's no buffer space left?
  Only worth it if there's pinning
  Otherwise "no space" doesn't mean "we can't write to buffer"
  NB stager queue is shared between VOs, while disk buffer is not
    FTS backoff might starve the VO presence in the shared queue and when FTS starts resubmitting, the VO is at a disadvantage
  FTS can maybe know what volume of stage requests are in its queue, but it can't know what volume is waiting in the stager queue.

Will add some questions to survey to clarify whether clients (e.g. FTS) should be throttling submissions based on free buffer space.


