CTA deployment meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
Michael Davis (CERN)

Backpressure

Backpressure for Retrieval

This is the simplest case and has the least serious consequences (i.e. no possibility of data loss).

The Tape Server asks if there is enough free space on the destination disk system, and "reserves" enough space for the files which have been popped from the queue. If there is insufficient space, the tape is dismounted (unless this is a new mount: the mount will not happen) and the retrieve requests are requeued.

Backpressure for Archival

If we are writing to the disk buffer faster than we can archive to tape, the buffer will gradually fill up. Once it is full, there will be a "no free space" error which will be reported to the client (SFO, FTS, etc.) It is up to the client to hold on to the disk files and retry when there is buffer available again.

If everything is working well, we can keep up, e.g. in recent tests writing 5.6 GB/s to 17 tape drives used only about 10 TB of buffer.

The required amount of disk buffer (for long continuous periods of data taking/to survive outages of tape infrastructure) will be set by the Service Level Agreement with the experiment.

Backpressure for Simultaneous Archival and Retrieval

Julien's solution worked out with Andreas:

Have 2 EOS spaces overlaid on the same physical disk, one for archival and one for retrieval:

  • default space
  • staging space

We can set independent quotas for each space. The quota is set as an absolute number of bytes per space (not a percentage).

For the archival use case, we should set a quota (e.g. 30% of available space). When the quota is reached, the client will receive write failure errors and will back off.

For the retrieval use case, we do not need to set a quota, because when the Tape Server makes its reservation, it uses df to query the available space and then checks if it has enough space to make a reservation. If we can't make the reservation we will never write, so in any case we should never hit a quota. Instead we configure the "free disk space threshold" for the reservation to, e.g. 30% of available space.

Conclusion

  • Set a quota for the archival space (A)
  • Set a free space reservation limit for the retrieval space (B) in the tape server config file
  • These limits should not be overlapping (i.e. A + B < 100% of available space)

This does not solve the case where there are two channels writing (e.g. DAQ + reprocessing) and the risk of DAQ being blocked by another write process filling the disk. In general we expect such a case to be rare and to be solved by tape operators. If we want an automated protection, we could implement priority channels (e.g. by storage class).

Special considerations for ALICE

  • We need a per-space conversion rule to move files from the staging space (SSD) to spinner space.
  • This converter could potentially fail if the spinner space is full.

Post-meeting update: the problem is not solved

The proposed solution depended on the fact that the quota/available disk space for the default and staging spaces would be accounted for separately. However, this is not the case. Whether the disk system is queried for space A or space B, it will return the same number (the space available on the underlying physical SSDs).

This means that if the archival quota is set to 30% and retrievals have filled up 30% of the disk, then both the archival and retrieval backpressure mechanisms will block further writes.

There are minutes attached to this event. Show them.
    • 14:00 14:10
      Revised Timeline 10m

      w/c 17 Feb

      • Finish ATLAS repack in CASTOR
      • Release CTA v1.2
      • We will use EOS 4.6.8 for recall tests
      • Re-migrate ATLAS

      w/c 24 Feb

      • Join ATLAS reprocessing campaign

      w/c 2 Mar/9 Mar

      • Write stress test with ATLAS
      • Simultaneously run our own write/recall/delete tests on the instance
      • Test dual-copy tape pools
      • Tier-1 export test
      • Test with XRootD 4.11.2 and EOS 4.6.9, to allow testing of TPC proxy delegation

      w/c 16 Mar

      • Integration test with ATLAS online:
        • Writing without FTS
        • The "is file safely on tape" test
        • "What happens when the buffer is full" test (in particular, ensure that online writes cannot be blocked by failed writes from elsewhere or by recalls filling the buffer)

      w/c 23 Mar

      • One week "cool off" period with no writes to CASTOR, to ensure all files have made it to tape and to check that no further data is being written

      w/c 30 Mar

      • Provisional date for EOSCTAATLAS to go into production.
    • 14:10 14:40
      Technical discussion about Backpressure 30m

      We need to take a look at all of the backpressure/rate-limiting mechanisms in play in all of the system components (Rucio/FTS/XRootD/EOS/CTA) and work out what cases are not yet covered.

      • Two weeks ago, Cédric presented the backpressure mechanism as it is implemented in the tapeserver.
      • FTS team will be present for a discussion on backpressure on the experiment side.
      • The main issue is to be sure that archival from the DAQ can never be blocked by stuck retrieves or by other archival activity.
      • Clarify what problems have already been solved and which are the open problems.
      • Discussion of solutions.
    • 14:40 14:45
      Other unresolved problems 5m

      List issues that still need to be resolved:

      • Zero-length files