FTS + CTA + CMS

Current situation

CMS sees an FTS transfer as one single unit. The file is at the source and the transfer ends when the file makes it on tape at the destination.

Implementation wise, FTS has to split the data movement into two distinct phases: 

  1. Move the data source disk --> destination disk
  2. Monitor the file from destination disk until it makes it to tape

 

Many times, a failure can arise in part 1. When this happens, FTS will clean-up the destination file. Very rarely, either the clean-up fails or the transfer fails in such a way that it never reaches the clean-up stage. When this happens, remnant files are left at the destination.

Sometimes, the transfer fails in the monitoring-to-tape (part 2). This may be because of an archiving error or archive timeout expired. When this happens, there is no mechanism in FTS to clean-up the destination file. In all such cases, the remnant file is left at the destination.

Problem

When transfers fail, CMS will retry with the same logical filename. When remnant files exist at the destination, we hit the `File exists and overwrite is not enabled` error message. This puts Rucio <--> FTS in a submission loop, with no progress forward.

Invariants

Note

In the past, the `Destination file report` feature was developed by FTS for Rucio. When the transfer fails with `File exists and overwrite is not enabled`, FTS will provide a JSON with the following destination file attributes:

{
     "filesize": <filesize>,  
     "checksum": <checksum>,
     "on_disk": <on_disk_boolean>,
     "on_tape": <on_tape_boolean>  
}

 

This feature needed the use of overwrite in a future retry. It also cannot cover the case when the file fails because of `archive timeout` expired.

Outcome

Two main problems identified:    

  1. Remnant files on the destination disk buffer
  2. Archiving backlog causing many problems

 

Proposed approach to tackle them step-by-step. The remnant files on the destination buffer can be solved by FTS introducing the --overwrite-disk option (overwrite replica only if solely on disk) and Rucio can retry transfer with --overwrite-disk enabled. This will remove this problem from the equation and will make room for the archiving backlog problems to show up.

The archiving backlog problem will be addressed separately, in a future discussion.

Follow-up discussions

Participants:      
Dima (CMS), Rahul (CMS), Julien (CTA), Mihai (FTS), Joao (FTS)

---

Dima: Why FTS needs --overwrite flag?

Mihai: By default, FTS won't delete existing destination files unless instructed to do so

Julien: LHCb does overwrite everywhere until transfer succeeds (disk and tape)

Rahul: Question on who cleans the "dark data" on tapes in case of overwrite

Julien: When a file is deleted on tape, it leaves a hole in the tape. However, the "space" is freed from the experiment quota. The "holes" in the tapes are recovered once the tapes are repacked. Tape repack is an internal procedure and experiments are not involved with site tape lifecycle management

Dima:       
For disk transfers, CMS would use --overwrite everywhere because there's no point to put effort investigating why a disk --> disk transfer failed.

For tape:      
- Fermilab does not want --overwrite flag (Why? Personal decision of Fermilab sysadmins)      
- CTA is ok to do --overwrite

Julien: Certain directories could even be protected against deletion. This approach should be reviewed separately and should not interfere with DM requirements

Julien had a look and not a single directory is protected against deletion on eosctacms instance

Files on tape should be compared with Rucio namespace regularly to spot anomalies. Even if some files are wrongly deleted, they are only deleted from the namespace. They can still be recovered for a quarantine period that should be agreed with the experiment

---

Dima: Storage sites should provide backpressure mechanism

Julien: *Will present a proposal for archive backpressure at the next CMS week*

If CMS dumps a few Petabytes to Tape sites, disk buffers might not be big enough to digest all the data before moving it to tape. Also, even if disk buffers are big enough, there may not be enough tape bandwidth to move the data to tape before archive_timeout expires

CMS needs to monitor the move to tape because CMS "does not simply trust" that all the files that were written to disk will be moved to tape

Dima: How much can CMS trust to not put "check on tape" flag?

Julien: CMS must continue to put "check on tape" flag. SLAs should be refined further with every site. For CERN CTA tape endpoint: a file that does not make it on tape after 24h should be overwritten

---

Mihai:

Suggests to always put --overwrite in Rucio retries.      

- Mihai proposes --overwrite-on-disk  
   - FTS will check if file is on tape before deleting      
       - If file is on tape: does not delete and writes it in the error message      
       - If file is on disk only: FTS will delete and start the copy again

Dima

Is on board with the idea.

- It will solve many disk-disk problems      
- It will also provide more meaningful error messages in Central Monitoring:      
   - Less File already exists error messages      
   - More Archive timeout expired error messages

Julien: Even if file is already on tape there's no problem because old copy is deleted from the namespace and therefore is not accounted on the CMS quota (see previous tape hole section)

Mihai: The --overwrite-on-disk option will go in the next FTS release

---

Julien: CTA needs archive-timeout to be small (i.e.: 24 hours). Because RAW data must be on tape quickly at T0: if a file does not make it on tape in 24h it can be overwritten