SRE Data Durability

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Maria Arsuaga Rios (CERN)
Description

Follow up of EOS SRE data durability.

- Understand the issue of the case checksum 0 with filesize !=0 found in restic.

- Repair plain layout for versioning files (skipped in this moment).

- Think how to automatize better the cases as all rep corrupted to help to accelerate the draining with these kind of leftovers. When dropping the replicas the entry in the namespace is removed (confirmed test from Cristi).

- Change cronjob timing from 20:00 to 15:00 for automatic reparation in failed drain filesystems.

- However keep the three objectives as the top priorities (last slide)

  • fsck tests pps -> backup -> cms
There are minutes attached to this event. Show them.