Speaker
Description
Undoubtedly, the processing of ALICE experiment data relies on the quality and integrity of data. Currently, ALICE uses a distributed file crawler that periodically evaluates samples of files from each storage element in order to gather statistics about the number of corrupted or inaccessible files. The main issue with this solution lies in its inability to provide a comprehensive overview of a storage element status, as the analysis results are based on examining a random selection of files. This presentation will describe a new solution for the ALICE File Consistency Check System. The new approach will overcome the limitations of the file crawler by using the powerful consistency checking tools provided by EOS. The idea behind this project is to collect all the existing errors on an EOS instance from the reports generated by the FSCK command with the goal of reconciling the contents of the local storage with the central catalogue and, where possible, recover the lost content from other replicas. The output of the FSCK report command will be accessed through the new HTTP interface available in the latest versions of EOS.
Hence, this solution not only produces a more accurate integrity analysis but automates the recovery of data loss as well.