A. Sciaba, L. Duflot, J. Walder, F. Wuerthwein, O. Smirnova, J. Elmsheuser, X. Espinal, O. Gutsche, L. Sexton-Kennedy, P. Millar, R. Di Maria, Tigran, D. Weitzel, H. Severini, D. Lange, D. Smith
Presentation from Franck Wuerthwein:
- Extrapolation of number to HL-LGC on data :
- 0.5 PB of RAW written per year per experiment
- Processing 0.5 PB over 100 days would require processing 10 PB/day for ATLAS+CMS -> 1 Tb/s reading speed in US T1 (30-40% share) : THIS IS BIG CHALLENGE
- Example of optimisation : Minimise disk buffer in Tape@T1 (carousel model) and buffer at remote processing site (like HPC)
-> Co schedule the chain
o Opportunity coming to run tests benefitting from FABRIC project in the US (few 1 Tb/s line coming in next 4 years)
--> Proposal to organise US ATLAS+CMS+WLCG to run a challenge, over a single day, with goal to process a total of 10 PB of input data
+ L. Duflot : Very challenging numbers
o Current max performance of ATLAS caroussel is 50 TB/day -> Identify limiting points
o 1 job per 1 GB Raw file -> 10 million jobs per day
+ B. Jayatilaka : Confirm that woul be factor 25 compared to today
Franck wants to focus to consolidate all the numbers.
+ O. Sminorva : Extrapolation to how many Tape Drive is needed.
+ L. Duflot : Propose to integrate this exercise with Data Carousel team
+ Current limitation on TAPE recall -> Propose to factorise different steps for the moment : Example 10 PB input already on DISK.
Next steps (before summer vacation) :
+ Each experiment and WLCG should internally discuss if they want to be part of it.
+ Collect scalability issues from different components (FTS,Rucio,..)
+ Check if both experiments can currently process 10 PB per day on DISK with all Grid capacity (IO limitation). If not, run derivation with run faster.
+ Next DOMA Access meeting : Collect numbers in a single document and invite each contributing team to present their vision and issues nowdays with such challenge