People on Vidyo: Andreas Petzold, Daniele Spiga, David Smith, Diego Ciangottini, Elizabeth Sexton-Kennedy, Eric Fede, Frank Wuerthwein, Gonzalo Merino, Johannes Elmsheuser, Kaushik De, Laurent Duflot, Markus Schulz, Oxana Smirnova, Horst Severini, Riccardo Di Maria, Sabine Chrystel Crepe, Stephane Jezequel, Xavier Espinal
* Frank Wuerthwein (Univ. of California San Diego (US)) - Proposed New Scope of DOMA Access
- discussed at
- reported here again; please have a look at the slides
* Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR)) - Discussion on proto datalake
- follow up of document presented by Xavier (
- this proposition was not presented and discussed within experiments yet so is not endorsed by any.
- request from last meeting to have more realistic timescale
- important to quantify the gain from proposed datalake organisation
- today presentation objective is to make the first step is to converge on proposition from DOMA ACCESS community on concrete tests until Run-3 restart (early 2022)
- maintain or reduce manpower necessary to maintain Grid storage infrastructure at HL-LHC scale including data transfers
- datalake notion: storage infrastructure providing full replica of input analysis data; production and analysis workflow should deal with datalake organisation
- need to find a solution for isolated sites (Chile, Australia, Eastern asia)
- this proposal assumes that RUCIO already deployed everywhere for everything and availability of HammerCloud infrastructure for performance measurements
- this proposal fits perfectly with the activity being carried out in ESCAPE project
- if testbed deployed in DOMA, it is necessary to define manpower, maximise reuse of existing tools and monitoring, minimise interference with production activity
- is it possible to complete exercises before LHC data-taking restart in early 2022 ?
- 3 or more sites providing storage in EU and US needed (already propositions from ES, IT, FR and DE)
- use only xrootd/http protocols for disk
-> this is already a call for sites to contribute
Timescale (rely on availability of Rucio experts) :
- Build list of volunteering sites during summer 2020
2020-2021 : Can only be a small set of sites (10-20 ?)
- Build datalake during autumn 2020
Integrating new sites on the fly over 2020 since temporary data
- use RUCIO production instance of DOMA (or ATLAS/CMS)
- asynchronous transfers relying on DOMA-TPC FTS infra
- monitoring datalake + functional tests: pre-prototype existing for ESCAPE storage infrastructure similar to WLCG for FTS/perfsonar/..
- 3 or more sites/federations with state-less storage (caching)
- 3 or more sites/federations accessing data remotely (storage-less sites)
- calibrated jobs (analysis and production ?) triggered by HC infrastructure of experiment -> matrix of success to be defined
- demonstrate feasibility to plug in external resources (HPC, cloud)
Timescale (ATLAS, CMS) : Late 2020-Spring 2021
- Start first HC tests accessing datalake in late 2020
- Measure performances for calibrated jobs
- HPC/Commercial cloud : Not later than mid 2021
- injection of “primary” data to the lake
- run the required processing to produce analysis-like datasets; exercise file workflow after the data is produced, data is moved to a different QoS
- distribution of the analysis-like datasets to the processing sites caches
Timescale : Spring-Summer 2021
- based on the defined workflows, collect and compare metrics and see how they compare with the current infrastructure: overall time for completion, storage used, bandwidth used, CPU efficiency, operation burden (sites and experiment)
Timescale : Autumn 2021 → Summer 2022
Opened questions -> please see slides
- feedback to be collected from today’s presentation needed
- translate ideas in document
- experiments are stressed and not straightforward to have manpower from them; how much work is necessary to prototype this proposal?
- this should not be postponed, but fear to last more than 1.5 year; feeling that the prototype is more a pre-production service
- US-T1s could enter the discussion as well since similar argument already been discussed internally
- shared namespaces for ATLAS and CMS when same caches and origins are used is a non-trivial argument
- structural differences should be understood between ATLAS and CMS
- analyse better manpower and expertise needed for this project is necessary to understand if feasible or not
- RUCIO multi-VOs should be the priority
- HC to be less experiment dependent would be a plus