DOMA / ACCESS Meeting

Name: DOMA / ACCESS Meeting
Start: 2020-06-30T17:30:00+02:00
End: 2020-06-30T18:50:00+02:00
Location: CERN

Tuesday 30 Jun 2020, 17:30 → 18:50 Europe/Zurich

513/1-024 (CERN)

513/1-024

CERN

Show room on map

Frank Wuerthwein (Univ. of California San Diego (US)), Ilija Vukotic (University of Chicago (US)), Markus Schulz (CERN), Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR)), Xavier Espinal (CERN)

Hide

People on Vidyo: Andreas Petzold, Daniele Spiga, David Smith, Diego Ciangottini, Elizabeth Sexton-Kennedy, Eric Fede, Frank Wuerthwein, Gonzalo Merino, Johannes Elmsheuser, Kaushik De, Laurent Duflot, Markus Schulz, Oxana Smirnova, Horst Severini, Riccardo Di Maria, Sabine Chrystel Crepe, Stephane Jezequel, Xavier Espinal

* Frank Wuerthwein (Univ. of California San Diego (US)) - Proposed New Scope of DOMA Access

- discussed at https://indico.cern.ch/event/932079/

- reported here again; please have a look at the slides

* Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR)) - Discussion on proto datalake

- follow up of document presented by Xavier (https://docs.google.com/document/d/1ZzyycM6Sli6cFQelF3VfEs9OEDbnEbaSmIOpaZ5fOgY)

- this proposition was not presented and discussed within experiments yet so is not endorsed by any.

- request from last meeting to have more realistic timescale

- important to quantify the gain from proposed datalake organisation

- today presentation objective is to make the first step is to converge on proposition from DOMA ACCESS community on concrete tests until Run-3 restart (early 2022)

- maintain or reduce manpower necessary to maintain Grid storage infrastructure at HL-LHC scale including data transfers

- datalake notion: storage infrastructure providing full replica of input analysis data; production and analysis workflow should deal with datalake organisation

- need to find a solution for isolated sites (Chile, Australia, Eastern asia)

- this proposal assumes that RUCIO already deployed everywhere for everything and availability of HammerCloud infrastructure for performance measurements

- this proposal fits perfectly with the activity being carried out in ESCAPE project

- if testbed deployed in DOMA, it is necessary to define manpower, maximise reuse of existing tools and monitoring, minimise interference with production activity

- is it possible to complete exercises before LHC data-taking restart in early 2022 ?

STEP 1

- 3 or more sites providing storage in EU and US needed (already propositions from ES, IT, FR and DE)

- use only xrootd/http protocols for disk

-> this is already a call for sites to contribute

Timescale (rely on availability of Rucio experts) :

- Build list of volunteering sites during summer 2020

2020-2021 : Can only be a small set of sites (10-20 ?)

- Build datalake during autumn 2020

Integrating new sites on the fly over 2020 since temporary data

STEP 1b

- use RUCIO production instance of DOMA (or ATLAS/CMS)

- asynchronous transfers relying on DOMA-TPC FTS infra

- monitoring datalake + functional tests: pre-prototype existing for ESCAPE storage infrastructure similar to WLCG for FTS/perfsonar/..

STEP 1c

- 3 or more sites/federations with state-less storage (caching)

- 3 or more sites/federations accessing data remotely (storage-less sites)

- calibrated jobs (analysis and production ?) triggered by HC infrastructure of experiment -> matrix of success to be defined

- demonstrate feasibility to plug in external resources (HPC, cloud)

Timescale (ATLAS, CMS) : Late 2020-Spring 2021

- Start first HC tests accessing datalake in late 2020

- Measure performances for calibrated jobs

- HPC/Commercial cloud : Not later than mid 2021

STEP 2

- injection of “primary” data to the lake

- run the required processing to produce analysis-like datasets; exercise file workflow after the data is produced, data is moved to a different QoS

- distribution of the analysis-like datasets to the processing sites caches

Timescale : Spring-Summer 2021

STEP 3

- based on the defined workflows, collect and compare metrics and see how they compare with the current infrastructure: overall time for completion, storage used, bandwidth used, CPU efficiency, operation burden (sites and experiment)

Timescale : Autumn 2021 → Summer 2022

Opened questions -> please see slides

Conclusion

- feedback to be collected from today’s presentation needed

- translate ideas in document

Discussion

- experiments are stressed and not straightforward to have manpower from them; how much work is necessary to prototype this proposal?

- this should not be postponed, but fear to last more than 1.5 year; feeling that the prototype is more a pre-production service

- US-T1s could enter the discussion as well since similar argument already been discussed internally

- shared namespaces for ATLAS and CMS when same caches and origins are used is a non-trivial argument

- structural differences should be understood between ATLAS and CMS

- analyse better manpower and expertise needed for this project is necessary to understand if feasible or not

- RUCIO multi-VOs should be the priority

- HC to be less experiment dependent would be a plus

There are minutes attached to this event. Show them.

- 17:30 → 17:35
  
  Introduction 5m
  
  DOMA-General
- 17:35 → 18:15
  
  Discussion on proto datalake 40m
  
  Speaker: Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR))
  
  Datalake Challenge Update