DOMA / ACCESS Meeting

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Frank Wuerthwein (Univ. of California San Diego (US)), Ilija Vukotic (University of Chicago (US)), Markus Schulz (CERN), Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR)), Xavier Espinal (CERN)

People on Vidyo: Alessandra Forti, Amitoj Singh, Andrea Sciaba, David Lange, Edgar Fajardo Hernandez, Frank Wuerthwein, Horst Severini, Ilija Vukotic, Johannes Elmsheuser, Justas Balcas, Kaushik De, Laurent Duflot, Markus Schulz, Nikola Hardi, Riccardo Di Maria, Stephan Lammel, Stephane Jezequel, Xavier Espinal

 

* Xavier Espinal (CERN) and Frank Wuerthwein (Univ. of California San Diego (US)) - News

- first meeting of the season: presentation of the plan of the near term

- storage workshop in late November in parallel with WLCG-HSF meeting

- discussions in DOMA-ACCESS to be seen in view of this workshop

-- today, DataLake (DL) prototyping

-- in 2 weeks, archival bandwidth for T1s to have raw numbers to think about

--- this will trigger a discussion during Nov workshop 

-- next meetings, presentations on the impact of the DL model on the total cost of ownership -  important to hear from sites

 

* Edgar Fajardo Hernandez (Univ. of California San Diego (US)) - US CMS Data Lake Prototype Proposal

- please see slides - here just a summary

- proposal for a US CMS DL prototype, following the WLCG DL definition

- Rucio and FTS used by default to manage data transfer between lakes

- single entry point to the DL

- data access from processing resources performed via streaming from either caches or the lake origin

- injection: offer a single https interface that Rucio can command

- tracking: Rucio

- GeoIP to choose the closest cache

- data tiers NANOAOD/SIM

- DL deployed with XRootS doors and caches via k8s

- Auth via scitokens

- write to the DL

-- 3 different storage system (jbod)

-- xrootd tpc from site/lake to DL

-- no data movement inside the DL

- read from DL

-- all NANOAOD/SIM in the DL

-- data read through caches

-- AAA: no plan yet

-- enable redirection to CMS Data federation???

- differences from today: "each T2 and T1 is its own data lake" vs. "US T1/T2 infrastructure as one lake"

- advantages: no replication within US; all access relevant replication is via cache; internal optimisation not transparent to the outside

- proposal to ignore tape archival for this prototype

- however, there are 2 extreme ways of fitting tape in

-- each QoS, e.g. tape archive including its buffer space, is its own DL

-- the entry point supports multiple QoS

- hardware requirements: ideally 3 caches; 2 or more sites as origins

- benchmarking goals: exercise deletions and measure missed deletions; exercise data input and data removal via FTS; exercise NANOAOD application access 

- proposed timeline at slide 15 (January 2021 - September 2021)

- concerns 

-- consistency (dirty deletes)

-- sociological aspect for user transition

-- no data movement inside the lake

-- metrics of success are fuzzy

- consistency problem -> XRootD or Rucio centric solution: XRootD is preferred

- assuming proto successful, merge Caltech and UCSD CMS namespaces

- transform the lessons learned from the DL prototype into a production DL for Run 3

- user area still a problem

- Q&A

-- how to monitor if the system is performant?

job success rate or efficiency; to think about the term of comparison

-- regarding workflow management system?

--- not yet finalised job submission and monitoring

--- NANOAOD does not need CMSSW/CRAB

--- global pool in US will be used (local batch system but grid) -> direct control of ENV variables 

- estimate of storage-saving?

the answer will come in October presentation

- cache known by Rucio?

xrootd preferred solution since caches are hidden by Rucio

- official CMS transition plan to Rucio - timescale?

NANOAOD is complete

 

There are minutes attached to this event. Show them.