DOMA / ACCESS Meeting

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Frank Wuerthwein (Univ. of California San Diego (US)), Ilija Vukotic (University of Chicago (US)), Markus Schulz (CERN), Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR)), Xavier Espinal (CERN)

People on Vidyo: Bo Jayatilaka, Daniele Spiga, David Smith, Diego Ciangottini, Diego Davila, Elizabeth Sexton-Kennedy, Eric Fede, Frank Wuerthwein, Ilija Vukotic, James William Walder, Johannes Elmsheuser, Laurent Duflot, Markus Schulz, Nikola Hardi, Nikolai Marcel Hartmann, Oxana Smirnova, Horst Severini, Paul Millar, Riccardo Di Maria, Stephane Jezequel, Xavier Espinal

 

 

Ilija Vukotic (University of Chicago (US)) - ATLAS Update on Virtual Placement Tests:

  • caches are in general useful to: reduce WAN traffic, reduce latency/increase cpu efficiency, and cost less to run
  • issues: work only if files are accessed multiple times, current job scheduling of jobs “to where the data is” does not work, and multiple protocols are still used to move data
  • several ways to deploy caches: wn-local-disk (pcache), small (cpu) site w/o pledged storage far from ST site, and large site/hpc w/o pledged storage
  • Virtual Placement in summary:
    • during RUCIO registration, every dataset gets assigned to N sites in the same region;
    • the assignment is done randomly where the probability of each site to get the dataset is proportional to the fraction of CPUs that the site contributes to ATLAS;
    • datasets are not actually copied at any of these N sites but only exist in the “lake”;
    • panda would assign job that needs as input this dataset to the first site from these 3; in case the site is in outage it would get assigned to the second site from the list; once the job is there it would access the data through the cache.
  • As a result: much higher cache-hit rate.
  • VP service: an engine doing assignments, REDIS DB to memorize Virtual Placements, and REST API to configure access placements
  • currently external to RUCIO, but if useful should become part of RUCIO
  • IRL Tests: configuration in slides
  • issues handled: bad origins, and sites w/o xroot as a primary protocol for WAN reads
  • VP service instrumented to report all requests (3.5 Hz) and replies to ES@UChicago
  • RUCIO traces are also collected
  • more on scheduling at slide 10
  • XCache reports: 71.4% cache hit probability, 65.5% data delivered in following accesses, 59.6% data delivered from xcache disk
  • MWT2 - rate and sparseness: average ~170k files in cache; ~72% fill factor of the files in cache; and part of the jobs do copy2scratch
  • Caches comparison at slide 14
  • LRZ-LMU xcache:
    • big site with well-supported xcache that worked in direct mode until 7 days ago with ~30% cache hit rate (files & data);
    • now in VP mode: 50% file cache hit rate and 75% data delivered from cache.
  • Future:
    • XCache developments: update to support CRC once ready; fix for ROOT TChain:Add
    • origin fixes - constant load
    • VP: studying on load changing on the cache; a large site served only via xcache; deploy in front of an HPC
    • far future: multi node xcache support; moving VPservice into RUCIO; adaptive caching instead of LRU currently used
  • Adaptive caching: a gain 5% by changing caching model, that would reduce WAN traffic by 25%
  • Proposal for using reinforcement learning (more on slides)
  • Plan:
    • get data (in ES);
    • pre-process data;
    • create (OpenAI) environments  - discrete action and continuous action;
    • train different actors (deep-Q network (DQN) or Dueling DQN for discrete action and Actor-Critic model for continuous action);
    • comparison with LRU.
       

 

DOMA ACCESS White Paper and HL-LHC review document preparations

  • on-going and being finalised
  • the appendix is in a different document
  • draft is public for comments
There are minutes attached to this event. Show them.
    • 17:30 17:35
      Introduction 5m
      Speakers: Frank Wuerthwein (UCSD), Frank Wuerthwein (Univ. of California San Diego (US)), Ilija Vukotic (University of Chicago (US)), Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR))
    • 17:35 17:55
      ATLAS update on Virtual Placement tests 20m
      Speaker: Ilija Vukotic (University of Chicago (US))
    • 18:25 18:35
      AOB 10m