DOMA / ACCESS Meeting

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Frank Wuerthwein (Univ. of California San Diego (US)), Ilija Vukotic (University of Chicago (US)), Markus Schulz (CERN), Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR)), Xavier Espinal (CERN)

o XCache testing at FZK_LMU (Nikolai Hartmann)

- Testbed built with 'old' disk server (2012). Still very encouraging results. Some other configurations 
suggested by Ilija will be tested in the coming month(s)

-Data access performances are in line with previous measurements at CERN and providing again hints of the usefulness of cache usage to provide efficient latency hiding 

- Next step is also to understand the non negligible failure rate

- CERN (David Smith) will setup a similar testbed making use of a controlled environment with simulation capabilities (latency, bandwidth, etc.) 

o Virtual Placement and scheduling with caches (I. Vikotic)

- Very nice model and results based on simulation. Ilija would like to have cross-check from other person

- Ilija promised to include a 'Sequence Diagram' to explain the algorithm

o Distributed dCache project (P. Millar) 

- Include cache layer within dcache infrastructure

- Testbed being built with DESY and Kurchatov

There are minutes attached to this event. Show them.
    • 17:30 17:35
      Introduction 5m
      Speakers: Frank Wuerthwein (UCSD), Frank Wuerthwein (Univ. of California San Diego (US)), Ilija Vukotic (University of Chicago (US)), Stephane Jezequel (LAPP-Annecy CNRS/USMB (FR))
      HOW 2019 (JLAB) presentations:
       
      - DOMA ACCESS working group presentation (30’) Wed  21/03 15:00 
      1. DOMA ACCESS working group scope and mandate 
      2. Briefly mention the topics we covered during the many talks we had (experiments, storage, xcache tests, etc…)
      3. Strawman summary 
      4. Data Formats
      5. Recent results from xcache deployments and studies performed
      6. Future steps 
      - Data Provisioning for non Grid Resources (HPCs, Clouds) (15’) Tue 20/03 16:00
       
      • ATLAS and CMS common presentation to see the experiments' vision and experiences
    • 17:35 17:50
      XCache testing at FZK_LMU 15m
      Speaker: Nikolai Hartmann (Ludwig Maximilians Universitat (DE))
    • 17:50 18:10
      Virtual Placement and scheduling with caches 20m
      Speaker: Ilija Vukotic (University of Chicago (US))
    • 18:10 18:25
      distributed dCache project 15m
      Speaker: Paul Millar
    • 18:25 18:30
      AOB 5m

      Cache studies

      Currently we have several Xcache instances in operation. Most of them are one-off configurations used for specific use cases. Load measurements on these are of interest, but we can learn only little about which setup would be needed to support a typical T2 without persistent storage. Therefore it is necessary to conduct systematic studies that reflect the usage patterns as we observe them with reference hardware setups that can be used to estimate realistically the capacity of the cache services needed when we transition to diskless T2s. 

      The storage usage patterns are known from the data popularity logs kept by ATLAS and CMS. From these we can already by simulation of the data access activities get a good approximation of the required size of a site cache and the fraction between data from cache and the overall average bandwidth. 
      For ATLAS the access pattern to the last step in the analysis chain, the ntuples, are available to the site only. While this is unfortunate it isn’t a fundamental problem and the impact of these can addressed in the same way as will be described for those data accesses for which logs are available. 
      It is clear that the access patterns will evolve over time, therefore a system is needed that can test the impact of arbitrary patterns. 
      By the study of logs alone we don’t learn what setup is needed to handle the given load since the cache has to carry out several additional operations that aren’t related to moving data. Therefore it is necessary to measure the performance of a cache with realistic loads.

       The system needed has to be made of three components: 
      Load Generator (LG)
      Cache layer   (C)
      Data Source   (DS)

      The Load Generator mimics the behaviour of the WNs on the site by reading data through the Cache from the Data Source. This is best done by using the data access logs as a program to steer the LGs. For the cache simulations these have already been extracted for individual sites. Since the LGs do nothing more than read the data with the average bandwidth of a given workload as many as the bandwidth of the physical WN permits can be run on a node. Time can be compressed, by adding more LGs and therefore advancing through the data more quickly.

      The Cache layer (C)  is an instance of the caching software to be evaluated. This should be initially installed on a reference node type that corresponds to a typical storage node. 

      The Data Source is a modified Storage Element. When a file is opened and read the DS sends an arbitrary pattern of data as a response. This allows with very few nodes, without any storage to emulate the data delivery capability of a large storage system without storing the data. The integrated bandwidth of the DSs has to exceed the bandwidth of the Cache. 


      The load on the Cache node will be monitored, either by prmon or by other suitable tools.

      To measure the capability of the Cache the active LGs will be increased until one of the resources used by the Cache is saturated. 
      Impact of the number of discs and type of discs as well as different network connectivity can be explored. Also different versions and cache implementations can compared in a quantitative way.