Cache studies

Currently we have several Xcache instances in operation. Most of them are one-off configurations used for specific use cases. Load measurements on these are of interest, but we can learn only little about which setup would be needed to support a typical T2 without persistent storage. Therefore it is necessary to conduct systematic studies that reflect the usage patterns as we observe them with reference hardware setups that can be used to estimate realistically the capacity of the cache services needed when we transition to diskless T2s. 

The storage usage patterns are known from the data popularity logs kept by ATLAS and CMS. From these we can already by simulation of the data access activities get a good approximation of the required size of a site cache and the fraction between data from cache and the overall average bandwidth. 
For ATLAS the access pattern to the last step in the analysis chain, the ntuples, are available to the site only. While this is unfortunate it isn’t a fundamental problem and the impact of these can addressed in the same way as will be described for those data accesses for which logs are available. 
It is clear that the access patterns will evolve over time, therefore a system is needed that can test the impact of arbitrary patterns. 
By the study of logs alone we don’t learn what setup is needed to handle the given load since the cache has to carry out several additional operations that aren’t related to moving data. Therefore it is necessary to measure the performance of a cache with realistic loads.

 The system needed has to be made of three components: 
Load Generator (LG)
Cache layer   (C)
Data Source   (DS)

The Load Generator mimics the behaviour of the WNs on the site by reading data through the Cache from the Data Source. This is best done by using the data access logs as a program to steer the LGs. For the cache simulations these have already been extracted for individual sites. Since the LGs do nothing more than read the data with the average bandwidth of a given workload as many as the bandwidth of the physical WN permits can be run on a node. Time can be compressed, by adding more LGs and therefore advancing through the data more quickly.

The Cache layer (C)  is an instance of the caching software to be evaluated. This should be initially installed on a reference node type that corresponds to a typical storage node. 

The Data Source is a modified Storage Element. When a file is opened and read the DS sends an arbitrary pattern of data as a response. This allows with very few nodes, without any storage to emulate the data delivery capability of a large storage system without storing the data. The integrated bandwidth of the DSs has to exceed the bandwidth of the Cache. 


The load on the Cache node will be monitored, either by prmon or by other suitable tools.

To measure the capability of the Cache the active LGs will be increased until one of the resources used by the Cache is saturated. 
Impact of the number of discs and type of discs as well as different network connectivity can be explored. Also different versions and cache implementations can compared in a quantitative way.