The heavily increasing amount of data delivered by current experiments in high energy physics challenge both end users and providers of computing resources. The boosted data rates and the complexity of analyses require huge datasets being processed. Here, short turnaround cycles are absolutely required for an efficient processing rate of analyses. This puts new limits to the provisioning of resources and infrastructure since already existing approaches are difficult to adapt to HEP requirements and workflows.
The CMS group at the KIT has developed a prototype enabling data locality for HEP analysis processing via coordinated caches. This concept successfully solves key issues of data analyses for HEP:
- Caching reduces the limiting factor of data transfers by joining
local high performance devices with large background storages.
- Throughput optimization is reached by selecting and allocating
critical data within user workflows
- Transparent integration into the
batch system including the usage of container technology solves
Since this prototype has sped up user analyses by several factors, but is limited in scope, our focus is to extend this setup to serve a wider range of analyses and a larger amount of resources. Since it is a static setup under own control of hard- and software, new developments focus not only on extending the setup, but also make it flexible for volatile resources like cloud computing. Usually, data storages and computing farms are deployed by different providers, which leads to data delocalization and a strong influence of the interconnection transfer rates. Here, a caching solution combines both systems into a highly performant setup and enables fast processing of throughput dependent analysis workflows.