21–25 Aug 2017
University of Washington, Seattle
US/Pacific timezone

Opportunistic data locality for HEP analysis workflows

22 Aug 2017, 16:00
45m
The Commons (Alder Hall)

The Commons

Alder Hall

Poster Track 1: Computing Technology for Physics Research Poster Session

Speaker

Christoph Heidecker (KIT - Karlsruhe Institute of Technology (DE))

Description

The heavily increasing amount of data delivered by current experiments in high energy physics challenge both end users and providers of computing resources. The boosted data rates and the complexity of analyses require huge datasets being processed. Here, short turnaround cycles are absolutely required for an efficient processing rate of analyses. This puts new limits to the provisioning of resources and infrastructure since already existing approaches are difficult to adapt to HEP requirements and workflows.
The CMS group at the KIT has developed a prototype enabling data locality for HEP analysis processing via coordinated caches. This concept successfully solves key issues of data analyses for HEP:

  • Caching reduces the limiting factor of data transfers by joining
    local high performance devices with large background storages.
  • Throughput optimization is reached by selecting and allocating
    critical data within user workflows
  • Transparent integration into the
    batch system including the usage of container technology solves
    compatibility issues

Since this prototype has sped up user analyses by several factors, but is limited in scope, our focus is to extend this setup to serve a wider range of analyses and a larger amount of resources. Since it is a static setup under own control of hard- and software, new developments focus not only on extending the setup, but also make it flexible for volatile resources like cloud computing. Usually, data storages and computing farms are deployed by different providers, which leads to data delocalization and a strong influence of the interconnection transfer rates. Here, a caching solution combines both systems into a highly performant setup and enables fast processing of throughput dependent analysis workflows.

Primary authors

Günter Quast (KIT - Karlsruhe Institute of Technology (DE)) Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE)) Max Fischer (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)) Matthias Jochen Schnepf (KIT - Karlsruhe Institute of Technology (DE)) Christoph Heidecker (KIT - Karlsruhe Institute of Technology (DE))

Presentation materials

Peer reviewing

Paper