Indico celebrates its 20th anniversary! Check our blog post for more information!

6–8 Jun 2016
Europe/London timezone

CVMFS for Data Federations

7 Jun 2016, 14:20
20m

Speaker

Brian Paul Bockelman (University of Nebraska (US))

Description

A data federation is a cooperating set of storage resources transparently accessible across a wide area network via a common namespace. These are often implemented through a redirector hierarchy - clients query a centralized endpoint for a given file; this redirector locates an available storage resource, then redirects the client to the remote resource.

Data federations are an increasingly used as a way to distribute large-volumes of physics data. For example, the Compact Muon Solenoid (CMS) experiment has approximately 20PB of analysis data available through it's "Any Data, Any Time, Anywhere" (AAA) federation.

However, the namespace of AAA is extremely limited - it is equivalent to just a HTTP GET. There are no directory listings, authoritative size or checksum information - despite the fact this information is known to CMS and available in the underlying storage systems and across several services; it is user-hostile for data discovery.

In this presentation, we will discuss a series of improvements made to the CVMFS core to marry a user-friendly, CVMFS-based POSIX namespace with data federation. We will demonstrate a set of CVMFS repositories of increasing complexity that utilize these new CVMFS features. These repositories serve as frontends for data federations for OSG, LIGO, and CMS.

Finally, we will discuss plans to grow this work - in terms of scale (data volume), efficiency, and features used in production.

Summary

An effort to utilize CVMFS's scalable namespace features to provide a POSIX interface for data federations.

Primary author

Brian Paul Bockelman (University of Nebraska (US))

Co-authors

Dave Dykstra (Fermi National Accelerator Lab. (US)) Derek John Weitzel (University of Nebraska (US))

Presentation materials