Speaker
Marco Mambelli
(University of Chicago)
Description
A Data Skimming Service (DSS) is a site-level service for rapid event filtering and
selection from locally resident datasets based on metadata queries to associated
"tag" databases. In US ATLAS, we expect most if not all of the AOD-based datasets to
be be replicated to each of the five Tier 2 regional facilities in the US Tier 1
"cloud" coordinated by Brookhaven National Laboratory. Entire datasets will consist
of on the order of several terabytes of data, and providing easy, quick access to
skimmed subsets of these data will be vital to physics working groups. Typically,
physicists will be interested in portions of the complete datasets, selected
according to event-level attributes (number of jets, missing E_t, etc) and content
(specific analysis objects for subsequent processing).
In this paper we describe methods used to classify data (metadata tag generation) and
to store these results in a local database. Next we discuss a general framework
which includes methods for accessing this information, defining skims, specifying
event output content, accessing locally available storage through a variety of
interfaces (SRM, dCache/dccp, gridftp), accessing remote storage elements as
specified, and user job submission tools through local or grid schedulers.
The advantages of the DSS are the ability to quickly "browse" datasets and design
skims, for example, pre-adjusting cuts to get to a desired skim level with minimal
use of compute resources, and to encode these analysis operations in a database for
re-analysis and archival purposes. Additionally the framework has provisions to
operate autonomously in the event that external, central resources are not available,
and to provide, as a reduced package, a minimal skimming service tailored to the
needs of small Tier 3 centers or individual users.
Primary author
Marco Mambelli
(University of Chicago)
Co-authors
David Malon
(Argonne National Laboratory)
Jack Cranshaw
(Argonne National Laboratory)
Jerry Gieraltowsky
(Argonne National Laboratory)
May Edward
(Argonne National Laboratory)
Robert Gardner
(UNIVERSITY OF CHICAGO)