CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Analyzing how we do Analysis and consume data, Results from the SciDAC-Data Project.

10 Oct 2016, 15:45

15m

Sierra C (San Francisco Mariott Marquis)

Sierra C

San Francisco Mariott Marquis

Oral Track 7: Middleware, Monitoring and Accounting Track 7: Middleware, Monitoring and Accounting

One of the principle goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics “datasets” that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information regarding these for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used.

We present the first results of this project, which examine in detail how the CDF, DØ and NO𝜈A experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include the analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities.

In particular we present detailed analysis of the NO𝜈A data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.

Primary Keyword (Mandatory)	Data model
Secondary Keyword (Optional)	Computing facilities
Tertiary Keyword (Optional)	Data processing workflows and frameworks/pipelines

Dr Andrew Norman (Fermilab)

Dr Adam Lyon (Fermilab) Dr Aristeidis Tsaris (Fermilab) Dr Leonidas Aliaga Soplin (Fermilab) Dr Misbah Mubarak (Argonne) Dr Pengfei Ding (Fermilab) Dr Robert Ross (Argonne)

scidac_data.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Analyzing how we do Analysis and consume data, Results from the SciDAC-Data Project.

Sierra C

San Francisco Mariott Marquis

Description

Author

Co-authors

Presentation materials

Choose timezone

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Description

Author

Co-authors

Presentation materials