Computing in High Energy and Nuclear Physics (CHEP) 2012

Name: Computing in High Energy and Nuclear Physics (CHEP) 2012
Start: 2012-05-21T06:00:00-04:00
End: 2012-05-25T18:00:00-04:00
Location: New York City, NY, USA

21–25 May 2012

New York City, NY, USA

US/Eastern timezone

Support

chep2012@bnl.gov

ATLAS Data Caching based on the Probability of Data Popularity

22 May 2012, 13:30

4h 45m

Rosenthal Pavilion (10th floor) (Kimmel Center)

Rosenthal Pavilion (10th floor)

Kimmel Center

Poster Distributed Processing and Analysis on Grids and Clouds (track 3) Poster Session

Mikhail Titov (University of Texas at Arlington (US))

Efficient distribution of physics data over ATLAS grid sites is one of the most important tasks for user data processing. ATLAS' initial static data distribution model over-replicated some unpopular data and under-replicated popular data, creating heavy disk space loads while under-utilizing some processing resources due to low data availability. Thus, a new data distribution mechanism was implemented, PD2P (PanDA Dynamic Data Placement) within the production and distributed analysis system PanDA that dynamically reacts to user data needs [1], basing dataset distribution principally on user demand. Data deletion is also demand driven, reducing replica counts for unpopular data [2]. This dynamic model has led to substantial improvements in efficient utilization of storage and processing resources. Based on this experience, in this work we seek to further improve data placement policy by investigating in detail how data popularity is calculated. For this it is necessary to precisely define what data popularity means, what types of data popularity exist, how it can be measured, and most importantly, how the history of the data can help to predict the popularity of derived data. We introduce locality of the popularity: a dataset may be only of local interest to a subset of clouds/sites or may have a wide (global) interest. We also extend the idea of the “data temperature scale” model [3] and a popularity measure. Using the ATLAS data replication history, we devise data distribution algorithms based on popularity measures and past history. Based on this work we will describe how to explicitly identify why and how datasets become popular and how such information can be used to predict future popularity. [1] Kaushik De, Tadashi Maeno, Torre Wenaus, Alexei Klimentov, Rodney Walker, Graeme Stewart, “PD2P – PanDA Dynamic Data Placement”, ATLAS Notes, CERN [2] Angelos Molfetas, Fernando Barreiro Megino, Andrii Tykhonov, Vincent Garonne, Simone Campana, Mario Lassnig, Martin Barisits, Gancho Dimitrov, Florbela Tique Aires Viegas, “Popularity framework to process dataset tracers and its application on dynamic replica reduction in the ATLAS experiment”, CHEP, Taipei, Taiwan, October 18-22, 2010 [3] Alexei Klimentov, “ATLAS data over Grid (data replication, placement and deletion policy)”, ATLAS Notes, CERN, March 17, 2009

Student? Enter 'yes'. See http://goo.gl/MVv53	yes

Dr Alexei Klimentov (Brookhaven National Laboratory (US)) Dr Gergely Zaruba (University of Texas at Arlington (US)) Dr Kaushik De (University of Texas at Arlington (US)) Mikhail Titov (University of Texas at Arlington (US))

Poster

mtitov-poster-chep2012.pdf

Computing in High Energy and Nuclear Physics (CHEP) 2012

Support

ATLAS Data Caching based on the Probability of Data Popularity

Rosenthal Pavilion (10th floor)

Kimmel Center

Speaker

Description

Primary authors

Presentation materials

Choose timezone

Computing in High Energy and Nuclear Physics (CHEP) 2012

Support

Speaker

Description

Primary authors

Presentation materials

Share this page

Direct link

Social networks

Calendaring