Indico celebrates its 20th anniversary! Check our blog post for more information!

13–17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

A skimming procedure to handle large datasets at CDF

13 Feb 2006, 16:20
20m
D406 (Tata Institute of Fundamental Research)

D406

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
oral presentation Distributed Data Analysis Distributed Data Analysis

Speakers

Dr Donatella Lucchesi (INFN Padova)Dr Francesco Delli Paoli (INFN Padova)

Description

The CDF experiment has a new trigger which selects events depending on the significance of the track impact parameters. With this trigger a sample of events enriched of b and c mesons has been selected and it is used for several important physics analysis like the Bs mixing. The size of the dataset is of about 20 TBytes corresponding to an integrated luminosity of 1 fb-1 collected by CDF. CDF has developed a skimming procedure to reduce the dataset by selecting events which contain only B mesons in specifics decay modes. The rejected events are almost background, and this guarantees that no signal is lost while the processing time is reduced by factor 10. This procedure is based on SAM (Sequential Access via Metadata), the CDF data handling system. Each file from the original dataset is read via SAM and processed on the CDF users farm at Fermilab. The outputs are stored and cataloged via SAM on a temporary disk location at Fermilab in order to be finally concatenated. This final step consists of copy and then store and catalog the output in Italy on disks hosted at Tier 1, permanently. These skimmed data are available in Italy for the CDF collaboration, and user can access them via the Italian CDF farm. We will describe the procedure to skim data, concatenate the output and the method used to control that each input file is processed once and only once. The tool to copy data from the users farm to temporary and permanent disk locations, developed by CDF,consists of users authentication plus a transfer layer. Users allowed to perform the copy are mapped in a gridmap file and authenticated with a Globus Security Infrastructure (GSI). Details on the tool performances and the use and the definition of a remote permanent disk location will be described in detail.

Primary author

Dr Donatella Lucchesi (INFN Padova)

Co-authors

Dr Armando Fella (INFN Pisa) Dr Francesco Delli Paoli (INFN Padova) Dr Massimo Casarsa (INFN Trieste) Dr Saverio Da Ronco (INFN Padova)

Presentation materials