A T3 non-grid end-user analysis model based on prior installed Grid Infrastructure.
Presented by Fabrizio FURANO
Track: Computing Technology for Physics Research
An unprecedented amount of data will soon come out of CERN’s Large Hadron Collider (LHC). Large user communities will immediately demand data access for physics analysis. Despite the Grid and the distributed infrastructure allowing geographically distributed data mining and analysis, there will be an important concentration of user analysis activities where the data resides, nullifying, to some extent, the grid paradigm itself. The LHCb (Large Hadron Collider beauty) experiment’s computing model envisages data distribution to be restricted to selected centers, known as Tier-1 centers. More general, due to the limited storage capability, none of the LHC experiment’s computing models envisages the distribution of all data across all sites. Driven by the need to avoid unnecessary over-usage of a few sites where data resides and also by the need to exploit storage facilities at non Tier-1sites (e.g Tier-2 and Tier-3 sites), this work proposes a model to copy, on demand, data from grid centers for local usage. This will allow tapping into storage facilities, otherwise unused. Once available on the site, the data is used by the institute’s scientific community to perform a local analysis. This can be done by using dedicated computing resources, accessible via special site batch system queues already in place via the site’s middleware installation. Through this solution, local physics communities will be in the position to define their own priorities by running on their own resources and, at the same time, the risk to have crowded batch queues on remote systems (e.g. the LSF at CERN) are minimized. Conscious of the need to keep a consistent interface for the end-user analysis in both the LHCb and ATLAS user communities, some work and studies have been done to integrate it and customize the Ganga interface to submit local non-grid jobs. Ganga allows, for the user, to transparently change on where to submit the job, either the local cluster or the Grid, without the need to change the job description. Finally, this paper presents a first working prototype, proof of concept of a new model for the end-user analysis meant to fully take advantage of the potential storage in the Grid and to allow local communities for an immediate end-user analysis over the LHC data. In particular it includes mechanisms for downloading and data storage, registering the data in local or remote catalogs and accessing the data via the, already in place, middleware installation.