Advances in Distributed High Throughput Computing for the Fabric for Frontier Experiments Project at Fermilab

16 Apr 2015, 11:30
15m
B250 (B250)

B250

B250

oral presentation Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing Track 4 Session

Speaker

Parag Mhashilkar (Fermi National Accelerator Laboratory)

Description

The FabrIc for Frontier Experiments (FIFE) program is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model development for Fermilab experiments and external projects. FIFE is a collaborative effort between physicists and computing professionals to provide computing solutions for experiments of varying scale, needs, and infrastructure from all areas of Particle Physics: from Neutrino to Collider to Astro-Particle Physics. The major focus of the FIFE project is a single, unified high throughput computing solution for all experiments by development, deployment, and integration of workload management system (WMS), data management layer, and database access layer working seamlessly at sites in Open Science Grid (OSG), commercial and community cloud providers (e.g. Amazon AWS and FermiCloud), and local computing farms. Two development areas where FIFE has made significant progress are a job submission system and reference data distribution. Jobsub is a redesigned scalable, reliable, and robust tiered jobs submission system that integrates with the GlideinWMS workload management system to run complex scientific workflows in Grids and clouds. Jobsub is responsible for functions such as site selection via GlideinWMS, managing credentials, and handling data transfers, with GlideinWMS responsible for provisioning computing resources. Through the development of Alien Cache for CVMFS, the FIFE program has considerably expanded the capabilities of CVMFS for reference data distribution. In addition to job submission and reference data distribution, the FIFE project has also made significant progress integrating services into experiment computing operations such as a flexible data transfer client and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken the leading role in defining the computing model for Fermilab experiments, aided in the design of experiments beyond Fermilab, and will continue to lead the future direction of high throughput computing for future physics experiments world wide.

Primary authors

Arthur Kreymer (FERMILAB) Dave Dykstra (Fermi National Accelerator Lab. (US)) Dennis Box (F) Dr Gabriele Garzoglio (FERMI NATIONAL ACCELERATOR LABORATORY) Ken Herner (SUNY Stony Brook) Dr Michael Kirby (Fermi National Accelerator Laboratory) Parag Mhashilkar (Fermi National Accelerator Laboratory) Mrs Tanya Levshina (FERMILAB)

Presentation materials