13–17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

Failure Management in the London Distributed Tier 2

15 Feb 2006, 09:00
9h 10m
Tata Institute of Fundamental Research

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
poster Grid middleware and e-Infrastructure operation Poster

Speakers

Dr David Colling (Imperial College London)Dr Olivier van der Aa (Imperial College London)

Description

The LCG [1] have adopted a hierarchical Grid computing model which has a Tier 0 centre at CERN, national Tier 1 centres and regional Tier 2 centres. The roles of the different Tier centres are described in the LCG Technical Design Report [2] and the levels of service required from each level of Tier centre is described in the LCG Memorandum of Understanding [3] . Many of the Tier 2 centres are formed by federating the resources belonging to geographically distributed institutes in a given region. The institutes within such a federation are able provide different levels of resources and typically will have different levels of expertise. Providing a good level of service in such situations is challenging. In this context, the London Tier2 (LT2) [4] is one of the four federated Tier 2 centres within the GridPP [5] collaboration in UK. The LT2 is distributed between five institutes in the London area and currently totals around 1 Mega Spec Int 2000 [6] . In this paper we analyze how we can minimize the time to solve LT2 failures within the constraint of the available human resources and their mobility. The analysis takes into account, the time to travel between institutes, the type of problems each support person can solve and their availability. We demonstrate how to create a hierarchy of support staff to solve an identified problem. We also provide an estimate of time to solve for future LT2 failures. This is based on failures rates extracted from the monitoring information and known response times. We suggest this failure management method as a model for any distributed Tier2. [1] LCG http://lcg.web.cern.ch/LCG/ [2] LHC Computing Grid, Technical Design Report, LCG-TDR-001, CERN-LHCC-2005-024. [3] http://lcg.web.cern.ch/LCG/C-RRB/MoU/LCG_T0-2_draft_final_051012.pdf [4] LT2, http://www.gridpp.ac.uk/tier2/london/ [5] GridPP, UK computing for particle physics http://www.gridpp.ac.uk/ [6] Spec Int 2000 http://www.spec.org/cpu2000/

Primary authors

Dr Alex Martin (Queen Mary, University of London) Ms Alice Fage (University College London) Dr Ben Waugh (University College London) Dr David Colling (Imperial College London) Mr David McBride (Imperial College London) Dr Gianfranco Sciacca (University College London) Dr Giuseppe Mazza (Queen Mary, University of London) Dr Grigori Rybkine (Royal Holloway, University of London) Dr Henry Nebrensky (Brunel University) Mrs Mona Aggarwal (Imperial College London) Dr Olivier van der Aa (Imperial College London) Dr Paul Kyberd (Brunel University) Dr Rand Duncan (Brunel University) Dr Simon George (Royal Holloway, University of London) Dr William Hay (University College London)

Presentation materials

There are no materials yet.