LCG Workshop on Operational Issues

chaired by Ian Bird
from to (Europe/Zurich)
at CERN
Description
Currently over 80 sites are connected to the LCG grid and over 8000 processors are available to run a variety of applications. During the Data Challenges of the LHC experiments the grid middleware has proven to be stable though incomplete. It seems a good moment to shift the attention from getting more reliable software to getting more reliable grid operation. In the end we need an infrastructure which is always operational, where software upgrades can be done regularly and in a controlled way, where bugs can be fixed quickly and efficiently and where users can get support when and where needed. To discuss how to achieve this a Workshop will be organised at CERN from 2 to 4 November. We would like to make this a real workshop with one plenary sessions only followed by many small dedicated meetings focused on just one aspect. 

For this open workshop people responsible for the operation of the major LCG centers are invited as well as the people responsible for the EGEE Operational Management Center, the Regional Operation Centers and Core Infrastructure Centers. The people with the real hands-on experience of operations should come to propose solutions for the bottlenecks we will identify as well as the managers that can assign resources and manpower to make these solutions become true.
The format of the workshop is outlined below. The IT Auditorium can hold approximately 100 people and we would therefore like to ask you to register by filling up the form at the following web address: 
http://lcg.web.cern.ch/LCG/SC2/LCGWorkshop/LCGWorshopReg.asp
Although this is an open workshop the organizers retain the right to make some choices in case of over subscription.
Go to day
  • Tuesday, 2 November 2004
    • 09:00 - 18:00 Plenary Session I
      Convener: Kors Bos
      Location: 40-SS-C01
      • 09:00 Introduction to the workshop 30'
        Speaker: Ian Bird (CERN (IT-GD))
        Material: transparencies powerpoint file pdf file
      • 09:30 Current LCG/EGEE Operations 1h0'
        What are the issues?
        Speaker: Stephen Burke (RAL)
        Material: transparencies powerpoint file pdf file
      • 10:30 COFFEE 30'
      • 11:00 Security incident response 30'
        Summary of OSG/EGEE/LCG incident response procedures and plans + discussion
        Speaker: Dave Kelsey (RAL)
        Material: transparencies powerpoint file pdf file
      • 11:30 Overview of the deployment process 30'
        Speaker: Markus Schulz (CERN (IT-GD))
        Material: transparencies powerpoint file
      • 12:00 Operations management 30'
        Overview of site testing, problem follow up and escalation. Local vs remote control of sites. 
        Discussion on escalation procedures. How to handle bad sites?
        Speaker: Piotr Nyczyk (CERN (IT-GD))
        Material: transparencies powerpoint file
      • 12:30 LUNCH 1h0'
      • 13:30 Grid3 Operations 30'
        Speaker: Doug Pearson
        Material: transparencies powerpoint file pdf file
      • 14:00 Sharing work and responsibilities 30'
        Summary of what was discussed at CHEP. How to share the work between the CICs and other GOCs.
        Speaker: John Gordon (RAL)
        Material: transparencies powerpoint file pdf file
      • 14:30 Globus Monitoring 30'
        Speaker: Jennifer Schopf (ANL)
        Material: transparencies powerpoint file pdf file
      • 15:00 Monitoring Frameworks 30'
        Overview of R-GMA monitoring framework. What information needs to be published by each site?
        Speaker: Min Tsai
        Material: transparencies powerpoint file pdf file
      • 15:30 Monitoring in LCG 30'
        Speaker: Dave Kant (RAL)
        Material: transparencies powerpoint file pdf file
      • 16:00 COFFEE 30'
      • 16:30 LCG/EGEE User Support - GGUS 30'
        Speaker: Torsten Antoni/Holger Marten (FZK)
        Material: transparencies powerpoint file pdf file
      • 17:00 User Support in grid.it 30'
        Speaker: Marco Velato
        Material: more information powerpoint file pdf file
      • 17:30 Accounting, current status 30'
        Speaker: Luciano Gaido
        Material: more information pdf file transparencies powerpoint filedown arrow pdf filedown arrow
  • Wednesday, 3 November 2004
    • 09:00 - 13:30 Plenary Session II
      Convener: Kors Bos
      Location: 40-SS-C01
      • 09:00 Issues from current experience 30'
        Speaker: Andreu Pacheco
        Material: transparencies powerpoint file pdf file
      • 09:30 LCG new manual installation 30'
        Speaker: Laurence Field
        Material: transparencies powerpoint file pdf file
      • 10:00 Quattor in LCG-2 30'
        Speaker: Jeff Templon (NIKHEF)
        Material: transparencies powerpoint file pdf file
      • 10:30 COFFEE 30'
      • 11:00 Batch systems 30'
        Experience and use of Torque/Maui, fairshares
        Speaker: Steve Traylen (RAL)
        Material: transparencies powerpoint file
      • 11:30 Definitions of the Working Groups 1h0'
        Speaker: wg conveners
        Material: more information powerpoint filedown arrow pdf filedown arrow
      • 12:30 LUNCH 1h0'
    • 13:30 - 18:00 Working Group 1: Operational Security
      Convener: Ian Neilson
      Location: 40-SS-C01
      Material: more information pdf file
    • 13:30 - 18:00 Working Group 2: Operational Support
      Convener: Ian Bird
      Location: 60-6-002
      Material: more information powerpoint filedown arrow word file pdf filedown arrow
    • 13:30 - 18:00 Working Group 3: User Support
      Convener: Flavia Donno
      Location: 13-2-005
      Material: more information powerpoint file pdf file
    • 13:30 - 18:00 Working Group 4: Fabric Management
      Convener: Davide Salomoni
      Location: 160-1-009
    • 13:30 - 18:00 Working Group 5: Software Management
      Convener: Steve Traylen
      Location: 104-R-B09
      Material: text unknown type file
  • Thursday, 4 November 2004
    • 08:50 - 12:30 Working Group 1: Operational Security
      Convener: Ian Neilson
      Location: 14-4-002
    • 08:55 - 12:30 Working Group 2: Operational Support
      Convener: Ian Bird
      Location: 40-SS-C01
    • 09:00 - 12:30 Working Group 4: Fabric Management
      original room was: 13-1-017
      Convener: Davide Salomoni
      Location: 13-3-005
    • 09:00 - 12:30 Working Group 3: User Support
      Convener: Flavia Donno
      Location: 13-2-005
    • 09:10 - 12:30 Working Group 5: Software Management
      Convener: Steve Traylen
      Location: 13-3-005
    • 13:30 - 17:00 Plenary Session III
      The working groups report on what they have achieved. The slides should show an operational plan with a schedule when this can achieved and with names of people who are responsible for it. 
      Convener: Kors Bos
      Location: 40-SS-C01
      • 13:30 Report from WG 1 30'
        Speaker: Ian Neilson (CERN)
        Material: more information powerpoint file pdf file
      • 14:00 Report from WG 2 30'
        Speaker: Ian Bird (CERN)
        Material: transparencies powerpoint file pdf file
      • 14:30 Report from WG 3 30'
        Speaker: Flavia Donno (CERN)
        Material: transparencies powerpoint file pdf file
      • 15:00 COFFEE 30'
      • 15:30 Report from WG 4 30'
        Speaker: Davide Salomoni (NIKHEF)
        Material: more information powerpoint filedown arrow pdf filedown arrow
      • 16:00 Report from WG 5 30'
        Speaker: Steve Traylen (RAL)
        Material: more information powerpoint file pdf file
      • 16:30 Summary 30'
        Speaker: Ian Bird (CERN)
    • 17:00 - 18:00 DRINK
      Convener:
      Location: 501--