Operations team & Sites

EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 126540 with code: 4880. Apologies: Mark, Raja
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb We have mostly smooth running for LHCb in the UK. Issues : 1. Various CVMFS errors at different sites. Followed up through GGUS tickets. 2. Interesting 3-day oscillation in running jobs at RAL (Tier-1). Trying to understand is origins and implications. - CMS - ATLAS UKI-LT2-IC-HEP: Long standing problem with missing release needs some dedicated testing due to the different setup IC has. Waiting on AdS to supply some code. UKI-SOUTHGRID-OX-HEP: problems with FTS time out settings. Brian has now changed them to a longer time. Downtimes UKI-NORTHGRID-SHEF-HEP: uni power cut UKI-SOUTHGRID-BHAM-HEP: disruptive installation of new aircon units UKI-SOUTHGRID-RALPP: site routers maintenance RAL-LCG2: site routers maintenance CVMFS * timeout problem has now two tickets one for atlas and one for cvmfs. https://savannah.cern.ch/bugs/?95420 https://savannah.cern.ch/support/?129468 Jakob thinks ha has found a solution and has a test version of cvmfs for it. * Another bug we are looking at is cvmfs hanging every now and then this affects lhcb too so Raja might want to give a look. https://savannah.cern.ch/bugs/?92112 Transfers errors UKI-NORTHGRID-SHEF-HEP AND UKI-SOUTHGRID-RALPP had some problem with jobs in tranferring state accumulating after RAL downtime last week. This was due to FTS reporting the same error code for two different errors confusing Site Services. The problem has been noted and reported to the WLCG meeting. - Other
    • 11:20 11:40
      Meetings & updates 20m
      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest - Tier-1 status - Accounting - Documentation - Interoperation - Monitoring - On-duty - Rollout - Security - Services - Tickets - Tools - VOs - SIte updates
    • 11:40 11:50
      GDB overview 10m
    • 11:50 12:00
      glexec & ARGUS 10m
    • 12:00 12:01
      AOB 1m
    • 12:01 12:21
      VOMS 20m