UKI Monthly Operations Meeting (TB-SUPPORT)

Europe/Zurich
EVO - UKI monthly meeting (TB-SUPPORT) (find it in the GridPP community)

EVO - UKI monthly meeting (TB-SUPPORT) (find it in the GridPP community)

Description
Monthly review and discussion meeting for those involved with UKI EGEE/WCG deployment and operations.
Minutes
    • 10:30 10:50
      Site status review and issues 20m
      - Accounting UKI-LT2-QMUL - stopped mid-September UKI-NORTHGRID-MAN-HEP - stopped early September UKI-SOUTHGRID-BHAM-HEP - stopped early/mid-September - Monitoring status Summary can be seen here: http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/sam.html Particular problem with UKI-NORTHGRID-SHEF-HEP 0% for last week and month. Last week sees <60% success for: UKI-LT2-Brunel UKI-LT2-QMUL - Site concerns
    • 10:50 10:55
      Current experiment/VO activities and issues 5m
      - ATLAS - CMS - LHCb - Regional VOs - including camont and totalep - Other such as biomed
    • 10:55 11:05
      Recent GDB/MB/DTEAM discussions 10m
      * GDB at the end of August: http://indico.cern.ch/conferenceOtherViews.py?view=cdsagenda&confId=17747 -- Look at US centres. These tend to be supporting one experiment only -- 1.The top level Grid Security Policy v5.7 was approved by the GDB. -- 2.The Site Operations Policy v1.4 was approved by GDB in May 07. It has now been approved OSG although they have some outstanding issue for inclusion in the next release. V1.4 will now proceed to the EGEE PEB and WLCG MB -- 3.The VO Operations Policy requires further iterations before returning to the GDB in October, hopefully to be approved.. -- 4.The Pilot Jobs Policy is still under discussion (v0.2) but it is planned to produce v1.0 in October after EGEE07 and bring it to GDB the following week (October 10th). - Discussion about middleware readiness (FTS2.0 ready; LFC ok; SRM2.2 looking at production readiness for early 2008; gLite WMS almost ready; LCG-CE being ported to SL4. CREAM early 2008. 3D going well. - glexec still an issue. *MB Main concerns remain accounting, site reliability and metrics. There was also an LHCC referees meeting recently (http://indico.cern.ch/conferenceDisplay.py?confId=19115). *DTEAM Main discussions have been: next stage in site testing; security awareness and processes; glexec; outcomes from CHEP/WLCG workshop. Quarterly reports are coming again! *PMB Focus is currently on preparation for an upcoming GridPP Oversight Committee *OPS (important extracts from LCG Bulletin) LCG-RB to Maintenance Mode - Following the release of the 3.1 WMS, developers in JRA1 and integrators in SA3 are no longer releasing updates to the LCG-RB middleware component. The 3.1 WMS was finally deployed to the production infrastructure causing some surprise to service managers and their users due to the differences between the LCG-RB and WMS 3.1. In particular support for the glite-job-* commands that many are familiar with in not available in this upgrade. All Experiments were reminded that Sites will configure the VOs access via pool accounts for PRD and SGM roles, unless the Experiment specifically states in its VO ID card (in the CIC Portal) they want single accounts for these roles. ATLAS asked all Sites supporting them *not* to install the Python32 library for them. This topic will be discussed at the next ATLAS taskforce meeting and the policy coming from this discussion be communicated to the ATLAS Sites
    • 11:05 11:15
      SL4 on WNs - progress 10m
      - Status noted at Tuesday's DTEAM meeting * Oxford - new machines running SL4 * Glasgow - already done * Manchester - some WN's SL4 - one of the CE's to be upgraded. * Lancs already upgraded. * Shef / Liv have other issues just now * Birm = Some SL4 in testing - prod styaying for now * Bris - devel with SL4 * Camb - Se's SL4, WN still 3 * RalPPd - >50% sl4 already * T1 - 60-80% SL4 just now
    • 11:15 11:25
      General discussion 10m
      - Thoughts after GridPP 19 panel discussions - Requests for training and items to be covered at the HEPSYSMAN/monitoring meeting in London on 31st October - Thoughts on SRM2.2 deployment workshop (13/14th November - NeSC)
    • 11:25 11:30
      AOB 5m
      1) Future workshop dates to be aware of: 3rd WLCG Collaboration Workshop April 21-25 2008 CERN 4th WLCG Collaboration Workshop March 21-22 2009 Prague, before CHEP09 2) Sites with these CEs should check their VOView publishing (not consistent with CE info): ce.epcc.ed.ac.uk ce00.hep.ph.ic.ac.uk epgce1.ph.bham.ac.uk fal-pygrid-18.lancs.ac.uk hep-ce.cx1.hpc.ic.ac.uk heplnx207.pp.rl.ac.uk lcgce01.phy.bris.ac.uk mars-ce2.mars.lesc.doc.ic.ac.uk pc90.hep.ucl.ac.uk serv03.hep.phy.cam.ac.uk svr016.gla.scotgrid.ac.u kt2ce02.physics.ox.ac.uk: Fuller explanation from ops meeting: "there is a mismatch between all inclusive information published for a CE and information published in the VOViews. As an example, the queue mentioned below supports only ATLAS, therefore, the number of waiting jobs in the inclusive view should be the same as the one for the ATLAS VoView. But it is not. The VOView publishes all zeroes. Moreover, there are some queues where the number of waiting jobs for all views do not add up to the total published in the inclusive view. In total more than 130 ATLAS queues are affected, among which almost all T1s. Since the WMS uses information in the VOView and the latest one is generally the wrongly published one, ATLAS is submitting jobs almost randomly with accumulation of jobs at small sites. The issue is extremely severe."