Operations team & Sites

Name: Operations team & Sites
Start: 2012-06-12T11:00:00+01:00
End: 2012-06-12T12:16:00+01:00
Location: EVO - GridPP Operations team meeting

Tuesday 12 Jun 2012, 11:00 → 12:16 Europe/London

EVO - GridPP Operations team meeting

Description

- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 126540 with code: 4880. Apologies:

Hide

Meetings and updates

====================

General updates

---------------

Pre-GDB on WN security on now.

GDB tomorrow, details linked from https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

Dave Crooks as T2 rep.

T2 reliability and availablity stats out for May.

UMD mirroring - many sites have a local private mirror. Not a lot of need seen for an NGI level mirror, as most sites have a local mirror.

Tier-1 update

-------------

Castor outage tomorrow (Wednesday), and FTS outage. Sepatate updates, but synched. WMS-1 update.

Morning on Tue 19th outage, site access router replacement. (+ on bulletin).

Storage

-------

Chep digest up.

Griddpp DPM tools are now in EPEL, so should be easier to install.

Accounting

----------

HEP-SPEC06 benchmarks on wiki - starting to get an overview of the new kit. Will check the publishing at the end of the month.

Documentation

-------------

Tools for handling VOMS info, voulteers wanted for testing

Interop

-------

Sites not publishing UserDN's - let Stuart P know reason for that.

On Duty

-------

Busy last week with new CA's certs. Quiter now.

Services

--------

Perf Sonar - aim for 4 sites, and work out testing matrix.

Tickets

-------

On the bulletin.

Glasgow having problem with storage. (more in atlas update).

Tools

-----

Backup Nagios up at https://gridppnagios.lancs.ac.uk/nagios

Can sites check, and see if it's working for thier site. Would only be use if the oxford one falls over - configured the same as the primary.

Also: can sites check https://pprc.qmul.ac.uk/~walker/votable.html to see if it (derived from publishing) matches the ite admins expectations.

Experimetns and updates

=======================

LHCB

----

What is the status of the new CE's in Glasgow? Fully in production.

CMS

---

ICHEP preperations. Imperial had problems with local users overloading it, very full at the moment.

ATLAS

-----

3 tickets: Durham, Imperial, and Glasgow. D and I already covered.

Glasgow: 2 iffy disk servers, set offline. Ongoing issues with hot spots, mitigated with reduced peak analysis jobs (capped). Pending SL4 -> SL5 transition should rebalance things a bit, and help.

QMUL has some dark data problems, investigation ongoing. about 15TB.

Ewan asks: Cambridge analysis queue went offline then into test yesterday - but looks good now, is hammer cloud marking things online at the right times? Cambridge not running jobs that Elena can see. Offline followup needed. Might be not enough jobs with the current version of software, then it's not getting marked online.

CVMFS problem at Cam, yesterday. more CVMFS issues? Question over what time zone things are in.

Some discussion on cache cleaing vs cvmfs.fsck; emails to be forwarded for followup.

Other VOS: Snowplus

-------------------

Noted that to get MyProxy to work, they needed to upload a proxy twice - once with a password for workstation use, and again without a password for automated renewall.

Site roundtable

===============

QMUL: Problem with 10 Gb cards - driver locks up after a while.

Glasgow: SE problems - overloading and hotspots.

Lancaster: Should have a fix for APEL/LSF.

Tier-1: Looking into FTS timeouts for start and end of jobs.

Manchester: Installing perf sonar - looking at the way RAL did it.

RHUL: Working on APEL issues.

Oxford: Networking improvements - 10G line nearly there, workign on long range tuning for TCP.

AOB

===

Vidyo vs EVO for future meetings? Janet pays the EVO for, at least for the moment - likely to stick to EVO for the moment. Vidyo updates might well imporve it's experience...

There are minutes attached to this event. Show them.

- 11:00 → 11:20
  
  Experiment problems/issues 20m
  
  Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS - Other
- 11:20 → 11:45
  
  Meetings & updates 25m
  
  With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest - Tier-1 status - Accounting - Documentation - Interoperation - Monitoring - On-duty - Rollout - Security - Services - Tickets - Tools - VOs - SIte updates
- 11:45 → 12:04
  
  Site roundtable 19m
  
  - Input from sites on current priorities and concerns.
- 12:04 → 12:05
  
  Actions 1m
  
  To be completed: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Completed: https://www.gridpp.ac.uk/wiki/Operations_Team_Completed_Actions
- 12:05 → 12:06
  
  AOB 1m

Choose timezone

Operations team & Sites

EVO - GridPP Operations team meeting