Attendees:
Jeremy (chair+minutes)
Duncan
Stephen
Derek
Mingchao
Brian
Jens
Raja
Greig
11:00 Experiment problems/issues:
LHCb: Raja reported that LHCb are starting the move away from DIRAC2 for production. It is hoped that Ganaga
developers will be available next week. DIRAC2 uses the SRMv1 endpoints and is still used for low level free
construction. It is hoped to have a DIRAC3 envirnoment ready by the end of August with the end of September being the
absolute deadline - SRmv1 instances are to be switched off on this timescale.
The second item for LHCb to mention was the UK CA changes. There was one issue with a user not being able to move
smoothly - user was only one in LHCb VOMS not to be registered with both old and new CA DNs. Another user did not
receive the notice - the CERN spam filter killed the notification.
The notification issue prompted a short discussion on communicating with users. There will always be some issues like
this which mean straghtforward emails to individuals or lists is not enough.
JC noted that Steve L's test page now incorporated the LHCb SAM test results but that the tests themselves had
recently stopped running. RN explained that they were being moved to the DIRAC3 framework.
JC asked about the software are corruption that had been seen at several UK sites. RN said this had built up over
time - sites were not used for a long period - but the problems were being systematically discovered and resolved.
CMS: No report
ATLAS: No report
Other: There were no other VO issues arising.
11:20 ROC update:
EGEE SA1 meeting today: http://indico.cern.ch/conferenceDisplay.py?confId=38432
Topics: Admin matters; Update on EGI blueprint (being re-written as a 30 page document); SLA roadmap.
JC explained that the EGI proposal was going to be rewritten as the current presentation was not deemed strong
enough. Areas would also be broken out in the new document (i.e. SA1 and its role would become clearer).
WLCG update
*****************
MB was last week - nothing new to report here: http://indico.cern.ch/conferenceDisplay.py?confId=33704. Most of the
MB material was covered under the GDB discussion last week. Benchmarking is probably the most relevant and urgent for
T2s.
EGEE-WLCG-OSG ops meeting
******************************
Agenda: http://indico.cern.ch/conferenceDisplay.py?confId=38629
- Releases:-
For the PPS: 2008-07-28: glexec tests in PPS:
Service available on several CEs in PPS.
(list available at:
https://pps-private-wiki.egee.cesga.es/gocdb/user1.cgi?inputVal=40
selecting nodes at version >= gLite 3.1 PPS-update31
Still no feedback received from users.
2008-07-28: release of gLite3.1 PPS Update34 to PPS in preparation
This update will contain
* DPM and LFC 1.6.11 (see details in PATCH:1987)
* dCache 1.8.0-15p5 with new YAIM nodule for configuration
For production: 2008-07-23: release of gLite3.1 Update28 in preparation
This update, to be release the 29th of July will contain
* glite-CONDOR_utils for lcg-CE
- Discussion (our request) on WMS performance.
-- 3.1 SL4 found to be more stable than 3.0
-- several countries use a round robin setup for the host machines
--> What do we need to do next?
- Current recommended storage versions
https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions
GC noted that most sites were now pretty well up-to-date, but there were old "additional" DPMs around that should be
removed. Manchester now have a working DPM, but nobody has tried a WN distribution for DPM. Nothing intrinsic to DPM
which will prevent it but lack of tools for managing such a configuration may become an issue.
BD suggested that setting up two SEs at the site would allow some form of replication.
The move will leave just a few sites RALPP and IC using dCache. They are managing following the latest updates. GC's
suggestion that the sites were "content" with dCache was slightly disputed by DR. DR also commented that although UCL
is not functioning smoothly at the moment it is a small site and should not consume too much attention. JC mostly
agreed but added that if a user placed data on a specific site, even a small site, then the expectation would be for
good SE service levels.
GC noted a few sites were recently impacted by the CA changes: RHUL; UCL; Durham and Cambridge.
Ticket status
***************
https://gus.fzk.de/download/escalationreports/roc/html/20080728_EscalationReport_ROCs.html
Two tickets. On 35089 DR mentioned that this was awaiting a change by the fabric team at RAL. He was going to remind
them after the meeting.
37185 concerned a CDF request for re-enablement at sites. The original ROC ticket had been broken into child tickets
for each affected site, unfortunately it was not clear which of those was now closed. JC was to follow up.
11:35 Hardware purchase advice & sharing (05')
JC talked through the structure of the page setup to share hardware procurement information: . He asked everyone to
take a look and provide feedback/comments today as this needed to be shared ASAP since most sites were about to
procure new equipment. No feedback was given during the meeting and no new areas were suggested.
11:40 Follow up on quarterly reports & readiness reviews (10')
- Any further feedback on the QRs?
There was no additional feedback. DR mentioned that the LondonGrid report was still not final. JC understood but said
he would start compiling the overall view from the reports at the end of the week.
- The readiness review documents are here: https://www.gridpp.ac.uk/tier2/Readiness_Reviews/index.html
The plan was to take a quick look at some but various acces problems became apparent. DR received an access denied
error. JC got an error message when trying to open the London overview report. JC to investigate.
11:50 Web-page review (05')
starting with the high-level deployment pages http://www.gridpp.ac.uk/deployment/, JC looked at each main area. Most
of the high level pages are JCs responsibility (Overview, Status, Meetings .. etc. ) so he will check last updates
and modify pages which are out of date. All team members need to check the Contacts page. It was noted that Mingchao
did not yet appear.
Looking at the wiki areas starting from http://www.gridpp.ac.uk/wiki/Main_Page
* The T2 coordinators need to check the T2 and site entries - noted that London sites are not directly linked
* Grid services are nearly all T1 pages - DR to review
* Tier-2 support -> Experiment. T2 coordinators and experiment reps to review
* Tier-2 support -> Middleware. Andrew to check data management; Grieg Storage (quite up-to-date in most areas);
Batch systems - DR since many T1 entries; Update tools - is this used? AF instigated?; Virtualisation - ok;
Workarounds - static now since moved on.
* PPS. static
* VO Support. Presumably this is Sergey's area though some falls back to T2Cs.
* Security - Mingchao to review and update. MM asked about how to update the left margin links. JC suggested
contacting AM but SB said he would be able to help -> MM to liase with SB.
* Monitoring - AF for the links and AE for the Nagios part
* Availability - Static views but JC looking at feeding daily graphs onto a comment page (exisiting action)
* Hardware - new section
* Service challenge - prompted question about whether this old information should be archived somehow? Decided to
leave it where it is - users of the pages can check the date for relevance.
* Deployment team area - some sections like issues log not used. Others ok.
11:55 Actions review (10')
Updates recorded in wiki.
12:05 AOB (05')
- UKI meeting on Thursday
-- CA move is already on the agenda
-- Any urgent items for discussion given LHC/experiment status? None suggested!
Meeting closed at 12:00.
Chat window content:
10:54:58] Mingchao Ma joined EVO
[10:59:13] Derek Ross joined EVO
[11:04:59] Jens Jensen joined EVO
[11:24:41] Stephen Burke joined EVO
[11:28:46] Jens Jensen new status: Away
[11:35:01] Jens Jensen new status: Available
[12:01:58] Brian Davies left EVO
[12:02:02] Stephen Burke left EVO
There are minutes attached to this event.
Show them.