28-R-15 (CERN conferencing service (joining details below))
28-R-15
CERN conferencing service (joining details below)
Nick Thackray
Description
grid-operations-meeting@cern.ch Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
GGUS representatives
VO representatives
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0148141
OR click HERE (Please specify your name & affiliation in the web-interface)
From: Asia Pacific and Central Europe
To: SouthEast Europe and DECH
Report from Asia Pacific::
Nothing to report.
Report from CE::
Additional information about ticket set to 'Case transfered to political instances':
Site name: UKI-LT2-QMUL
ROC : UKI
Ticket id: 8997 (GGUS id: 40945)
Problem : RGMA-host-cert-valid
'Case transfered to political instances' status from 2008-09-30 and no progress. Now node is in Scheduled Downtime.
b)
<big> PPS Report & Issues </big>
Please find Issues from EGEE ROCs and general info in:
d)
<big> EGEE issues coming from ROC reports </big>
ROC CE: Some site admins are complaining that they cannot fill weekly reports - detail link is empty and it is impossible to check why failure appeared. Some of them even suggested that reports show failures that never happened.
ROC France:For Information: IN2P3-CC T1 has now succeeded in configuring its CEs GIP to restrict CMS access to both VOMS:/cms/Role=production and VOMS:/cms/Role=lcgadmin.
This configuration works only by using a Glite WMS, but CMS agreed as its production is entirely handled through glite WMS. Some CMS monitoring problem have still to be solved, but CMS production job submission has shown to be successful with this configuration.
The way the configuration has been made (with help of Steve Traylen) can be found in GGUS ticket #37102
That solution is close to Steve s proposal explained in the wiki page below:
http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_queues_with_access_restricted_to_a_FQAN.
But that needs some modifications. Steve, could you please update your page ?
ROC France: Between 10/10 and 13/10, various SAM failures raised but that seems to be wrong alerts. Moreover, no details was provided with the SAM test details web page. See for ex.:
https://lcg-sam.cern.ch:8443/sam/sam.py?funct=TestResult&nodename=lyogrid02.in2p3.fr&vo=OPS&testname=CE-sft-job&testtimestamp=1223606240
https://lcg-sam.cern.ch:8443/sam/sam.py?funct=TestResult&nodename=cclcgceli05.in2p3.fr&vo=OPS&testname=CE-sft-lcg-rm&testtimestamp=1223599915)
ROC SWE: PIC comment: in the GridMap monitoring (http://gridmap.cern.ch) if one clicks the "show SI2k" button in the "topology view" section, the sites are scaled wrt the "total cpus" value in a SI2k units, which looks as computed just multiplying the number of job slots published times the GlueHostBenchmarkSI00. As most of the clusters are not homogeneous, this is not correct. GlueHostBenchmarkSI00 is just the value to which internal accounting is normalized. LIP comment: There are of failures shown on the ROC report for LIP-Lisbon CEs but none of them appeared at SAM. Is there a syncronization problem between ROC report and SAM DB (10/11/12 of October, ce02.pic.pt) ?
e)
<big>gLite 3.0 services <b><i> NOW OBSOLETE </i></b> </big>
glite-SE_classic
glite-VOBOX
glite-WMS
glite-PX
glite-MON
An announcement for this retirement is already on the gLite 3.0 page :
http://glite.web.cern.ch/glite/packages/R3.0/
This corresponds to the procedure (until we have new one) that was discussed in the ops meeting in Feb 08: https://twiki.cern.ch/twiki/bin/view/EGEE/WlcgOsgEgeeOpsMinutes2008x02x25#Support_for_gLite_3_0_services
3
WLCG Items
a)
<big> WLCG issues coming from ROC reports </big>
ROC Russia: There is a request from ATLAS to clean their files (at least) in Russian sites. The following procedure is proposed (DPM version):
kill all files on the specified directories,
clean database by using dpns-rm command.
All external links are responsibilities of ATLAS VO.
There are two question:
Why should site managers ever do this, while VO administrators have enough access rights to do this themselves?
Is it a procedure approved by WLCG project management?
b)
<big>WLCG Service Interventions (with dates / times where known) </big>