28-R-15 (CERN conferencing service (joining details below))
CERN conferencing service (joining details below)
Maite Barroso Lopez(CERN)
firstname.lastname@example.org Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0148141
OR click HERE (Please specify your name & affiliation in the web-interface)
The problems with Asia Pacific have now been resolved.
Two sites (NE and AP) have overdue alarms, but I have
informed both of them about this.
No issues to report to the WLCG meeting, except to inform
them that the AP problems are now resolved.
Italy, France and UKI had not validated their ROC reports as of the 14:00 deadline.
Reports show no major operational issues encountered during the reporting period, and no points to raise at this meeting.
FZK-LCG2 wishes to convey the following INFO: Planed downtime at FZK-LCG2 on 10-09-2009 07:00 - 08:00 UTC
The LFC service lfc-fzk.gridka.de will be down (not LHCb LFC) due to splitting it into an ATLAS (atlas-lfc-fzk.gridka.de) and a non-ATLAS (lfc-fzk.gridka.de as before) one.
SEE ROC: At the previous operations meeting it is briefly discussed the issue “WLCG MB agreed on 4th of August to ask for the SL5 migration at all Sites, including the Tier-2 Sites.”. As far as we know MPI it is still not supported by the glite-3.2 (see https://gus.fzk.de/ws/ticket_info.php?ticket=47422).
We understand that this affects only the WLCG sites (at the moment), but since there are many users/teams in our region that they are depending on the MPI facility/capability of the Grid, we think that this issue could be given higher priority at the developers.
SWE ROC: We d like to certify a site that runs only central services (WMS, LFC, etc..), the site has no storage or computing backend. Is this possible from the point of view of OPS?
Last reminder that the default DPM used for SAM tests will be upgraded to SL4 next Monday 7th of September, and that sites with obsolete client S/W will start failing tests.
SAM MPI tests will NOT be activated
There are pending tickets for SL5
Notification of new gstat beta version (see attached material)
7 Sites running legacy gLite releases, those not upgraded next week will be moved to suspended/uncertified till they do so:
Site Host Version
EENet kriit.eenet.ee 3.0.2
HK-HKU-CC-01 ce.grid.hku.hk 3.0.2
JP-KEK-CRC-01 dg10.cc.kek.jp 3.0.2
Taiwan-IPAS-LCG2 atlasce.phys.sinica.edu.tw 3.0.2
Taiwan-NCUCC-LCG2 ce.cc.ncu.edu.tw 3.0.2
TW-NTCU-HPC-01 host001.hpc.ntcu.edu.tw 3.0.2
UKI-LT2-RHUL ce1.pp.rhul.ac.uk 3.0.2
An update to the EGEE SA1 OAT release has now been released and
is available in the usual repositories.
There are no changes to the YAIM configuration required but it is necessary
to rerun ncg.pl at least e.g via a YAIM rerun following the "yum update"
of your packages.
Changes to grid-monitoring-probes-org.bdii probes with NCG providing
configuration for them.
Probe details: http://goc.grid.sinica.edu.tw/gocwiki/NagiosProbe
Addition of org.gstat.CE and org.gstat.SE probes. These provide the
sanity checks similar to those the gstat1 web interface provided. These are
the gstat2 probes.
In particular these look for greater compliance to the WLCG/EGEE glue schema
Nagios probe results that are collected via the messaging system now
have their status prefixed with the hostname from where the test was executed.
e.g For a ROC that submitted a WN test to site via a CE then the
probe result once transmitted to the site nagios via msg service
will appear as
before as service "org.sam.WN-Bi-dteam-roc"
on the CE node but the status line contains the WN name. e.g
lxbra3908.cern.ch: OK: getCE:
indicating that lxbra3908 was the WN where the test was executed.
We plan to deploy a bug fix to the production message brokers shorty
that at times can cause consumers to fail to get messages.
The OSG supporter wrote in the diary of GGUS ticket 49970 that the problem is solved, hence the ticket will be closed.
However, the corresponding OIM ticket 7148
is in Status: Support Agency. Therefore the GGUS ticket cannot be closed.
Please adapt the ticket status and put a comprehensive text in the Solution field for the GGUS Knowledge Data Base.