28-R-15 (CERN conferencing service (joining details below))
CERN conferencing service (joining details below)
firstname.lastname@example.org Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
Do the other federations have experience with multi-valued LCG_GFAL_INFOSYS?
Suggest that SAM should extend RM test timeout with introduction of multi-value LCG_GFAL_INFOSYS. This settings allows the test to fail-over but will execute longer probably.
FYI: there is a ticket created (GGUS Ticket ID# 37754) that SAM does not recognize SE downtime. The answer was that this is just an error of the visualization layer, and GridView scores are properly updated, but this report also doesn t recognize the downtime.
Air conditioning trouble at IN2P3-CC due to excessive heat.
DESY: What is the procedure in case users use site resources in a denial-of-service manner?
Contacting the user and/or ban the user is an immediate solution, but is not a scalable one.
The problem in case is a memory fork bomb on a gLite WN (torque client). Do generic linux or torque/maui configurations or tools exist to prevent these, or at least monitor them?
We would appreciate feedback from other ROCs/Sites.
[ROC Northern Europe]:
There has been a bug reports submitted on june 11th about a crashing glite-proxy-renewd, (GGUS ticket 37334). It is still in an assigned status. Could someone have a look at it.
[ROC South Eastern Europe]:
AEGIS-01 and AEGIS-07 are asking if one monbox can handle the accounting for two sites.
<big> WLCG issues coming from ROC reports </big>
Many jobs (from Alice and Atlas) had to be cancelled to solve a problem which resulted from a massive job submission by Atlas (>30'000 jobs).
<big>WLCG Service Interventions (with dates / times where known) </big>
<big> Status of deployment of FTM at tier-1 sites </big>
Which LCG tier-1 sites have successfully deployed FTM?
For those tier-1 sites which have not deployed FTM, when is this planned to take place? The reason the experiments want this is because the FTM publishes transfer logs to GridView (thanks Steve ;o)
ASGC: Already deployed and operational.
BNL: Already deployed and operational.
CNAF: Installed last week but still being tested.
DE-KIT (FZK/GridKa): Already deployed and operational.
IN2P3-CC: Not yet installed. Hope to have it in place during July.
NDGF: Not installed. Will take at least 3 weeks if needed.
PIC: A test instance is being deployed now and is planned to be in production by mid July
RAL: Already deployed and operational.
SARA: Intend to install FTM early in July.
TRIUMF: Already deployed and operational.
<big> WLCG Operational Review </big>
Harry Renshall / Jamie Shiers
<big> Alice report </big>
<big> Atlas report </big>
<big> CMS report </big>
<big> LHCb report </big>
1. In2P3 gsidcap file access issue:
Problem has finally been understood (global GSI environment screwed up with multiple
connections into the same gsidcap door). And a new patch (1.8.0-15p8 out next
week) will cure this problem that has to be rolled out very, very quickly.
2. SARA SRMv1: no pools configured.
<big>Recommended base versions for storage services:</big>
(OSG - Indiana University)
Discussion of open tickets for OSG
Review of action items5m
list of actions
Suggestion to use EVO rather than the CERN conferencing system in the future.
We could use the EGEE community which exists in EVO: