WLCG-OSG-EGEE Operations meeting

Name: WLCG-OSG-EGEE Operations meeting
Start: 2006-08-28T14:00:00+02:00
End: 2006-08-28T17:30:00+02:00
Location: VRVS (Sky room)

Monday 28 Aug 2006, 14:00 → 17:30 Europe/Zurich

28-R-15 (VRVS (Sky room))

28-R-15

VRVS (Sky room)

Maite Barroso

Description

grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:

OSG operations team

EGEE operations team

EGEE ROC managers

WLCG coordination representatives

WLCG Tier-1 representatives

other site representatives (optional)

GGUS representatives

VO representatives

VRVS "Sky" room will be available 15:30 until 18:00 CET

- 28-R-15
  
  28-R-15
  - 1
    
    Feedback on last meeting's minutes
    
    Minutes
  - 2
    
    Grid-Operator-on-Duty handover
  - From France (backup: South East) to Italy (backup: Russia)

WLCG SC report and upcoming activities

See new and updated information at https://twiki.cern.ch/twiki/bin/view/LCG/SC4ExperimentPlans

Speaker: Harry Renshall

gLite 3.0 updates

We will start moving to per-service upgrades. First one, probably released today:
- CE: lcg-info-dynamic-scheduler fix for host/queue name matching
next ones in the queue:
- FTS
- DPM/LFC
- UI and WN

Change of format of operations meeting

1) EGEE Items

Grid-Operator-on-Duty handover

Any other items/announcements specific to EGEE (eg updates to mw)

Issues coming from VO and ROC reports (ROC reports not received)

2) OSG Items

Issues coming from OSG

3) WLCG Items

Upcoming SC4 Activities

Any other general WLCG items

WLCG related Issues coming from experiment VOs and Tier-1/Tier-2 reports (VO reports + Tier 1 reports not received)

4) Review of action items

5) Feedback on last meeting's minutes

6) AOB

REMINDER: to update to the 1.8 IGTF CA package for every service (not only the WNs, only ones checked with SFT)

Bug 16625: 10-50 times speedup for lcg-info-generic.

http://savannah.cern.ch/bugs/?func=detailitem&item_id=16625
10-50 times speedup for lcg-info-generic, tested at RAL and GridKa. Request to increase its priority so it is included in a release asap.

Issues to discuss from reports

Reports were not received from:
ROCs: UKI (holiday)
Tier-1s (reports attached): BNL
VOs:

CE ROC: Improvements to gLite update release process needed.

1.A) (4444 jobs problem. Bug affects all sites containing a special character in domain name, or a queue name)
gLite updates for production sites should not contain packages that are known to have bugs. Package lcg-info-dynamic-scheduler released with gLite 3.0.2 contained well known bug that affects CEs with hostname containing character '-' or queue name containing underscores, uppercase letters, and numbers. This bug is not listed on any download page (e.g. http://glite.web.cern.ch/glite/packages/R3.0/deployment/lcg-CE/3.0.3/lcg-CE-3.0.3-update.html) as known issue.
Since the publish date this issue has generated at least three tickets:
https://gus.fzk.de/pages/ticket_details.php?ticket=11681&from=allt
https://gus.fzk.de/pages/ticket_details.php?ticket=11619&from=allt
https://savannah.cern.ch/bugs/?func=detailitem&item_id=19233
There's a lot of such CEs in central BDII so there will probably be more tickets. The worst part is that the problem has already been reported in May:
https://savannah.cern.ch/bugs/?func=detailitem&item_id=17716
and the patch has been available since July:
https://savannah.cern.ch/patch/? func=detailitem&item_id=754

1.B) Updates should be coordinated with central services (GSTAT is affected here). For example, MyProxy's ServiceType has changed from 'myproxy' to 'MyProxy' since Yaim 3.0.0-17 (release in June) and Gstat still issues warning on PROX nodes because it looks for 'myproxy'
(https://gus.fzk.de/pages/ticket_details.php?ticket=11653&from=allt).

2. DECH: Is there a way to clean-up the RBs MySQL database for (very) old entries? Database files have sizes of many GBs already." (DESY-HH)

3. LHCb: I'd simply like to put more pressure on GridKA site admins whose site is failingreconstructions jobs for the on going DC06.
There is a GGUS ticket (#11599)describing the problem qhose priority was severe.

Problems with lcg-gt at GRIDKA in DC06
Detailed description:
Dear Site Manager,
Since several days now, when we (LHCb) are trying to run Reconstruction DC06 jobs at your site, for data we have just transfer to your site we get in to the following situation: when the job issue lcg-gt commands to get appropriated TURL for the dcap protocol to be used by the application a large fraction of them timeout (by our own wrapper after 30 seconds) and thus the intput data to the jobs can not be resolved.
This same logic has been working fine at your site in the past, and it is also working at other Tier1's (PIC, RAL, IN2P3) and CERN. Please investigate the problem and let us know if we can help you to debug the issue.

Review of action items

AOB