WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray, Steve Traylen
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: Asia-Pacific, Russia
  • VOs: ALICE, CMS, BIOMED
  • list of actions
    Minutes
      • 16:00 16:05
        Feedback on last meeting's minutes 5m
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big> 5m
          From: Taiwan / Germany-Switzerland
          To: UK-Ireland / SouthWesternEurope


          NB: Please can the grid ops-on-duty teams submit their reports no later than 12:00 UTC (14:00 Swiss local time).

          Issues:
          1. Ticket Stats:
            • New : 16
            • Extend the date : 26
            • Close : 34
            • Quarantine : 6
            • 2nd mail : 11
          2. Some sites did not update their ca rpm to the latest version and I extended the date.
          3. There were some problematic nodes turned off on GOCDB and the SAM test did not run this node. I closed these tickets. Such as :
            • ticket 6379 FRJ-IF
            • ticket 6469 INFN-TORINO
            • ticket 6468 INFN-T1
            • ticket 6464 ENEA-INFO
            • ticket 6456 BNL-LCG2
            • ticket 6455 Beijing-LCG2
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:

          Issues from EGEE ROCs:
          1. Nothing to report
          Release News:
          • Item 1
            • Item 1.1
          • Item 2
        • <big> EGEE issues coming from ROC reports </big> 5h
          1. Central-Europe: Do gLite software procedures contain any security vulnerability check process? Who carries it out?
          2. IN2P3:
            1. We are experiencing memory problem with SL4_32. As a consequences, we had to reduce the number of job slots by WN. At the time being we have only 7 job slots for 8 CPUs (16GB RAM) by machine. This is still possible that we should reduce again the number of job slots as the problem still appears sometimes. We are also trying to solve the problem at the OS level, but no solution has been found for now. Anyway, this problem should disappear with SL4_64, so the question is: when can we get an official release of glite WN for SL4_64? Would it be possible to try a pre-release ?
            2. Lot of CMS jobs (1500) were lost because of a mini blackhole created by 10 misconfigured WNs. The problem has been hardly identified because of memory problems occurring at same time with SL4_32. This mine blackhole is now closed.
        • <big>Removal of lcg-job-monitor.cern.ch</big> 5m
          The unused and non working service: https://lcg-job-monitor.cern.ch:8443/job-monitor/job-monitor.cgi will be removed later this week.
        • <big> gLite Release News</big> 15m
          A new null release was made: glite 3.1 Update 5. This removed some 3rd parties from the gLite externals that were available from DAG anyway.
        • <big>gLite Progress to SL4.</big> 10m
          Speaker: Oliver Keeble (CERN)
        • <big>Life Time of SL3 Services after an SL4 Release</big> 5m
          It is the wish of SA3 that one month after an SL4 release of a service is made support for the SL3 version will be stopped. Deployment activities within SA1 require that any problems with the SL4 service will of delay the expiration of the SL3 service.
        • <big> End of life for SL<b>C</b>3</big> 15m
          SLC3 reaches end of life in two days time and updates will no longer be guaranteed. Updates for SL3 are still available until 2010.

          Current situation searching for SLC3 sites in the information system we see a distribution of SubClusters by country domain name of.

          • 2 at
          • 1 bg
          • 1 ca
          • 6 ch
          • 2 cz
          • 1 de
          • 6 es
          • 1 gr
          • 1 hk
          • 25 it
          • 2 jp
          • 2 lv
          • 2 pl
          • 1 pt
          • 2 ro
          • 6 ru
          • 1 se
          • 1 sg
          • 1 sk
          • 4 tw
          • 2 uk
      • 16:30 17:00
        WLCG Items 30m
  • <big>CMS service</big>
    • No Report Given
    Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)
  • <big> LHCb service </big>
    • Last week we had at CNAF problem due to the shared area not working. The problem was related to the migration of the shared areas to GPFS. This suggestes that any important changes in site configuration should be always broadcasted at a high level.
    Speaker: Dr roberto santinelli (CERN/IT/GD)
  • <big> ALICE service </big>
    • No Report Given.
    Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD)
  • <big> WLCG Service Coordination </big>
    • WLCG Service Reliability workshop, CERN, November 26 - 30 - agenda - wiki
    • Common Computing Readiness Challenge - CCRC'08 - Meetings schedule
    • CMS CSA07 has been extended till mid-November.
    • ATLAS M5 detector cosmics run has started to run till 5 November. Data for reconstruction and export not expected till later this week.
    Speaker: Harry Renshall / Jamie Shiers
  • 16:55 17:00
    OSG Items 5m
  • 17:00 17:05
    Review of action items 5m
    list of actions
  • 17:10 17:15
    AOB 5m
    • .