WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0157610

    OR click HERE

    actionlist
    minutes
      • 16:00 17:40
        WLCG-OSG-EGEE Operations Meeting 28-R-15

        28-R-15

        • 16:00
          Feedback on last meeting's minutes 5m
        • 16:05
          EGEE Items 25m
          • <big> Grid-Operator-on-Duty handover </big> 5m
            From CERN ROC (backup: DECH ROC) to AsiaPacific ROC (backup: CentralEurope ROC)
            Tickets:
            Open 103
            Closed 112
            Modified 197

            Notes:
            1. SSH upgrade caused a lot of sites to have JS problems -> lot of tickets
            2. Dashboard was not working from time to time
            3. SFT results were not available towards the end of the week
            4. To see PPS results you need to add '&vo=ops' at the end of SFT url
          • <big> Update on SLC4 migration </big> 5m
            Questions from last week:
            Q: compatibility with native SCL4 compiler?
            Q: porting priority will be on on 3.0, or 3.1?
            Q: How long with SLC3 gLite be supported?
            Speaker: Alberto Di Meglio
          • <big> Release of new version of CIC Portal </big> 5m
            Speaker: Gilles Mathieu
          • <big> EGEE issues coming from ROC reports </big> 10m
            Reports were not received from these ROCs: Asia-Pacific; France; North Europe
            Reports were not received from these non-HEP VOs: BioMed

            1. As discussed in previous meetings, the RB maintenance concepts by middleware developers/designers needed as fundamental feature of production quality software. Problem was addressed by us many times but not yet solved. Corresponding ticket was closed without the request being accepted.
              Closing a port in the firewall simply disables the service but then the user simply gets an error.
              See GGUS ticket: 13476
              (DECH ROC)


            2. As discussed in previous meetings, the Garbage collection and cleanup of persistent services like RB needs to be addressed. The corresponding GGUS tickets are not in progress at all. Escalation needed?
              See GGUS ticket 13474
              (DECH ROC)
        • 16:30
          OSG Items 20m
          No items for discussion.
        • 16:35
          WLCG Items 45m
          • <big>WLCG Service Report</big> 15m
            more information
          • <big> WLCG Service Commissioning report and upcoming activities </big> 15m
            Speaker: Harry Renshall
          • CNAF - Castor2 problems again, transfers fail withmessages related to SRM
          • SARA - same old problem with the certificates of theVoBox, all transfers are failing, nothing has changed
          • RAL - problems withgridftp and the storage pool, all transfers are failing right now


          • ATLAS:
            ATLAS is currently running a Tier-0 scaling test. The main goal is to test the Tier-0 Management System and the export of the data from Tier-0 to Tier-1s and, on a voluntary basis, from Tier-1s to Tier-2s.
            The test was supposed to end last week it has been extended to the end of this week. These Tier-0 scaling tests should be run regularly and we would like to have a new 3 weeks period starting around January 15th.
            In parallel a new MC production of 20 M events is scheduled from now up to the end of December. We are currently in the validation phase based on 1M events. When validated the production will run full speed.
            For next year we intend to run MC production continuously.
            The goal is to simulate ~50M events in the first quarter and to double every quarters.
            Are also foresee:
          • a Calibration Data Challenge, ~ March 07
          • a "Dress Rehearsal" defined as a complete exercise of the full chain from trigger to analysis, ~June 07
          • We are putting in place "Data Transfer Functional" tests. The goal is to check the Tier-0->Tier-1->Tier-2 transfer chain. A first one was run in September a new one is scheduled for this month after the Tier-0 scaling test. For next year we would like another test just after the Tier-0 scaling test of January 2007. We would also like to study the possibility to have this kind of test as part of SAM.
  • 17:15
    Review of action items 20m
    more information
  • 17:35
    AOB 5m