ROC Managers' Meeting <big>- Note the change of week</big>

Europe/Zurich
CERN

CERN

Nick Thackray
Description
Actions, https://edms.cern.ch/document/753089 Minutes, https://edms.cern.ch/document/829338 This meeting is 10:00 to 11:30 UTC (11:00 to 12:30 Swiss local time). Phone number is: +41 22 767 6000 Access code is: 0147097 Or click here: https://audioconf.cern.ch/call/0147097 The conference call opens 5 minutes before the meeting starts.
    • 11:00 14:00
      ROC Managers' Meeting
      • 11:00
        Admin matters 15m
        • Quarterly Report
        • Meetings
        • Milestones
        • Deliverables
        • Moderators
        Speaker: Alistair Mills
        more information
      • 11:00
        Feedback on minutes of last meeting
      • 11:15
        GGUS reports 10m
        Speaker: Diana Bosio / Maria Dimou
        • TPM Monitoring
          TPM moitoring reports
        • GGUS ticket escalation report
          Report
      • 11:25
        Site down-time procedures & tools 10m
        • Proposed EGEE Procedures for Service Interruptions
          Speaker: Maite Barroso
          document
        • Proposal for site downtime broadcasting and reporting
          Speaker: Helene Cordier
      • 11:35
        SAM Critical Tests 20m
        • Review of current set of critical tests
          CE - Replica Management - BrokerInfo - Software Version (WN) - CA certs version - CSH test - Test if the service host certificate is valid. - Job submission gCE - Job submission - BrokerInfo - CA certs version - CSH test - Replica Management - Software Version (WN) SE - Copy and register a file to the SE - Copy a file back from the SE - Delete a file from the SE SRM - Delete a file from the SRM (advisory-delete) using lcg-del - Copy a file back from the SRM (get) using lcg-cp - Store file to SRM (put) using lcg-cr BDII - BDII Node Check site-BDII - GIIS Perf Check: - GIIS Sanity Check: FTS - FTS in BDII according to lcg-infosites - List FTS channels - Test if the service host certificate is valid. LFC - Check we can do a lfc-ls on /grid/ - Check we can create a file in the LFC for this VO - Test if the service host certificate is valid. MyProxy - Test if the service host certificate is valid. RB - Time for RB to match make a simple GOC test job - Test if the service host certificate is valid. gRB - Test if the service host certificate is valid. RGMA - Test if the service host certificate is valid. VOBOX - Check we can do gsissh to the node VOMS - Test if the service host certificate is valid.
        • Procedure for making SAM tests critical
          Procedure to decide on new critical tests for ops VO: 1. request coming from VO/ROC 2. evaluation by SAM team; if no objection: 3. SAM team (and/or selected ROC) produce a report of which sites in which regions are failing. 4. report submitted to ROC managers and COD with proposal to make the test critical 5. Announce 2 weeks in advance to ops meeting by which point the tests must be frozen returning "Error" , "Warn" or "OK" now as they will once critical; follow up with ROCs/sites failing the test (who will follow up to be decided on a case by case basis) 6. The SAM team or the test maintainer will broadcast the announcement 1 week and 1 day before agreed date to become critical 7. 2 weeks after step 5), set it to Critical
      • 11:55
        Change request procedures for core operational tools 10m
        We need to have a Change Request (CR) procedure for each of the grid operations services
        (SAM/FCR, SAM Admin's page, CIC portal, GOC DB, GGUS, GStat).
        Below is a "straw-man" of points that should be included in the CR procedure of each service.


        1)       Single point of entry to the process

        o        For example, email address, web form, etc.

        o        CRs arriving by any other route will be rejected

        2)       Standard form/template so that it is clear for the requestor what information they must provide:

          • Name of requestor
          • e-mail of requestor
          • VO of requestor
          • Title/summary of request
          • Full description of request
          • Priority from requestor's point of view

        3)       Publicly viewable list of new (unprioritized) requests, i.e. a "wish list"

          • Web page, etc.
          • CRs should be given a unique ID

        4)       Regular review of requests by stakeholders (incl. ROC managers)

        5)       Publicly viewable list of prioritized requests (schedule of work)

          • Web page, etc.
          • Each item should include:
            • Level-of-effort estimates
            • Estimates of completion date
            • Indication of progress made

        6)       Clear contact point for questions, feedback, etc.

      • 12:05
        Change of headings in RC- ROC reports 10m
        After the appearance of the new "Availability reports" heading in the RC/ROC, we notice that many sites are confused while filling the different text boxes both on the RC and ROC reports. To make it clearer, we would like to suggest a renaming of the headings in the following way: ROC reports (same suggestion applies to RC reports): - Non-Functional Sites SHALL WE REMOVE THESE FIELD? Is there any ROC making use of it? - Scheduled Downtime LEAVE AS IT IS - Major Operational Issues Encountered During the Reporting Period CHANGE TO: Site Operational Report and Add as explanation: It should cover what happened at the site during the week: e.g. interventions and upgrades without downtime - Points to Raise at the Operations Meeting LEAVE AS IT IS - Availability report LEAVE AS IT IS and Add as explanation: This field should be filled by sites with unavailability equal or longer than 2 hours
      • 12:15
        Review of action items 10m
        list of actions
      • 12:25
        AOB 5m
        • SAM review
        • ROCs to remind sites to sign up for the Operations Workshop in June