ROC Managers' Meeting <big>- Note the change of week</big>
→
Europe/Zurich
CERN
CERN
Nick Thackray
Description
Actions, https://edms.cern.ch/document/753089
Minutes, https://edms.cern.ch/document/829338
This meeting is 10:00 to 11:30 UTC (11:00 to 12:30 Swiss local time).
Phone number is: +41 22 767 6000
Access code is: 0147097
Or click here:
https://audioconf.cern.ch/call/0147097
The conference call opens 5 minutes before the meeting starts.
-
-
11:00
→
14:00
ROC Managers' Meeting
- 11:00
-
11:00
Feedback on minutes of last meeting 1m
- 11:15
- 11:25
-
11:35
SAM Critical Tests 20m
-
Review of current set of critical testsCE - Replica Management - BrokerInfo - Software Version (WN) - CA certs version - CSH test - Test if the service host certificate is valid. - Job submission gCE - Job submission - BrokerInfo - CA certs version - CSH test - Replica Management - Software Version (WN) SE - Copy and register a file to the SE - Copy a file back from the SE - Delete a file from the SE SRM - Delete a file from the SRM (advisory-delete) using lcg-del - Copy a file back from the SRM (get) using lcg-cp - Store file to SRM (put) using lcg-cr BDII - BDII Node Check site-BDII - GIIS Perf Check: - GIIS Sanity Check: FTS - FTS in BDII according to lcg-infosites - List FTS channels - Test if the service host certificate is valid. LFC - Check we can do a lfc-ls on /grid/ - Check we can create a file in the LFC for this VO - Test if the service host certificate is valid. MyProxy - Test if the service host certificate is valid. RB - Time for RB to match make a simple GOC test job - Test if the service host certificate is valid. gRB - Test if the service host certificate is valid. RGMA - Test if the service host certificate is valid. VOBOX - Check we can do gsissh to the node VOMS - Test if the service host certificate is valid.
-
Procedure for making SAM tests criticalProcedure to decide on new critical tests for ops VO: 1. request coming from VO/ROC 2. evaluation by SAM team; if no objection: 3. SAM team (and/or selected ROC) produce a report of which sites in which regions are failing. 4. report submitted to ROC managers and COD with proposal to make the test critical 5. Announce 2 weeks in advance to ops meeting by which point the tests must be frozen returning "Error" , "Warn" or "OK" now as they will once critical; follow up with ROCs/sites failing the test (who will follow up to be decided on a case by case basis) 6. The SAM team or the test maintainer will broadcast the announcement 1 week and 1 day before agreed date to become critical 7. 2 weeks after step 5), set it to Critical
-
-
11:55
Change request procedures for core operational tools 10mWe need to have a Change Request (CR) procedure for each of the grid operations services
(SAM/FCR, SAM Admin's page, CIC portal, GOC DB, GGUS, GStat).
Below is a "straw-man" of points that should be included in the CR procedure of each service.
1) Single point of entry to the process
o For example, email address, web form, etc.
o CRs arriving by any other route will be rejected
2) Standard form/template so that it is clear for the requestor what information they must provide:
- Name of requestor
- e-mail of requestor
- VO of requestor
- Title/summary of request
- Full description of request
- Priority from requestor's point of view
3) Publicly viewable list of new (unprioritized) requests, i.e. a "wish list"
- Web page, etc.
- CRs should be given a unique ID
4) Regular review of requests by stakeholders (incl. ROC managers)
5) Publicly viewable list of prioritized requests (schedule of work)
- Web page, etc.
- Each item should include:
- Level-of-effort estimates
- Estimates of completion date
- Indication of progress made
6) Clear contact point for questions, feedback, etc.
-
12:05
Change of headings in RC- ROC reports 10mAfter the appearance of the new "Availability reports" heading in the RC/ROC, we notice that many sites are confused while filling the different text boxes both on the RC and ROC reports. To make it clearer, we would like to suggest a renaming of the headings in the following way: ROC reports (same suggestion applies to RC reports): - Non-Functional Sites SHALL WE REMOVE THESE FIELD? Is there any ROC making use of it? - Scheduled Downtime LEAVE AS IT IS - Major Operational Issues Encountered During the Reporting Period CHANGE TO: Site Operational Report and Add as explanation: It should cover what happened at the site during the week: e.g. interventions and upgrades without downtime - Points to Raise at the Operations Meeting LEAVE AS IT IS - Availability report LEAVE AS IT IS and Add as explanation: This field should be filled by sites with unavailability equal or longer than 2 hours
- 12:15
-
12:25
AOB 5m
- SAM review
- ROCs to remind sites to sign up for the Operations Workshop in June
-
11:00
→
14:00