- ROD team update
- Nagios status
- Tier-1 update
ote that RAL was closed both Monday and Tuesday (29/30 August) of last week, so no Tier1 representative at the last DTEAM meeting. During the long weekend services ran as normal. There were some intermittent SAM test failures (on the Atlas SRM and on the non-cream CE, CE06).
Over the weekend (Sunday 5th Sep) there were load issues on the Castor Atlas instance (MCTape service class) The Atlas FTS channels to RAL were reduced (in the end down to 25% of nominal values). These were raised back to 50% of nominal values on Monday, and to 100% this morning.
There was a failure of the RAL Site Access Router that broke network connectivity into RAL from 01:10 to 08:10 on the morning of Monday 5th September. The callout mechanisms that should have notified someone of this failure did not work - resulting in the long site outage. The problem was resolved when staff returned to work on Monday. This also made the GOC DB unavailable for the same time window.
We have an At Risk tomorrow on the LFC FTS & 3D services while regular Oracle updates are applied. These are done in a rolling manner across the nodes so should not result in any downtime.
We have seen intermittent errors for the SAM tests on our one non-cream CE (lcgce06) for the last week or so. Cause not yet understood.
- Security update
-- T2 issues
For how long have sites been using hyperthreading?
-- General notes.
We are reviewing GOCDB roles ahead of the NGI_UK move. Please check your site entries and report any needed updates to Jeremy by Wednesday this week.
BDII crashes - openLDAP versions.
- Tickets
Direct link: http://tinyurl.com/3jjnvca if not working
Indirect link: https://ggus.eu/ws/ticket_search.php (select support unit 'ROC_UK/Ireland' and Creation Date 'Any')