PuppetDB Outage post-mortem and Summary of Impacted Services
28 - online
21 - in person
Minutes summary
- For DBOD, there were two issues to be tackled. The first one was restoring the databases for Puppet manually, and the second was fixing the issue with ProxySQL.
- On the presented timeline regarding the impact of the incident on Hadoop, there was a mismatch in times that has been fixed.
- Every app behind SSO could be affected, at least partially, not only LanDB.
- It seems that the infrastructure was well-prepared for certain failures but not entirely. We were not ready for this particular type of failure (empty fact); otherwise, simply failing the puppet run would have been sufficient. This has now been implemented.
- Security concerns were reminded about relying on the data stored in PuppetDB.
There are minutes attached to this event.
Show them.