Symptoms
- Downtime is set on a check and this downtime is visible in the UI
- Downtime exists for the check in the
runtime.nagios_scheduleddowntime`
database table - Notifications are sent for the check despite the above
Cause: disparity in downtime state between the database and datastore
When downtime is scheduled for a check or host, it is set in two places: runtime.nagios_scheduleddowntime
and the datastore of the collector(s) monitoring the host.
Downtime displayed in the UI comes from the runtime.nagios_scheduleddowntime
table. However, downtime stored in the collector datastore is what actually suppresses notifications.
If the datastore on the collector(s) monitoring the host is reinstalled or otherwise corrupted, the downtime is ‘forgotten’ by the collector so is not enforced.
Collector datastore corruption can happen when collectors are redeployed to different clusters.
Solution: recreate_downtime
Using the recreate_downtime
support script, downtime stored in the database can be recreated in the datastores of collectors.
See our page on using the recreate_downtime
script.
Comments
0 comments
Please sign in to leave a comment.