Problem:
The newly added host repeatedly transitions to a DOWN state, with its service checks returning CRITICAL. However, the host remains reachable and responds to ping requests.
Rechecking the host temporarily resolves the issue, as the host returns to an OK state, but it shortly reverts back to DOWN, causing the problem to repeat.
This behavior suggests an intermittent or configuration-related issue rather than an actual connectivity problem.
Possible cause:
If the host is reachable and operational (for example, you can access it via SSH or other means) but continues to appear as DOWN in Opsview, the issue may be related to the Host Check Command in use. Different devices and environments can respond differently to standard ICMP (ping) checks. In some cases, firewalls, network policies, rate-limiting, or latency can cause ICMP to be blocked or delayed, resulting in false DOWN states even though the host is actually available.
Possible solution:
Update the Host Check Command to one that is more appropriate for the device or environment being monitored. For example, you may choose a TCP-based host check (such as checking a specific open port) or another protocol that better reflects the actual availability of the host from the monitoring server’s perspective. This helps ensure the check aligns with your network policies, security restrictions, and expected latency.
Additional Option (Custom Host Check Command)
If the available host check commands do not meet your requirements, you may create a custom Host Check Command by developing or using a custom plugin. This allows you to tailor the availability check to your specific environment and monitoring needs, ensuring more accurate host status reporting.
Suggested Improvement / Best Practice
Before changing the Host Check Command, it’s also worth verifying:
Whether ICMP (ping) is allowed between the monitoring server/collector and the host.
If there are firewall rules, network ACLs, or security policies blocking or rate-limiting ICMP.
The timeout and retry settings of the current host check, as increasing these can sometimes resolve false DOWN states caused by latency.
This approach helps avoid unnecessary changes while ensuring host availability is monitored accurately.
Related article(s):
- Hosts, Host Groups, and Host Check Commands
- docs.itrsgroup.com/docs/opsview/cloud/configuration/hosts-groups/host/index.html#host-check-command
If you need further help:
-
Please contact our support team via the chat service box on any of our websites or raise a support request.
-
Make sure you provide us with:
- Background of the issue or request.
- Use cases, requirements, business impact, etc.
- Encountered error messages.
- Log files or diagnostic files.
- Screenshots.
- And other important information relevant to your inquiry.
Comments
0 comments
Please sign in to leave a comment.