Related to:
A service check suddenly no longer performs a check or is returning incorrect output.
Problem
- A service check has stopped running checks or the service shows incorrect or unexpected output such as "UNKNOWN."
Possible Cause(s)
- The service configuration was changed with missing or incorrect arguments.
- The check_command used has changed.
- An upgrade or migration was done.
- A monitoring agent is down or has changed.
- Missing files for a custom plugin.
Possible Solution(s)
General checking.
On the UI, if an error showing "Changes could not be saved" is shown, you can try to undo the changes by clicking the "Undo" button. If a prompt for a "Complete re-import" is shown, click that. Try to re-do your change and issue a save once more.
An upgrade or migration was done.
If you recently upgraded your system, check that core services for the Monitor environment are running. Core services include Naemon and Merlin. The following commands can help determine the status of these services:
# systemctl status naemon
# systemctl status merlind
To start or restart these services, replace "status" with either "start" or "restart." As an alternative, you can also use the command below to start or restart most core components:
# mon start
# mon restart
If you recently migrated your environment from an older OS or Monitor version, check that the IP address or hostname of your new server has been changed to match the old server's IP address or hostname (in case you are reusing the same details). If the new environment makes checks to different hosts, monitoring agents on those hosts may be expecting communication from a different IP or hostname. Read on further for information on checking monitoring agents on other hosts.
Check on the monitoring agent.
If you are using monitoring agents on your hosts such as NRPE or SNMP to run checks, double-check that the Monitor environment is allowed to communicate with the host in question.
See links in the "Related Articles" section below for more information on monitoring agent setup.
For Linux hosts running NRPE, locate the file /etc/nagios/nrpe.cfg of the host running the agent (i.e., the host being monitored) and check that the allowed_hosts field includes your OP5 Monitor environment's IP address or hostname.
[sandbox@centos7 ~]# grep "allowed_hosts" /etc/nagios/nrpe.cfg
allowed_hosts=127.0.0.1,::1,op5-master-system
For Windows hosts running NSClient, you will need to check the registry settings. The key for NSClient settings is [HKEY_LOCAL_MACHINE\SOFTWARE\NSClient++\settings\default]
. Edit the setting allowed hosts to specify the IP address or hostname of the Monitor environment.
For Linux hosts running SNMP, you will need to check on the files /var/lib/net-snmp/snmpd.conf and/or /etc/snmp/snmpd.conf, depending on whether SNMP v2c or v3 is used.
For SNMP v2c, ensure that the correct community string is used. For SNMP v3, ensure that the correct user and credentials are defined within the setup is used on your service. Check as well that the Monitor environment is allowed to communicate and has sufficient access to the data within the SNMP tree of the host running SNMP.
Checking the configuration of your service
In a lot of instances, you can be guided on where the problem may lie by having a look at the service's configuration page. There is a "Test this check" function available. If the check returns any errors, then this can be used to see the full error and any potential leads regarding the problem:
In the above example, it can be noticed that the check is SNMP-based, meaning SNMP is used as a monitoring agent. The error indicates there is no response from the host. In this case, you would refer back to the host's SNMP configuration to see if OP5 Monitor is allowed to communicate with the host or has access to the SNMP tree.
Another common configuration error is incorrect check_command_args options. If you are unsure about the syntax or arguments that a check_command is expecting, you can click on the "Syntax help" button of the service's configuration page for guidance.
Review custom plugin setup.
If you run custom plugins and have recently upgraded or migrated, it is possible that your custom plugins were not migrated. Double-check your /opt/plugins/custom directory to see if your plugins are present. If not, transfer or migrate them over. You can include these custom files in the op5-backup process by following the details in this article.
Related Articles
- How to configure NSClient++ in the Windows Registry
- How to configure NSClient++ from the Windows command prompt
- How to configure a Linux server for SNMP monitoring
- How to install NRPE agent on CentOS and RHEL
- How to install NRPE agent on Debian and Ubuntu?
If Issue Persists
- Please contact with our Client Services team via the chat service box available in any of our websites or via email to support@itrsgroup.com
- Make sure you provide to us:
- Log files or configuration files of Naemon and/or the monitoring agent used
- Screenshots of the problem
- The output of running "Test this check" on the service configuration page
- Software versions, both for ITRS and others
- Any troubleshooting step already verified from the ones described in this article.
Comments
0 comments
Please sign in to leave a comment.