Related to:
Netprobe crashing, disconnecting, empty dataviews, Gateway-probedata shows down probes
Problem
- You notice a Netprobe has disconnected from the Gateway and monitored data is missing
- You have set-up a monitor-of-monitors and it reports a Netprobe is down in the Gateway-Probe dataview
Possible Cause(s)
Root Cause 1: The server the Netprobe is running on has crashed or been disconnected from the network
Root Cause 2: The Netprobe has triggered a Memory Protection restart
Root Cause 3: The Netprobe has terminated because of an error condition
Root Cause 4: The Netprobe depends on many external APIs and libraries for a variety of plugins, such as database libraries, middleware, Java etc. and sometimes if there is a fault in one of these then the Netprobe can also fail
Root Cause 5: There is a shortage of resources on the server that means the Netprobe cannot continue to run. Typically this is a disk space issue but will also sometimes be triggered by memory shortages and the Operating System terminating processes based on a selection algorithm that the Netprobe has little control over
Root Cause 6 - AV (Antivirus) or SELinux is enforcing policies
Possible Solution(s)
- Solution Root Cause 1 - Establish if the server the Netprobe normally runs on is up and accessible
- If you have a Gateway Probe plugin configured in your Gateway then you can check the connectionStatus column here. Normally it would say Up and any other status reflects a problem. The full list of values is given in the documentation for the plugin, but here is the list from there:
- Unknown
- Up
- Down
- Unreachable
- Rejected
- Removed
- Suspended
Note: The connection state 'Unreachable' indicates that the probe is unreachable, but not necessarily the server hosting it. For example, the probe might be unresponsive or its port might be in use.
-
The above means that you cannot use this status to check on the state of health of the server itself. If the connectionStatus is Down or Unreachable you should check the server health and the process, but the other non-Up states will hint that there is another issue
- If you have a Gateway Probe plugin configured in your Gateway then you can check the connectionStatus column here. Normally it would say Up and any other status reflects a problem. The full list of values is given in the documentation for the plugin, but here is the list from there:
- Solution Root Cause 2 -If the log file says something like "ERROR: NetProbe Restart Message" then this is most likely the Netprobe Memory Protection feature protecting your server from a potential sudden increase in memory usage by the Netprobe. The linked document explains more and how to go about monitoring, and if necessary, tuning the settings.
- Solution Root Cause 3 - If you find the server is OK but the Netprobe process itself is not running then first check the Netprobe log file. The last entries in the log should give a hint as the the cause of the process termination, which could be a crash but may also be intentional such as a SIGTERM signal from a kill command or, on Windows, the Service being stopped.
- Solution Root Cause 4 - Check the version of the Netprobe (and other Geneos components) to ensure that you are up-to-date with releases and system requirements, we operate in a continuous release cycle with fixes and improvements.
- Solution Root Cause 5 -Check the server in general to ensure it it "healthy", has enough disk, memory, file descriptor, etc.
- Solution Root Cause 6- Make sure your Antivirus solution or SELinux have been configured with the correct exclusions or policies to allow the operation of the Netprobe.
Related Articles
If Issue Persists
- Please contact our Client Services team via the chat service box available in any of our websites or via email to support@itrsgroup.com
- Make sure you provide the below information to us:
- Netprobe logs
- Core or crash dumps if any exists, The logs may provide enough information to narrow down any know issues
Comments
0 comments
Please sign in to leave a comment.