The gateway and netprobe logs should be checked for error messages or clues if this happens. This article talks about additional debug options that can be enabled.
The gateways and netprobes have heartbeat checking mechanism with each other. The default interval is 70 seconds for netprobes, and 75 seconds for gateways (the options can be adjusted in Gateway Setup Editor if needed). If the reply is not received within this time, the connection is terminated and re-established.
If a netprobe becomes disconnected from the gateway, one possibility is that certain sampler or plugin takes a long time for processing. The cause will need to be narrowed down to the sampler, which could be due to very large files monitored or complex regular expression configured, etc. In a certain case, an user tried to monitor a remote NAS with a wildcarded filename specified, it started to have problem after the folder grown to 10000's of files (which caused "ls -l" command to take minutes to return, so a problem is much expected).
The netprobe includes debug options that can be enabled to troubleshoot similar issues. In Gateway Setup Editor, select Probes => (Probe Name) => Debug tab. Then press the Add New button, and add SAMPLING with Setting * as in the screen capture.
The netprobe log will then contain the start and end times when a sampler is executed. This will help identify if a certain sampler taking a long time to finish.
<Mon Sep 21 12:45:03> DEBUG: SAMPLING:* IN : CPU[CPU]
After the sampler is identified, the gateway should be reviewed on any configuration issues with the sampler.
Please sign in to leave a comment.