With Geneos you can interrogate a shared memory segment using the RMC plugin. The default workflow of the plugin is that it does an rmcTest followed by an rmcGet if succesful.
Why would you skip the rmcTest?
A faulty shared memory test will stop the plugin from retrieving data.
What could make the shared memory faulty?
The most common cause we have seen is when a component (ADS or ADH) has been restarted manually with a kill command. When using the proper stop and start scripts the shared memory segment for the process gets cleared out. If not done properly, the segment remains and gets re-attached to the new process instance in a fault state. The best way to check before starting up the ADH or ADS is to do an ipcs -a command and make sure that the semaphore key has disappeared from the table (50 for ADH and 52 for ADS)
Where does the timing record fit in?
If you are running a box which is particularly prone to faulty shared memory, you can replace the rmcTest with a heartbeat detection. This is the timing record. The ADH and the ADS both publish $host.$instance.<component>.admin.MemoryStats/time every 5 seconds, and as long as we continue to read this heartbeat, we will assume that the memory segment is 'alive' with other current values.
Comments
0 comments
Please sign in to leave a comment.