Question
How to remove or reset a specific service graph (a service's stored performance data)?
Answer
Removing a service check from a host does not remove its related graphs. The PNP side does not have a way to check that a parent still exists. Cleaning up should be simple, though we also provide steps for the more entrenched conditions:
1) The Basic Way
Removing all stored graphing data (performance data => perfdata) from a service or host check is normally a trivial operation. It's just a matter of removing the files related to the check. These are found in the following directory on the OP5 server, replacing ${hostname} with the name of the relevant host:
/opt/monitor/op5/pnp/perfdata/${hostname}/
The above directory will contain two files for every check with processed perfdata. These files have rrd and xml file extensions. The RRD files contains the perfdata used to create the graphs and the XML file contains some metadata used by the npcd system daemon. Both files may be removed.
2) The advanced way
Sometimes not all perfdata will be removed solely by removing the files as explained above. To mitigate system I/O resource utilization, the RRD files are not updated in real-time. Instead, the updates are sent to a system daemon called rrdcached. This cached and volatile data is being written to the RRD files stored on disk every now and then (using a varying time interval). It also occurs every time a new graph is generated, such as when looking at a service graph in the web interface. Even if the RRD file has been removed, it might reappear as rrdcache flushes its cache. This could also lead to old and unexpected perfdata being written alongside new perfdata.
The following steps describe safe removal of all perfdata from a check, including any cached data:
- The npcd system daemon processes all new and raw perfdata and sends it to the rrdcached daemon. Pausing this process ensures that no race condition will occur between steps 2 and 3 below. The process can be temporarily paused by shutting down npcd using the following command:
# service npcd stop
- Force rrdcached to flush any cached perfdata belonging to a specific RRD file by sending a FLUSH command to it. This is performed using inter-process communication (IPC) via rrdcached's UNIX socket file. This has been made trivial thanks to the unixcat command. Make sure to substitute the ${hostname} and ${checkname} strings as they appear in the file system:
# echo FLUSH /opt/monitor/op5/pnp/perfdata/${hostname}/${checkname}.rrd | unixcat /opt/monitor/var/rrdtool/rrdcached/rrdcached.sock
- It is now safe to remove the RRD and XML files. As with the previous step, make sure to substitute the ${hostname} and ${checkname} strings:
#rm /opt/monitor/op5/pnp/perfdata/${hostname}/${checkname}.{rrd,xml}
- Restart the npcd daemon to make sure that any new perfdata will be processed:
# service npcd start
IMPORTANT: Not firing up npcd again will make most graphs look a bit empty after a while. The system will also end up with large amounts of unprocessed perfdata -- gigabytes of data in large setups.
Comments
0 comments
Please sign in to leave a comment.