This article will step you through using an event handler in OP5 Monitor to restart services, processes, and daemons that have stopped responding. We will focus on Windows targets (hosts) using the NSClient++ agent, also known as NCSP.
Our demonstration example will restart Microsoft's Internet Information Service (IIS) on a Windows Server. Rather than make you bounce from link to link, we will cover all the steps in order: from installing the NSCP agent in OP5 mode to getting automated confirmation that the process restarted. You can of course use event handlers to initiate any kind of action upon a state-change of a host or a service, such as writing an entry to a log or power cycling an appliance server. As you get comfortable with the main steps, you can review our more advanced steps in the section below to add support for additional arguments from the OP5 server side.
Please note: This document is in transition -
- NSCP has improved dramatically since this article's first publication in 2013. Thus we are revising and simplifying the steps to match its new configuration approach and modern implementation for multiple scripting languages. You can read a lot more at the NSCP article about external scripts.
- We also provide more Unix-centric steps [in an article to be linked later]. However NSCP also supports Linux and has a superior authentication setup than the traditional NRPE agent. Thus we will improve this document over time to include Linux and Windows steps. OP5 would like its customers to feel as comfortable as possible with NSCP as their agent of choice, as our preferred agent, and as the simplest way to get big and small server farm deployments underway and in compliance.
Overview - the three main parts
Upon a state change of a host or service an event handler can be executed. Every host and service object has three related advanced settings that control this behavior:
event_handler ? the command to call
event_handler_args ? the arguments to pass along to the command
event_handler_enabled ? the on-off switch for the event handler
The event handler that you define will be executed whenever a host or service state change occurs. A event handler is a command just like your other check-commands ? a configuration object with a name and a command_line that may be configured to handle arguments. The typical event handler is a reference to a script.
The script that the event handler calls should be able to perform more than one action, depending on which kind of state change just occurred. It's a good idea to include logging in your event handler script. A typical action performed by an event handler script is to call a NRPE (Nagios Remote Plugin Executor) command on a remote server. The remote command can for example execute a local script which restarts a service or process.
The IIS example
In this example we assume that we already have a service configured that test the availability of a website. If this service goes critical, we want to use an event handler to restart the IIS service on the remote server.
Important files, commands, and addresses
|Windows target script||restartiis.bat|
|Event handler script||restart_winsvc.sh|
|NRPE command name||restart_iis|
|Windows host (target server)||192.168.1.8|
On the Windows target
First we script and test the restart action for the IIS service.
- Create a script that stops and starts the IIS service. Save your script to the following path and filename:
C:\Program Files\NSClient++\scripts\custom\. This script must meet the standards for a Nagios plugin response, such as including an explicit 'OK' reply upon completion and a return value. Here is an example that meets the NRPE and OP5 criteria:
- Create or edit
- Add the modules section just as below to turn on the External Scripts module in NSCP, then add a reference to a reference like the one below to make a tag that holds the NSCP-based path for the batch script:
- Restart the NSClient++ system service to reload the configuration.
- Test your restart action end-to-end from the command line interface at your OP5 server (via ssh or directly at console):
/opt/plugins/check_nrpe -H 192.168.1.8 -c restart_iis
- If your test successfully restarted the IIS service, you can now continue with the scripting and configuration on the OP5 Monitor server.
On the OP5 Monitor server
First create a script to which your new event handler command can point. The script must be able to handle different state-types and have some kind of logging. An example follows:
This simple script executes an NRPE command on a remote server if it is called with a couple of arguments where the first one is CRITICAL and the second one is HARD. It logs an informational line plus the output from the execution to a log file.
Event handlers will still run during downtimesIf a service or host object is currently in downtime, this will not stop event handlers from running. If you don't want event handlers to fire for objects in downtime, this logic needs to be incorporated into the event handler script itself. For more information, see the command_name definition below.
Now place this script in
Change the permissions to 755:
chmod 755 /opt/plugins/custom/restart_winsvc.sh
Use Configure in OP5 Monitors and add a new "command" (Configure -> Commands).
Configure your new command like this:
|command_line:||$USER1$/custom/restart_winsvc.sh $SERVICESTATE$ $SERVICESTATETYPE$ $HOSTADDRESS$ restart_iis|
The macros above are just examples of what you may want to pass to your script. If you also want to check whether the object is in a downtime, so that the script can exit immediately in such a situation, you may want to include the $HOSTDOWNTIME$ and $SERVICEDOWNTIME$ macros, which will resolve to a number. If this number is greater than zero, the object is in a downtime.
When you have saved your new command, you edit the host where you want to use your new event handler command and select the service where you want to use it. Click the Advanced link, set the parameter "
event_handler" to "
restart_winsvc", and Save your configuration. You should now have a working event handler.
As of version 7.1.6, event handlers are executed on all peers in a load balanced/high availability setup. If you use peering and event handlers, make sure to implement logic in your scripts to prevent execution by more than one peer.
Adding support for arguments
The above configuration can be modified to accept arguments, such as which service should be restarted. This will allow you to pass different services as tags to control SQL Server's service or any other process.
Windows service names are the short name for a service, found in Manage this computer -> Services -> Properties -> "Service Name" (Not Display Name)
It's important to understand that adding support for arguments involves modifying the whole transport of data between the OP5 Monitor server and the target Windows server.
The configuration changes
On the Windows target
Add the following to 'custom.ini' to allow NSCP's NRPE module commands t:
This makes NRPE able to process requests with arguments.
Make a copy of restartiis.bat to restartservice.bat and change it to process a supplied argument.
Change your previously configured command definition in
restart_service="C:\Program Files\NSClient++\scripts\restartservice.bat" $ARG1$
Please note that we add
$ARG1$ /and/ change the name of the command
Restart the NSCP service to load the changes.
On the OP5 Monitor server
First we need to modify the event handler-script. An example follows.
The following has been added to the event handler-script:
We have also changed line were we execute
Now we need to edit our check_command (Configure -> Commands) and edit the command restart_winsvc.
The new command-line should now look like this:
$USER1$/custom/restart_winsvc.sh $SERVICESTATE$ $SERVICESTATETYPE$ $HOSTADDRESS$ restart_service $ARG1$
Here we have added
$ARG1$ and changed which NRPE command we call.
Finally we need to change which arguments that are supplied by the
event_handler_args in the service-definition (Configure -> select host -> Services for host -> select service -> Advanced):
You should now have a working configuration we're you can supply as an event handler argument, which Windows service name to restart upon problems.