Purpose
The purpose of this article is to describe how OP5 Monitor, Naemon or Nagios can be used with the check_vmware_v2
plugin to monitor your VMware vSphere infrastructure.
What can be monitored?
The plugin can be used to monitor the infrastructure managed by a VMware vCenter server or a VMware ESXi server; such as its datacenters, clusters, hosts and virtual machines. It can check a large range of different metrics and statuses depending on the targeted infrastructure; for example CPU load, memory usage, network activity, runtime states, etc..
The plugin uses the Python SDK for the VMware vSphere API, which makes it compatible with the previous four versions of vSphere. The Python SDK does not need to be installed on the OP5 Monitor server for the plugin to work.
Installation
The check_vmware_v2
plugin should be installed by default on Monitor 8. It is also available for manual install from the op5 Monitor Updates repository:
yum install naemon-check-vmware
The check_vmware_v2
plugin is not supported on previous versions of Monitor.
Note: It is currently not possible to use this plugin on a slim poller.
Configuration
The service configuration files are installed in /etc/op5/check_vmware
.
The service is installed with a working configuration, but you may for example want to configure the port used by the check_vmware
service process, or the vCenter authentication details known by the service.
Service configuration
The service.cfg
file is used to specify configuration variables for the service WSGI application.
Most of the available configuration variables are documented by the respective library they belong to:
Service-specific configuration
The Check VMware service defines the following additional configuration variables.
-
SERVERS
The
SERVERS
configuration can be used to set up default authentication credentials for the service, so that username and password does not have to be passed via the CLI.SERVERS
should be set to a Python dict containing the authentication details for the vSphere servers the service should know about. See example below:SERVERS = {
'vcenter.op5.com': {
'username': 'username',
'password': 'password',
'ignore_ssl': False
}
}Specifying a single host in
SERVERS
makes that host the default host used when the--host
option is not specified via the CLI. -
VSAN_SDK_ENABLED
The
VSAN_SDK_ENABLED
configuration controls whether or not the service attempts to load the Virtual SAN Management SDK, which enables more vSAN related checks. By default this configuration is set toFalse
. Set it toTrue
to enable the vSAN SDK.Note that the following vSAN related checks do not require the vSAN SDK to function and are always available:
vsan.health
,vsan.usage
andvsan.disk_usage
. All the other vSAN related checks depend on the vSAN SDK.
Gunicorn configuration
The gunicorn.cfg
file is used to configure the Gunicorn HTTP server that runs the service as a WSGI application. This is where we can configure the host and port the service runs on.
See http://docs.gunicorn.org/en/stable/settings.html for documentation on the available Gunicorn configuration.
Starting the service
The plugin has a service that runs in the background to avoid making a new HTTP connection to VMware for every check. The plugin service also needs a running Redis instance that is used for caching.
Depending on your architecture, run the commands below.
Starting the service on EL6
service redis start
service check_vmware start
And to make sure the services start after a system reboot:
chkconfig redis on
chkconfig check_vmware on
Starting the service on EL7
systemctl start redis
systemctl start check_vmware
And to make sure the services start after a system reboot:
systemctl enable redis
systemctl enable check_vmware
Running the check command
The check command is installed in /opt/plugins/check_vmware_v2
.
Run check_vmware_v2 --help
to list all the command options.
Available counters
The --list-counters
and --list-all-counters
options can be used to list all available counters.
The --list-counters
option lists counters actually available on a specified managed entity.
The --list-all-counters
option lists all counters known by the vCenter server, but they may not necessarily be available on any accessible managed entity.
Add the --verbose
option for more information about the listed counters.
Examples
-
Check the current CPU MHz usage of a virtual machine:
> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t vm -n pn-dhcp-debian8 cpu.usagemhz.average
CHECK_VMWARE OK - cpu.usagemhz.average is 10MHz | 'cpu.usagemhz.average'=10MHz
-
Check the current memory usage of a host system:
> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com mem.usage.average
CHECK_VMWARE OK - mem.usage.average is 28.8% | 'mem.usage.average'=28.8%
-
Check the current memory usage of a host system without going through the vCenter:
> /opt/plugins/check_vmware_v2 --host labesxi1.it.op5.com mem.usage.average
CHECK_VMWARE OK - mem.usage.average is 28.8% | 'mem.usage.average'=28.8%
-
Check the current storage usage of a host system and list all its datastores:
> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage -vvv
CHECK_VMWARE OK - all 7 results are ok
ok: labesxi1-local: 14.83% (19.06GB / 128.5GB)
ok: LabStorage1: 29.32% (600.46GB / 2047.75GB)
ok: LabStorage2: 24.77% (507.14GB / 2047.75GB)
ok: LabStorage4: 24.67% (505.27GB / 2047.75GB)
ok: LabStorage3: 24.13% (494.18GB / 2047.75GB)
ok: LabStorage5: 25.68% (525.85GB / 2047.75GB)
ok: LabStorage6: 24.04% (492.2GB / 2047.75GB)
| 'labesxi1-local'=14.83% LabStorage1=29.32% LabStorage2=24.77% LabStorage3=24.13% LabStorage4=24.67% LabStorage5=25.68% LabStorage6=24.04%
-
Check the current storage usage of a specific datastore:
> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage.LabStorage1
CHECK_VMWARE OK - LabStorage1: 29.32% (600.46GB / 2047.75GB) | LabStorage1=29.32%
-
Check the current storage usage of a list of specific datastores:
> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage.LabStorage1,LabStorage2
CHECK_VMWARE OK - all 2 results are ok | LabStorage1=29.32% LabStorage2=24.77%
Troubleshooting
Bad magic numbers
If the service fails to start and there is a log entry saying something like:
ImportError: bad magic number in 'check_vmware': b'\x03\xf3\r\n'
Then the problem is likely that there are .pyc files compiled for a different version of the Python interpreter than the one being used. This could happen after a system upgrade where the service had previously been running on a lower Python version. The solution is to remove all .pyc files from the installation so that the new Python interpreter is forced to re-compile them.
Comments
0 comments
Please sign in to leave a comment.