Purpose

The purpose of this article is to describe how OP5 Monitor, Naemon or Nagios can be used with the check_vmware_v2 plugin to monitor your VMware vSphere infrastructure.

What can be monitored?

The plugin can be used to monitor the infrastructure managed by a VMware vCenter server or a VMware ESXi server; such as its datacenters, clusters, hosts and virtual machines. It can check a large range of different metrics and statuses depending on the targeted infrastructure; for example CPU load, memory usage, network activity, runtime states, etc..

The plugin uses the Python SDK for the VMware vSphere API, which makes it compatible with the previous four versions of vSphere. The Python SDK does not need to be installed on the OP5 Monitor server for the plugin to work.

Installation

The check_vmware_v2 plugin should be installed by default on Monitor 8. It is also available for manual install from the op5 Monitor Updates repository:

yum install naemon-check-vmware

The check_vmware_v2 plugin is not supported on previous versions of Monitor.

Note: It is currently not possible to use this plugin on a slim poller.

Configuration

The service configuration files are installed in /etc/op5/check_vmware.

The service is installed with a working configuration, but you may for example want to configure the port used by the check_vmware service process, or the vCenter authentication details known by the service.

Service configuration

The service.cfg file is used to specify configuration variables for the service WSGI application.

Most of the available configuration variables are documented by the respective library they belong to:

Service-specific configuration

The Check VMware service defines the following additional configuration variables.

SERVERS

The SERVERS configuration can be used to set up default authentication credentials for the service, so that username and password does not have to be passed via the CLI. SERVERS should be set to a Python dict containing the authentication details for the vSphere servers the service should know about. See example below:
```
SERVERS = {
    'vcenter.op5.com': {
        'username': 'username',
        'password': 'password',
        'ignore_ssl': False
    }
}
```
Specifying a single host in SERVERS makes that host the default host used when the --host option is not specified via the CLI.
VSAN_SDK_ENABLED

The VSAN_SDK_ENABLED configuration controls whether or not the service attempts to load the Virtual SAN Management SDK, which enables more vSAN related checks. By default this configuration is set to False. Set it to True to enable the vSAN SDK.

Note that the following vSAN related checks do not require the vSAN SDK to function and are always available: vsan.health, vsan.usage and vsan.disk_usage. All the other vSAN related checks depend on the vSAN SDK.

Gunicorn configuration

The gunicorn.cfg file is used to configure the Gunicorn HTTP server that runs the service as a WSGI application. This is where we can configure the host and port the service runs on.

See http://docs.gunicorn.org/en/stable/settings.html for documentation on the available Gunicorn configuration.

Starting the service

The plugin has a service that runs in the background to avoid making a new HTTP connection to VMware for every check. The plugin service also needs a running Redis instance that is used for caching.

Depending on your architecture, run the commands below.

Starting the service on EL6

service redis start

service check_vmware start

And to make sure the services start after a system reboot:

chkconfig redis on

chkconfig check_vmware on

Starting the service on EL7

systemctl start redis

systemctl start check_vmware

And to make sure the services start after a system reboot:

systemctl enable redis

systemctl enable check_vmware

Running the check command

The check command is installed in /opt/plugins/check_vmware_v2.

Run check_vmware_v2 --help to list all the command options.

Available counters

The --list-counters and --list-all-counters options can be used to list all available counters.

The --list-counters option lists counters actually available on a specified managed entity.

The --list-all-counters option lists all counters known by the vCenter server, but they may not necessarily be available on any accessible managed entity.

Add the --verbose option for more information about the listed counters.

Examples

Check the current CPU MHz usage of a virtual machine:

> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t vm -n pn-dhcp-debian8 cpu.usagemhz.average

CHECK_VMWARE OK - cpu.usagemhz.average is 10MHz | 'cpu.usagemhz.average'=10MHz

Check the current memory usage of a host system:

> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com mem.usage.average

CHECK_VMWARE OK - mem.usage.average is 28.8% | 'mem.usage.average'=28.8%

Check the current memory usage of a host system without going through the vCenter:

> /opt/plugins/check_vmware_v2 --host labesxi1.it.op5.com mem.usage.average

CHECK_VMWARE OK - mem.usage.average is 28.8% | 'mem.usage.average'=28.8%

Check the current storage usage of a host system and list all its datastores:

> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage -vvv

CHECK_VMWARE OK - all 7 results are ok

ok: labesxi1-local: 14.83% (19.06GB / 128.5GB)

ok: LabStorage1: 29.32% (600.46GB / 2047.75GB)

ok: LabStorage2: 24.77% (507.14GB / 2047.75GB)

ok: LabStorage4: 24.67% (505.27GB / 2047.75GB)

ok: LabStorage3: 24.13% (494.18GB / 2047.75GB)

ok: LabStorage5: 25.68% (525.85GB / 2047.75GB)

ok: LabStorage6: 24.04% (492.2GB / 2047.75GB)

| 'labesxi1-local'=14.83% LabStorage1=29.32% LabStorage2=24.77% LabStorage3=24.13%   LabStorage4=24.67% LabStorage5=25.68% LabStorage6=24.04%

Check the current storage usage of a specific datastore:

> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage.LabStorage1

CHECK_VMWARE OK - LabStorage1: 29.32% (600.46GB / 2047.75GB) | LabStorage1=29.32%

Check the current storage usage of a list of specific datastores:

> /opt/plugins/check_vmware_v2 --host vcenter.op5.com -t host -n labesxi1.it.op5.com vmfs.usage.LabStorage1,LabStorage2

CHECK_VMWARE OK - all 2 results are ok | LabStorage1=29.32% LabStorage2=24.77%

Troubleshooting

Bad magic numbers

If the service fails to start and there is a log entry saying something like:

ImportError: bad magic number in 'check_vmware': b'\x03\xf3\r\n'

Then the problem is likely that there are .pyc files compiled for a different version of the Python interpreter than the one being used. This could happen after a system upgrade where the service had previously been running on a lower Python version. The solution is to remove all .pyc files from the installation so that the new Python interpreter is forced to re-compile them.

Articles in this section

OP5 Monitor - How to monitor VMware vSphere infrastructure with check_vmware_v2

Purpose

What can be monitored?

Installation

Configuration

Service configuration

Service-specific configuration

Gunicorn configuration

Starting the service

Starting the service on EL6

Starting the service on EL7

Running the check command

Available counters

Examples

Troubleshooting

Bad magic numbers

Comments

Articles in this section

Purpose

What can be monitored?

Installation

Configuration

Service configuration

Service-specific configuration

Gunicorn configuration

Starting the service

Starting the service on EL6

Starting the service on EL7

Running the check command

Available counters

Examples

Troubleshooting

Bad magic numbers

Related articles