OP5 Monitor - How to successfully implement network monitoring

To implement a monitoring can be a very wide concept. This document concretizes what such an implementation consists of, in order to give you an overview of what needs to be done, who to involve and how to prioritize.

Main Goals

If implemented correctly, monitoring can really influence and support how you work, what you spend your time on and how you make decisions. Below are some main goals listed which are important to keep in mind in order to prioritize wisely how you spend your time when building a monitoring configuration.

a. Get in control of your IT-services

Every IT-service is dependent on a number of network-services, processes, servers, network equipment, and network connections. By monitoring all the dependencies, you gain understanding of how different problems affect your services. Using that information, you can make informed decisions on how to best manage and further develop your operations environment.

b. Work proactively

With correctly configured thresholds, you will get warnings /before/ things stop working. By reacting on warnings, you have already fixed the problem, or are already working on the problem, when users start reporting errors.

By using OLA-reporting (Operational Level Agreement reports) you can fix problems /before/ they start effecting your services. OLA-reports are reports that include all the dependencies of an IT-service.

c. Tie the core business closer to IT-operations

By identifying "system owners" in the organization, you can make colleagues understand the importance of IT-systems. An originator/supplier relationship can be established between the system owners and the monitoring administrators to aid in finding a good routine for adding more monitoring. The work of creating and managing SLA-reports (Service Level Agreement reports) can be delegated to the system owners who can schedule the reports for automatic delivery to managers responsible for service availability.

Planning the deployment

When planning how to deploy monitoring, split the work that lies ahead into three to ten stages and populate each stage with one to three tasks (types of services to add). Plan for a test- and adjustment-period of a least a week between the end of one stage and the start of the next. The test- and adjustment-periods are needed to be able to remedy errors in your IT-environment that has been discovered in each stage, and to confirm that thresholds and check-periods are correctly configured/adjusted. Use alert summary reports to pinpoint what needs to be remedied or adjusted in each test- and adjustment-period.

In the planning stage, the good old OSI-model can be of use.

Order of importance

Below is a list of tasks or service-types to add to your configuration, listed in order of importance. How important it is to monitor different service-types is of course specific for each organization, but the list below can be a good starting-point.

In short you start by pinging your servers and finish up by adding monitoring of log filters matching on "bad signs" (early warnings) in your logs.

Task /service-type	Description	Applicable OP5 products	Commonly used plugins
hosts	Check host availability and graph ICMP ping statistics	op5 Monitor	check_host, check_icmp
environmentals	Monitor and graph temperature, humidity, and floor wetness	op5 Monitor	check_tempraxe, check_em1,
ups	Monitor and graph status, load per phase and estimated battery runtime	op5 Monitor	check_snmp, check_apc, check_ups
network services basic	Check availability of network services like dns, imap, http, smtp and graph their response time	op5 Monitor	check_tcp, check_dig, check_http, check_imap
agent services	Monitor and graph OS resource utilization (disk, cpu, memory, swap, processes, connections, cache)	op5 Monitor	check_nt, check_nrpe, check_nwstat
services, daemons, processes and jobs	Monitor Windows services and processes, unix/linux daemons, processes and OS400 subsystems and jobs	op5 Monitor	check_nt, check_nrpe, check_as400
network services advanced	Advanced monitoring of network services, like advanced database-or website-monitoring	op5 Monitor	check_mysql, check_sql, check_oracle, check_webinject, check_http
graphs for traffic/errors	Monitor and graph traffic (bandwidth usage) and errors/discards on relevant NICs/ports on switches/routers. Locate and remedy sources of broken packets.	op5 Monitor	check_traffic, check_iferrors, check_snmpif
hardware services	Check hardware status (disk-arrays, temperature, power-supplies, fans, memory modules)	op5 Monitor	check_openmanage, check_hpasm, check_snmp, check_snmp_env, check_ipmi_sensor
logs	Collect/centralize and archive Eventlogs/syslogs and application-logs. Monitor for bad messages.	op5 Monitor + LogServer extension	check_ls_log, check_log2

Articles in this section

OP5 Monitor - How to successfully implement network monitoring

Main Goals

a. Get in control of your IT-services

b. Work proactively

c. Tie the core business closer to IT-operations

Planning the deployment

Order of importance

Comments

Articles in this section

Main Goals

a. Get in control of your IT-services

b. Work proactively

c. Tie the core business closer to IT-operations

Planning the deployment

Order of importance

Related articles