4
Nagios: Service checks based on host status Notice This article applies to Nagios Core 2.x and 3.x. Luckily Nagios Core 4 natively manages the inhibition of service notifications when the service parent (for instance its host) is not UP. Read about this and other Nagios 4 Core features at Nagios Core 4: Overview . It is likely that when a host switch to a DOWN state or UNREACHABLE, Nagios inhibit cheking its services: Why checking them if Nagios itself has determined that the host isnot UP? For better or worse this is not true: Nagios keeps on running regular checks on the services on a non-UPhost. The resulting state of e ach service check depends on how it handles the unavailability of the data source. Beyond the advantages of that fact, there are some disadvantages: Too much information produces perplexity, and a set of alarms in services related to a host failure can hide real problems in services from other hosts. Resource consumption related to the implementation of checks predestined to fail. Notification storm related to the host and its services failure. Therefore it seems desirable, if not for all at least for many service types, following some steps to avoid the above problems: 1. Establishing service states to reflect the reality of the situation, such as an UNKNOWN state. 2. Inhibiting notifications related to service state change. 3. Disabling active checks of services while their host is not UP. These steps should prevent, in a major or minor way, the problems related to mesleading information, resource consumption and notification storm. Howto So now the question is: How to do it? There are different approaches, having each one its pros and cons. Far from analyzing all, the best solution seems to be using Nagios external commands for performing all previous tasks every time host status changes.

Nagios - Service Checks Based on Host Status

Embed Size (px)

DESCRIPTION

Manual

Citation preview

Page 1: Nagios - Service Checks Based on Host Status

Nagios: Service checks based on host status

Notice

This article applies to Nagios Core 2.x and 3.x. Luckily Nagios Core 4 natively manages the inhibition of service notifications when the service parent (for instance its host) is not UP. Read about this and other Nagios 4 Core features at Nagios Core 4: Overview.

It is likely that when a host switch to a DOWN state or UNREACHABLE, Nagios inhibit cheking its services: Why checking them if Nagios itself has determined that the host isnot  UP?

For better or worse this is not true: Nagios keeps on running regular checks on the services on a non-UPhost. The resulting state of each service check depends on how it handles the unavailability of the data source.

Beyond the advantages of that fact, there are some disadvantages:

Too much information produces perplexity, and a set of alarms in services related to a host failure can hide real problems in services from other hosts.

Resource consumption related to the implementation of checks predestined to fail. Notification storm related to the host and its services failure.

Therefore it seems desirable, if not for all at least for many service types, following some steps to avoid the above problems:

1. Establishing service states to reflect the reality of the situation, such as an UNKNOWN state.

2. Inhibiting notifications related to service state change.3. Disabling active checks of services while their host is not UP.

These steps should prevent, in a major or minor way, the problems related to mesleading information, resource consumption and notification storm.

HowtoSo now the question is: How to do it? There are different approaches, having each one its pros and cons. Far from analyzing all, the best solution seems to be using Nagios external commands for performing all previous tasks every time host status changes.

Required external commands should be:

ENABLE_PASSIVE_SVC_CHECKS: Enables service status to be set from an external command. Note

that this command itself doesn't set the status, you must use PROCESS_SERVICE_CHECK_RESULT (read on) to do it.

DISABLE_HOST_SVC_CHECKS, ENABLE_HOST_SVC_CHECKS: Disables/Enables checks for all

services of a given host.

Page 2: Nagios - Service Checks Based on Host Status

PROCESS_SERVICE_CHECK_RESULT: Sets the status value for a given service. DISABLE_HOST_SVC_NOTIFICATIONS and

additionallyDISABLE_ALL_NOTIFICATIONS_BEYOND_HOST: Disables notifications for both all services of a given host and all services from all hosts topologically beyond a given host.

ENABLE_HOST_SVC_NOTIFICATIONS and additionallyENABLE _ALL_NOTIFICATIONS_BEYOND_HOST : Makes the opposite of the previous commands.

All these commands must be used on a script designed for managing host status changes. This script migth manage these command line arguments:

Host name, avaliable through the $HOSTNAME$ host macro. Host status, available (in numeric format) through the $HOSTSTATUSID$ host macro.

This could be the script algorithm using metalanguage:

if HOSTSTATUSID=0 the  # Host has changed to an UP status     # Force status for all host services   for each host Service

    # Submit an external command to set, as service status,    # previous current value ($LASTSERVICESTATUSID$ macro)    ExternalCommand(PROCESS_SERVICE_CHECK_RESULT,Service,                    $LASTSERVICESTATUSID:HostName:Service$)  endfor 

  # Enable notifications for all host services  ExternalCommand(ENABLE_HOST_SVC_NOTIFICATIONS, HostName)

  # Enable active checks for all host services  ExternalCommand(ENABLE_HOST_SVC_CHECKS, Hostname)   else  # Host has changed to a non-UP status     # Disable active checks for all host services  ExternalCommand(DISABLE_HOST_SVC_CHECKS, Hostname)

     # Disable notifications for all host services  ExternalCommand(DISABLE_HOST_SVC_NOTIFICATIONS, HostName)

  # Set UNKNOWN (3) status for all host services  for each host Service    ExternalCommand(PROCESS_SERVICE_CHECK_RESULT,Service,3)  endforendif

Page 3: Nagios - Service Checks Based on Host Status

ConfigurationOnce the script is written, you must define a command object for enabling its usage from Nagios:

define command {command_name setSvcStatusByHostStatuscommand_line  -h $HOSTNAME$ -s $HOSTSTATUSID$}

In the previous example, hostname will be passed to the script using the -h argument, and -s argument will be

used to pass host status id.Finally, it will be necessary setting the previous command as host event handler. If the defined solution is suitable for managing all host status changes, previous command must be set as global event handler in the Nagios configuration (usually stored in nagios.cfg file):

global_host_event_handler = setSvcStatusByHostStatus

If it's not to be used on all hosts, it must be set as event handler for every suitable host:

define host {...event_handler setSvcStatusByHostStatus...}

CentreonPrevious solution is fully supported by Centreon:

Command definition is not different to other usual command. The only thing to consider is defining it as "check" type in order to be available through the event handler  configuration lists.

You can set the value of global_host_event_handler through the field "Global host event handler"

located on the "Checking options" tab in the Configuration>Nagios>Nagios.cfg menu. You can set the event_handler directive for each host using the field "Event handler" located on the

"Data management" of the Configuration>Hosts>(host name).