data_timeout in the merlin.conf on a system that is peered, remember to also set the same value on the peer.When running a command like mon node status, peers and pollers in your cluster may sometimes show up as INACTIVE. This is because Merlin was unable to verify the node is alive, and you can see Merlin actively trying to verify this in neb.log.
There are however situations where you know that pollers or peers will be unstable, and where you would rather see that Merlin is more tolerant towards the node not responding for a while. Perhaps you have a peer or poller that is geographically far away, and/or connected via an unreliable network. For cases like these, you can use the data_timeout setting in Merlin to indicate that Merlin should be more lenient with this node, and allow it a longer time to re-connect before marking it as INACTIVE.
Example configuration
Every Merlin configuration tells Merlin about all other nodes except itself, essentially from its own perspective. For a cluster with 4 nodes, the primary master's configuration will list 3 nodes, since it excludes itself. If you wish to increase the number of seconds required for the master to classify a poller as INACTIVE, the following value is added to the masters merlin.conf:
poller poller01 {
    data_timeout = 600
    hostgroup = foo
    address = poller01
    port = 15551
    takeover = no
    notifies = no
}
The data_timeout value means that this master will wait 600 seconds before actually marking poller01 as inactive.
Comments
0 comments
Please sign in to leave a comment.