Related to:
An action is not fired properly when states of a dataview are changed in accordance with a defined rule.
Problem
- An action or alert has been configured to run based on a state change of a data item, however, the action seemingly did not run.
Possible Cause(s)
Root Cause 1 : The defined Action is erroneous or configured incorrectly.
Root Cause 2: The object is outside of its specified Active Time.
Root Cause 3: The conditions of the rule being used were not met.
Root Cause 4: The action timed out.
Possible Solution(s)
General troubleshooting.
The Gateway log is a good place to check for initial errors. The logs will typically include entries pertaining to the ActionManager (see snippet below). Lines indicating "Firing action" can be cross-validated with times where state changes occur to determine whether or not an action was fired. The log will typically include whether or not an action has finished executing its command or script, and if the action itself is completed. If the action indicates an exit code other than 0, note the errors and review the action definition.
2021-08-12 10:09:07.851+0800 INFO: ActionManager Action DataItem 'Mail' generated (variable=/geneos/gateway[(@name="ITRS")]/directory/probe[(@name="gateway01 Host")]/managedEntity[(@name="gateway01 Gateway Hardware")]/sampler[(@name="CPU")][(@type="Infrastructure Defaults")]/dataview[(@name="CPU")]/rows/row[(@name="Average_cpu")]/cell[(@column="percentUtilisation")])
2021-08-12 10:09:07.851+0800 INFO: ActionManager Firing action 'Mail'
2021-08-12 10:09:07.854+0800 INFO: ActionManager Action DataItem 'Mail2' generated (variable=/geneos/gateway[(@name="ITRS")]/directory/probe[(@name="gateway01 Host")]/managedEntity[(@name="gateway01 Gateway Hardware")]/sampler[(@name="CPU")][(@type="Infrastructure Defaults")]/dataview[(@name="CPU")]/rows/row[(@name="Average_cpu")]/cell[(@column="percentUtilisation")])
2021-08-12 10:09:07.854+0800 INFO: ActionManager Firing action 'Mail2'
2021-08-12 10:09:08.001+0800 INFO: ActionManager Finished executing '/opt/itrs/gateway/gateway_scripts/email.pl' with arguments 'it_noc@venetianqa.local'.
2021-08-12 10:09:08.001+0800 INFO: ActionManager Completed action 'Mail2', Exit code: 0
2021-08-12 10:09:08.001+0800 INFO: ActionManager Finished executing '/opt/itrs/gateway/gateway_scripts/testmail.pl' with arguments ''.
2021-08-12 10:09:08.001+0800 INFO: ActionManager Completed action 'Mail', Exit code: 0
Solution Root Cause 1
For Actions that run commands, internal commands, effects, or shared libraries ensure that these exist and the user has suitable permissions to access and run these.
For actions that run scripts, aside from ensuring the script exists and has the necessary permissions, check that the script by itself is able to run on the server and complete without error. Pay attention to the run location as this will dictate on which server the script should be.
You can validate the Action by opening the Gateway Setup Editor and clicking "Validate current document." If there are any errors, address them.
Solution Root Cause 2
Objects that are outside of their specified Active Time will prevent or delay the action from firing. Review that the Active Time setup does not interfere with a time window when you expect actions to be fired and/or notifications to be sent. Once the active time period resumes, and if the alert is still valid, it will fire and repeat/escalate as normal.
Solution Root Cause 3
Check that the rule that is supposed to trigger the action is valid and can be evaluated. If a rule is applied to a data item, you can right-click on the data item and click "Show Rules" to see both the rule used and the values of variables.
Solution Root Cause 4
Check that the command or script, if used, can be run and complete on the command line. If you are running a command against the Netprobe it should ideally run optimally. Otherwise, it could cause the Netprobe to hang and result in Netprobe processes (such as firing actions or sampling) stopping. Other processes that take time to complete, such as Toolkit samplers, could cause the Netprobe to experience performance issues as well, which may cause problems with your action.
Related Articles
- What happens to actions that are firing, then go outside the active time?
- What happens to Actions when a Netprobe goes down?
- Gateway Rules, Actions, and Alerts
If Issue Persists
- Please contact our Client Services team via the chat service box available on any of our websites or via email to support@itrsgroup.com
- Make sure you provide to us:
- Gateway and Netprobe logs
- Rule and Action XML files
- Full screenshot of the dataviews
- Any troubleshooting step already verified from the ones described in this article.
Comments
0 comments
Please sign in to leave a comment.