Related to:
Email alert not sent, email action not sent, email action not triggering, email alert delay, email action delay, emails arriving late, email not arriving on time, email alert not received
Problem
Problem 1 - User sets up a rule that calls an email action to run depending on the severity threshold. User notices the rule target hit the severity threshold but they didn't receive the email alert.
Problem 2 - User notices email action called by rule is getting delayed and not sent exactly at the time the issue occurs.
Possible Cause(s)
To help determine which of the causes listed below is responsible for the problem you are experiencing first, find out which rule generated the email alert. You can do that by right clicking on the cell or headline cell that has the severity set, and select the "Show Rules" option in the right click menu. This will bring up the Output window, where you will see the name of the rule and highlighted in yellow the rule logic the data item in question met. Once you confirm the intended rule logic and action is highlighted, search the email action name in the gateway log and check if you see any of the following INFO: ActionManager messages.
Problem 1:
- Possible Cause 1 - If you see the below INFO: ActionManager messages it means the critical severity happened after a configuration change. By default, actions are not fired when a rule condition is true after a configuration change.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway_name")]/directory/probe[(@name="netprobe_name")]/managedEntity[(@name="managed_entity_name")]/sampler[(@name="sample_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' would have fired, but stopped as this was as a result of a configuration change
- Possible Cause 2 - If the gateway or netprobe were just started, by default the action will not trigger to avoid false alerts. The gateway log will show the following INFO: ActionManager messages if this is the case.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway_name")]/directory/probe[(@name="netprobe_name")]/managedEntity[(@name="managed_entity_name")]/sampler[(@name="sample_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' would have fired, but stopped as this was during the startup of a component - Possible Cause 3 - By default, an action will trigger if the target of the rule and all it's ancestors (meaning any levels above it like the managed entity and dataview for example) are not snoozed. If you see the below INFO: ActionManager messages it means the data item or one of it's ancestors has been snoozed, causing this action not to trigger.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway_name")]/directory/probe[(@name="netprobe_name")]/managedEntity[(@name="managed_entity_name")]/sampler[(@name="sample_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' did not fire because variable '/geneos/gateway[(@name="gateway_name")]/directory/probe[(@name="netprobe_name")]/managedEntity[(@name="managed_entity_name")]/sampler[(@name="sample_name")][(@type="")]/dataview[(@name="Log Files")]' is snoozed
- Possible Cause 4 - If you see the below INFO: ActionManager message it means the action did not trigger because dataview was created and transitioned from an undefined to okay severity.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway_name")]/directory/probe[(@name="netprobe_name")]/managedEntity[(@name="managed_entity_name")]/sampler[(@name="sample_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' would have fired, but stopped as this was as a result of transition from undefined to OK state
- Possible Cause 5 - If the Gateway log shows the Netprobe goes down or disconnects, the email action will be removed and is no longer sent.
- Possible Cause 6 - If the Gateway log shows the below INFO: ActionManager messages saying the action has fired as seen below, first you want to check what Gateway version and CentOS/RHEL version the server is running. If the server is using CentOS/RHEL 8, the Gateway version is lower than GA5.9.x, and the /var/log/maillog does not show any entries of the email being sent from the server; then the cause is the Gateway version you are using. We recently found the Centos/RHEL 8 system version of libcrypto.so.1.1 conflicts with the one shipped with the Gateway.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway_name")]/directory/probe[(@name="netprobe_name")]/managedEntity[(@name="managed_entity_name")]/sampler[(@name="sample_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example_log.txt")]/cell[(@column="status")])
INFO: ActionManager Firing action 'Email Alert'
2021-09-10 16:50:51.085-0400 INFO: ActionManager Finished executing '/export/home/itrs/geneos-utils-master/system/scripts/email.pl' with arguments ''.
2021-09-10 16:50:51.085-0400 INFO: ActionManager Completed action 'Email Alert', Exit code: 0
- Possible Cause 7 - If the Gateway log shows the email action completed with an exit code other than 0, please note this is coming from the email script.
Problem 2:
- Possible Cause 1 - If the highlighted "Show Rules" output shows a delay in the highlighted block, this indicates the rule itself has a delay causing the action not to get triggered right away.
- Possible Cause 2 - If the highlighted "Show Rules" output references a throttle, this means the action itself is being throttled. To read more about Throttles check out our documentation here.
Possible Solution(s)
Problem 1:
- Possible Solution to Cause 1 - To change the default behavior so an action fires following a configuration change, go to the main Actions folder in the Gateway Setup Editor and click on Advanced tab and check mark the setting called "Fire on configuration change." Please note this setting affects all actions.
- Possible Solution to Cause 2 - To change the default behavior and have an action trigger right after the Gateway or Netprobe startup, go to the main Actions folder in the Gateway Setup Editor and click on Advanced tab and check mark the setting called "Fire on component startup." Please note this setting affects all actions.
- Possible Solution to Cause 3 - This default behavior can be changed from the Action record itself by going to the Action's Advanced tab and changing the Snoozing setting from the default "Fire if item and ancestors not snoozed" to "Always Fire" or "Fire if item not snoozed." To learn more about the difference of each setting check out our documentation here. Please note this setting will affect all rules that call this action.
- Possible Solution to Cause 4 - To change this default behavior to allow an action to be fired as a result of a dataview item being created and transitioning from undefined to OK severity, go to the main Actions folder in the Gateway Setup Editor and click on Advanced tab and check mark the setting called "Fire on create with ok severity." Please note this setting affects all actions.
- Possible Solution to Cause 5 - When the Netprobe instance comes back up you should see the email alert is generated in the Gateway log.
- Possible Solution to Cause 6 - If you are unable to upgrade your Gateway to version GA5.9.x or higher as a workaround, you may remove or rename the
libcrypto.so*
files in the Gatewaylib64
directory so that the Gateway picks up the system version. However, this may have consequences to other functionalities of the Gateway. - Possible Solution to Cause 7 - In this case we recommend trying to execute the email script directly from the Gateway server to confirm if the script exits with the same code. If it does, you will need to troubleshoot and fix the email script.
Problem 2:
- Possible Solution to Cause 1 - You can go back to the rule record and remove the delay from the rule block and save the change. The next time the rule triggers you should no longer see the delay taking place. You can read more about rule delay here.
- Possible Solution to Cause 2 - You can go back to the rule record and remove the throttle reference from the rule block and save the change. The next time the rule triggers, the action should fire immediately.
Related Articles
If Issue Persists
- Please contact with our Client Services team via the chat service box available in any of our websites or via email to support@itrsgroup.com
- Make sure you provide to us:
- The data and time the email action should have triggered
- A screenshot if possible of the "Show Rules" output indicating the rule should have triggered
- Copy of the Gateway log, and and if possible let us know the log entry lines you looked at when doing your own initial investigation
- XML code of the rule and action records
- Gateway diagnostics file if you are able to generate, if not we may ask for it depending on the investigation.
- Any troubleshooting step already verified from the ones described in this article.
Comments
0 comments
Please sign in to leave a comment.