Related to:
Intermittent delivery of critical email alerts.
Problem:
Some email alerts are received normally. Other critical alerts are missing.
User View: Intermittent "gaps" in alerting where a Gateway state change occurs, but no notification is received.
Backend/Logs: The
gateway.logshowsActionManager Completed action... Exit code: 0, suggesting the Gateway successfully handed the task to the operating system. However, other log entries showAction... did not fire because variable... is snoozed, indicating the Gateway intentionally skipped the action based on its current state.
Possible cause(s):
- Root Cause 1: Active Snooze on Data Items The most direct cause found in the logs is that the specific data items (cells, rows, or entities) were "snoozed" by a user. When an item is snoozed, the Action Manager identifies the trigger but suppresses the execution of the action.
How to determine: Search the
gateway.logfor the string:Action 'Action_Name' did not fire because variable 'Path_to_Item' is snoozed.- Root Cause 2: Rule-Level Throttling The alert action is referenced by 170+ rules. Some of these rules have "throttling" configured. Throttling limits how often an action can fire within a specific timeframe to prevent alert fatigue.
How to determine: Review the XML configuration for the Gateway. Check the
rulessection and look for<throttling>tags associated with the rules triggering the email action.- Root Cause 3: Downstream Mail Server Issues Since the Gateway logs show an Exit code: 0, the Gateway has successfully executed the local mail command. If the email still doesn't arrive, the issue likely exists within the corporate mail relay or SMTP server.
How to determine: Coordinate with the System Admin or Mail Team to track the specific timestamp of the "successful" Gateway action against the mail server logs.
Possible solution(s):
- Solution Root Cause 1: Identify the specific data item that failed to alert. Check if the item or its parent (Managed Entity/Probe) is currently snoozed in the Active Console. If the snooze was accidental or is no longer required, unsnooze the item to resume alerts.
- Solution Root Cause 2: Review the logic of the specific rules tied to the missing alerts. If the alerts are critical and must never be skipped, reduce or remove the throttling configuration for those specific rules, or move the critical alerts to a dedicated rule without suppression limits.
-
Solution Root Cause 3: If the Gateway logs continue to show
Exit code: 0but no email is received, the Gateway is functioning correctly. The client must trace the hand-off between the Gateway server's local mail agent (e.g., sendmail or postfix) and the internal mail relay to find where the message is being dropped or quarantined.
Related article(s):
If you need further help:
-
Please contact our support team via the chat service box on any of our websites or raise a support request.
-
Make sure you provide us with:
- Background of the issue or request.
- Use cases, requirements, business impact, etc.
- Encountered error messages.
- Log files or diagnostic files.
- Screenshots.
- And other important information relevant to your inquiry.
Comments
0 comments
Please sign in to leave a comment.