Gateway disconnecting, Gateway disconnected in Active Console
- Gateways disconnecting from the Active Console
- Root Cause 1 - Gateways are crashing.
- Root Cause 2 - Gateway performance issues or Gateway has dataquality issues as the new connections making the gateway going into an overload state.
- Root Cause 3 - Unhealthy network (or devices) between the gateway host and the remote consoles
- Solution Root Cause 1: Please see "Gateway crash" article
- Solution Root Cause 2: Data quality issue
Gateway log contains something similar to:
2021-07-28 05:53:48.001+0200 WARN: RuleManager Waiting for Max Jobs (45) has taken 30s so far. 45 Jobs in flight
2021-07-28 05:53:58.000+0200 WARN: RuleManager Waiting for Max Jobs (45) has taken 40s so far. 45 Jobs in flight
2021-07-28 05:54:08.000+0200 WARN: RuleManager Waiting for Max Jobs (45) has taken 50s so far. 45 Jobs in flight
2021-07-28 05:53:41.088+0200 INFO: DataQuality Maximum data age : period - 19981 ms, lifetime - 138965 ms
2021-07-28 05:53:41.088+0200 INFO: DataQuality Maximum queue data size: period - 36554 bytes, lifetime - 36554 bytes
2021-07-28 05:53:41.088+0200 INFO: DataQuality Maximum total data size: period - 159504 bytes, lifetime - 529576 bytes
You can find more information about data quality here: Data Quality User Guide
Information on rule evaluation threads: Gateway Reference Guide - Operating Environment
Here you can find also information about the issue you are facing once the netprobes start connecting, it might be the case it can't handle the rule evaluations: RuleManager Flushing jobs
Solution Root Cause 3: Unhealthy network
The two typical disconnection types in the gateway log could be like this:
<Fri Dec 1 08:03:49> INFO: Translator ConManager Details: 'writeData()'; 'None'; 0; -808; 'getSockOpt() failed on fd 92 returns 104'
<Fri Dec 1 08:03:51> INFO: UserManager User 'APAC\SG819955' from 10.102.235.171:59467 disconnected. Connection ID 15.
<Fri Dec 1 08:59:54> INFO: Translator ConManager Details: 'checkHeartBeatLastCalled()'; 'tmNow = 1512118794'; 0; -801; 'no reply for 75 seconds'
<Fri Dec 1 08:59:54> INFO: UserManager User 'APAC\b49150' from 10.192.101.142:64039 disconnected. Connection ID 17.
The first is an errno 104 and means Connection Reset By Peer - this is typical when the remote end (the AC2) is seen to send a TCP FIN packet. This may also be sent by an intermediate network device or system firewall.
The second is a 75 second heartbeat timeout and is indicative of no data being seen coming from the far end of the connection for that time.
A regular occurrence of both of these indicates an unhealthy network (or devices) between the gateway host and the remote consoles. If it was just one remote console then we advise to look at that user's machine.
Looking at the Active Console log you could see "error 10053" type disconnects:
2017-12-01 09:00:10 NativeLoggerBridge [INFO] Connector::dropConnection: ConManager::shutdownSocket(167056) --- recv() failed: An established connection was aborted by the software in your host machine.
2017-12-01 09:00:10 NativeLoggerBridge [INFO] CONMGR: Details: writeData(); None; 0; -807; write() failed on fd 164088;
2017-12-01 09:00:10 NativeLoggerBridge [INFO] CONMGR-WINDOWS: WinErr: 10053
Error 10053 is a Windows error that the OS sends the application when it has closed the TCP/IP connection externally - either because of a firewall or other supervisory application doing so.
If Issue Persists
- Please contact with our Client Services team via the chat service box available in any of our websites or via email to firstname.lastname@example.org
- Make sure you provide to us: