These problems include:
- Connectivity problems to the database. These can be:
- The database server is down through a fault or for planned maintenance
- The network between the gateway and the database server is unreliable
- The authentication or other details are incorrect, including the database user's account has been locked
- A local dependency has failed, such as the database client libraries not being accessible on gateway start
- Exceeding the gateway's internal database logging queue size. These are usually performance related either:
- The gateway is logging too much data or
- The database server is not keeping up with the number of items being updated
- Forced Interval is set (and to the same interval) on more data items than the request queue size
- The database server and Geneos gateway are too far apart. Excessive network latency between the two means commits take longer to be confirmed.
In normal operation the gateway will check for these files and replay them to the database when the condition causing the original issue has been resolved. The gateway will continue to log live data items as a priority so it may take some time to replay these files. Each dump file is deleted once it has been successfully replayed and the data confirmed as written to the database.
Performance and configuration issues can have a direct impact on the overflow of logged data into dump files. A number of parameters can be tuned to suit local conditions:
- The maxRequestQueueSize is the number of items that can be in the queue to the gateway thread connected to the database. Increase this from the default 4,000 if the number of items being logged is large.
- If using Oracle then change the isolationLevel to Read_committed. It has little effect on other database architectures.
- The gateway will issue a fixed number of INSERT or UPDATE SQL statements before committing the data. As a commit can take a significant amount of time to complete - tens of milliseconds may not be uncommon - then increasing the number of statements per commit ("per transaction") may result in better throughput in exchange for a small risk of data loss if there is a failure before the commit completes.
- Not strictly a performance affecting configuration but may result in errors due to duplicate timestamps, we would advise enabling the Log netprobe sampler time for data items config flag which then uses the timestamp, when available, of the data collection rather than the arrival time of the data in the gateway. This makes no difference to plugins that may return multiple values for a single data item in the same second, e.g. Statetracker