Geneos - Gateway - Gateway database logging, performance and .dat dump files

The gateway will create files in the database/ directory whenever there are problems logging live data to the database. The gateway will, by default, automatically locate and write this data once the underlying issue is resolved.

These problems include:

Connectivity problems to the database. These can be:
- The database server is down through a fault or for planned maintenance
- The network between the gateway and the database server is unreliable
- The authentication or other details are incorrect, including the database user's account has been locked
- A local dependency has failed, such as the database client libraries not being accessible on gateway start
Exceeding the gateway's internal database logging queue size. These are usually performance related either:
- The gateway is logging too much data or
- The database server is not keeping up with the number of items being updated
- Forced Interval is set (and to the same interval) on more data items than the request queue size
- The database server and Geneos gateway are too far apart. Excessive network latency between the two means commits take longer to be confirmed.

In normal operation the gateway will check for these files and replay them to the database when the condition causing the original issue has been resolved. The gateway will continue to log live data items as a priority so it may take some time to replay these files. Each dump file is deleted once it has been successfully replayed and the data confirmed as written to the database.

Performance and configuration issues can have a direct impact on the overflow of logged data into dump files. A number of parameters can be tuned to suit local conditions:

The maxRequestQueueSize is the number of items that can be in the queue to the gateway thread connected to the database. Increase this from the default 4,000 if the number of items being logged is large.
If using Oracle then change the isolationLevel to Read_committed. It has little effect on other database architectures.
The gateway will issue a fixed number of INSERT or UPDATE SQL statements before committing the data. As a commit can take a significant amount of time to complete - tens of milliseconds may not be uncommon - then increasing the number of statements per commit ("per transaction") may result in better throughput in exchange for a small risk of data loss if there is a failure before the commit completes.
Not strictly a performance affecting configuration but may result in errors due to duplicate timestamps, we would advise enabling the Log netprobe sampler time for data items config flag which then uses the timestamp, when available, of the data collection rather than the arrival time of the data in the gateway. This makes no difference to plugins that may return multiple values for a single data item in the same second, e.g. Statetracker

Manual Replay

It is also possible to manually read and replay these files using an independent gateway process using the -process-dump-files command line option as long as the independent gateway has the same database logging details as the original gateway

Articles in this section

Geneos - Gateway - Gateway database logging, performance and .dat dump files

Comments

Articles in this section

Related articles