If you have problems pushing configuration to a poller and can see the following error in the Master's neb log:
[1234567890] 4: stdout: rsync: connection unexpectedly closed (9 bytes received so far) [sender]
[1234567891] 4: stdout: rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
...
[1234567895] 6: NODESTATE: server2: STATE_PENDING -> STATE_NONE: connect() to poller node server2 (123.45.67.8:15551) failed: Connection refused
[1234567896] 6: NODESTATE: server1: STATE_PENDING -> STATE_NONE: connect() to poller node server1 (123.45.67.8:15551) failed: Connection refused
Also, you can see temporary files are created but not removed in the oconf directory of the poller:
[monitor@poller1 oconf]$ ll -all
total 261372
drwxrwxr-x 2 monitor apache 4096 Aug 27 12:40 .
drwxrwx--- 7 monitor apache 204800 Aug 12 15:21 ..
-rw------- 1 monitor apache 41853768 Aug 27 12:06 from-master.cfg
-rw------- 1 monitor apache 41758651 Aug 14 09:47 .from-master.cfg.ABCd1e
-rw------- 1 monitor apache 41758651 Aug 17 10:32 .from-master.cfg.ABCd1e.fghiJk
-rw------- 1 monitor apache 41758651 Aug 14 09:48 .from-master.cfg.Lmnopq
-rw------- 1 monitor apache 41747091 Aug 17 10:32 .from-master.cfg.RStuv.WxyZaB
-rw------- 1 monitor apache 41758651 Aug 14 09:50 .from-master.cfg.C2deFg
-rw------- 1 monitor apache 12845056 Aug 27 12:40 .from-master.cfg.H3ijKL.M34PQ5.rsTUvW
When we do oconf push, we use rsync to transfer the files. When doing so, rsync generates a bunch of files such as ` .master.cfg.xyz`. In this scenario, rsync somehow is interrupted or fails midway, which is causing the files to be left over. When rsync is then triggered again, it fails to delete these left-over temporary files.
You may add the rsync.log as argument to make rsync log what it is doing.
1. Open /usr/lib64/merlin/mon/oconf.py
2. Change the following line (184):
base_rsync_args = ['-aotzc', '--delete',
into:
base_rsync_args = ['-aotzc', '--delete', '--log-file=/tmp/rsync.log',
(Note: this will be overwritten on updates).
3. When the issue reoccurs, please examine the rsync logfile to find any significant error that may lead to the root cause of the issue.
In this case, the same error is reflecting in the rsync log:
2020/09/01 10:36:11 [26228] rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
which might be due to one of the following issues that you need to address in order to resolve the rsync issue.
- You are trying to sync a file greater than 100Gb.
- The destination host's disk is full.
- The sender host has not accepted the ssh key to perform the transfer (as 'monitor' user, perform from the sender machine a standard ssh connection and accept the key manually).
- Rsync versions mismatch between client and server (OP5 monitor versions should be the same)
- Directory does not exist as remote path (e.g. /var/cache/merlin/backups)
Comments
0 comments
Article is closed for comments.