Rsync ate my disk
Rsync has variously saved my life on occasions and then again is has completely brought the whole place to a grinding halt. Essentially the cause is that sometimes BackupPC decides it's time for a complete full backup. This entails asking Rsync to compute the hash (checksum/md5sum) of every file on storage. Needless to say the NFS server ends up with an iolatency of near 100% and the entire place grind to a halt.
Various solutions revolving around ionice completely fail to resolve the cache overload. The storage system here uses an SSD cache in front of raided spinning metal. So by the time ionice intervenes things have long gone.
The solution comes from Russia. https://habr.com/ru/post/332614/
Essentially the very clever idea is to use SAR to measure latency. Once it passes the usable threshold a pkill signal is sent to rsync to stop. When iolatency drops below the 30% limit another signal is sent to continue. Simple as that!
#!/bin/bash INTERVAL=10 CNTR=0 while : do CUR_LA=`LANG=C sar 1 $INTERVAL | grep Average | awk '{print $6}' | perl -pe 'if ($_ > 30) { print "HIGH "} else {print "LOW "}'` echo $CUR_LA MARKER=`echo $CUR_LA | awk '{print $1}'` if [ "$MARKER" = "LOW" ] then CNTR=0 pkill dd -x --signal CONT continue else let "CNTR=$CNTR+1" pkill dd -x --signal STOP fi if [ "$CNTR" = "5" ] then echo "CNTR = $CNTR - CONT / 2 sec / STOP to avoid socket timeouts" CNTR=0 pkill dd -x --signal CONT sleep 2 pkill dd -x --signal STOP fi done