Retry loop for redis-cli BGSAVE#306
Conversation
* redis-cli BGSAVE can fail if background rewriting is happening.
It will still exist zero though, causing an infinite loop. Replaced
with for loop.
* The LASTSAVE checking loop now times out after 60 minutes.
* Switched to bash instead of sh for nicer for loops.
|
So, |
rubiojr
left a comment
There was a problem hiding this comment.
So, debuild from package-deb was running the tests as well.
This is intentional @buckelij. Makes sure the debian package (make deb or ./script/package-deb) passes the tests before distributing it (typical/recommended in upstream packages).
That seems redundant, and was causing the checks to timeout.
Can you elaborate? Seems to be fine in master and Travis runs debuild without disabling the tests.
Added some style related comments but other than that, makes sense to me.
| if [ \$n -eq 239 ]; then | ||
| exit 1 | ||
| fi | ||
| done |
There was a problem hiding this comment.
This for loop style is a bit more compact and preferred.
for n in $(seq 240); do
if [ "\$(redis-cli LASTSAVE)" != "\$timestamp" ]; then
break
fi
sleep 15
done
[ "\$(redis-cli LASTSAVE) != "\$timestamp" ] # exits 1 if bgsave didn't work| else | ||
| break | ||
| fi | ||
| done |
There was a problem hiding this comment.
This for loop style is preferred:
for i in $(seq 10); do
if ! redis-cli BGSAVE | grep -q ERR; then
break
fi
sleep 15
done|
@rubiojr made the style changes and re-enabled the test. Something must have been going on with CI earlier -- the travis-ci checks were passing but the backup-utils test was timing out during the deb package build. |
13b91ec to
193d615
Compare
| sleep 15 | ||
| done | ||
| for n in \$(seq 240); do | ||
| sleep 1 |
There was a problem hiding this comment.
I added a sleep here and took a second off the 2nd sleep. The fake redis-cli used in the test environment increments every second, but this was running through too fast and adding an extra 15 seconds to every backup-utils run. That was adding up and causing the tests to time out.
There was a problem hiding this comment.
@buckelij any reason why we can't just replace 240 with 3600 to simplify things?
There was a problem hiding this comment.
And leave only the sleep 1, that is.
|
Ready for review again @rubiojr |
|
🙇 |
Restore public github/doc repo to GHES
ghe-backup-redisdumps can fail and hang indefinitely if BGSAVE is called while Redis "background rewriting" is happening. In these cases BGSAVE will report an error, but still exit 0.This PR adds a
forloop around theredis-cli BGSAVEto retry if BGSAVE reports anERR. I also put theLASTSAVEcheck in aforloop that times out after 60 minutes.I'm not certain how long the
forloopsleeps should be, but I think 60 minutes is more than enough.