Skip to content

Retry loop for redis-cli BGSAVE#306

Merged
buckelij merged 10 commits into
masterfrom
buckelij/redis-retry
May 31, 2017
Merged

Retry loop for redis-cli BGSAVE#306
buckelij merged 10 commits into
masterfrom
buckelij/redis-retry

Conversation

@buckelij

Copy link
Copy Markdown
Member

ghe-backup-redis dumps can fail and hang indefinitely if BGSAVE is called while Redis "background rewriting" is happening. In these cases BGSAVE will report an error, but still exit 0.

This PR adds a for loop around the redis-cli BGSAVE to retry if BGSAVE reports an ERR. I also put the LASTSAVE check in a for loop that times out after 60 minutes.

I'm not certain how long the for loop sleeps should be, but I think 60 minutes is more than enough.

buckelij added 2 commits May 17, 2017 14:35
  * redis-cli BGSAVE can fail if background rewriting is happening.
    It will still exist zero though, causing an infinite loop. Replaced
    with for loop.
  * The LASTSAVE checking loop now times out after 60 minutes.
  * Switched to bash instead of sh for nicer for loops.
@buckelij

Copy link
Copy Markdown
Member Author

So, debuild from package-deb was running the tests as well. That seems redundant, and was causing the checks to timeout. I added DEB_BUILD_OPTIONS=nocheck to debuild in f3a351c.

@rubiojr rubiojr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, debuild from package-deb was running the tests as well.

This is intentional @buckelij. Makes sure the debian package (make deb or ./script/package-deb) passes the tests before distributing it (typical/recommended in upstream packages).

That seems redundant, and was causing the checks to timeout.

Can you elaborate? Seems to be fine in master and Travis runs debuild without disabling the tests.

Added some style related comments but other than that, makes sense to me.

if [ \$n -eq 239 ]; then
exit 1
fi
done

@rubiojr rubiojr May 26, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop style is a bit more compact and preferred.

for n in $(seq 240); do
  if [ "\$(redis-cli LASTSAVE)" != "\$timestamp" ]; then
    break
  fi
  sleep 15
done
[ "\$(redis-cli LASTSAVE) != "\$timestamp" ] # exits 1 if bgsave didn't work

else
break
fi
done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop style is preferred:

for i in $(seq 10); do
  if ! redis-cli BGSAVE | grep -q ERR; then
    break
  fi
  sleep 15
done

@buckelij

Copy link
Copy Markdown
Member Author

@rubiojr made the style changes and re-enabled the test. Something must have been going on with CI earlier -- the travis-ci checks were passing but the backup-utils test was timing out during the deb package build.

@buckelij buckelij force-pushed the buckelij/redis-retry branch from 13b91ec to 193d615 Compare May 30, 2017 23:26
sleep 15
done
for n in \$(seq 240); do
sleep 1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a sleep here and took a second off the 2nd sleep. The fake redis-cli used in the test environment increments every second, but this was running through too fast and adding an extra 15 seconds to every backup-utils run. That was adding up and causing the tests to time out.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@buckelij any reason why we can't just replace 240 with 3600 to simplify things?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And leave only the sleep 1, that is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's fine. Fixed in 5627d6d.

@buckelij

Copy link
Copy Markdown
Member Author

Ready for review again @rubiojr

@buckelij buckelij merged commit 25c304f into master May 31, 2017
@buckelij buckelij deleted the buckelij/redis-retry branch May 31, 2017 18:13
@rubiojr

rubiojr commented May 31, 2017

Copy link
Copy Markdown
Member

🙇

@snh snh mentioned this pull request Jun 7, 2017
jeluhu pushed a commit that referenced this pull request Jun 12, 2023
Restore public github/doc repo to GHES
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants