Skip to content

Unify the backup & restore process#375

Merged
lildude merged 128 commits into
masterfrom
lildude/unify-strategy
Mar 27, 2018
Merged

Unify the backup & restore process#375
lildude merged 128 commits into
masterfrom
lildude/unify-strategy

Conversation

@lildude

@lildude lildude commented Mar 10, 2018

Copy link
Copy Markdown
Member

Background

Since the introduction of Cluster, backup-utils has effectively maintained three different methods of backing up GitHub Enterprise appliances, whilst at the same time attempted to retain backwards compatibility for all releases of GitHub Enterprise since 11.10.344.

This has resulted in a very messy, hard to maintain and not particularly efficient code base, with the cluster-specific backup functionality not being tested within backup-utils at all. This PR starts to address this by unifying the backup methods used to a single method that is compatible with cluster and single nodes. Switching to this method will effectively be testing the same pathways used for cluster with the need to only write a few more cluster-specific tests for those scenarios not covered by testing for single nodes.

Caveats

The biggest caveat with this change is backup-utils 2.13.0 onwards will only support GitHub Enterprise 2.11.0 and newer as these are the only releases of GitHub Enterprise that contain the necessary functionality.

Prior to this PR being merged, a new branch will be created to allow for easy (manual) maintenance of legacy releases of backup-utils that people may be using to backup 2.10 and older releases. Previous releases will not be removed.

The Future

The intention going forward is to keep the versioning and support for backup-utils releases inline with that of GitHub Enterprise to make it clearer what versions of backup-utils are supported and with which versions of GitHub Enterprise, hopefully addressing concerns like those raised in https://github.com/github/backup-utils/issues/371 .

PRs are in the pipeline for further beautification of the code and completely updating and reformatting the README and documentation. These changes haven't been included in this PR as they detract from the primary purpose of this PR and would turn what is already a pretty large PR into a huge and hard to review PR.

What does this PR do?

In short, this PR:

  • Removes all logic and support for legacy 11.10.xxx GitHub Enterprise releases and GitHub Enterprise releases prior to 2.11.0.
    • Note: This means as the point of merging this PR, backup-utils can only be used to backup & restore GitHub Enterprise 2.11.0 and newer instances.
  • Switches to a single method for backing single nodes and clusters, based on the code used to backup clusters.
  • Unifies the format of the remaining code putting things in a better position for further simplification in future.
  • Removes unused commands and tests.
  • Adds more tests to cover more of the remaining code.
  • Removes duplication within the tests and code.

TODO

  • More 👀 on these changes.
  • More tests.
  • Extensive manual testing with various configuration combinations to make sure I've not missed anything and write tests for those things I have.
  • Make this work with restoring to unconfigured instances - pending changes on the GitHub Enterprise side of things.

... and merge nw tests into remaining tests.
Info/warnings are ignored for the moment.
... and update tests
This will create a path using the temp dir structure of the backup host
which, if a different OS, may not match the filesystem structure on GHE.
@lildude

lildude commented Mar 19, 2018

Copy link
Copy Markdown
Member Author

@github/backup-utils this is now ready for review. The one failing test is due to parallel not being installed on the new CI servers which I'm working to address.

@lildude lildude force-pushed the lildude/unify-strategy branch from 3ba3c9b to 4aa92f7 Compare March 20, 2018 16:59
This makes it easier to test and removes a dependency.
@lildude lildude force-pushed the lildude/unify-strategy branch from 4aa92f7 to 5917bb0 Compare March 20, 2018 17:02
ghe-ssh "$GHE_HOSTNAME" -- /bin/bash >&3 <<EOF
split -l 1000 $to_restore $tempdir/chunk
chunks=\$(find $tempdir/ -name chunk\*)
echo "\$chunks" | xargs -P64 -n1 -I{} /bin/sh -c "cat {} | github-env ./bin/dgit-cluster-restore-finalize"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up @snh. Since you last looked at this, I've switched out the use of parallel in favour of xargs -P64 for all of the finalize steps (with a few force pushes to hide my mess 😊).

My testing shows this works just as well, but I'd appreciate it if you could specifically double check me here - I'm not sure why you went for parallel over xargs when you first implemented this. 🙇

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lildude I'm concerned that 64 will be a little high in some circumstances, as there are a number of Cluster customers who restore to DR and test clusters that are quite light-on resources wise.

From my quick testing, each dgit-cluster-restore-finalize, without having been feed any routes yet, uses 76MB of PSS memory, which adds up to close to 5GB for 64 of these.

For this reason I'm quite a fan of the approach we were using with parallel to scale the number of processes based on the CPU count, which is often a good indicator of the resources a node has available to it.

Did you have a chance to try and stub parallel? As this runs on the appliance itself, which already had parallel, the dependency on parallel has never really been an issue until now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that 64 will be a little high in some circumstances, as there are a number of Cluster customers who restore to DR and test clusters that are quite light-on resources wise.

This would only affect customers who actually have up to 64 CPUs on their node as -P is for maximum number of processes. From the man page (emphasis is mine): "Run up to max-procs processes at a time;".

I'd hope if they're going to assign that many CPUs, they'd allocate an equivalently large amount of memory too as they'd hit other problems long before they hit backup-utils related problems.

For this reason I'm quite a fan of the approach we were using with parallel to scale the number of processes based on the CPU count

I'd have liked to do the same with xargs by not specifying a number or using 0 to use all available procs, but xargs on macOS requires a value > 0 and we still test on macOS, hence I've gone for a rather large value to mimic the same behaviour.

Did you have a chance to try and stub parallel?

I did, but this turned out to be a lot more complex than anticipated and I soon found myself effectively writing a passthrough version of parallel that stripped the args so we could run the subsequent commands. This adds unnecessary complexity and potential tech debt, just for testing, when a simple and just as effective alternate solution works for both actual use and testing.

As this runs on the appliance itself, which already had parallel, the dependency on parallel has never really been an issue until now.

That's because we were never testing this behaviour within backup-utils itself so never encountered any problems until now. We don't use parallel for anything else within GitHub Enterprise either.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in Slack, I'm wrong here and misunderstood the behaviour. Will revert that change.

@lildude

lildude commented Mar 27, 2018

Copy link
Copy Markdown
Member Author

Just to be 100% clear about the biggest change in this PR, I'm going to state it in a comment of its own, using a mix of @snh's and my own words:

The backup utilities will now only support the current feature release and two previous feature releases. For example, the backup utilities 2.13.0 will support 2.11, 2.12 and 2.13. A new legacy branch will be created, and maintained with critical bug fixes, for customers using older releases.

A new backup-utils release will be made with each GitHub Enterprise feature release in future so backup-utils 2.14 will support GitHub Enterprise 2.12, 2.13 and 2.14, backup-utils 2.15 will support GitHub Enterprise 2.13, 2.14 and 2.15 and so on.

The documentation is being updated to reflect this change in #380

@snh snh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants