Unify the backup & restore process#375
Conversation
... and merge nw tests into remaining tests.
Info/warnings are ignored for the moment.
... and update tests
This will create a path using the temp dir structure of the backup host which, if a different OS, may not match the filesystem structure on GHE.
|
@github/backup-utils this is now ready for review. The one failing test is due to |
3ba3c9b to
4aa92f7
Compare
This makes it easier to test and removes a dependency.
4aa92f7 to
5917bb0
Compare
| ghe-ssh "$GHE_HOSTNAME" -- /bin/bash >&3 <<EOF | ||
| split -l 1000 $to_restore $tempdir/chunk | ||
| chunks=\$(find $tempdir/ -name chunk\*) | ||
| echo "\$chunks" | xargs -P64 -n1 -I{} /bin/sh -c "cat {} | github-env ./bin/dgit-cluster-restore-finalize" |
There was a problem hiding this comment.
Heads up @snh. Since you last looked at this, I've switched out the use of parallel in favour of xargs -P64 for all of the finalize steps (with a few force pushes to hide my mess 😊).
My testing shows this works just as well, but I'd appreciate it if you could specifically double check me here - I'm not sure why you went for parallel over xargs when you first implemented this. 🙇
There was a problem hiding this comment.
@lildude I'm concerned that 64 will be a little high in some circumstances, as there are a number of Cluster customers who restore to DR and test clusters that are quite light-on resources wise.
From my quick testing, each dgit-cluster-restore-finalize, without having been feed any routes yet, uses 76MB of PSS memory, which adds up to close to 5GB for 64 of these.
For this reason I'm quite a fan of the approach we were using with parallel to scale the number of processes based on the CPU count, which is often a good indicator of the resources a node has available to it.
Did you have a chance to try and stub parallel? As this runs on the appliance itself, which already had parallel, the dependency on parallel has never really been an issue until now.
There was a problem hiding this comment.
I'm concerned that 64 will be a little high in some circumstances, as there are a number of Cluster customers who restore to DR and test clusters that are quite light-on resources wise.
This would only affect customers who actually have up to 64 CPUs on their node as -P is for maximum number of processes. From the man page (emphasis is mine): "Run up to max-procs processes at a time;".
I'd hope if they're going to assign that many CPUs, they'd allocate an equivalently large amount of memory too as they'd hit other problems long before they hit backup-utils related problems.
For this reason I'm quite a fan of the approach we were using with parallel to scale the number of processes based on the CPU count
I'd have liked to do the same with xargs by not specifying a number or using 0 to use all available procs, but xargs on macOS requires a value > 0 and we still test on macOS, hence I've gone for a rather large value to mimic the same behaviour.
Did you have a chance to try and stub parallel?
I did, but this turned out to be a lot more complex than anticipated and I soon found myself effectively writing a passthrough version of parallel that stripped the args so we could run the subsequent commands. This adds unnecessary complexity and potential tech debt, just for testing, when a simple and just as effective alternate solution works for both actual use and testing.
As this runs on the appliance itself, which already had parallel, the dependency on parallel has never really been an issue until now.
That's because we were never testing this behaviour within backup-utils itself so never encountered any problems until now. We don't use parallel for anything else within GitHub Enterprise either.
There was a problem hiding this comment.
As discussed in Slack, I'm wrong here and misunderstood the behaviour. Will revert that change.
This reverts commit 5917bb0.
This is useful for version comparison.
Additional enforcement will occur on the GHE side of things.
|
Just to be 100% clear about the biggest change in this PR, I'm going to state it in a comment of its own, using a mix of @snh's and my own words: The backup utilities will now only support the current feature release and two previous feature releases. For example, the backup utilities 2.13.0 will support 2.11, 2.12 and 2.13. A new legacy branch will be created, and maintained with critical bug fixes, for customers using older releases. A new backup-utils release will be made with each GitHub Enterprise feature release in future so backup-utils 2.14 will support GitHub Enterprise 2.12, 2.13 and 2.14, backup-utils 2.15 will support GitHub Enterprise 2.13, 2.14 and 2.15 and so on. The documentation is being updated to reflect this change in #380 |
Background
Since the introduction of Cluster, backup-utils has effectively maintained three different methods of backing up GitHub Enterprise appliances, whilst at the same time attempted to retain backwards compatibility for all releases of GitHub Enterprise since 11.10.344.
This has resulted in a very messy, hard to maintain and not particularly efficient code base, with the cluster-specific backup functionality not being tested within backup-utils at all. This PR starts to address this by unifying the backup methods used to a single method that is compatible with cluster and single nodes. Switching to this method will effectively be testing the same pathways used for cluster with the need to only write a few more cluster-specific tests for those scenarios not covered by testing for single nodes.
Caveats
The biggest caveat with this change is backup-utils 2.13.0 onwards will only support GitHub Enterprise 2.11.0 and newer as these are the only releases of GitHub Enterprise that contain the necessary functionality.
Prior to this PR being merged, a new branch will be created to allow for easy (manual) maintenance of legacy releases of backup-utils that people may be using to backup 2.10 and older releases. Previous releases will not be removed.
The Future
The intention going forward is to keep the versioning and support for backup-utils releases inline with that of GitHub Enterprise to make it clearer what versions of backup-utils are supported and with which versions of GitHub Enterprise, hopefully addressing concerns like those raised in https://github.com/github/backup-utils/issues/371 .
PRs are in the pipeline for further beautification of the code and completely updating and reformatting the README and documentation. These changes haven't been included in this PR as they detract from the primary purpose of this PR and would turn what is already a pretty large PR into a huge and hard to review PR.
What does this PR do?
In short, this PR:
TODO