-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add create-recovery-plan command #622
Conversation
This command will allow users to scan a deployment for problems, then write out a YAML-formatted recovery plan, to be used later in the `recover` command. [#185483613] Authored-by: Chris Selzo <cselzo@vmware.com>
The "plan" is not necessary for resolving problems via the director [#185483613] Authored-by: Chris Selzo <cselzo@vmware.com>
The current value of `max_in_flight` is fetched from the director and used as a default. If the value does not change from the default, no `max_in_flight_override` is written to the recovery plan. [#185483613] Authored-by: Chris Selzo <cselzo@vmware.com>
So, I ran the unit tests, and got one failure in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if removing the switch case is really necessary but figured I'd at least leave a comment about it. Rest, lgtm.
Given this is a set of work that is under fairly active development. Is this something we should mark as experimental? I say that without any idea how we'd go about doing that... |
The command does print a warning and exits if the director doesn't support it. I don't think it's particularly experimental, but we also haven't done the other half of the command yet. Perhaps we could hold off on merging until |
The `%v` format string will do the right thing when the `max_in_flight` value is either a string or an integer. (of course, this just moves the type switch into the standard library) [#185483613] Authored-by: Chris Selzo <cselzo@vmware.com>
Go ranges over map keys in a random order by design. The `FakeUI` test framework requires you to give answers to ui prompts in a specific order, and does not allow you to react to the prompts with a given input. [#185483613] Authored-by: Chris Selzo <cselzo@vmware.com>
Go returning map keys in random order when ranging strikes again! The Fixed by sorting the map keys for the problems by type, then ranging over the sorted slice to access the map in a consistent way. Ran the test 150 times over lunch and never had a failure with the fix. Note that for the user experience, it doesn't really matter, this is purely a testing problem. There are only 6 problem types, so sorting a list of at most 6 does not seem like an expensive operation vs. refactoring how the Resolved in dd7ff53 |
@klakin-pivotal do you mind testing the fix? Want to make sure it wasn't just fixed on my machine :). |
Marking it "experimental" is more of a warning that the behavior might change before we're done with this track of work. For example, maybe we need to change the structure of the recovery plan before we're done. |
Another question I have here is how this relate to the existing cloud-check command? Should we deprecate it in favour of this one at some point? Maybe we can describe the plans around the recover work in https://github.com/cloudfoundry/bosh/discussions like we did for other topics. |
✅ CLI vs. old DirectorVerified that it emits a proper error message when used with an older Director:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (though were I to do this I'd prefer 1-2 commits rather than 5).
Would you like to click the "squash and merge" button? That'll guarantee 1 commit :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This command will allow users to scan a deployment for problems, then write out a YAML-formatted recovery plan, to be used later in the
recover
command. The CLI prompts per instance group and then per problem type.Example recovery plan:
Note Relies on a director with this PR merged. The command fails with "Director does not support this command. Try 'bosh cloud-check' instead" if the director it is targeting does not return instance groups for problems