-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short circuiting backoff #814
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look ok, I've added a bunch of suggestions. I'd like to see some more thorough unit testing of this feature before we merge though, at the moment it's not obvious that we have test cases in place that cover the new behaviour
pkg/controller/machinehealthcheck/machinehealthcheck_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this mostly makes sense to me, just a question about the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I went through this commit by commit, there may be some comments that are fixed by later commits.
Could you do a rebase and fix up the commits and commit messages so that the commits are atomic and have good descriptions
Please also add a decent description to the PR
No major blockers from my side though, mostly nits/ additional testing required
pkg/controller/machinehealthcheck/machinehealthcheck_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Outdated
Show resolved
Hide resolved
7855ff4
to
bbbd72c
Compare
looks like this might need a rebase? |
Hi @elmiko , any idea if you are planning to merge it soon ? |
good question. given that we don't have a bugzilla associated with this, i'm guessing we are waiting for 4.8 master to open, if that's the case then you have a couple weeks before we could merge it. (cc @JoelSpeed ) |
Yep, this will have to wait until the 4.9 master branch opens towards the end of May |
bbbd72c
to
086f7e8
Compare
Signed-off-by: Michael Shitrit <mshitrit@redhat.com>
086f7e8
to
f1e9ae5
Compare
/retest |
1 similar comment
/retest |
/bugzilla refresh |
@JoelSpeed: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is pretty much ready to go apart from the default.
Would like to make sure we have support upstream for this too, let's make sure we are having that conversation
install/0000_30_machine-api-operator_07_machinehealthcheck.crd.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Michael Shitrit <mshitrit@redhat.com>
/lgtm
Would like to pursue this (at least start the conversation) before we merge this if possible. We have just under 4 weeks til feature freeze, I think we could probably get some consensus one way or another by then |
Adding Marc, I assume he'll be involved in necessary upstream process. |
/retest |
1 similar comment
/retest |
@JoelSpeed , @elmiko |
…backoff # Conflicts: # pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
New changes are detected. LGTM label has been removed. |
/test all |
/retest |
/approve I think this is good and aligns with the enhancement document now. I'm keen to see if we can get agreement for this feature upstream as well. If we can hold off merging this for a little bit while we negotiate that, would be appreciated. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, +1 to what @JoelSpeed said
Leaving a link to the upstream discussiom here, I always have to search for it 😉 No activity since the last comment 2 weeks ago... |
…ged_failing # Conflicts: # pkg/apis/machine/v1beta1/zz_generated.deepcopy.go # pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Signed-off-by: Michael Shitrit <mshitrit@redhat.com>
6dad461
to
5fdbfdb
Compare
@mshitrit: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Closing this PR since upstream is distinctly uninterested in this feature and the burden of maintaining the delta indefinitely was considered disproportionate to the benefits gained |
This PR is the implementation of this enhancement.
The main purpose is to allow create a large enough time frame for an admin to fix certain issues by delaying the remediation time for machines that are in a failed stated and has no node.