Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short circuiting backoff #814

Closed
wants to merge 5 commits into from

Conversation

mshitrit
Copy link
Contributor

@mshitrit mshitrit commented Mar 2, 2021

This PR is the implementation of this enhancement.
The main purpose is to allow create a large enough time frame for an admin to fix certain issues by delaying the remediation time for machines that are in a failed stated and has no node.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign alexander-demichev after the PR has been reviewed.
You can assign the PR to them by writing /assign @alexander-demichev in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look ok, I've added a bunch of suggestions. I'd like to see some more thorough unit testing of this feature before we merge though, at the moment it's not obvious that we have test cases in place that cover the new behaviour

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mostly makes sense to me, just a question about the defaults.

pkg/apis/machine/v1beta1/machinehealthcheck_types.go Outdated Show resolved Hide resolved
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I went through this commit by commit, there may be some comments that are fixed by later commits.

Could you do a rebase and fix up the commits and commit messages so that the commits are atomic and have good descriptions

Please also add a decent description to the PR

No major blockers from my side though, mostly nits/ additional testing required

@elmiko
Copy link
Contributor

elmiko commented May 5, 2021

looks like this might need a rebase?

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2021
@mshitrit
Copy link
Contributor Author

mshitrit commented May 6, 2021

looks like this might need a rebase?

Hi @elmiko , any idea if you are planning to merge it soon ?
(I'd like to know whether to prioritize this)

@elmiko
Copy link
Contributor

elmiko commented May 6, 2021

Hi @elmiko , any idea if you are planning to merge it soon ?

good question. given that we don't have a bugzilla associated with this, i'm guessing we are waiting for 4.8 master to open, if that's the case then you have a couple weeks before we could merge it. (cc @JoelSpeed )

@JoelSpeed
Copy link
Contributor

Yep, this will have to wait until the 4.9 master branch opens towards the end of May

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2021
Signed-off-by: Michael Shitrit <mshitrit@redhat.com>
@mshitrit
Copy link
Contributor Author

/retest

1 similar comment
@mshitrit
Copy link
Contributor Author

/retest

@JoelSpeed
Copy link
Contributor

/bugzilla refresh
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 15, 2021

@JoelSpeed: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh
/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty much ready to go apart from the default.

Would like to make sure we have support upstream for this too, let's make sure we are having that conversation

Signed-off-by: Michael Shitrit <mshitrit@redhat.com>
@JoelSpeed
Copy link
Contributor

/lgtm

Would like to make sure we have support upstream for this too, let's make sure we are having that conversation

Would like to pursue this (at least start the conversation) before we merge this if possible. We have just under 4 weeks til feature freeze, I think we could probably get some consensus one way or another by then

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2021
@mshitrit
Copy link
Contributor Author

/lgtm

Would like to make sure we have support upstream for this too, let's make sure we are having that conversation

Would like to pursue this (at least start the conversation) before we merge this if possible. We have just under 4 weeks til feature freeze, I think we could probably get some consensus one way or another by then

Adding Marc, I assume he'll be involved in necessary upstream process.
/cc @slintes

@openshift-ci openshift-ci bot requested a review from slintes June 29, 2021 10:59
@mshitrit
Copy link
Contributor Author

/retest

1 similar comment
@mshitrit
Copy link
Contributor Author

/retest

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2021
@mshitrit
Copy link
Contributor Author

@JoelSpeed , @elmiko
Is this PR still relevant ?

@JoelSpeed
Copy link
Contributor

As far as I'm aware, this is still an issue that we wanted to solve. As far as I know, nothing else has solved this issue just yet.

CC @slintes @beekhof

…backoff

# Conflicts:
#	pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 23, 2021

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 23, 2021
@slintes
Copy link
Member

slintes commented Oct 11, 2021

/test all

@slintes
Copy link
Member

slintes commented Oct 20, 2021

/retest

@JoelSpeed
Copy link
Contributor

/approve

I think this is good and aligns with the enhancement document now. I'm keen to see if we can get agreement for this feature upstream as well. If we can hold off merging this for a little bit while we negotiate that, would be appreciated.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2021
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, +1 to what @JoelSpeed said

@slintes
Copy link
Member

slintes commented Nov 3, 2021

Leaving a link to the upstream discussiom here, I always have to search for it 😉
kubernetes-sigs/cluster-api#3106 (comment)

No activity since the last comment 2 weeks ago...

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 3, 2021
…ged_failing

# Conflicts:
#	pkg/apis/machine/v1beta1/zz_generated.deepcopy.go
#	pkg/controller/machinehealthcheck/machinehealthcheck_controller_test.go
Signed-off-by: Michael Shitrit <mshitrit@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2021

@mshitrit: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-workers-rhel7 7855ff4 link /test e2e-aws-workers-rhel7
ci/prow/e2e-vsphere 5fdbfdb link false /test e2e-vsphere
ci/prow/e2e-vsphere-serial 5fdbfdb link true /test e2e-vsphere-serial
ci/prow/e2e-vsphere-upgrade 5fdbfdb link false /test e2e-vsphere-upgrade
ci/prow/e2e-vsphere-operator 5fdbfdb link false /test e2e-vsphere-operator
ci/prow/e2e-gcp-operator 5fdbfdb link false /test e2e-gcp-operator
ci/prow/e2e-aws-disruptive 5fdbfdb link false /test e2e-aws-disruptive
ci/prow/e2e-metal-ipi-ovn-ipv6 5fdbfdb link false /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mshitrit
Copy link
Contributor Author

Closing this PR since upstream is distinctly uninterested in this feature and the burden of maintaining the delta indefinitely was considered disproportionate to the benefits gained
Jira Link

@mshitrit mshitrit closed this Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants