-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1888073: prevent no-op hotlooping on Operators #1816
Bug 1888073: prevent no-op hotlooping on Operators #1816
Conversation
Hi @sjenning. Thanks for your PR. I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sjenning: This pull request references Bugzilla bug 1888073, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/bugzilla refresh |
@sjenning: This pull request references Bugzilla bug 1888073, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold Still some things to work out. It isn't recreating the Operator when it is deleted manually. And the unit test failure I think is due to not requeuing on failure. |
This PR failed 1 out of 1 times with 4 individual failed tests and 4 skipped tests. A test is considered flaky if failed on multiple commits. totaltestcount: 1
|
3d43b56
to
bdae8f8
Compare
|
bdae8f8
to
b85df58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, awesome work! Thanks again for finding this and patching it so quickly!
I have one last comment before I sign off. Let me know what you think.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ecordell, njhale, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
2 similar comments
/retest |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
12 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@sjenning: All pull requests linked via external trackers have merged: Bugzilla bug 1888073 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/cherry-pick release-4.6 |
@sjenning: only operator-framework org members may request cherry picks. You can still do the cherry-pick manually. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.6 |
@kevinrizza: new pull request created: #1822 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
xref https://bugzilla.redhat.com/show_bug.cgi?id=1888073
The
Operator
Reconciler does not detect when a reconciliation is triggered by its own Update and thus hotloops until it gets lucky enough to create the exact same rendering of theOperator
two times in a row. For largerOperator
resources, this is a unlikely and will hotloop forever.This hotloop creates a lot of strain on the kube-apiserver and etcd as well, since some
Operator
resources are very large. I observe about 500m of additional CPU cores usage per master in addition to 1 core of CPU used directly byolm-operator
hotlooping.olm-operator
also generates 1-2MB/s of network traffic in this state.Before patch with ACM Operator installed
After patch. The usage spike is during ACM installation, but it falls back down after installation is compelete, as expected.
@njhale @derekwaynecarr @dinhxuanvu