-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1864352: Add leader election mechanism to release 4.5 #664
Bug 1864352: Add leader election mechanism to release 4.5 #664
Conversation
Using leader election by default will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge. Relevant provider PRs: - openshift/cluster-api-provider-gcp#85 - openshift/cluster-api-provider-aws#315 - openshift/cluster-api-provider-azure#122 - openshift/cluster-api-provider-openstack#108 - openshift/cluster-api-provider-baremetal#81 - openshift/cluster-api-provider-ovirt#55 - openshift#571
The machine-api-controller components are refreshing their lease more than all other components combined. Bringing this to 90s each, will decrease etcd writes at idle.
The machine-api-controller components are refreshing their lease more than all other components combined. Bringing this to 90s each, will decrease etcd writes at idle.
The machine-api-controller components are refreshing their lease more than all other components combined. Bringing this to 90s each, will decrease etcd writes at idle.
The machine-api-controller components are refreshing their lease more than all other components combined. Bringing this to 90s each, will decrease etcd writes at idle.
Enforce this value in MAO deployment will ensure the value won't be overritten by changed CLI defaults in the future.
@elmiko: This pull request references Bugzilla bug 1864352, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
i am still testing this out, just wanted to get the cherry-picks in place. |
i've tested the mao, mhc, and machineset components. still working on a test for vsphere. |
we also need to make sure we don't break the baremetal, ovirt, and openstack providers by merging this as it adds the leader election command line flags to the deployments for the provider controllers. |
@elmiko: This pull request references Bugzilla bug 1864352, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
i would like to make sure we include these commits once they have landed in the master branch, #675 |
Prevent machine controllers from writing in etcd at idle too often by setting 30s retry and 90s deadline on all renewals. BZ 1858403
Prevent machine controllers from writing in etcd at idle too often by setting 30s retry and 90s deadline on all renewals. BZ 1858403
Prevent machine controllers from writing in etcd at idle too often by setting 60s retry and delay on all renewals. BZ 1858403
Prevent machine controllers from writing in etcd at idle too often by setting 60s retry and delay on all renewals. BZ 1858403
@elmiko: This pull request references Bugzilla bug 1864352, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@elmiko: This pull request references Bugzilla bug 1864352, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retitle Bug 1864352: Add leader election mechanism to release 4.5 |
@elmiko: This pull request references Bugzilla bug 1864352, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
not quite sure why the unit tests are failing now, but i do see similar failures locally. it looks like something related to the machinehealthcheck tests, @JoelSpeed any thoughts? |
/test unit |
seems like the unit sorted itself out, i think it had to do with the container setup and not the test themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
@JoelSpeed any idea how we get the cherry-pick approval to make tide happy? |
Patch manager should see that this PR is ready when they start the next merge window (Wed-Fri) for 4.5.z, then give it the label as appropriate. As this is marked as high it should be relatively high on their list to look at this week so will hopefully merge in the next few days. From our side, our job is done now, we just wait for the patch manager. |
@elmiko: All pull requests linked via external trackers have merged:
Bugzilla bug 1864352 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This cherry-picks the fix from master/4.6 and integrates with: openshift/machine-api-operator#664 Adding 3 new cli arguments for configuring leader elections: -leader-elect -leader-elect-lease-duration int -leader-elect-resource-namespace string Using leader election will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge. (cherry picked from commit 5ccc992)
This cherry-picks the fix from master/4.6 and integrates with: openshift/machine-api-operator#664 Adding 3 new cli arguments for configuring leader elections: -leader-elect -leader-elect-lease-duration int -leader-elect-resource-namespace string Using leader election will add stronger guarantees than we have today that only one controller is running at a time to protect against edge cases where the deployment replica could be increased or upgrades with permissive maxSurge. (cherry picked from commit 5ccc992)
This change brings in several cherry picks for the leader election mechanisms for all the controllers in this repo.
3d3b575
c939c92
5fdd6ef
48d9cce
fe7cbdc
495f3e7
0a56184
3f12419
a52a7ec
91251df
303f931
644ed9a
d6ec8b6
a72ad43
fda33fd
because this merge request will change the way that the controllers are deployed (adding leader election command line flags), we should make sure all the providers have the change in place first.
aws: openshift/cluster-api-provider-aws#342
azure: openshift/cluster-api-provider-azure#156
baremetal: openshift/cluster-api-provider-baremetal#106
gcp: openshift/cluster-api-provider-gcp#114
openstack: openshift/cluster-api-provider-openstack#117
ovirt:
vsphere: #664