Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle #100

Merged
merged 1 commit into from
Aug 27, 2020

Conversation

elmiko
Copy link

@elmiko elmiko commented Aug 25, 2020

Prevent machine controllers from writing in etcd at idle too often by
setting 120s lease, 20s retry and 110s deadline on all renewals. Higher
values cause tests to flake.

Inspired by openshift/cloud-credential-operator#231
Depends on openshift/machine-api-operator#675

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 25, 2020
@openshift-ci-robot
Copy link

@elmiko: This pull request references Bugzilla bug 1858400, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@elmiko: This pull request references Bugzilla bug 1858400, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@elmiko
Copy link
Author

elmiko commented Aug 25, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link

@elmiko: This pull request references Bugzilla bug 1858400, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 25, 2020
@Danil-Grigorev
Copy link

@dhellmann Do you have a moment to review? We'd need to backport this implementation into 4.5, but before that, this BZ should be fixed.

@elmiko
Copy link
Author

elmiko commented Aug 25, 2020

@dhellmann Do you have a moment to review? We'd need to backport this implementation into 4.5, but before that, this BZ should be fixed.

i was going to add the backports as my next step, i had been using this bug to report under https://bugzilla.redhat.com/show_bug.cgi?id=1861896

basically i have been backporting the leader elect mechanisms and bringing all these timing changes at once.

@stbenjam
Copy link
Member

/retest

@dhellmann
Copy link

The change looks fine. It sounds like there's a known issue with the CI job that isn't related to this PR.

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2020
@elmiko
Copy link
Author

elmiko commented Aug 26, 2020

/retest

@Danil-Grigorev
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2020
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2020
@elmiko
Copy link
Author

elmiko commented Aug 27, 2020

updating with 90s retry time
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2020
@openshift-ci-robot
Copy link

@elmiko: This pull request references Bugzilla bug 1858400, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Prevent machine controllers from writing in etcd at idle too often by
setting 120s lease, 20s retry and 110s deadline on all renewals. Higher
values cause tests to flake.
@elmiko
Copy link
Author

elmiko commented Aug 27, 2020

after some discussion with the team we have agreed that 20s retry period is acceptable for now. i have fixed the pr
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2020
@openshift-ci-robot
Copy link

@elmiko: This pull request references Bugzilla bug 1858400, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Danil-Grigorev
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Danil-Grigorev, dhellmann, elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 2ee62a5 into openshift:master Aug 27, 2020
@openshift-ci-robot
Copy link

@elmiko: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state.

Bugzilla bug 1858400 has not been moved to the MODIFIED state.

In response to this:

BUG 1858400: [Performance] Lease refresh period for machine-api-controllers is too high, causes heavy writes to etcd at idle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

honza pushed a commit to honza/cluster-api-provider-baremetal that referenced this pull request Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants