Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3094: take toleration/taints into considering when computing skew #3105

Conversation

kerthcet
Copy link
Member

@kerthcet kerthcet commented Jan 5, 2022

  • One-line PR description: take toleration/taints into considering when computing skew
  • Other comments: None

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 5, 2022
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jan 5, 2022
@kerthcet kerthcet changed the title KEP: take toleration/taints into considering when computing skew KEP-3094: take toleration/taints into considering when computing skew Jan 5, 2022
@kerthcet kerthcet force-pushed the feature/filter-nodes-in-pod-topology-spread branch from 99f3136 to 4fdf46a Compare January 5, 2022 08:18
@alculquicondor
Copy link
Member

Please let us know when the latest suggestions from kubernetes/kubernetes#106127 are applied

@kerthcet
Copy link
Member Author

kerthcet commented Jan 7, 2022

Please let us know when the latest suggestions from kubernetes/kubernetes#106127 are applied

got it.

@kerthcet
Copy link
Member Author

Please take a review for the first round @alculquicondor @Huang-Wei , glad to hear all your advices.

@kerthcet
Copy link
Member Author

also cc @wojtek-t

@kerthcet
Copy link
Member Author

updated the proposal as advised.

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
@kerthcet kerthcet force-pushed the feature/filter-nodes-in-pod-topology-spread branch from 9e44d0f to 214d511 Compare January 13, 2022 05:42
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
@wojtek-t wojtek-t self-assigned this Jan 17, 2022
Signed-off-by: kerthcet <kerthcet@gmail.com>
@alculquicondor
Copy link
Member

/approve
/assign @wojtek-t

- A spike on failure events with keyword "failed spreadConstraint" in scheduler log.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
N/A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not N/A - it should definitely be tested. For API changes and things that we actually store in etcd it's especially importnt.
Though not a blocker for now, but it will be for Beta.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Wojciech, updated.

Signed-off-by: kerthcet <kerthcet@gmail.com>
@kerthcet
Copy link
Member Author

@wojtek-t please take a look again, updated as advised, thanks.

@kerthcet
Copy link
Member Author

kindly ping @wojtek-t

@alculquicondor
Copy link
Member

@Huang-Wei you are also listed as approver/reviewer. Do you want to take a look?

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Some wording suggestions as well as suggestions on detailing the test scope.

-->
#### Alpha
- Feature implemented behind feature gate.
- Unit and integration tests passed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe details the test scope: validation, defaulting, API field's enforcement/removal, functional covering. There should be examples in other KEPs.

Copy link
Member Author

@kerthcet kerthcet Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mentioned above, let me updated the words to Unit and integration tests passed as designed in [TestPlan](#test-plan). Is it enough?

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
@wojtek-t
Copy link
Member

This will require a bit more work for Beta, but it's fine for Alpha.

/lgtm
/approve PRR

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 27, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, kerthcet, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2022
@k8s-ci-robot k8s-ci-robot merged commit b5afb53 into kubernetes:master Jan 27, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Jan 27, 2022
@kerthcet
Copy link
Member Author

This will require a bit more work for Beta, but it's fine for Alpha.

/lgtm /approve PRR

I'll update the origin issue for reminding.

@kerthcet kerthcet deleted the feature/filter-nodes-in-pod-topology-spread branch February 14, 2022 03:14
leelavg added a commit to leelavg/ocs-osd-deployer that referenced this pull request Feb 12, 2023
issue:
- consider we have three nodes in a zone and one of the nodes (bigger) which is
  cordoned has 5 OSDs running, two other nodes (smaller) are not running OSDs
- assume the region has three such zones in same config
- now if we evict OSDs from the cordoned node and have tsc at hostname
  level to satisfy the constraint all OSDs should be running on one of
  smaller nodes which isn't possible due to less resources
- due to this we can't ever evict pods from the bigger node if tsc takes
  into account of cordoned nodes as well

rc:
- we don't have a way to take tainted nodes into consideration in tsc
  calculations until k8s 1.26 [0]

fix:
- set tsc at zone level which effectively counts number of OSDs running
  per zone even with cordon nodes
- as a result we can have 5 OSDs running in a zone irrespective of
  bigger/smaller nodes

[0]: kubernetes/enhancements/pull/3105

Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants