-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3094: take toleration/taints into considering when computing skew #3105
KEP-3094: take toleration/taints into considering when computing skew #3105
Conversation
kerthcet
commented
Jan 5, 2022
- One-line PR description: take toleration/taints into considering when computing skew
- Issue link: Take taints/tolerations into consideration when calculating PodTopologySpread skew #3094
- Other comments: None
99f3136
to
4fdf46a
Compare
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
Please let us know when the latest suggestions from kubernetes/kubernetes#106127 are applied |
got it. |
Please take a review for the first round @alculquicondor @Huang-Wei , glad to hear all your advices. |
also cc @wojtek-t |
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/kep.yaml
Show resolved
Hide resolved
updated the proposal as advised. |
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
9e44d0f
to
214d511
Compare
Signed-off-by: kerthcet <kerthcet@gmail.com>
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
Signed-off-by: kerthcet <kerthcet@gmail.com>
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Show resolved
Hide resolved
Signed-off-by: kerthcet <kerthcet@gmail.com>
/approve |
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
- A spike on failure events with keyword "failed spreadConstraint" in scheduler log. | ||
|
||
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? | ||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not N/A - it should definitely be tested. For API changes and things that we actually store in etcd it's especially importnt.
Though not a blocker for now, but it will be for Beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Wojciech, updated.
Signed-off-by: kerthcet <kerthcet@gmail.com>
@wojtek-t please take a look again, updated as advised, thanks. |
kindly ping @wojtek-t |
@Huang-Wei you are also listed as approver/reviewer. Do you want to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Some wording suggestions as well as suggestions on detailing the test scope.
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md
Outdated
Show resolved
Hide resolved
--> | ||
#### Alpha | ||
- Feature implemented behind feature gate. | ||
- Unit and integration tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe details the test scope: validation, defaulting, API field's enforcement/removal, functional covering. There should be examples in other KEPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mentioned above, let me updated the words to Unit and integration tests passed as designed in [TestPlan](#test-plan).
Is it enough?
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
This will require a bit more work for Beta, but it's fine for Alpha. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, kerthcet, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'll update the origin issue for reminding. |
issue: - consider we have three nodes in a zone and one of the nodes (bigger) which is cordoned has 5 OSDs running, two other nodes (smaller) are not running OSDs - assume the region has three such zones in same config - now if we evict OSDs from the cordoned node and have tsc at hostname level to satisfy the constraint all OSDs should be running on one of smaller nodes which isn't possible due to less resources - due to this we can't ever evict pods from the bigger node if tsc takes into account of cordoned nodes as well rc: - we don't have a way to take tainted nodes into consideration in tsc calculations until k8s 1.26 [0] fix: - set tsc at zone level which effectively counts number of OSDs running per zone even with cordon nodes - as a result we can have 5 OSDs running in a zone irrespective of bigger/smaller nodes [0]: kubernetes/enhancements/pull/3105 Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>