-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5471 Extended Toleration Operators for Threshold-Based Placement #5473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5471 Extended Toleration Operators for Threshold-Based Placement #5473
Conversation
2a36559 to
c9e75ba
Compare
|
/cc @dom4ha @sanposhiho |
|
/lgtm |
| - Upgrade | ||
| - Enable the feature gate in both API Server and Scheduler. | ||
| - Downgrade | ||
| - Disable the feature gate in both API Server and Scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the correct order of the components to enable the feature gate, then? First the kube-apiserver, then the scheduler? Is the downgrade ordering the same?
|
|
||
| Impact on existing pods with Gt/Lt operators when feature is disabled: | ||
|
|
||
| 1. **Already-running pods**: Continue running normally. The kubelet doesn't need to re-evaluate tolerations for running pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if somebody wants to update one of the pod's mutable fields/annotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in (4.), user won't be able to update the pod at all, even for mutable fields like annotations or labels.
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
soltysh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, but PRR is mostly complete for alpha.
| name: flexible-sla-workload | ||
| spec: | ||
| tolerations: | ||
| # Accept nodes with SLA >= 900 (SLA = 900 OR SLA > 900) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: but for consistency Gt is not SLA >= 900, it's SLA > 900, right?
|
|
||
| Extend **core/v1 Toleration** to support **numeric comparison operators** when matching **Node Taints**: | ||
|
|
||
| - New operators: `Lt`, `Gt` (in addition to existing `Equal`/`Exists`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding that we already use Lt and Gt for node selectors, so our users are familiar with these.
| - Parse integers only when new operators are used. | ||
| - Existing `Equal`/`Exists` operators execute identical code paths with no additional overhead. | ||
| - Consider caching parsed values in scheduler data structures if performance issues arise | ||
| - Feature gate allows disabling if performance problems occur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one additional important mitigation, everybody using numeric values currently will ONLY use the currently available operators. Thus using the numeric operators requires at minimum changing the operator, at which point the validation should kick in and catch the problem. So I hope this should not be a problem. Although the question is what kind of validation currently exists around the operators, if only Exists and Equal were allowed you should be good, if the validation is not that strict the risk is real.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current validation is strict and it explicity rejects any operator that isn't Equal or Exists. So I believe this mitigation is good, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, just add that information to this doc in that case. Strict is always good and helps when we're expanding functionality, like here 😄
|
|
||
| - Clear documentation and examples showing proper numeric taint configuration | ||
| - Enhanced error messages in scheduling events that clearly indicate parsing failures | ||
| - Users can use the metric to set up alerts and monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How pod with numeric operator will be dealt with in this situation? Iow. node has node.kubernetes.io/sla=high and pod has gt 900, what happens in that case? Are you going to fail the pod? Are you planning to fall-back to the previous behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod isn't rejected entirely, but won't match it on that particular taint. I've updated the Notes/Constraints/Caveats section to clear this case and updated the Taint Misconfiguration Detection risk case also.
| - The toleration filter returns `false` (doesn't match) | ||
| - Pod is considered to have untolerated taints | ||
| - Filter returns `UnschedulableAndUnresolvable` status | ||
| - Pod remains in Pending state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this answers my previous question about the new operators and how they are treated.
| 4. **General Scheduler Tests:** (`scheduler_test.go`): | ||
| - Dynamic taint addition/removal | ||
| - Pod rescheduling after taint changes | ||
| - Integration with NodeAffinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further in the doc you're mentioning feature gate on/off tests, can you mention it here?
| - Force deletion may be required: `kubectl delete pod <name> --force --grace-period=0` | ||
| 3. **Workload controllers** (Deployments, StatefulSets, etc.): | ||
| - If the pod template uses Gt/Lt operators, the controller cannot create new pods | ||
| - Rolling updates will fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this risk wasn't mentioned earlier. If any of the controllers is trying to use the disabled operators the controller will hot-loop, trying to created a pod that will always fail validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this risk to the Risks and Mitigations section.
| - Users might set wrong field or both fields accidentally | ||
| - Complex validation logic for field combinations | ||
| - Memory/storage overhead for additional field | ||
| - API complexity and documentation burden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll play devil's advocate, have you considered using the current mechanism such that it only works based on existing operators? Iow. Node can publish node.kubernetes.io/sla=950, and pods will just use sla equal 950. What are the pros and cons of such approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Added this alternative with all pros/cons. PTAL
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
soltysh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
the PRR section
| - Parse integers only when new operators are used. | ||
| - Existing `Equal`/`Exists` operators execute identical code paths with no additional overhead. | ||
| - Consider caching parsed values in scheduler data structures if performance issues arise | ||
| - Feature gate allows disabling if performance problems occur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, just add that information to this doc in that case. Strict is always good and helps when we're expanding functionality, like here 😄
|
/label tide/merge-method-squash |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: helayoty, sanposhiho, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Uh oh!
There was an error while loading. Please reload this page.