Skip to content

Conversation

@helayoty
Copy link
Member

@helayoty helayoty commented Aug 11, 2025

  • One-line PR description: Add numeric comparison operators (Lt, Gt) to Tolerations for SLA-based scheduling with threshold-based taint matching.
  • Other comments: cc @kubernetes/sig-scheduling-misc @kubernetes/sig-apps-misc

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Aug 11, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 11, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Aug 11, 2025
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 11, 2025
@k8s-ci-robot k8s-ci-robot requested a review from dom4ha August 11, 2025 21:48
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Scheduling Aug 11, 2025
@k8s-ci-robot k8s-ci-robot requested a review from macsko August 11, 2025 21:48
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 11, 2025
@helayoty helayoty force-pushed the helayoty/enable-sla-based-schedule branch from 2a36559 to c9e75ba Compare August 15, 2025 23:18
@helayoty helayoty moved this from Needs Triage to In Progress in SIG Scheduling Aug 15, 2025
@macsko
Copy link
Member

macsko commented Aug 22, 2025

/cc @dom4ha @sanposhiho

@helayoty helayoty requested a review from everpeace August 22, 2025 15:39
@helayoty helayoty requested a review from stlaz October 13, 2025 17:03
@macsko
Copy link
Member

macsko commented Oct 14, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2025
Comment on lines 659 to 662
- Upgrade
- Enable the feature gate in both API Server and Scheduler.
- Downgrade
- Disable the feature gate in both API Server and Scheduler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the correct order of the components to enable the feature gate, then? First the kube-apiserver, then the scheduler? Is the downgrade ordering the same?


Impact on existing pods with Gt/Lt operators when feature is disabled:

1. **Already-running pods**: Continue running normally. The kubelet doesn't need to re-evaluate tolerations for running pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if somebody wants to update one of the pod's mutable fields/annotations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in (4.), user won't be able to update the pod at all, even for mutable fields like annotations or labels.

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2025
@helayoty helayoty requested a review from stlaz October 14, 2025 16:42
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but PRR is mostly complete for alpha.

name: flexible-sla-workload
spec:
tolerations:
# Accept nodes with SLA >= 900 (SLA = 900 OR SLA > 900)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: but for consistency Gt is not SLA >= 900, it's SLA > 900, right?


Extend **core/v1 Toleration** to support **numeric comparison operators** when matching **Node Taints**:

- New operators: `Lt`, `Gt` (in addition to existing `Equal`/`Exists`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding that we already use Lt and Gt for node selectors, so our users are familiar with these.

- Parse integers only when new operators are used.
- Existing `Equal`/`Exists` operators execute identical code paths with no additional overhead.
- Consider caching parsed values in scheduler data structures if performance issues arise
- Feature gate allows disabling if performance problems occur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one additional important mitigation, everybody using numeric values currently will ONLY use the currently available operators. Thus using the numeric operators requires at minimum changing the operator, at which point the validation should kick in and catch the problem. So I hope this should not be a problem. Although the question is what kind of validation currently exists around the operators, if only Exists and Equal were allowed you should be good, if the validation is not that strict the risk is real.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current validation is strict and it explicity rejects any operator that isn't Equal or Exists. So I believe this mitigation is good, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, just add that information to this doc in that case. Strict is always good and helps when we're expanding functionality, like here 😄


- Clear documentation and examples showing proper numeric taint configuration
- Enhanced error messages in scheduling events that clearly indicate parsing failures
- Users can use the metric to set up alerts and monitoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How pod with numeric operator will be dealt with in this situation? Iow. node has node.kubernetes.io/sla=high and pod has gt 900, what happens in that case? Are you going to fail the pod? Are you planning to fall-back to the previous behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod isn't rejected entirely, but won't match it on that particular taint. I've updated the Notes/Constraints/Caveats section to clear this case and updated the Taint Misconfiguration Detection risk case also.

- The toleration filter returns `false` (doesn't match)
- Pod is considered to have untolerated taints
- Filter returns `UnschedulableAndUnresolvable` status
- Pod remains in Pending state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this answers my previous question about the new operators and how they are treated.

4. **General Scheduler Tests:** (`scheduler_test.go`):
- Dynamic taint addition/removal
- Pod rescheduling after taint changes
- Integration with NodeAffinity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further in the doc you're mentioning feature gate on/off tests, can you mention it here?

- Force deletion may be required: `kubectl delete pod <name> --force --grace-period=0`
3. **Workload controllers** (Deployments, StatefulSets, etc.):
- If the pod template uses Gt/Lt operators, the controller cannot create new pods
- Rolling updates will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this risk wasn't mentioned earlier. If any of the controllers is trying to use the disabled operators the controller will hot-loop, trying to created a pod that will always fail validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this risk to the Risks and Mitigations section.

- Users might set wrong field or both fields accidentally
- Complex validation logic for field combinations
- Memory/storage overhead for additional field
- API complexity and documentation burden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll play devil's advocate, have you considered using the current mechanism such that it only works based on existing operators? Iow. Node can publish node.kubernetes.io/sla=950, and pods will just use sla equal 950. What are the pros and cons of such approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Added this alternative with all pros/cons. PTAL

Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 15, 2025
@helayoty helayoty requested a review from soltysh October 15, 2025 14:32
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Signed-off-by: Heba Elayoty <heelayot@microsoft.com>
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
the PRR section

- Parse integers only when new operators are used.
- Existing `Equal`/`Exists` operators execute identical code paths with no additional overhead.
- Consider caching parsed values in scheduler data structures if performance issues arise
- Feature gate allows disabling if performance problems occur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, just add that information to this doc in that case. Strict is always good and helps when we're expanding functionality, like here 😄

@soltysh
Copy link
Contributor

soltysh commented Oct 15, 2025

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 15, 2025
@sanposhiho
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: helayoty, sanposhiho, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2c685f9 into kubernetes:master Oct 16, 2025
4 checks passed
@github-project-automation github-project-automation bot moved this from Needs Review to Done in SIG Apps Oct 16, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SIG Scheduling Oct 16, 2025
@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Oct 16, 2025
@helayoty helayoty deleted the helayoty/enable-sla-based-schedule branch November 18, 2025 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.

10 participants