-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce topology into the runtimeClass API #75744
Introduce topology into the runtimeClass API #75744
Conversation
/sig scheduling |
212e4e5
to
352d9d5
Compare
352d9d5
to
d253246
Compare
any KEP or discussion about that? |
Ref kubernetes/enhancements#909 This is mostly a placeholder/wip as this is still on reviewv |
Please update this to match the KEP (NodeSelectorTerm -> NodeSelector) |
d253246
to
8f5e8fd
Compare
84b7afc
to
bd2b7aa
Compare
e00cf99
to
02288bc
Compare
b6685a9
to
3285db8
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meeting Notes From the API review meeting today with @thockin
- Why not something more generic? Maybe like a node capabilities struct (maybe on node resource), where pod requests node capability of "runtime windows?
- Not considered in this context. Considered SchedulingPolicy context.
- Don't want everything complicated (bespoke features) getting dumped on scheduler. In this case Scheduler will have new predicate for RuntimeClass.
- RuntimeClass comes with list of tolerations. Why is that not sufficient for this scheduling?
- Taint all windows nodes as windows? Because you could have overlap. Could have nodes run kata, gvisor, and some overlap that runC. Need tolerations for union.
- Why is Node Selector not good enough?
- If every pod had RuntimeClass and you have labels set up you don't need taints and tolerations. But you could make same argument for general taints and tolerations (if labels are set up on everything).
- Taints and tolerations let you set default instead of just match or not match.
- Why is it insufficient to label pod with node selector that says
OS==Windows
?- Original motivator in case of gvisor nodes -- don't want non-gvisor pods (non sandboxed) -- taint helps repel default pods.
- If we had defaulting could get rid of tolerations?
- Yes, all
*Class
types should have default. Worth looking in to howStorageClass
does it.
- Yes, all
- Talking about adding scheduler logic to add selector to intersect with node selector?
- Yes. Let us have a nice error message -- if you can't schedule due to what's on pod vs what's on runtime class.
- Back to original question: why not demand make this more generic. Node capabilities, and make RuntimeClass in to a node capabilities. Worth thinking about.
- Concept of error message is nice, but if we make it general, we don't need to do this again.
- What is time line?
- Hoping to get it in to 1.15 as beta. But we could call it alpha and put it behind a feature gate.
- General solution may be not that much more work. Let's set up a follow up discussion with SIG-Node @dchen1107.
- Why label selector on nodes -- instead of first class thing on nodes?
- Using labels means smaller change. Requires you to put selector in RuntimeClass so you have to agree on what label is on node.
- How does label get on node?
- Part of node setup or config (whoever is responsible for configuring node does it).
- Another reason, regular gvisor vs debug gvisor runtime. Add new runtime class, but don't want to update labels, want to reuse the same labels. "runc overhead" is the more interesting example.
Open questions:
- Generalize or not? Capability or not?
- Why use word
topology
-- it has lots of baggage?- Not tied to it, copied from StorageClass. Stay away from that word unless you mean physical structure of things, which this isn't representing.
Updated based on API review feedback, according to kubernetes/enhancements#1069 |
4d5a0cc
to
2e38485
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin, yastij The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind api-change
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes: Fixes #72413
Special notes for your reviewer:
/assign @tallclair
Does this PR introduce a user-facing change?: