Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature to allow optionally setting taints based on node properties #540

Closed
rptaylor opened this issue Jun 10, 2021 · 15 comments · Fixed by #910
Closed

feature to allow optionally setting taints based on node properties #540

rptaylor opened this issue Jun 10, 2021 · 15 comments · Fixed by #910
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Milestone

Comments

@rptaylor
Copy link

rptaylor commented Jun 10, 2021

What would you like to be added:

It would be nice if NFD could be configured with options to set node taints as well as labels, based on certain features of nodes. Would you consider that in scope of NFD?

Why is this needed:
Cluster operators may wish to automatically taint nodes with certain features, for example tainting a node that has GPUs to prevent other pods from running on it if they don't actually need (tolerate) the GPUs.

@rptaylor rptaylor added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 10, 2021
@marquiz
Copy link
Contributor

marquiz commented Jun 10, 2021

Hi @rptaylor. Yes, this would be useful and I've been thinking this myself as part of the work I've done on #464 and particularly #468 which are very much on prototype level, still.

I've done some initial experiments and started to think whether it should be possible to taint only some of the nodes (a configurable proportion). WDYT? This complicates things (in implementation) quite a bit, though so maybe that would be a future enhancement.

@rptaylor
Copy link
Author

rptaylor commented Jun 10, 2021

Okay, nice @marquiz . It makes sense to me that NFD could have the flexibility and generalization to apply arbitrary properties (taints as well as labels) based on the features of nodes.

What would be the use case to only taint a portion of nodes with a given feature and configuration?

@marquiz
Copy link
Contributor

marquiz commented Jun 11, 2021

What would be the use case to only taint a portion of nodes with a given feature and configuration?

Reserving some of the nodes for general usage or alternatively reserving only a fraction of the nodes for special workloads. Dunno if that is useful in practice 🤔

@zvonkok
Copy link
Contributor

zvonkok commented Jun 11, 2021

To taint only a subset of nodes in a cluster makes kinda sense for extended resources. If a Pod is allocating an extended resource, the ExtendedResourceAdmissionWebhook will automatically tolerate the extended resource taint.

But we need to be careful when we're applying this taint. In some cases of hardware enablement usually adding the taint is when the extended resources are exposed.

It also depends on how you want to partition your cluster. We used taints and tolerations for a "hard-partitioning" meaning no workloads allowed that are not tolerating the taint, repelling workloads.

Or using "soft-partitioning" e.g. with priority classes to have mixed workloads but special workloads could have higher priorities.

Another use-case would be behavioural partitioning, let's say e.g. you have one cluster and want to do some AI/ML pipeline, one could imagine to taint some nodes as inference and others as training or data-lake resembling a pipeline in one cluster rather then having several clusters each for "one" specific feature.

@rptaylor
Copy link
Author

If the extended resources are equivalent on a number of nodes, making the nodes fungible, it doesn't make sense to me to divide them into separate hard partitions. In a traditional batch system partitioning creates significant challenges in practice, especially at large scales, and this would be handled by fair-share scheduling instead, but that is a big missing feature of Kubernetes (I think Volcano may have this). PriorityClasses are not enough.

It is a fundamental trade-off in scheduling theory between latency and throughput; partitioning will inevitably reduce usage efficiency (throughput) but can improve latency (nodes are reserved for you so available right away), but that has to be balanced against the risk of filling up your partition and the probably larger benefit of being able to use other partitions when available, if all nodes were in the same shared pool instead.

Even with a relatively steady state workload (as opposed to dynamic and bursty), wouldn't it be better to use resource quotas for each app (inference/training/etc) as a floating reservation across any available node rather than locking certain apps to a specific subset of nodes?
Anyway my perspective is from a scientific HPC background, other situations could have totally different needs and considerations that I am not familiar with. Best to build a tool that provides sufficient options so anyone can use it however they need :)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2021
@marquiz
Copy link
Contributor

marquiz commented Sep 10, 2021

I have plans to implement this on top of #553.

@rptaylor I think I agree with you above. Partial/proportional tainting is much more complicated with problematic corner cases (e.g. with cluster auto-scaling), not to mention the problems of optimal scheduling and resource usage you talked about above.

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2021
@marquiz marquiz added this to the v0.11.0 milestone Nov 26, 2021
@marquiz
Copy link
Contributor

marquiz commented Jan 14, 2022

For consistency, we'd need to support this for both nfd-worker config (configuration of custom source), I think. This means that we need to update our gRPC interface, too, to send the taints from worker to master. Also, we prolly need to add an annotation for bookkeeping, (similar to nfd.node.kubernetes.io/feature-labels and nfd.node.kubernetes.io/extended-resources)

@marquiz marquiz modified the milestones: v0.11.0, v.0.12.0 Mar 17, 2022
@marquiz
Copy link
Contributor

marquiz commented Mar 17, 2022

Moving to v0.12.0

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2022
@marquiz
Copy link
Contributor

marquiz commented Jul 8, 2022

We still want this. Not a huge deal in terms of implementation but somebody® just has to do it

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 8, 2022
@fmuyassarov
Copy link
Member

I'm interested to work on this.
/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2022
@fmuyassarov
Copy link
Member

/remove-lifecycle stale
/lifecycle active

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 29, 2022
@fmuyassarov
Copy link
Member

fmuyassarov commented Nov 29, 2022

this is being reviewed right now in #910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet
6 participants