Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating that flavors of a resource are different #58

Open
ahg-g opened this issue Feb 24, 2022 · 11 comments
Open

Validating that flavors of a resource are different #58

ahg-g opened this issue Feb 24, 2022 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Feb 24, 2022

What if we validate that the flavors of a resource in a capacity have at least on common label key with different values?

This practically forces that each flavor is pointing to different sets of nodes.

@ahg-g ahg-g added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 24, 2022
@denkensk
Copy link
Member

denkensk commented Feb 24, 2022

Is the "common label key" officially defined or user-defined ?

@ahg-g
Copy link
Contributor Author

ahg-g commented Feb 24, 2022

User defined, we can validate that on create.

@alculquicondor
Copy link
Contributor

I worry that this could be limiting. In GKE, we don't have a label for non-spot VMs. They are rather defined by the absence of the label. Should we add a label selector instead?

I suspect there might be similar scenarios in other environments.

@ahg-g
Copy link
Contributor Author

ahg-g commented Mar 1, 2022

Let me state the problem that I am trying to solve with this:

We need to force the scheduler to place the pods on the selected flavor; currently this is done by injecting node affinity of the selected flavor labels, if a common label doesn't exist, then there is nothing forcing the scheduler to comply with Kueue's decision.

We discussed adding a selector to flavors instead of labels at the beginning; which would allow having NotIn selectors for the flavors that don't have the selected label, but I think selector is not intuitive to think about in this context and is not synonym with how we do that for nodes (which in a sense represent resource flavors).

Ideally flavors should have labels that uniquely distinguishes them, perhaps for flavors with missing labels we inject NotIn affinity constraints.

@alculquicondor
Copy link
Contributor

We can have "flavors have labels that uniquely distinguish them" as a best practice, but it's actually not trivial to enforce if we add a ResourceFlavor CRD #59.

To increase generality of the API, we could still have a label selector but limit the operators to the ones that make sense.

is not synonym with how we do that for nodes (which in a sense represent resource flavors).

Can you expand on this?

@ahg-g
Copy link
Contributor Author

ahg-g commented Mar 15, 2022

My problem is this: flavors look very artificial if we don't force that they are distinct and map to different types of nodes. For example, even if we define a taint for spot flavor, without unique labels, jobs assigned to this flavor could land on on-demand nodes, this is not a good experience.

To increase generality of the API, we could still have a label selector but limit the operators to the ones that make sense.

hmm, this will allow flavors with overlapping nodes and not necessarily force that flavors are different.

Can you expand on this?

I think of different flavors as different types of nodes. We don't identify nodes with selectors, we do with labels.

Mapping flavors to different node types will be the most common case to using flavors, and I feel we need to be opinionated about it and I also think this will offer better out of the box experience.

@alculquicondor
Copy link
Contributor

I feel we need to be opinionated about it and I also think this will offer better out of the box experience.

How much of it should actually be enforced vs stated as best practices.

Since we are going to split ResourceFlavors into a CRD, it don't think it's easy to enforce.

@ahg-g
Copy link
Contributor Author

ahg-g commented Mar 16, 2022

How much of it should actually be enforced vs stated as best practices.

I think it needs to be enforced, at least the default should be to enforce it.

Since we are going to split ResourceFlavors into a CRD, it don't think it's easy to enforce.

A bit aggressive, but we can do the following: each flavor will have a label added using a common key (flavor.kueue.x-k8s.io/${resource}) and a value of the flavor name (were ${resource} is "cpu", "memory" etc.).

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 14, 2022
@ahg-g
Copy link
Contributor Author

ahg-g commented Jul 14, 2022

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants