DRA API: bump maximum size of ReservedFor to 256 #129543

pohly · 2025-01-09T10:59:03Z

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

The original limit of 32 seemed sufficient for a single GPU on a node. But for shared non-local resources it is too low. For example, a ResourceClaim might be used to allocate an interconnect channel that connects all pods of a workload running on several different nodes, in which case the number of pods can be considerably larger.

256 is high enough for currently planned systems. If we need something even higher in the future, an alternative approach might be needed to avoid scalability problems.

Which issue(s) this PR fixes:

Required for NVIDIA GB200 and potentially Google TPU use cases.

Special notes for your reviewer:

Normally, increasing such a limit would have to be done incrementally over two releases. In this case we decided on
Slack (https://kubernetes.slack.com/archives/CJUQN3E4T/p1734593174791519) to make an exception and apply this change to current master for 1.33 and backport it to the next 1.32.x patch release for production usage.

This breaks downgrades to a 1.32 release without this change if there are ResourceClaims with a number of consumers > 32 in ReservedFor. In practice, this breakage is very unlikely because there are no workloads yet which need so many consumers and such downgrades to a previous patch release are also unlikely. Downgrades to 1.31 already weren't supported when using DRA v1beta1.

Does this PR introduce a user-facing change?

DRA API: the maximum number of pods which can use the same ResourceClaim is now 256 instead of 32. Beware that downgrading a cluster where this relaxed limit is in use to Kubernetes 1.32.0 is not supported because 1.32.0 would refuse to update ResourceClaims with more than 32 entries in the status.reservedFor field.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/4381

/assign @thockin
/cc @liggitt @johnbelamaric @klueska

pohly · 2025-01-09T11:09:50Z

/hold

This and #129544 both need all necessary approvals before merging either of them.

klueska · 2025-01-09T11:10:40Z

Thanks @pohly.

As mentioned in the description we got preliminary support for this (unprecedented) update from @thockin and @liggitt in https://kubernetes.slack.com/archives/CJUQN3E4T/p1734593174791519

/lgtm
/approve

k8s-ci-robot · 2025-01-09T11:10:47Z

LGTM label has been added.

Git tree hash: f554f25029d1e3e9f7df66e32cb0154e6d9c9fd1

staging/src/k8s.io/api/resource/v1beta1/types.go

The original limit of 32 seemed sufficient for a single GPU on a node. But for shared non-local resources it is too low. For example, a ResourceClaim might be used to allocate an interconnect channel that connects all pods of a workload running on several different nodes, in which case the number of pods can be considerably larger. 256 is high enough for currently planned systems. If we need something even higher in the future, an alternative approach might be needed to avoid scalability problems. Normally, increasing such a limit would have to be done incrementally over two releases. In this case we decided on Slack (https://kubernetes.slack.com/archives/CJUQN3E4T/p1734593174791519) to make an exception and apply this change to current master for 1.33 and backport it to the next 1.32.x patch release for production usage. This breaks downgrades to a 1.32 release without this change if there are ResourceClaims with a number of consumers > 32 in ReservedFor. In practice, this breakage is very unlikely because there are no workloads yet which need so many consumers and such downgrades to a previous patch release are also unlikely. Downgrades to 1.31 already weren't supported when using DRA v1beta1.

ffromani · 2025-01-09T15:22:46Z

/triage accepted
/priority important-soon

fixing paperwork

k8s-triage-robot · 2025-01-09T20:11:04Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

pohly · 2025-01-09T20:19:40Z

/hold

Let me test one slow test more thoroughly which didn't run in the presubmit job...

liggitt · 2025-01-09T20:59:02Z

/retest

pohly · 2025-01-09T21:21:14Z

Let me test one slow test more thoroughly which didn't run in the presubmit job...

This needs a bit more time. I'll continue tomorrow.

We want to be sure that the maximum number of pods per claim are actually scheduled concurrently. Previously the test just made sure that they ran eventually. Running 256 pods only works on more than 2 nodes, so network-attached resources have to be used. This is what the increased limit is meant for anyway. Because of the tightened validation of node selectors in 1.32, the E2E test has to use MatchExpressions because they allow listing node names.

k8s-ci-robot · 2025-01-10T08:50:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska, pohly, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~api/OWNERS~~ [thockin]
~~pkg/apis/OWNERS~~ [thockin]
~~pkg/generated/openapi/OWNERS~~ [thockin]
~~staging/src/k8s.io/api/OWNERS~~ [thockin]
~~test/e2e/dra/OWNERS~~ [klueska,pohly,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2025-01-10T10:04:02Z

/hold cancel

E2E testing was updated to match the intended usage and passes for me locally with a kind cluster.

/test pull-kubernetes-node-e2e-containerd

Failed to get scheduled.

pohly · 2025-01-10T13:08:49Z

I was a bit confused about which of the updated DRA jobs run "slow" tests. The good news is that the updated "on multiple nodes with network-attached resources supports sharing a claim sequentially" test did run in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/129543/pull-kubernetes-kind-dra/1877639295167107072 and passed, so this PR should be good to be merged.

Whether it should be included a presubmit is a separate discussion - it really is slow at > 12 minutes (partly due to gomega.Consistently, partly because running all the pods takes time).

klueska · 2025-01-10T13:15:43Z

Thanks @pohly for updating the tests and verifying that they did indeed run in the CI

/lgtm

k8s-ci-robot · 2025-01-10T13:15:50Z

LGTM label has been added.

Git tree hash: 3a8b83e9997db839a19c2287c91aa2afb4aec258

pohly · 2025-01-10T13:59:05Z

/test pull-kubernetes-node-e2e-containerd

"when querying /stats/summary should report resource usage through the stats api" - unrelated.

…3-origin-release-1.32 Automated cherry pick of #129543: DRA API: bump maximum size of ReservedFor to 256

k8s-ci-robot assigned thockin Jan 9, 2025

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 9, 2025

k8s-ci-robot requested review from johnbelamaric, klueska and liggitt January 9, 2025 10:59

pohly mentioned this pull request Jan 9, 2025

Automated cherry pick of #129543: DRA API: bump maximum size of ReservedFor to 256 #129544

Merged

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2025

k8s-ci-robot assigned klueska Jan 9, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2025

sttts reviewed Jan 9, 2025

View reviewed changes

staging/src/k8s.io/api/resource/v1beta1/types.go Show resolved Hide resolved

pohly force-pushed the dra-reserved-for-limit branch from 800dc29 to 1cee368 Compare January 9, 2025 13:27

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2025

k8s-ci-robot requested a review from thockin January 9, 2025 13:27

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 9, 2025

This was referenced Jan 9, 2025

DRA: periodic jobs may run slow tests kubernetes/test-infra#34113

Merged

generate DRA job configs from a Jinja template kubernetes/test-infra#34010

Merged

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2025

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jan 10, 2025

k8s-ci-robot requested a review from sttts January 10, 2025 08:50

k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jan 10, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2025

k8s-ci-robot merged commit db1da72 into kubernetes:master Jan 10, 2025
16 checks passed

k8s-ci-robot added this to the v1.33 milestone Jan 10, 2025

k8s-ci-robot added a commit that referenced this pull request Jan 10, 2025

Merge pull request #129544 from pohly/automated-cherry-pick-of-#12954…

e616858

…3-origin-release-1.32 Automated cherry pick of #129543: DRA API: bump maximum size of ReservedFor to 256

pacoxu mentioned this pull request Feb 8, 2025

DRA: structured parameters kubernetes/enhancements#4381

Open

41 tasks

github-actions bot mentioned this pull request Oct 21, 2025

claude jjchange10/googlecloud#3

Merged

liggitt added this to API Reviews Dec 3, 2025

liggitt moved this to API review completed, 1.33 in API Reviews Dec 3, 2025

DRA API: bump maximum size of ReservedFor to 256 #129543

DRA API: bump maximum size of ReservedFor to 256 #129543

Uh oh!

Conversation

pohly commented Jan 9, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

pohly commented Jan 9, 2025

Uh oh!

klueska commented Jan 9, 2025

Uh oh!

k8s-ci-robot commented Jan 9, 2025

Uh oh!

Uh oh!

ffromani commented Jan 9, 2025

Uh oh!

k8s-triage-robot commented Jan 9, 2025

Uh oh!

pohly commented Jan 9, 2025

Uh oh!

liggitt commented Jan 9, 2025

Uh oh!

pohly commented Jan 9, 2025

Uh oh!

k8s-ci-robot commented Jan 10, 2025

Uh oh!

pohly commented Jan 10, 2025

Uh oh!

pohly commented Jan 10, 2025

Uh oh!

klueska commented Jan 10, 2025

Uh oh!

k8s-ci-robot commented Jan 10, 2025

Uh oh!

pohly commented Jan 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants