Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: Extend PodResources to include resources from Dynamic Resource Allocation #3695

Open
4 of 8 tasks
klueska opened this issue Dec 14, 2022 · 48 comments
Open
4 of 8 tasks
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Comments

@klueska
Copy link
Contributor

klueska commented Dec 14, 2022

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Dec 14, 2022
@klueska klueska moved this to Ongoing Enhancements in @klueska's k8s review queue Jan 30, 2023
@klueska
Copy link
Contributor Author

klueska commented Feb 2, 2023

/milestone v1.27
/label lead-opted-in

@k8s-ci-robot
Copy link
Contributor

@klueska: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone v1.27
/label lead-opted-in

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@klueska: Can not set label lead-opted-in: Must be member in one of these teams: [release-team-enhancements release-team-leads sig-api-machinery-leads sig-apps-leads sig-architecture-leads sig-auth-leads sig-autoscaling-leads sig-cli-leads sig-cloud-provider-leads sig-cluster-lifecycle-leads sig-contributor-experience-leads sig-docs-leads sig-instrumentation-leads sig-k8s-infra-leads sig-multicluster-leads sig-network-leads sig-node-leads sig-release-leads sig-scalability-leads sig-scheduling-leads sig-security-leads sig-storage-leads sig-testing-leads sig-windows-leads]

In response to this:

/milestone v1.27
/label lead-opted-in

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SergeyKanzhelev
Copy link
Member

/milestone v1.27
/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 2, 2023
@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Feb 2, 2023
@dchen1107
Copy link
Member

/label lead-opted-in

I had trouble to add lead-opted-in last couple of days. Trying it one more time ...

@SergeyKanzhelev
Copy link
Member

/stage alpha

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Feb 3, 2023
@marosset
Copy link
Contributor

marosset commented Feb 3, 2023

Hello @klueska 👋, Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PDT Thursday 9th February 2023.

This enhancement is targeting for stage alpha for v1.27 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: v1.27
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

For this enhancement, it looks like #3738 will address the remaining requirements.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well.
Thank you!

@marosset
Copy link
Contributor

marosset commented Feb 9, 2023

This enhancement meets all of the requirements to be tracked in v1.27.
Thanks!

@marosset marosset moved this from At Risk to Tracked in 1.27 Enhancements Tracking Feb 9, 2023
@marosset marosset added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Feb 9, 2023
@marosset
Copy link
Contributor

marosset commented Mar 8, 2023

Hi @klueska 👋,

Checking in as we approach 1.27 code freeze at 17:00 PDT on Tuesday 14th March 2023.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

Please let me know if there are any other PRs in k/k I should be tracking for this KEP.

As always, we are here to help should questions come up. Thanks!

@klueska
Copy link
Contributor Author

klueska commented Mar 8, 2023

This is a dependent PR for the one you listed -- I have updated the description to include it:
kubernetes/kubernetes#115912

@Rishit-dagli
Copy link
Member

Hi @klueska 👋, I’m reaching out from the 1.27 Release Docs team. This enhancement is marked as ‘Needs Docs’ for the 1.27 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.27 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by March 16. For more information, please take a look at Documenting for a release to familiarize yourself with the documentation requirements for the release.
Please feel free to reach out with any questions. Thanks!

@klueska
Copy link
Contributor Author

klueska commented Mar 15, 2023

Docs placeholder added in description

@SergeyKanzhelev
Copy link
Member

Based on SIG Node meeting on 05/02/2023 we do NOT plan this for 1.28 release. Please comment otherwise.

@klueska
Copy link
Contributor Author

klueska commented May 7, 2023

@moshe010 I don't think we can progress this to beta until DRA itself progresses to beta.

@moshe010
Copy link
Contributor

moshe010 commented May 7, 2023

@klueska I wasn't in the SIG Node meeting on 05/02/2023 which it was discussed and I never request this to be beta in 1.28. In the kep we stated that following: [1]
alpha: "v1.27"
beta: "v1.30"
stable: "v1.32"

[1] - https://github.com/kubernetes/enhancements/pull/3915/files#diff-11e83115a85d63622d7dcdb3732b43f918caa972ff7f034a431f642227d22b2aL30-L32

@Atharva-Shinde Atharva-Shinde removed this from the v1.27 milestone May 14, 2023
@Atharva-Shinde Atharva-Shinde removed the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label May 14, 2023
@aojea
Copy link
Member

aojea commented Oct 8, 2024

my understanding is that this KEP seems to target a networking out of band plugin approach, that makes sense on a DRA classic environment IIUIC.

Since then, DRA evolved and we are now making networking requirements part of the DRA efforts https://github.com/LionelJouin/kubernetes/commits/KEP-4817/ and making easy to integrate networking plugins directly into DRA https://github.com/aojea/dra-network-driver-template .

My questions is, with the new approach , removing DRA classic and that networking plugins can hook directly into DRA, is this still needed?

@klueska
Copy link
Contributor Author

klueska commented Oct 8, 2024

From my perspective, this KEP is important independent of its (possible) usage for networking. The PodResourcesAPI is designed to surface the full set of resources that are allocated to a pod, and thus including DRA allocated resources is a natural extension of this. For example, NVIDIA's DCGM prometheus exporter relies on the information provided via this API to link GPU metrics back to the pods that are consuming those GPUs.

@ArangoGutierrez
Copy link
Contributor

/cc

@aojea
Copy link
Member

aojea commented Oct 8, 2024

I don't disagree on exposing the resources through the PodResourcesAPI and let the consumers work with that.

I was commenting on the second goal that is related to networking, we are making the integration with networking native, so encouraging that as a goal sounds contradictory with the current efforts, so may complain is not about the KEP is about this paragraph ...

To allow the DRA feature to work with CNIs that require complex network devices such as RDMA. DRA resource drivers will allocate the resources, and the meta-plugin will read the allocated CDI Devices using the PodResources API. The meta-plugin will then inject the device-id of these CDI Devices as CNI arguments and invoke other CNIs (just as it does for devices allocated by the device plugin today).

If you remove that goal and use a more generic text as "allow node components to use the PodResourcesAPI to use the DRA information to develop new features and integrations", then this is a SIG Node only KEP 😄

@johnbelamaric
Copy link
Member

@haircommander can you un-opt-in this one? I will remove from my PRR board. Thanks

@ArangoGutierrez
Copy link
Contributor

@ArangoGutierrez had said he was going to take this one over.

That said, there is no implementation work needed at the moment. All that is needed is to update the KEP to be in line with the latest code that is already merged.

I'm working on an update PR for the KEP, hope to have the PR link by tomorrow

@haircommander
Copy link
Contributor

/remove-milestone v1.32
/remove-label lead-opted-in

@haircommander haircommander moved this from Considered for release to Not for release in SIG Node 1.32 KEPs planning Oct 8, 2024
@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label Oct 8, 2024
@ArangoGutierrez
Copy link
Contributor

#4913

@pacoxu
Copy link
Member

pacoxu commented Oct 10, 2024

According to #4913, beta is planned to be "v1.33"and stable: "v1.36".

So this is not milestone v1.32?

@klueska
Copy link
Contributor Author

klueska commented Oct 10, 2024

Correct -- #4913 was just an update to the KEP to reflect the current state of the code base while remaining in alpha.

@dipesh-rawat dipesh-rawat moved this from At risk for enhancements freeze to Deferred in 1.32 Enhancements Tracking Oct 10, 2024
@tjons
Copy link
Contributor

tjons commented Oct 11, 2024

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.32 milestone Oct 11, 2024
@haircommander haircommander moved this from Not for release to Removed in SIG Node 1.32 KEPs planning Oct 11, 2024
@pohly
Copy link
Contributor

pohly commented Nov 19, 2024

/wg device-management

@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Nov 19, 2024
@haircommander haircommander moved this from Triage to Not for release in SIG Node 1.33 KEPs planning Jan 28, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 17, 2025
@mbrow137 mbrow137 moved this from 🆕 New to 🏗 In Progress in Dynamic Resource Allocation Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: Tracked
Status: No status
Status: Not for release
Status: 📋 Backlog
Status: Deferred
Status: Removed
Development

No branches or pull requests