Add new Dynamic Resource Allocation examples #49079

nojnhuh · 2024-12-13T17:58:06Z

Description

This PR adds new Dynamic Resource Allocation examples to complement the existing Concept page. It adds one new Task document showing various use cases and one new Tutorial document comparing workloads configured via DRA and via device plugins.

This PR is currently a work-in-progress as we iterate on the higher-level details of the new docs.

Issue

Closes: #

k8s-ci-robot · 2024-12-13T17:58:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign reylejano for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

content/en/docs/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2024-12-13T18:06:45Z

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`ff618c0`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/675cb4337f213d000897d41c
😎 Deploy Preview	https://deploy-preview-49079--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

nojnhuh · 2024-12-13T18:09:15Z

content/en/docs/tutorials/dynamic-resource-allocation/comapring-dra-device-plugin.md

+
+## {{% heading "prerequisites" %}}
+
+* An NVIDIA GPU-enabled cluster with GPU Operator installed


Something vendor-specific like this might not fit well in these docs, but NVIDIA GPUs are probably the most common DRA use case right now and NVIDIA's device plugin and DRA driver make for the most apples-to-apples comparison of these APIs at the moment I think.

Can we use tabs to let people take part even with different vendors?

I think that would work well if we can map the same use cases 1:1 across vendors, like if AMD eventually exposes a similar GPU and "security/network isolation" device like NVIDIA's IMEX channels. Different use cases might be better expressed as separate sections though, but we can definitely play around with how those look when we can produce more examples here.

sftim

Thanks! A tutorial will really help people learn.

sftim · 2024-12-13T18:49:09Z

content/en/docs/tasks/configure-pod-container/assign-dra-resource.md

Early feedback: this sounds like it belongs in the Tutorials section, not Tasks.

Why:

this suggests deploying a whole new cluster

if we had cluster admins saying "look, I'm in a hurry, just show me how to deploy the Example Hardware driver, where are the docs?" then a task would be the right fit; this isn't like that though

There may be specific other tasks to cover, though.

For example:

how do I troubleshoot resource allocation?

how do I check what devices are allocatable?

how do I find out about the utilization of my dynamically-allocated devices?

All of those are questions we could cover with a task page (typically a task page per question, though sometimes we combine them).

Taking another look at the example driver, I'm thinking it's likely we can describe workable examples here given "a cluster with DRA enabled" without requiring the exact kind cluster the driver's docs describe. I forgot that the example driver doesn't publish the Helm chart anywhere though, so it needs to be built locally. If we can publish that chart somewhere publicly like GitHub and we find that any DRA-enabled cluster works, do you think that would simplify the setup enough to justify keeping this as a Task?

+1 to those other topics, I think those would be great to include. I'll add placeholders for those.

Nope. I will never meet a cluster admin who wants guidance on setting up the example driver in their existing production cluster.

If you'd never do it outside of learning context, it's unlikely to be a task.

Sounds good, I've split out the examples requiring the example driver into a new tutorial.

sftim · 2024-12-13T18:49:48Z

content/en/docs/tasks/configure-pod-container/assign-dra-resource.md

@@ -0,0 +1,70 @@
+---
+title: Assign Resources to Containers and Pods with Dynamic Resource Allocation


-title: Assign Resources to Containers and Pods with Dynamic Resource Allocation +title: Learn About Dynamic Resource Allocation

?

sftim · 2024-12-13T18:50:31Z

content/en/docs/tasks/configure-pod-container/assign-dra-resource.md

+
+## Deploy an example DRA driver
+
+- Reproduce the steps from https://github.com/kubernetes-sigs/dra-example-driver?tab=readme-ov-file#demo to create a cluster and install the driver


early feedback: we avoid sending people to GitHub repos to discover parts of the documentation

When I fill out this section, I was intending to essentially copy-paste some of the steps from the linked doc into this one, so here the steps would be things like "run this script" instead of "follow the steps in this linked document." Is that in line with your suggestion here?

sftim · 2024-12-13T19:11:49Z

Something to watch out for: a minority of readers may arrive at these pages looking for in-place changes to the resource allocations for existing Pods.

Now, we (Kubernetes) don't call that DRA, but the reader may not know this. So a clear page introduction for each guide will help readers spot if they are looking in the wrong place.

- split examples requiring driver into new tutorial - add troubleshooting section to task

Add new Dynamic Resource Allocation examples

319a8a2

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 13, 2024

k8s-ci-robot requested review from pohly and shannonxtreme December 13, 2024 17:58

k8s-ci-robot added language/en Issues or PRs related to English language size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 13, 2024

nojnhuh commented Dec 13, 2024

View reviewed changes

sftim reviewed Dec 13, 2024

View reviewed changes

feedback

ff618c0

- split examples requiring driver into new tutorial - add troubleshooting section to task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Dynamic Resource Allocation examples #49079

Add new Dynamic Resource Allocation examples #49079

nojnhuh commented Dec 13, 2024

k8s-ci-robot commented Dec 13, 2024

netlify bot commented Dec 13, 2024 •

edited

Loading

nojnhuh Dec 13, 2024

sftim Dec 13, 2024

nojnhuh Dec 13, 2024

sftim left a comment

sftim Dec 13, 2024

sftim Dec 13, 2024

nojnhuh Dec 13, 2024

sftim Dec 13, 2024

nojnhuh Dec 13, 2024

sftim Dec 13, 2024

sftim Dec 13, 2024

nojnhuh Dec 13, 2024

sftim commented Dec 13, 2024


		## {{% heading "prerequisites" %}}

		* An NVIDIA GPU-enabled cluster with GPU Operator installed

		@@ -0,0 +1,70 @@
		---
		title: Assign Resources to Containers and Pods with Dynamic Resource Allocation


		## Deploy an example DRA driver

		- Reproduce the steps from https://github.com/kubernetes-sigs/dra-example-driver?tab=readme-ov-file#demo to create a cluster and install the driver

Add new Dynamic Resource Allocation examples #49079

Are you sure you want to change the base?

Add new Dynamic Resource Allocation examples #49079

Conversation

nojnhuh commented Dec 13, 2024

Description

Issue

k8s-ci-robot commented Dec 13, 2024

netlify bot commented Dec 13, 2024 • edited Loading

✅ Pull request preview available for checking

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim commented Dec 13, 2024

netlify bot commented Dec 13, 2024 •

edited

Loading