Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about GPU time-sharing on Kubernetes #404

Open
jxl4650152 opened this issue May 17, 2023 · 5 comments
Open

Questions about GPU time-sharing on Kubernetes #404

jxl4650152 opened this issue May 17, 2023 · 5 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@jxl4650152
Copy link

jxl4650152 commented May 17, 2023

1. Issue or feature description

Is it possible to enable time-sharing on select GPUs within a single node, rather than all of them?
If not all GPUs on a single node support time-slicing, like Kepler K80 GPU, what behavior can be expected from the plugin?

@elezar
Copy link
Member

elezar commented Jun 2, 2023

The device plugin does not currently have a mechanism to expose different devices as different resource types. It is also not possible to apply sharing settings on a per-GPU basis.

Note that the plugin will still allow a sharing setting to be applied to GPUs that may not support this feature and effectively reports the same device multiple times to the Kubelet. The behaviour of containers that are both started on a device where time-slicing is not supported will depend on the application and the device, but should mirror what happens when two applications which access the same device are started on the host.

@jxl4650152
Copy link
Author

jxl4650152 commented Jun 2, 2023

Thank you for your response. We now have a clear understanding of the limitations of this plugin. We are using this repo for managing GPUs on Kubernetes, and we have also noticed another repo related to DRA(Dynamic Resource Allocation), which offers greater flexibility and richer functionality. Regarding this, we have another question: Will there be strong support for using GPUs on Kubernetes through the DRA approach in the future? I'm not sure if it's appropriate to ask this question here. If not, could you please let me know where I should ask it?

@klueska
Copy link
Contributor

klueska commented Jun 3, 2023

Yes DRA will be well supported. We see it as the future of GPU support in Kubernetes.

Please see:
https://m.youtube.com/watch?v=_fi9asserLE&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D

Copy link

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
@frittentheke
Copy link

frittentheke commented Jul 1, 2024

As @elezar said this is not (yet possible), see

I myself would also like to only enable time-slicing on a subset of GPUs. Is there any chance we could make this issue here a feature request @elezar? Or is this capability never coming to the device plugin in favor of DRA? That seems quite a while out though - NVIDIA/k8s-dra-driver#131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants