Questions about GPU time-sharing on Kubernetes #404

jxl4650152 · 2023-05-17T05:34:29Z

1. Issue or feature description

Is it possible to enable time-sharing on select GPUs within a single node, rather than all of them?
If not all GPUs on a single node support time-slicing, like Kepler K80 GPU, what behavior can be expected from the plugin?

elezar · 2023-06-02T12:56:58Z

The device plugin does not currently have a mechanism to expose different devices as different resource types. It is also not possible to apply sharing settings on a per-GPU basis.

Note that the plugin will still allow a sharing setting to be applied to GPUs that may not support this feature and effectively reports the same device multiple times to the Kubelet. The behaviour of containers that are both started on a device where time-slicing is not supported will depend on the application and the device, but should mirror what happens when two applications which access the same device are started on the host.

jxl4650152 · 2023-06-02T16:24:07Z

Thank you for your response. We now have a clear understanding of the limitations of this plugin. We are using this repo for managing GPUs on Kubernetes, and we have also noticed another repo related to DRA（Dynamic Resource Allocation）, which offers greater flexibility and richer functionality. Regarding this, we have another question: Will there be strong support for using GPUs on Kubernetes through the DRA approach in the future? I'm not sure if it's appropriate to ask this question here. If not, could you please let me know where I should ask it?

klueska · 2023-06-03T07:18:52Z

Yes DRA will be well supported. We see it as the future of GPU support in Kubernetes.

Please see:
https://m.youtube.com/watch?v=_fi9asserLE&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D

github-actions · 2024-02-28T04:25:07Z

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

frittentheke · 2024-07-01T15:25:14Z

As @elezar said this is not (yet possible), see

k8s-device-plugin/api/config/v1/replicas.go

Line 61 in 35ad180

if setsDevices {

I myself would also like to only enable time-slicing on a subset of GPUs. Is there any chance we could make this issue here a feature request @elezar? Or is this capability never coming to the device plugin in favor of DRA? That seems quite a while out though - NVIDIA/k8s-dra-driver#131

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about GPU time-sharing on Kubernetes #404

Questions about GPU time-sharing on Kubernetes #404

jxl4650152 commented May 17, 2023 •

edited

Loading

elezar commented Jun 2, 2023

jxl4650152 commented Jun 2, 2023 •

edited

Loading

klueska commented Jun 3, 2023

github-actions bot commented Feb 28, 2024

frittentheke commented Jul 1, 2024 •

edited

Loading

Questions about GPU time-sharing on Kubernetes #404

Questions about GPU time-sharing on Kubernetes #404

Comments

jxl4650152 commented May 17, 2023 • edited Loading

1. Issue or feature description

elezar commented Jun 2, 2023

jxl4650152 commented Jun 2, 2023 • edited Loading

klueska commented Jun 3, 2023

github-actions bot commented Feb 28, 2024

frittentheke commented Jul 1, 2024 • edited Loading

jxl4650152 commented May 17, 2023 •

edited

Loading

jxl4650152 commented Jun 2, 2023 •

edited

Loading

frittentheke commented Jul 1, 2024 •

edited

Loading