-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.15.0 added nodeSelector "nvidia.com/mps.capable": "true"? #1085
Comments
@markusl the
should only be defined for the MPS control deamon daemonset and is only applicable if MPS is used to apply space partitioning to existing GPUs. See: k8s-device-plugin/deployments/helm/nvidia-device-plugin/templates/daemonset-mps-control-daemon.yml Lines 207 to 208 in b6b81a6
|
Thanks for the quick answer! I am using AWS CDK for the deployment, which pulls the Helm chart automatically to the cluster cluster.addHelmChart('nvidia-device-plugin', {
chart: 'nvidia-device-plugin',
repository: 'https://nvidia.github.io/k8s-device-plugin',
namespace: 'kube-system',
version: '0.17.0', // <- causes nodeSelector with "nvidia.com/mps.capable": "true" to appear
}); This has changed between 0.14.0 and 0.15.0 as far as I can tell. Is there something specific that I need to configure to avoid the nodeselector from appearing? |
The device plugin DaemonSet does not have any affinity for the MPS label, only the MPS control DaemonSet has such an affinity. Here's the current affinity setting as of 0.17.0 of the device plugin's DaemonSet:
Xref to the default values in which |
Hi!
I have successfully used v0.14.0 with AWS EKS to correctly identify GPU's of AL2 instances. However, with newer versions (starting from v0.15.0), it seems that the daemonset unexpectedly requires "MPS capable" nodes only:
However, in previous versions, the affinity configuration looks like this:
This allows us to apply the label
'nvidia.com/gpu.present': 'true'
to force-run the daemon on instances created by AWS ASG and allows us to scale from zero.Could you please document the recommended way to run the daemonset on the required nodes when using AWS EKS and Cluster Autoscaler, which scales GPU instances from zero?
Best regards,
Markus
The text was updated successfully, but these errors were encountered: