You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
Hi,I reproduce this issue(#421). I deployed gpu-operator and then created a pod configured with securityContext.privileged=false. Then the pod is running. Why are all GPUs still exposed to the Pod?
@CuiDengdeng can you attach the toolkit config file: /usr/local/nvidia/toolkit/nvidia-container-runtime/config.toml. Also, can you paste the output of nvidia-smi from within the test pod which indicates access to all GPUs. Enabling debug mode in the toolkit config file and attaching nvidia-container-runtime log will help as well.
@CuiDengdeng can you attach the toolkit config file: /usr/local/nvidia/toolkit/nvidia-container-runtime/config.toml. Also, can you paste the output of nvidia-smi from within the test pod which indicates access to all GPUs. Enabling debug mode in the toolkit config file and attaching nvidia-container-runtime log will help as well.
@shivamerla thanks,I have solved this problem, but I want to know which component initializes the environment variable(NVIDIA_VISIBLE_DEVICES) to all when not applying for GPU
@CuiDengdengNVIDIA_VISIBLE_DEVICES environment variable is set to all in the official CUDA images (nvcr.io/nvidia/cuda). So if your container image builds off the CUDA image, then this envvar will be set.
I am closing this issue since you have indicated your problem has been solved.
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
1. Quick Debug Information
*Ubuntu20,04
2. Issue or feature description
Hi,I reproduce this issue(#421). I deployed gpu-operator and then created a pod configured with securityContext.privileged=false. Then the pod is running. Why are all GPUs still exposed to the Pod?
3. Steps to reproduce the issue
helm install nvidia/gpu-operator
--version=v22.9.0
--generate-name
--create-namespace
--namespace=gpu
--set driver.enabled=false
--set devicePlugin.env[0].name=DEVICE_LIST_STRATEGY
--set devicePlugin.env[0].value="volume-mounts"
--set toolkit.env[0].name=ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
--set-string toolkit.env[0].value='false'
--set toolkit.env[1].name=ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
--set-string toolkit.env[1].value='true'
--wait
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
image: docker.io/library/nginx:latest
resources:
limits:
cpu: 900m
nvidia.com/gpu: 1
requests:
cpu: 900m
nvidia.com/gpu: 1
4. Information to attach (optional if deemed irrelevant)
kubectl get pods -n OPERATOR_NAMESPACE
kubectl get ds -n OPERATOR_NAMESPACE
kubectl describe pod -n OPERATOR_NAMESPACE POD_NAME
kubectl logs -n OPERATOR_NAMESPACE POD_NAME --all-containers
nvidia-smi
from the driver container:kubectl exec DRIVER_POD_NAME -n OPERATOR_NAMESPACE -c nvidia-driver-ctr -- nvidia-smi
journalctl -u containerd > containerd.log
The text was updated successfully, but these errors were encountered: