You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
Triton Inference Server does not properly recognize or utilize both GPUs (RTX 3090 and RTX 3080 Ti) in a Kubernetes Helm deployment. Although the host system detects both GPUs, Triton only uses GPU 0 and reports the following error:
UNAVAILABLE: Invalid argument: instance group gptj_model2_1 of model gptj_model2 specifies invalid or unsupported GPU id 1. GPUs with at least the minimum required CUDA compute compatibility of 6.000000 are: 0.
This persists despite the GPUs meeting the CUDA compute compatibility requirements (8.6 for both).
Triton Information
Using Triton container from NGC: nvcr.io/nvidia/tritonserver:23.10-py3
To Reproduce
Environment Setup:
Host system with NVIDIA RTX 3090 (GPU 0) and RTX 3080 Ti (GPU 1).
Installed NVIDIA driver version: 560.35.03.
CUDA version: 12.6.
Kubernetes Setup:
Installed NVIDIA device plugin and validated that both GPUs are visible to Kubernetes.
Deployed Triton using Helm with the following values.yaml:
image:
imageName: nvcr.io/nvidia/tritonserver:23.08-py3
numGpus: 2
serverArgs:
- '--model-repository=/models'
- '--log-verbose=1'
Execute kubectl exec -it -- nvidia-smi:
Only GPU 0 is shown as in use.
GPU 1 is not utilized.
Expected behavior
Both GPUs (RTX 3090 and RTX 3080 Ti) should be visible to Triton.
Triton should utilize GPU 1 for model instances when specified in the instance_group configuration.
No errors related to unsupported GPU IDs should occur as both GPUs meet the CUDA compute compatibility requirement (8.6).
Description
Triton Inference Server does not properly recognize or utilize both GPUs (RTX 3090 and RTX 3080 Ti) in a Kubernetes Helm deployment. Although the host system detects both GPUs, Triton only uses GPU 0 and reports the following error:
UNAVAILABLE: Invalid argument: instance group gptj_model2_1 of model gptj_model2 specifies invalid or unsupported GPU id 1. GPUs with at least the minimum required CUDA compute compatibility of 6.000000 are: 0.
This persists despite the GPUs meeting the CUDA compute compatibility requirements (8.6 for both).
Triton Information
Using Triton container from NGC: nvcr.io/nvidia/tritonserver:23.10-py3
To Reproduce
Model Configuration (config.pbtxt):
Tried those configurations:
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [0, 1]
}
]
instance_group [
{
count: 2
kind: KIND_GPU
}
]
Execute kubectl exec -it -- nvidia-smi:
Only GPU 0 is shown as in use.
GPU 1 is not utilized.
Expected behavior
Both GPUs (RTX 3090 and RTX 3080 Ti) should be visible to Triton.
Triton should utilize GPU 1 for model instances when specified in the instance_group configuration.
No errors related to unsupported GPU IDs should occur as both GPUs meet the CUDA compute compatibility requirement (8.6).
Additional Information
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|-----------------------------------------+------------------------+----------------------+
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 0% 49C P8 38W / 370W | 24046MiB / 24576MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3080 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 30W / 370W | 15MiB / 12288MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+
Kubernetes Pod NVIDIA-SMI Output:
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|-----------------------------------------+------------------------+----------------------+
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 0% 49C P8 38W / 370W | 24046MiB / 24576MiB | 0% Default |
+-----------------------------------------+------------------------+------------
even though, in logs we have:
2024-11-20 11:20:08.618231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9805 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:04:00.0, compute capability: 8.6
The text was updated successfully, but these errors were encountered: