Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node #7818

jmarchel7bulls · 2024-11-20T11:38:30Z

Description
Triton Inference Server does not properly recognize or utilize both GPUs (RTX 3090 and RTX 3080 Ti) in a Kubernetes Helm deployment. Although the host system detects both GPUs, Triton only uses GPU 0 and reports the following error:

UNAVAILABLE: Invalid argument: instance group gptj_model2_1 of model gptj_model2 specifies invalid or unsupported GPU id 1. GPUs with at least the minimum required CUDA compute compatibility of 6.000000 are: 0.

This persists despite the GPUs meeting the CUDA compute compatibility requirements (8.6 for both).

Triton Information
Using Triton container from NGC: nvcr.io/nvidia/tritonserver:23.10-py3

To Reproduce

Environment Setup:
    Host system with NVIDIA RTX 3090 (GPU 0) and RTX 3080 Ti (GPU 1).
    Installed NVIDIA driver version: 560.35.03.
    CUDA version: 12.6.

Kubernetes Setup:
    Installed NVIDIA device plugin and validated that both GPUs are visible to Kubernetes.
    Deployed Triton using Helm with the following values.yaml:

image:
  imageName: nvcr.io/nvidia/tritonserver:23.08-py3
  numGpus: 2
serverArgs:
  - '--model-repository=/models'
  - '--log-verbose=1'

Model Configuration (config.pbtxt):

Tried those configurations:
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [0, 1]
}
]
instance_group [
{
count: 2
kind: KIND_GPU
}
]

Execute kubectl exec -it -- nvidia-smi:
Only GPU 0 is shown as in use.
GPU 1 is not utilized.

Expected behavior
Both GPUs (RTX 3090 and RTX 3080 Ti) should be visible to Triton.
Triton should utilize GPU 1 for model instances when specified in the instance_group configuration.
No errors related to unsupported GPU IDs should occur as both GPUs meet the CUDA compute compatibility requirement (8.6).

Additional Information

Host NVIDIA-SMI Output:

+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|-----------------------------------------+------------------------+----------------------+
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 0% 49C P8 38W / 370W | 24046MiB / 24576MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3080 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 30W / 370W | 15MiB / 12288MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+

Kubernetes Pod NVIDIA-SMI Output:

2024-11-20 11:20:08.618231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9805 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:04:00.0, compute capability: 8.6

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node #7818

Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node #7818

jmarchel7bulls commented Nov 20, 2024

Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node #7818

Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node #7818

Comments

jmarchel7bulls commented Nov 20, 2024