Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node #7818

Open
jmarchel7bulls opened this issue Nov 20, 2024 · 0 comments

Comments

@jmarchel7bulls
Copy link

Description
Triton Inference Server does not properly recognize or utilize both GPUs (RTX 3090 and RTX 3080 Ti) in a Kubernetes Helm deployment. Although the host system detects both GPUs, Triton only uses GPU 0 and reports the following error:

UNAVAILABLE: Invalid argument: instance group gptj_model2_1 of model gptj_model2 specifies invalid or unsupported GPU id 1. GPUs with at least the minimum required CUDA compute compatibility of 6.000000 are: 0.

This persists despite the GPUs meeting the CUDA compute compatibility requirements (8.6 for both).

Triton Information
Using Triton container from NGC: nvcr.io/nvidia/tritonserver:23.10-py3

To Reproduce

Environment Setup:
    Host system with NVIDIA RTX 3090 (GPU 0) and RTX 3080 Ti (GPU 1).
    Installed NVIDIA driver version: 560.35.03.
    CUDA version: 12.6.

Kubernetes Setup:
    Installed NVIDIA device plugin and validated that both GPUs are visible to Kubernetes.
    Deployed Triton using Helm with the following values.yaml:

image:
  imageName: nvcr.io/nvidia/tritonserver:23.08-py3
  numGpus: 2
serverArgs:
  - '--model-repository=/models'
  - '--log-verbose=1'

Model Configuration (config.pbtxt):

Tried those configurations:
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [0, 1]
}
]
instance_group [
{
count: 2
kind: KIND_GPU
}
]

Execute kubectl exec -it -- nvidia-smi:
Only GPU 0 is shown as in use.
GPU 1 is not utilized.

Expected behavior
Both GPUs (RTX 3090 and RTX 3080 Ti) should be visible to Triton.
Triton should utilize GPU 1 for model instances when specified in the instance_group configuration.
No errors related to unsupported GPU IDs should occur as both GPUs meet the CUDA compute compatibility requirement (8.6).

Additional Information

Host NVIDIA-SMI Output:

+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|-----------------------------------------+------------------------+----------------------+
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 0% 49C P8 38W / 370W | 24046MiB / 24576MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3080 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 30W / 370W | 15MiB / 12288MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+

Kubernetes Pod NVIDIA-SMI Output:

+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|-----------------------------------------+------------------------+----------------------+
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 0% 49C P8 38W / 370W | 24046MiB / 24576MiB | 0% Default |
+-----------------------------------------+------------------------+------------
even though, in logs we have:

2024-11-20 11:20:08.618231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9805 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:04:00.0, compute capability: 8.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant