Add intrinsic GPU config env-vars for AI runtimes #3253

achimnol · 2024-12-16T04:46:03Z

Synopsis

Let's add the following predefined environment variables in the session containers.
These are becoming more crucial as we are going to automate deployment of inference models as much as possible on top of fractionally scaled GPU resources which may result in an arbitrarily different number of physical GPUs present in containers.

Prerequisite: #2909

New predefined environment variables

GPU_TYPE: The device plugin name. For instance, "cuda".
GPU_COUNT: The number of accelerator devices visible inside the container. For instance, fractional GPUs may have a completely different number of GPU devices in the container depending on the fraction ratio configuration.
- N_GPUS: An analogy to N_CPUS
GPU_CONFIG: A comma-separated list of the device index to memory size mapping paired with colons. (e.g., "0:16g,1:16g")
- TF_GPU_MEMORY_ALLOC: Same to GPU_CONFIG but represented as MiB without the size suffix. (e.g., "0:16384,1:16384")
GPU_MODEL_NAME: A special env-var to expose the actual GPU model name, if configured to expose in the plugin. This may be used in runtimes to choose appropriate optimization strategies and algorithms.

Implementation detail

Backend.AI's system design allows attaching multiple GPUs from different vendors to a single session (though practically this is not common), but the above environment variables can only specify a single GPU type.

For now, I am going to use the first-seen compute device plugin (excluding intrinsic ones).

Application Relevance

vLLM

https://docs.vllm.ai/en/stable/models/engine_args.html

--device {auto,cuda,neuron,cpu,openvino,tpu,xpu,hpu} → determined from $GPU_TYPE
--tensor-parallel-size 1 → determined from $GPU_COUNT

TensorFlow

https://rasa.com/docs/rasa/tuning-your-model/#restricting-absolute-gpu-memory-available

TF_GPU_MEMORY_ALLOC → auto-set based on $GPU_CONFIG

The text was updated successfully, but these errors were encountered:

achimnol added comp:agent Related to Agent component urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE! labels Dec 16, 2024

achimnol added this to the 24.09 milestone Dec 16, 2024

achimnol self-assigned this Dec 16, 2024

achimnol mentioned this issue Dec 22, 2024

Restarted sessions are broken #3283

Open

achimnol closed this as completed Dec 23, 2024

achimnol mentioned this issue Dec 25, 2024

Remove some type errors found by pyright #3298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add intrinsic GPU config env-vars for AI runtimes #3253

Add intrinsic GPU config env-vars for AI runtimes #3253

achimnol commented Dec 16, 2024 •

edited

Loading

Add intrinsic GPU config env-vars for AI runtimes #3253

Add intrinsic GPU config env-vars for AI runtimes #3253

Comments

achimnol commented Dec 16, 2024 • edited Loading

Synopsis

New predefined environment variables

Implementation detail

Application Relevance

vLLM

TensorFlow

achimnol commented Dec 16, 2024 •

edited

Loading