You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's add the following predefined environment variables in the session containers.
These are becoming more crucial as we are going to automate deployment of inference models as much as possible on top of fractionally scaled GPU resources which may result in an arbitrarily different number of physical GPUs present in containers.
GPU_TYPE: The device plugin name. For instance, "cuda".
GPU_COUNT: The number of accelerator devices visible inside the container. For instance, fractional GPUs may have a completely different number of GPU devices in the container depending on the fraction ratio configuration.
N_GPUS: An analogy to N_CPUS
GPU_CONFIG: A comma-separated list of the device index to memory size mapping paired with colons. (e.g., "0:16g,1:16g")
TF_GPU_MEMORY_ALLOC: Same to GPU_CONFIG but represented as MiB without the size suffix. (e.g., "0:16384,1:16384")
GPU_MODEL_NAME: A special env-var to expose the actual GPU model name, if configured to expose in the plugin. This may be used in runtimes to choose appropriate optimization strategies and algorithms.
Implementation detail
Backend.AI's system design allows attaching multiple GPUs from different vendors to a single session (though practically this is not common), but the above environment variables can only specify a single GPU type.
For now, I am going to use the first-seen compute device plugin (excluding intrinsic ones).
Synopsis
Let's add the following predefined environment variables in the session containers.
These are becoming more crucial as we are going to automate deployment of inference models as much as possible on top of fractionally scaled GPU resources which may result in an arbitrarily different number of physical GPUs present in containers.
Prerequisite: #2909
New predefined environment variables
GPU_TYPE
: The device plugin name. For instance,"cuda"
.GPU_COUNT
: The number of accelerator devices visible inside the container. For instance, fractional GPUs may have a completely different number of GPU devices in the container depending on the fraction ratio configuration.N_GPUS
: An analogy toN_CPUS
GPU_CONFIG
: A comma-separated list of the device index to memory size mapping paired with colons. (e.g.,"0:16g,1:16g"
)TF_GPU_MEMORY_ALLOC
: Same toGPU_CONFIG
but represented as MiB without the size suffix. (e.g.,"0:16384,1:16384"
)GPU_MODEL_NAME
: A special env-var to expose the actual GPU model name, if configured to expose in the plugin. This may be used in runtimes to choose appropriate optimization strategies and algorithms.Implementation detail
Backend.AI's system design allows attaching multiple GPUs from different vendors to a single session (though practically this is not common), but the above environment variables can only specify a single GPU type.
For now, I am going to use the first-seen compute device plugin (excluding intrinsic ones).
Application Relevance
vLLM
https://docs.vllm.ai/en/stable/models/engine_args.html
--device {auto,cuda,neuron,cpu,openvino,tpu,xpu,hpu}
→ determined from$GPU_TYPE
--tensor-parallel-size 1
→ determined from$GPU_COUNT
TensorFlow
https://rasa.com/docs/rasa/tuning-your-model/#restricting-absolute-gpu-memory-available
TF_GPU_MEMORY_ALLOC
→ auto-set based on$GPU_CONFIG
The text was updated successfully, but these errors were encountered: