Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intrinsic GPU config env-vars for AI runtimes #3253

Closed
achimnol opened this issue Dec 16, 2024 · 0 comments
Closed

Add intrinsic GPU config env-vars for AI runtimes #3253

achimnol opened this issue Dec 16, 2024 · 0 comments
Assignees
Labels
comp:agent Related to Agent component urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE!
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Dec 16, 2024

Synopsis

Let's add the following predefined environment variables in the session containers.
These are becoming more crucial as we are going to automate deployment of inference models as much as possible on top of fractionally scaled GPU resources which may result in an arbitrarily different number of physical GPUs present in containers.

Prerequisite: #2909

New predefined environment variables

  • GPU_TYPE: The device plugin name. For instance, "cuda".
  • GPU_COUNT: The number of accelerator devices visible inside the container. For instance, fractional GPUs may have a completely different number of GPU devices in the container depending on the fraction ratio configuration.
    • N_GPUS: An analogy to N_CPUS
  • GPU_CONFIG: A comma-separated list of the device index to memory size mapping paired with colons. (e.g., "0:16g,1:16g")
    • TF_GPU_MEMORY_ALLOC: Same to GPU_CONFIG but represented as MiB without the size suffix. (e.g., "0:16384,1:16384")
  • GPU_MODEL_NAME: A special env-var to expose the actual GPU model name, if configured to expose in the plugin. This may be used in runtimes to choose appropriate optimization strategies and algorithms.

Implementation detail

Backend.AI's system design allows attaching multiple GPUs from different vendors to a single session (though practically this is not common), but the above environment variables can only specify a single GPU type.

For now, I am going to use the first-seen compute device plugin (excluding intrinsic ones).

Application Relevance

vLLM

https://docs.vllm.ai/en/stable/models/engine_args.html

  • --device {auto,cuda,neuron,cpu,openvino,tpu,xpu,hpu} → determined from $GPU_TYPE
  • --tensor-parallel-size 1 → determined from $GPU_COUNT

TensorFlow

https://rasa.com/docs/rasa/tuning-your-model/#restricting-absolute-gpu-memory-available

  • TF_GPU_MEMORY_ALLOC → auto-set based on $GPU_CONFIG
@achimnol achimnol added comp:agent Related to Agent component urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE! labels Dec 16, 2024
@achimnol achimnol added this to the 24.09 milestone Dec 16, 2024
@achimnol achimnol self-assigned this Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE!
Projects
None yet
Development

No branches or pull requests

1 participant