-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Misc] Define EP kernel arch list in Dockerfile #25635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request defines the TORCH_CUDA_ARCH_LIST in the vllm-base stage of the Dockerfile, ensuring that EP kernel installation defaults to supporting Hopper and Blackwell architectures. It also updates a fallback value for the architecture list. The changes are logical and improve the build process's configurability and defaults. However, there's a redundancy that can be cleaned up.
| RUN export TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-9.0a 10.0a+PTX}" \ | ||
| && bash install_python_libraries.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since TORCH_CUDA_ARCH_LIST is now set via an ENV instruction earlier in this build stage (line 288), this export command with a fallback is redundant. The environment variable will already be available to the install_python_libraries.sh script. You can simplify this RUN command by removing the export part.
RUN bash install_python_libraries.sh
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Found by @rizar that our container's DeepEP doesn't work on Blackwell, TY!
Summary
TORCH_CUDA_ARCH_LISTin thevllm-basestage so EP kernel installation defaults to Hopper and Blackwell supportTesting
https://chatgpt.com/codex/tasks/task_e_68d4c5364338832997fa34fc45f06432