Skip to content

Conversation

@0xjunhao
Copy link
Contributor

@0xjunhao 0xjunhao commented May 13, 2025

Update the Dockerfile to include Blackwell archs and use ubuntu 24.04 as base image

Have tested that this works for RTX5090.
Test build FYI: ubicloud/vllm-openai:latest

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

simon-mo
simon-mo previously approved these changes May 13, 2025
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's see if it builds, and i'll poll contributor on the ubuntu 24 change

@0xjunhao 0xjunhao force-pushed the archs branch 2 times, most recently from 1bec9dc to ecbded0 Compare May 13, 2025 18:55
@mergify mergify bot added the documentation Improvements or additions to documentation label May 13, 2025
@simon-mo simon-mo dismissed their stale review May 13, 2025 19:07

Seems like Ubuntu24 upgrade might break people's workflow

@simon-mo
Copy link
Collaborator

Is it possible to do this without updating base image?

Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to stick to building with older OSes, unfortunately. The reason for this is that the glibc version is forwards-compatible but not backwards-compatible. If we upgrade to Ubuntu 24.04, then vLLM won't work on 22.04, for instance.

Can the CUDA arch list be extended without upgrading Ubuntu?

@0xjunhao
Copy link
Contributor Author

I think so. Cuda 12.8 still supports 20.04 fortunately. There are two versions there, the base was on 20.04, and vllm-base was on 22.04. Should I revert both?

@0xjunhao
Copy link
Contributor Author

FYI ubuntu 20.04 will reach its EOL on May 31, 2025, which is only two weeks away.

@tlrmchlsmth
Copy link
Member

FYI ubuntu 20.04 will reach its EOL on May 31, 2025, which is only two weeks away.

I think we should consider the OS upgrade separately. Upgrading to Ubuntu 22.04 will put us on glibc 2.35.

This will break vLLM on, for instance:

@0xjunhao 0xjunhao requested a review from tlrmchlsmth May 13, 2025 20:20
@0xjunhao 0xjunhao changed the title [CI/Build] Update the Dockerfile to include Blackwell archs and use ubuntu 24.04 as base image [CI/Build] Update the Dockerfile to include Blackwell archs May 13, 2025
@0xjunhao 0xjunhao force-pushed the archs branch 2 times, most recently from 075ba43 to 812fda5 Compare May 13, 2025 23:09
Signed-off-by: Junhao Li <junhao@ubicloud.com>
@0xjunhao
Copy link
Contributor Author

It seems that with the new archs, the image build check is timing out during the FlashInfer stage. Is there a way to increase the timeout limit?

Screenshot 2025-05-14 at 9 24 31 AM

@alew3
Copy link

alew3 commented May 16, 2025

@0xjunhao I tried running the model OpenGVLab/InternVL3-1B-Instruct on an RTX 5090 with the dockerfile ubicloud/vllm-openai:latest and got this error

CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:175): no kernel image is available for execution on the device

@cchadowitz
Copy link

@0xjunhao I tried running the model OpenGVLab/InternVL3-1B-Instruct on an RTX 5090 with the dockerfile ubicloud/vllm-openai:latest and got this error

CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:175): no kernel image is available for execution on the device

I believe I could work around this by building the latest xformers from source.

@0xjunhao
Copy link
Contributor Author

Discussed with Simon, closing this PR. Please refer to PR #18095.

@0xjunhao 0xjunhao closed this May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants