[CI] Add E2E Blackwell Quantized MoE Test #25723

mgoin · 2025-09-25T23:26:18Z

Purpose

Adds a new "Blackwell Quantized MoE Test" job that is solely meant to run critical MoE models (Llama 4, Qwen, DeepSeek, GPT-OSS) that we have many ways of running on Blackwell, through various quantization backends.
It loads the model with dummy weights and for just a few layers to make sure we can pass process_weights_after_loading, torch.compile, cudagraph capture, and serve a request.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

.buildkite/test-pipeline.yaml

gemini-code-assist

Code Review

This pull request introduces end-to-end tests for quantized Mixture-of-Experts (MoE) models on Blackwell GPUs, which is a valuable addition for ensuring hardware-specific features work correctly.

My review identified two main issues:

A critical issue in the CI configuration (.buildkite/test-pipeline.yaml) where the new test job has an incomplete and partially incorrect list of file dependencies. This could prevent the test from running when relevant code, including the test file itself, is modified.
A high-severity issue in the new test file (tests/quantization/test_blackwell_moe.py) where the GPU capability check should be more flexible to allow running on future GPU architectures.

I have provided specific suggestions to address these points. Overall, the changes are a good step towards validating vLLM on new hardware, and with these fixes, the CI setup and tests will be more robust.

.buildkite/test-pipeline.yaml

tests/quantization/test_blackwell_moe.py

Signed-off-by: mgoin <mgoin64@gmail.com>

Increased max wait time for server to 600 seconds due to FlashInfer compile.

yewentao256

Thanks for the work!

tests/quantization/test_blackwell_moe.py

yewentao256

LGTM, thanks for the work!

Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

mgoin added 2 commits September 25, 2025 17:21

Add e2e model run for SM100 Quantized MoEs

b82ace9

Signed-off-by: mgoin <mgoin64@gmail.com>

Add round of tests

f8fa0ce

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested review from robertgshaw2-redhat and yewentao256 as code owners September 25, 2025 23:26

mergify bot added the ci/build label Sep 25, 2025

chatgpt-codex-connector bot reviewed Sep 25, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Show resolved Hide resolved

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Show resolved Hide resolved

tests/quantization/test_blackwell_moe.py Show resolved Hide resolved

mgoin added 5 commits September 25, 2025 16:40

Cleanup

40fbf30

Signed-off-by: mgoin <mgoin64@gmail.com>

Fix trigger

e8ae290

Signed-off-by: mgoin <mgoin64@gmail.com>

Updates

75ee6ca

Signed-off-by: mgoin <mgoin64@gmail.com>

Merge branch 'main' into blackwell-moe-test

552a451

Update

bdfbf32

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 26, 2025

mgoin added 2 commits September 26, 2025 08:20

Increase max wait time for RemoteOpenAIServer

317f8a5

Increased max wait time for server to 600 seconds due to FlashInfer compile.

precommit

526e36a

yewentao256 reviewed Sep 26, 2025

View reviewed changes

tests/quantization/test_blackwell_moe.py Show resolved Hide resolved

Update server arguments and prompt in tests

ca15b33

yewentao256 approved these changes Sep 26, 2025

View reviewed changes

vllm-bot merged commit f708bd4 into vllm-project:main Sep 26, 2025
76 of 79 checks passed

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[CI] Add E2E Blackwell Quantized MoE Test (vllm-project#25723)

b05e292

Signed-off-by: mgoin <mgoin64@gmail.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[CI] Add E2E Blackwell Quantized MoE Test (#25723)

b6f16d3

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[CI] Add E2E Blackwell Quantized MoE Test (vllm-project#25723)

af89179

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[CI] Add E2E Blackwell Quantized MoE Test (vllm-project#25723)

9aafb28

Signed-off-by: mgoin <mgoin64@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[CI] Add E2E Blackwell Quantized MoE Test (vllm-project#25723)

ed28f53

Signed-off-by: mgoin <mgoin64@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[CI] Add E2E Blackwell Quantized MoE Test (vllm-project#25723)

3ec6c36

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI] Add E2E Blackwell Quantized MoE Test #25723

[CI] Add E2E Blackwell Quantized MoE Test #25723

Uh oh!

mgoin commented Sep 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[CI] Add E2E Blackwell Quantized MoE Test #25723

[CI] Add E2E Blackwell Quantized MoE Test #25723

Uh oh!

Conversation

mgoin commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mgoin commented Sep 25, 2025 •

edited by github-actions bot

Loading