Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Sep 23, 2025

Purpose

Due to compiling full cudagraphs by default now in #25444, gpt-oss crashes on Hopper.

vllm serve openai/gpt-oss-20b
...
(EngineCore_DP0 pid=2061997)   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_model_runner.py", line 3503, in create_attn_groups
(EngineCore_DP0 pid=2061997)     attn_metadata_builders.append(attn_backend.get_builder_cls()(
(EngineCore_DP0 pid=2061997)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2061997)   File "/home/mgoin/code/vllm/vllm/v1/attention/backends/flash_attn.py", line 204, in __init__
(EngineCore_DP0 pid=2061997)     raise ValueError(
(EngineCore_DP0 pid=2061997) ValueError: Capture size larger than 992 is not supported for full cuda graph.

Test Plan

Test Result

vllm serve openai/gpt-oss-20b crashes on main but succeeds with this PR


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mgoin mgoin added the bug Something isn't working label Sep 23, 2025
@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Sep 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a crash on Hopper GPUs for the gpt-oss model by lowering the maximum CUDA graph capture size to 992, which is the compatibility limit for FlashAttention v3. The change is well-targeted and effective. I have one suggestion to improve the code's maintainability by replacing the hardcoded value 992 with a named constant.

@vllm-bot vllm-bot merged commit a8ffc4f into vllm-project:main Sep 23, 2025
9 of 14 checks passed
@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Sep 23, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…h FA3 (#25508)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
…h FA3 (vllm-project#25508)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…h FA3 (vllm-project#25508)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…h FA3 (vllm-project#25508)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants