Skip to content

Conversation

@jiahanc
Copy link
Contributor

@jiahanc jiahanc commented Sep 22, 2025

Purpose

Inherit from #23596

  1. Current gpt-oss eagle3 does not work on https://huggingface.co/nvidia/gpt-oss-120b-Eagle3, mainly config issues like vocab_size and lm_head
  2. The current implementation in spec-dec suppose only 1 attention groups :https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L1178-L1180.
    And https://github.com/vllm-project/vllm/blob/main/vllm/v1/spec_decode/eagle.py#L194-L199 only use the 1st attention builder.
    This causes accuracy issues because on models that have multiple attentions like gpt-oss, it will select the wrong attention group so draft model will use wrong attention as well as overwrite the KV cache of the target models

Test Plan

lm_eval --model local-completions --tasks gsm8k --model_args model=openai/gpt-oss-120b,base_url=http://0.0.0.0:30000/v1/completions,max_retries=3,tokenized_requests=False,timeout=1200,max_gen_toks=2048,max_length=8192 --batch_size 2048 --trust_remote_code --limit 0.8

Test Result

Before fix
Low AR:

(APIServer pid=21745) INFO 09-22 10:49:59 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 1.04, Accepted throughput: 77.80 tokens/s, Drafted throughput: 6284.37 tokens/s, Accepted: 778 tokens, Drafted: 62847 tokens, Per-position acceptance rate: 0.035, 0.002, 0.000, Avg Draft acceptance rate: 1.2%

Wrong Accuracy

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.0038|±  |0.0019|
|     |       |strict-match    |     5|exact_match|↑  |0.0000|±  |0.0000|

After fix
Correct AR

(APIServer pid=22270) INFO 09-22 10:55:25 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 2.81, Accepted throughput: 4559.99 tokens/s, Drafted throughput: 7571.32 tokens/s, Accepted: 45604 tokens, Drafted: 75720 tokens, Per-position acceptance rate: 0.755, 0.578, 0.474, Avg Draft acceptance rate: 60.2%

Correct Accuracy

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8362|±  |0.0114|
|     |       |strict-match    |     5|exact_match|↑  |0.5909|±  |0.0151|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jiahanc jiahanc marked this pull request as ready for review September 22, 2025 16:55
@mergify mergify bot added llama Related to Llama models gpt-oss Related to GPT-OSS models speculative-decoding v1 labels Sep 22, 2025
@benchislett benchislett added the bug Something isn't working label Sep 22, 2025
@jiahanc jiahanc marked this pull request as draft September 22, 2025 17:18
@jiahanc jiahanc marked this pull request as ready for review September 22, 2025 17:24
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 22, 2025
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025
@github-project-automation github-project-automation bot moved this from Ready to In progress in gpt-oss Issues & Enhancements Sep 22, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 22, 2025
@mgoin mgoin enabled auto-merge (squash) September 22, 2025 21:07
auto-merge was automatically disabled September 22, 2025 23:08

Head branch was pushed to by a user without write access

@mgoin
Copy link
Member

mgoin commented Sep 23, 2025

@jiahanc The failures in CI seem related

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
@jiahanc jiahanc force-pushed the jiahanc/gpt-oss-eagle3-fix branch from 19d7184 to 677ebdd Compare September 23, 2025 16:26
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
@jiahanc
Copy link
Contributor Author

jiahanc commented Sep 23, 2025

@jiahanc The failures in CI seem related
Fixed and eagle test passes locally

@mgoin mgoin merged commit d5944d5 into vllm-project:main Sep 23, 2025
48 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…vllm-project#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
…vllm-project#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…vllm-project#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…vllm-project#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…vllm-project#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…vllm-project#25406)

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants