[Spec Decode] Fix offline spec_decode.py #23461

ekagra-ranjan · 2025-08-23T03:49:32Z

Purpose

Offline spec_decode.py was broken by this PR and the recent PR. This PR fixes it.

Test Plan

time VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --print-output

Test Result

Output

--------------------------------------------------
--------------------------------------------------
total_num_output_tokens: 16956
num_drafts: 7409
num_draft_tokens: 22227
num_accepted_tokens: 9552
mean acceptance length: 2.29
--------------------------------------------------
acceptance at token 0: 0.68
acceptance at token 1: 0.40
acceptance at token 2: 0.21

gemini-code-assist

Code Review

This pull request introduces two fixes to address issues in offline speculative decoding. The first change in examples/offline_inference/spec_decode.py corrects the argument passed to llm.generate for batched prompts, ensuring it receives a list of TokensPrompt objects as expected. The second change in vllm/benchmarks/datasets.py adds a default value for request_id_prefix to prevent AttributeError when calling get_samples. Both changes appear correct and address the underlying bugs. My review did not find any high or critical severity issues with these fixes.

ekagra-ranjan · 2025-08-23T03:52:02Z

Should we run spec_decode.py with default parameter as an end to end test in vLLM? If so, can someone point me to the right place where this can be added?

tomasruizt · 2025-08-25T14:37:08Z

There is a test file tests/v1/e2e/test_spec_decode.py. That could be a potential location for and e2e test.

mergify · 2025-08-30T04:47:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

wwl2755 · 2025-09-02T19:28:10Z

@ekagra-ranjan Hi Ekagra, Thanks for the PR! I just came across this PR and verified it did fix the bug for example code. What's the current state of this PR? I think it is a helpful fix.

ekagra-ranjan · 2025-09-03T22:34:26Z

@wwl2755 - I am waiting for approval. It seems a later PR #23803 partially fixed it since i got merge conflict. I will update my PR now.

ywang96

Thanks for the fix!

…m-project#22675) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: ilmarkov <imarkov@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…ot being called when it should when using quantized FP8 model (vllm-project#22281) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…Attention Kernel (vllm-project#22703) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…llm-project#23360) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…llm-project#22527) Signed-off-by: feng <fengli1702@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…ist-to-list conversion (vllm-project#20000)" (vllm-project#23396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…oject#23425) Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

mergify · 2025-09-04T15:39:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ekagra-ranjan · 2025-09-04T15:48:55Z

superseeded by #24257

mergify bot added documentation Improvements or additions to documentation performance Performance-related issues speculative-decoding labels Aug 23, 2025

gemini-code-assist bot reviewed Aug 23, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 30, 2025

wwl2755 mentioned this pull request Sep 3, 2025

[Spec Decode][Model]Add qwen2-eagle #24187

Open

5 tasks

mergify bot removed the needs-rebase label Sep 3, 2025

ekagra-ranjan mentioned this pull request Sep 3, 2025

[V1][Spec Decode][Feature] Spec decode with probs #20459

Open

ywang96 approved these changes Sep 3, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025

ywang96 enabled auto-merge (squash) September 3, 2025 23:36

842974287 and others added 14 commits September 4, 2025 15:31

add an env var for path to pre-downloaded flashinfer cubin files (vll…

9191d57

…m-project#22675) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

[CI/Build] add EP dependencies to docker (vllm-project#21976)

31705bf

Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

fix spec_decode.py

73911f7

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

[BugFix] Fix batch updates for pooling models (vllm-project#23398)

03d6f9d

Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

[BugFix] Fix MinPLogitsProcessor.update_states() (vllm-project#23401)

82f511d

Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

[Model] Support DP for ViT on MiniCPM-V-4 (vllm-project#23327)

9ffa26d

Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

[UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh (v…

466ed1b

…llm-project#23360) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

58d6fa6

Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Revert "[PERF] Use faster way of decode in tokenizer: avoid useless l…

4f93bc2

…ist-to-list conversion (vllm-project#20000)" (vllm-project#23396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

fix(tests): Correct unreachable assertion in truncation test (vllm-pr…

5546b68

…oject#23425) Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

ekagra-ranjan requested review from LucasWilkinson, ProExpertProg, alexm-redhat, comaniac, houseroad, njhill, youkaichao, zhuohan123 and zou3519 as code owners September 4, 2025 15:33

mergify bot added ci/build deepseek Related to DeepSeek models frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models rocm Related to AMD ROCm structured-output v1 labels Sep 4, 2025

github-project-automation bot added this to Structured Output Sep 4, 2025

mergify bot added tpu Related to Google TPUs tool-calling labels Sep 4, 2025

github-project-automation bot added this to Tool Calling Sep 4, 2025

mergify bot added the needs-rebase label Sep 4, 2025

ekagra-ranjan mentioned this pull request Sep 4, 2025

[Spec Decode] Fix offline spec_decode.py #24257

Merged

ekagra-ranjan closed this Sep 4, 2025

github-project-automation bot moved this to Done in Tool Calling Sep 4, 2025

github-project-automation bot moved this to Done in Structured Output Sep 4, 2025

ekagra-ranjan mentioned this pull request Sep 9, 2025

[Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length #24531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Spec Decode] Fix offline spec_decode.py #23461

[Spec Decode] Fix offline spec_decode.py #23461

Uh oh!

ekagra-ranjan commented Aug 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ekagra-ranjan commented Aug 23, 2025

Uh oh!

tomasruizt commented Aug 25, 2025

Uh oh!

mergify bot commented Aug 30, 2025

Uh oh!

wwl2755 commented Sep 2, 2025

Uh oh!

ekagra-ranjan commented Sep 3, 2025 •

edited

Loading

Uh oh!

ywang96 left a comment

Uh oh!

mergify bot commented Sep 4, 2025

Uh oh!

ekagra-ranjan commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

40 participants

Uh oh!

[Spec Decode] Fix offline spec_decode.py #23461

[Spec Decode] Fix offline spec_decode.py #23461

Uh oh!

Conversation

ekagra-ranjan commented Aug 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ekagra-ranjan commented Aug 23, 2025

Uh oh!

tomasruizt commented Aug 25, 2025

Uh oh!

mergify bot commented Aug 30, 2025

Uh oh!

wwl2755 commented Sep 2, 2025

Uh oh!

ekagra-ranjan commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 4, 2025

Uh oh!

ekagra-ranjan commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

40 participants

ekagra-ranjan commented Aug 23, 2025 •

edited by github-actions bot

Loading

ekagra-ranjan commented Sep 3, 2025 •

edited

Loading