Skip to content

Conversation

@ekagra-ranjan
Copy link
Contributor

@ekagra-ranjan ekagra-ranjan commented Aug 23, 2025

Purpose

Offline spec_decode.py was broken by this PR and the recent PR. This PR fixes it.

Test Plan

time VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --print-output

Test Result

Output

--------------------------------------------------
--------------------------------------------------
total_num_output_tokens: 16956
num_drafts: 7409
num_draft_tokens: 22227
num_accepted_tokens: 9552
mean acceptance length: 2.29
--------------------------------------------------
acceptance at token 0: 0.68
acceptance at token 1: 0.40
acceptance at token 2: 0.21

@mergify mergify bot added documentation Improvements or additions to documentation performance Performance-related issues speculative-decoding labels Aug 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two fixes to address issues in offline speculative decoding. The first change in examples/offline_inference/spec_decode.py corrects the argument passed to llm.generate for batched prompts, ensuring it receives a list of TokensPrompt objects as expected. The second change in vllm/benchmarks/datasets.py adds a default value for request_id_prefix to prevent AttributeError when calling get_samples. Both changes appear correct and address the underlying bugs. My review did not find any high or critical severity issues with these fixes.

@ekagra-ranjan
Copy link
Contributor Author

Should we run spec_decode.py with default parameter as an end to end test in vLLM? If so, can someone point me to the right place where this can be added?

@tomasruizt
Copy link
Contributor

There is a test file tests/v1/e2e/test_spec_decode.py. That could be a potential location for and e2e test.

@mergify
Copy link

mergify bot commented Aug 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 30, 2025
@wwl2755
Copy link
Contributor

wwl2755 commented Sep 2, 2025

@ekagra-ranjan Hi Ekagra, Thanks for the PR! I just came across this PR and verified it did fix the bug for example code. What's the current state of this PR? I think it is a helpful fix.

@ekagra-ranjan
Copy link
Contributor Author

ekagra-ranjan commented Sep 3, 2025

@wwl2755 - I am waiting for approval. It seems a later PR #23803 partially fixed it since i got merge conflict. I will update my PR now.

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025
@ywang96 ywang96 enabled auto-merge (squash) September 3, 2025 23:36
842974287 and others added 14 commits September 4, 2025 15:31
…m-project#22675)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…ot being called when it should when using quantized FP8 model (vllm-project#22281)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…Attention Kernel (vllm-project#22703)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: ycyaw66 <497410282@qq.com>
Co-authored-by: ycyaw66 <497410282@qq.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…llm-project#23360)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…llm-project#22527)

Signed-off-by: feng <fengli1702@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Chenxi Yang <cxyang@meta.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…ist-to-list conversion (vllm-project#20000)" (vllm-project#23396)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…oject#23425)

Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@mergify mergify bot added ci/build deepseek Related to DeepSeek models frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models rocm Related to AMD ROCm structured-output v1 labels Sep 4, 2025
@mergify mergify bot added tpu Related to Google TPUs tool-calling labels Sep 4, 2025
@mergify
Copy link

mergify bot commented Sep 4, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@ekagra-ranjan
Copy link
Contributor Author

superseeded by #24257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) needs-rebase new-model Requests to new models performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output tool-calling tpu Related to Google TPUs v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.