Skip to content

Conversation

@ekagra-ranjan
Copy link
Contributor

@ekagra-ranjan ekagra-ranjan commented Sep 9, 2025

In the past, the examples/offline_inference/spec_decode.py has broken a few times due to changes in datasets or other places like this. This script is an important one since this allows measuring AL for SD methods.

This PR adds this script to CI and ensures 2 things

  1. the example script is in working condition
  2. the AL of default method, i.e., Eagle, is measured during CI as an e2e test since many SD code path use Eagle related components.

Testing

cmd
time python3 examples/offline_inference/spec_decode.py --test --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill

Output

Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 12256.43it/s]
Processed prompts: 100%|██████████████████████████| 80/80 [00:02<00:00, 38.61it/s, est. speed input: 3886.77 toks/s, output: 8174.11 toks/s]
--------------------------------------------------
total_num_output_tokens: 16936
num_drafts: 7403
num_draft_tokens: 22209
num_accepted_tokens: 9535
mean acceptance length: 2.29
--------------------------------------------------
acceptance at token 0: 0.68
acceptance at token 1: 0.39
acceptance at token 2: 0.21
Test passed!

real    0m31.999s
user    0m52.270s
sys     0m6.709s

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@mergify mergify bot added documentation Improvements or additions to documentation ci/build speculative-decoding labels Sep 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an end-to-end test for the spec_decode.py example script, integrating it into the CI pipeline to safeguard against regressions in speculative decoding acceptance length. The approach of refactoring the script for testability and adding assertions for a fixed test case is sound. My review focuses on improving the robustness and maintainability of these new tests. I've identified a missing assertion for a critical test parameter and a formatting issue in an assertion message that could hinder debugging. Addressing these points will make the new test more reliable.

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Copy link
Contributor

@wwl2755 wwl2755 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job for maintaining this! I think it is worthwhile to maintain the example scripts valid through CI, since they may be the places people get started from.

Link to #22992 for visibility.

@ekagra-ranjan ekagra-ranjan changed the title [Spec Decode] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length [Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length Sep 9, 2025
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this should be EAGLE1 or EAGLE3 or both, but in any case this is good to have.

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
@ywang96 ywang96 enabled auto-merge (squash) September 19, 2025 19:10
@mergify
Copy link

mergify bot commented Sep 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ekagra-ranjan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 21, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
auto-merge was automatically disabled September 22, 2025 15:49

Head branch was pushed to by a user without write access

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Sep 22, 2025
@ywang96 ywang96 merged commit 867ecdd into vllm-project:main Sep 23, 2025
78 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…ent breaking Acceptance Length (vllm-project#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…ent breaking Acceptance Length (#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
…ent breaking Acceptance Length (vllm-project#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…ent breaking Acceptance Length (vllm-project#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…ent breaking Acceptance Length (vllm-project#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…ent breaking Acceptance Length (vllm-project#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ent breaking Acceptance Length (vllm-project#24531)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants