update spec decode metrics to use throughput #24127

qandrew · 2025-09-02T22:31:48Z

Purpose

In encountering vllm speculative decoding code, we noticed that throughput was a more useful metric to read than the total number of tokens. Worked with @Jialin to devise better metrics

Test Plan

send a curl request to a vllm server running spec decoding

server

vllm serve facebook/opt-125m \
  --swap-space 16 \
  --disable-log-requests \
  --host :: \
  --dtype float16 \
  --speculative_config \
    "{\"method\":\"ngram\",\"num_speculative_tokens\":5,\"prompt_lookup_min\":5,\"prompt_lookup_max\":10}" \
  2>&1 | tee /data/users/$USER/logs/vllm_serving.$(date +%Y%m%d_%H%M%S).log

Test Result

metrics

[axia@devvm30969.cln0 ~/uv_env/gpt_oss_edit/bin]$ curl http://localhost:8000/v1/completions   -H "Content-Type: application/json"   -d '{
    "model": "facebook/opt-125m",
    "prompt": "Write a short story about plants.",
    "max_tokens": 1000,
    "temperature": 0.7
  }'


(APIServer pid=2236552) INFO 09-09 11:51:35 [metrics.py:96] SpecDecoding metrics: Draft acceptance rate: 88.8%, Mean acceptance length: 5.44, Accepted throughput: 4.37 tokens/s, Drafted throughput: 4.92 tokens/s, Accepted: 262 tokens, Drafted: 295 tokens, Per-position acceptance rate: 0.983, 0.949, 0.915, 0.881, 0.712

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-09-02T22:31:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request updates the speculative decoding metrics to include throughput for drafted and accepted tokens. The changes correctly use time.monotonic() to measure the elapsed time for accurate throughput calculation. The implementation is sound, adding last_log_time to SpecDecodingLogging, calculating throughput in the log method, and updating the log message accordingly. The code handles potential division-by-zero errors. Overall, the changes are a good addition for better performance monitoring, and I have no high or critical severity feedback.

Jialin

Looks good to me. Thanks for patching the change to OSS. Please add some screen shot in the test plan.

Signed-off-by: Andrew Xia <axia@meta.com>

Jialin · 2025-09-05T19:34:07Z

CC @yeqcharlotte @houseroad

vllm/v1/spec_decode/metrics.py

Signed-off-by: Andrew Xia <axia@meta.com>

vllm/v1/spec_decode/metrics.py

Signed-off-by: Andrew Xia <axia@meta.com>

luccafong

lgtm!

benchislett

LGTM

…ughput (vllm-project#24127) Signed-off-by: Andrew Xia <axia@meta.com>

…ughput (vllm-project#24127) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

qandrew requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 2, 2025 22:31

qandrew marked this pull request as draft September 2, 2025 22:32

mergify bot added speculative-decoding v1 labels Sep 2, 2025

gemini-code-assist bot reviewed Sep 2, 2025

View reviewed changes

qandrew force-pushed the andrew/jialin-spec-logging branch from 75cef51 to 710a8fa Compare September 2, 2025 23:20

Jialin approved these changes Sep 5, 2025

View reviewed changes

qandrew added 2 commits September 5, 2025 11:25

update spec decode metrics to use throughput

c159a9f

Signed-off-by: Andrew Xia <axia@meta.com>

pre-commit

e2fbcce

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew force-pushed the andrew/jialin-spec-logging branch from 710a8fa to e2fbcce Compare September 5, 2025 18:26

qandrew marked this pull request as ready for review September 5, 2025 18:29

yeqcharlotte reviewed Sep 6, 2025

View reviewed changes

vllm/v1/spec_decode/metrics.py Show resolved Hide resolved

qandrew requested a review from yeqcharlotte September 8, 2025 22:26

yeqcharlotte requested a review from luccafong September 9, 2025 16:50

qandrew added 2 commits September 9, 2025 11:44

Merge branch 'main' into andrew/jialin-spec-logging

57c042d

lucia comments

c79a905

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew requested a review from benchislett as a code owner September 9, 2025 18:53

benchislett reviewed Sep 9, 2025

View reviewed changes

vllm/v1/spec_decode/metrics.py Outdated Show resolved Hide resolved

comments

ce9367d

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew requested a review from benchislett September 9, 2025 22:30

luccafong approved these changes Sep 10, 2025

View reviewed changes

Merge branch 'main' into andrew/jialin-spec-logging

88f3528

benchislett approved these changes Sep 10, 2025

View reviewed changes

benchislett enabled auto-merge (squash) September 10, 2025 19:11

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025

Merge branch 'main' into andrew/jialin-spec-logging

4ef1029

qandrew mentioned this pull request Sep 10, 2025

[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 #24392

Merged

5 tasks

qandrew added 2 commits September 11, 2025 08:05

Merge branch 'main' into andrew/jialin-spec-logging

5c697ed

Merge branch 'main' into andrew/jialin-spec-logging

24d0da0

benchislett merged commit 79ac59f into vllm-project:main Sep 11, 2025
38 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

Update Spec Decode metrics to include drafted and accepted token thro…

c398458

…ughput (vllm-project#24127) Signed-off-by: Andrew Xia <axia@meta.com>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

Update Spec Decode metrics to include drafted and accepted token thro…

8d5bef6

…ughput (vllm-project#24127) Signed-off-by: Andrew Xia <axia@meta.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Update Spec Decode metrics to include drafted and accepted token thro…

28a38f4

…ughput (vllm-project#24127) Signed-off-by: Andrew Xia <axia@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

update spec decode metrics to use throughput #24127

update spec decode metrics to use throughput #24127

Uh oh!

qandrew commented Sep 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Jialin left a comment

Uh oh!

Jialin commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

luccafong left a comment

Uh oh!

benchislett left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

update spec decode metrics to use throughput #24127

update spec decode metrics to use throughput #24127

Uh oh!

Conversation

qandrew commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Jialin left a comment

Choose a reason for hiding this comment

Uh oh!

Jialin commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

luccafong left a comment

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qandrew commented Sep 2, 2025 •

edited by github-actions bot

Loading