[V1][Spec Decode] Enable spec decode for top-p & top-k sampling #15063

WoosukKwon · 2025-03-18T22:07:33Z

This PR is to enable spec decoding for requests with top-p & top-k sampling.
It is implemented using apply_top_k_top_p to mask the logits of the target model.
While this is more expensive than FlashInfer's sorting-free sampling, I think it's good for the first step.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

github-actions · 2025-03-18T22:07:41Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

vllm/v1/sample/rejection_sampler.py

houseroad

Looks good. Wondering how did we test it?

WoosukKwon · 2025-03-19T07:45:29Z

Looks good. Wondering how did we test it?

Good point. I just wanted to get some initial feedback before adding tests. Will update the PR.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon · 2025-03-24T20:47:00Z

@houseroad @LiuXiaoxuanPKU I've added the tests, and they're passing locally. Could you please review? I'd appreciate including this PR in v0.8.2 if possible.

LiuXiaoxuanPKU

LGTM, just minor QQ about the test

LiuXiaoxuanPKU · 2025-03-24T22:12:52Z

tests/v1/sample/test_rejection_sampler.py

+    num_tokens = batch_size * num_draft_tokens
+
+    # Randomly create unmasked indices.
+    num_top_p_tokens = int(vocab_size * top_p)


A bit confused by the definition of top_p sampling, should it be ' restricting the sampling to the set of most probable tokens with cumulative probability more than p'? Instead of sampling a fixed percentage of tokens like here.

@LiuXiaoxuanPKU Good catch. It only makes sense when int(vocab_size * top_p) tokens all have equal high logits (e.g., 100) while the others have -100. But definitely this is not general enough.

I've updated it to test top-p more precisely. Could you please take another look?

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

njhill

LGTM

vllm/v1/sample/rejection_sampler.py

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

WoosukKwon added 3 commits March 18, 2025 14:58

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling

7c4f058

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

comment

4ce33c9

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

optional

e1d647e

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon requested review from alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners March 18, 2025 22:07

WoosukKwon requested a review from LiuXiaoxuanPKU March 18, 2025 22:08

mergify bot added the v1 label Mar 18, 2025

WoosukKwon added 4 commits March 18, 2025 15:10

minor

b1aeb04

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

minor

b1416a7

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

fix docstring

438e39e

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Add example

0097a7b

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2025

houseroad reviewed Mar 19, 2025

View reviewed changes

vllm/v1/sample/rejection_sampler.py Show resolved Hide resolved

houseroad reviewed Mar 19, 2025

View reviewed changes

WoosukKwon added 2 commits March 20, 2025 14:06

Merge branch 'main' into v1-spec-top-p

59fe7fe

Merge branch 'main' into v1-spec-top-p

4d79940

WoosukKwon added this to the v0.8.2 milestone Mar 24, 2025

WoosukKwon added 2 commits March 24, 2025 13:39

Add tests

5439c60

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

improve

2d10140

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

LiuXiaoxuanPKU reviewed Mar 24, 2025

View reviewed changes

fix top-p test

e66fab2

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon requested review from LiuXiaoxuanPKU and houseroad March 24, 2025 22:38

njhill approved these changes Mar 24, 2025

View reviewed changes

vllm/v1/sample/rejection_sampler.py Outdated Show resolved Hide resolved

vllm/v1/sample/rejection_sampler.py Show resolved Hide resolved

comment

5dfb42e

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon merged commit ebcebee into main Mar 25, 2025
27 of 34 checks passed

WoosukKwon deleted the v1-spec-top-p branch March 25, 2025 00:16

WoosukKwon mentioned this pull request Mar 25, 2025

[V1][Spec Decode] Update target_logits in place for rejection sampling #15427

Merged

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (vllm…

7425009

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (vllm…

c9aebbe

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (vllm…

3b81918

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (vllm…

be95135

…-project#15063) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling #15063

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling #15063

Uh oh!

WoosukKwon commented Mar 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

Uh oh!

houseroad left a comment

Uh oh!

WoosukKwon commented Mar 19, 2025

Uh oh!

WoosukKwon commented Mar 24, 2025

Uh oh!

LiuXiaoxuanPKU left a comment

Uh oh!

LiuXiaoxuanPKU Mar 24, 2025

Uh oh!

WoosukKwon Mar 24, 2025

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling #15063

[V1][Spec Decode] Enable spec decode for top-p & top-k sampling #15063

Uh oh!

Conversation

WoosukKwon commented Mar 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Mar 19, 2025

Uh oh!

WoosukKwon commented Mar 24, 2025

Uh oh!

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

Uh oh!

LiuXiaoxuanPKU Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WoosukKwon commented Mar 18, 2025 •

edited by github-actions bot

Loading