Skip to content

Conversation

@benchislett
Copy link
Collaborator

@benchislett benchislett commented Mar 12, 2025

This PR changes the structured outputs behaviour in a few ways:

  • The generated speculative draft tokens are validated according to the grammar to ensure they match the grammar constraints. This operation should not modify the matcher state.
  • A bitmask is generated by the scheduler for each speculative draft token, such that the gpu runner can apply the appropriate bitmask to each position in the speculative sequence to ensure the sampled bonus token always adheres to the constraints. The easiest way I saw to accomplish this in the payload is to compact the bitmask and interleave the masks for the speculative tokens for each request. Then the gpu runner will always need to unpack the bitmask (since the compacted bitmask skips the non-structured requests even when speculative decoding is not enabled).

I also tweaked the benchmark file to fix some minor issues I had when running it locally, as well as modifying the prompt slightly to discourage infinite repetition when temperature==0. After this fix, I get 100% correctness when benchmarking with speculative decoding enabled.

NOTE: This PR is now compatible with both xGrammar and Guidance backends in V1.

…her.rollback)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
… decoding

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Mar 12, 2025
@mergify
Copy link

mergify bot commented Mar 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @benchislett.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…-outputs

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
@mergify mergify bot removed the needs-rebase label Mar 12, 2025
@LiuXiaoxuanPKU LiuXiaoxuanPKU self-assigned this Mar 12, 2025
@WoosukKwon WoosukKwon self-assigned this Mar 12, 2025
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
@aarnphm aarnphm self-requested a review March 13, 2025 06:58
@russellb
Copy link
Member

Let's get this in.

cc @russellb when you have bandwidth.

Yeah, I'm good with it. I just wanted to give @WoosukKwon a chance to do a final review, given that it touches such sensitive code.

@russellb
Copy link
Member

also this needs to be updated against main. I tried to do it but don't have access to the branch.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


def gen_prompt(index: int):
return f"Generate an example of a user profile given the following schema: {json.dumps(get_schema(index))}" # noqa: E501
return f"Generate an example of a brief user profile given the following schema: {json.dumps(get_schema(index))}" # noqa: E501
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some models are endlessly repeating outputs on the benchmark prompt. this tweak was enough to tip the scales

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 29, 2025
@mergify
Copy link

mergify bot commented Apr 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @benchislett.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 29, 2025
@WoosukKwon
Copy link
Collaborator

@benchislett Thanks for the PR. Could you please merge from main? That will fix the docker build error.

…-decoding-with-structured-outputs

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
@mergify mergify bot removed the needs-rebase label Apr 29, 2025
@benchislett
Copy link
Collaborator Author

merged again.

@mgoin mgoin enabled auto-merge (squash) April 29, 2025 22:17
@mgoin mgoin merged commit 34120f5 into vllm-project:main Apr 30, 2025
47 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in Structured Output Apr 30, 2025
@russellb
Copy link
Member

Thank you for the hard work on this PR!

radeksm pushed a commit to radeksm/vllm that referenced this pull request May 2, 2025
…lm-project#14702)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…lm-project#14702)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
…lm-project#14702)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Jul 31, 2025
…ar_bitmask` method (#2022)

### What this PR does / why we need it?
Fix #2033 

Sync vllm-project/vllm#14702 to solve
`grammar_bitmask` IndexError caused by outdated `apply_grammar_bitmask`
method

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested by upstream vllm


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@6e599ee

Signed-off-by: ApsarasX <apsarax@outlook.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Aug 19, 2025
…rammar_bitmask` method (#2314)

### What this PR does / why we need it?
Fix #2033

Sync vllm-project/vllm#14702 to solve
`grammar_bitmask` IndexError caused by outdated `apply_grammar_bitmask`
method

### Does this PR introduce _any_ user-facing change?
No


Signed-off-by: shen-shanshan <467638484@qq.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ar_bitmask` method (vllm-project#2022)

### What this PR does / why we need it?
Fix vllm-project#2033 

Sync vllm-project/vllm#14702 to solve
`grammar_bitmask` IndexError caused by outdated `apply_grammar_bitmask`
method

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested by upstream vllm


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@6e599ee

Signed-off-by: ApsarasX <apsarax@outlook.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ar_bitmask` method (vllm-project#2022)

### What this PR does / why we need it?
Fix vllm-project#2033 

Sync vllm-project/vllm#14702 to solve
`grammar_bitmask` IndexError caused by outdated `apply_grammar_bitmask`
method

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested by upstream vllm


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@6e599ee

Signed-off-by: ApsarasX <apsarax@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed structured-output v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants