Skip to content

Conversation

@shen-shanshan
Copy link
Collaborator

What this PR does / why we need it?

Add apply_grammar_bitmask() method to model runner.

This method is necessary for xgrammar structured output.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
@shen-shanshan
Copy link
Collaborator Author

@wangxiyuan I have tested this, after this function added, structured output can work well.

logs:

Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.63s/it, est. speed input: 4.18 toks/s, output: 1.52 toks/s]
--------------------------------------------------
Guided decoding by Choice: Negative
--------------------------------------------------
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.56it/s, est. speed input: 114.14 toks/s, output: 28.53 toks/s]
--------------------------------------------------
Guided decoding by Regex: alan_turing@enigma.com
--------------------------------------------------
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.25it/s, est. speed input: 27.60 toks/s, output: 27.60 toks/s]
--------------------------------------------------
Guided decoding by JSON: {"brand": "Toyota", "model": "Supra", "car_type": "Coupe"}
--------------------------------------------------
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.49it/s, est. speed input: 31.36 toks/s, output: 32.85 toks/s]
--------------------------------------------------
Guided decoding by Grammar: SELECT col_1  from table_1  where col_2 = 1

@MengqingCao
Copy link
Collaborator

LGTM


# We receive the structured output bitmask from the scheduler, but the
# indices of the requests in the batch may not match the indices of
# the bitmask since the scheduler doesn't know how the gpu runner is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will modify it later.


# TODO: compatibility with spec decode.
# NOTE:
# 1. XGrammar bitmask applying only supports CPU and GPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@wangxiyuan wangxiyuan merged commit 65c1f45 into vllm-project:main Apr 18, 2025
15 checks passed
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
… runner (vllm-project#555)

### What this PR does / why we need it?
Add `apply_grammar_bitmask()` method to model runner.

This method is necessary for `xgrammar` structured output.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
… runner (vllm-project#555)

### What this PR does / why we need it?
Add `apply_grammar_bitmask()` method to model runner.

This method is necessary for `xgrammar` structured output.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants