[V1][Structured Output] Add `apply_grammar_bitmask()` method to model runner #555

shen-shanshan · 2025-04-17T10:04:59Z

What this PR does / why we need it?

Add apply_grammar_bitmask() method to model runner.

This method is necessary for xgrammar structured output.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan · 2025-04-18T07:59:21Z

@wangxiyuan I have tested this, after this function added, structured output can work well.

logs:

Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.63s/it, est. speed input: 4.18 toks/s, output: 1.52 toks/s]
--------------------------------------------------
Guided decoding by Choice: Negative
--------------------------------------------------
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.56it/s, est. speed input: 114.14 toks/s, output: 28.53 toks/s]
--------------------------------------------------
Guided decoding by Regex: alan_turing@enigma.com
--------------------------------------------------
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.25it/s, est. speed input: 27.60 toks/s, output: 27.60 toks/s]
--------------------------------------------------
Guided decoding by JSON: {"brand": "Toyota", "model": "Supra", "car_type": "Coupe"}
--------------------------------------------------
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.49it/s, est. speed input: 31.36 toks/s, output: 32.85 toks/s]
--------------------------------------------------
Guided decoding by Grammar: SELECT col_1  from table_1  where col_2 = 1

MengqingCao · 2025-04-18T08:44:00Z

LGTM

wangxiyuan · 2025-04-18T08:46:23Z

vllm_ascend/worker/model_runner_v1.py

+
+        # We receive the structured output bitmask from the scheduler, but the
+        # indices of the requests in the batch may not match the indices of
+        # the bitmask since the scheduler doesn't know how the gpu runner is


GPU comment

OK, I will modify it later.

wangxiyuan · 2025-04-18T08:46:38Z

vllm_ascend/worker/model_runner_v1.py

+
+        # TODO: compatibility with spec decode.
+        # NOTE:
+        # 1. XGrammar bitmask applying only supports CPU and GPU.


… runner (vllm-project#555) ### What this PR does / why we need it? Add `apply_grammar_bitmask()` method to model runner. This method is necessary for `xgrammar` structured output. --------- Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan mentioned this pull request Apr 17, 2025

[Feature]: Add Support for Guided Decoding (Structured Output) #177

Closed

20 tasks

shen-shanshan added 3 commits April 17, 2025 11:58

add structured output mask apply to ModelRunner

125eb2d

Signed-off-by: shen-shanshan <467638484@qq.com>

update

cd6bab8

Signed-off-by: shen-shanshan <467638484@qq.com>

update

abc52ed

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan force-pushed the v1-so branch from bddbda1 to abc52ed Compare April 17, 2025 11:59

fix bug

44ac7a4

Signed-off-by: shen-shanshan <467638484@qq.com>

wangxiyuan approved these changes Apr 18, 2025

View reviewed changes

wangxiyuan reviewed Apr 18, 2025

View reviewed changes

wangxiyuan merged commit 65c1f45 into vllm-project:main Apr 18, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V1][Structured Output] Add `apply_grammar_bitmask()` method to model runner #555

[V1][Structured Output] Add `apply_grammar_bitmask()` method to model runner #555

Uh oh!

shen-shanshan commented Apr 17, 2025

Uh oh!

shen-shanshan commented Apr 18, 2025

Uh oh!

MengqingCao commented Apr 18, 2025

Uh oh!

wangxiyuan Apr 18, 2025

Uh oh!

shen-shanshan Apr 18, 2025

Uh oh!

wangxiyuan Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[V1][Structured Output] Add apply_grammar_bitmask() method to model runner #555

[V1][Structured Output] Add apply_grammar_bitmask() method to model runner #555

Uh oh!

Conversation

shen-shanshan commented Apr 17, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

shen-shanshan commented Apr 18, 2025

Uh oh!

MengqingCao commented Apr 18, 2025

Uh oh!

wangxiyuan Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[V1][Structured Output] Add `apply_grammar_bitmask()` method to model runner #555

[V1][Structured Output] Add `apply_grammar_bitmask()` method to model runner #555