-
Notifications
You must be signed in to change notification settings - Fork 526
[V1][Structured Output] Add apply_grammar_bitmask() method to model runner
#555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@wangxiyuan I have tested this, after this function added, structured output can work well. logs: Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.63s/it, est. speed input: 4.18 toks/s, output: 1.52 toks/s]
--------------------------------------------------
Guided decoding by Choice: Negative
--------------------------------------------------
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3.56it/s, est. speed input: 114.14 toks/s, output: 28.53 toks/s]
--------------------------------------------------
Guided decoding by Regex: alan_turing@enigma.com
--------------------------------------------------
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.25it/s, est. speed input: 27.60 toks/s, output: 27.60 toks/s]
--------------------------------------------------
Guided decoding by JSON: {"brand": "Toyota", "model": "Supra", "car_type": "Coupe"}
--------------------------------------------------
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.49it/s, est. speed input: 31.36 toks/s, output: 32.85 toks/s]
--------------------------------------------------
Guided decoding by Grammar: SELECT col_1 from table_1 where col_2 = 1 |
|
LGTM |
|
|
||
| # We receive the structured output bitmask from the scheduler, but the | ||
| # indices of the requests in the batch may not match the indices of | ||
| # the bitmask since the scheduler doesn't know how the gpu runner is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will modify it later.
|
|
||
| # TODO: compatibility with spec decode. | ||
| # NOTE: | ||
| # 1. XGrammar bitmask applying only supports CPU and GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
… runner (vllm-project#555) ### What this PR does / why we need it? Add `apply_grammar_bitmask()` method to model runner. This method is necessary for `xgrammar` structured output. --------- Signed-off-by: shen-shanshan <467638484@qq.com>
… runner (vllm-project#555) ### What this PR does / why we need it? Add `apply_grammar_bitmask()` method to model runner. This method is necessary for `xgrammar` structured output. --------- Signed-off-by: shen-shanshan <467638484@qq.com>
What this PR does / why we need it?
Add
apply_grammar_bitmask()method to model runner.This method is necessary for
xgrammarstructured output.Does this PR introduce any user-facing change?
How was this patch tested?