Skip to content

Conversation

@benchislett
Copy link
Collaborator

@benchislett benchislett commented Nov 3, 2025

Purpose

GPT-OSS can escape structured outputs by emitting tokens between <|channel|>final and <|message|>, for example: <|start|>assistant<|channel|>final <|constrain|>JSON<|message|>content....

This PR updates the parser to look for <|start|>assistant<|channel|>final and then the suffix <|message|>. This doesn't cover all cases, such as tool calling (see #23120), but it does workaround a few common issues.

Test Plan

Added a unit test for GptOssReasoningParser to target is_reasoning_end specifically. Future work can extend this to cover more cases as needed.

Test Result

Tests now passing.

I also ran the benchmark script:

python3 ../benchmarks/benchmark_serving_structured_output.py --backend openai-chat --model openai/gpt-oss-120b --dataset xgrammar_bench --structured-output-ratio 1.0 --request-rate 100 --num-prompts 100 --port 8046 --endpoint /openai/v1/chat/completions --output-len 900 --save-results

and see 98% accuracy before (same as with ratio set to 0.0. At least one confirmed failure case due to this feature gap), and 100% accuracy after this change.

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the GptOss reasoning parser, which previously struggled to correctly identify the end of a reasoning block when extra tokens were present between the final channel and the message tag. The implemented fix introduces a more flexible parsing mechanism that searches for a prefix and suffix separately within a defined token distance, which effectively resolves the issue and enhances parsing accuracy. The inclusion of a comprehensive unit test suite for is_reasoning_end is a valuable addition, as it thoroughly validates the new logic across different scenarios.

I have one suggestion to enhance the performance of the is_reasoning_end method, which could become a performance bottleneck with long input sequences.

@benchislett benchislett added the bug Something isn't working label Nov 3, 2025
# The model can output some special tokens between "final" and "<|message|>"
# So we need to look for both sequences to determine the end of reasoning.
self.reasoning_end_token_ids_prefix = self.model_tokenizer.encode(
"<|start|>assistant<|channel|>final"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the model will only output <|start|>assistant if it does decide to output a reasoning message, in some situations it may just jump straight to the final message, so it may be more reasonable to just do <|channel|>final instead as it will cover both situations.

To cover tool calling scenarios, as soon as you see to= in the header, then the message is a tool call and no longer reasoning.

Both situations can be covered by the ending suffix of <|message|> though.

Copy link
Collaborator Author

@benchislett benchislett Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to cover the first case. I think it would be easiest for me to leave any tool calling updates to follow-up work, as I have no context as to how we do them in vLLM. I'm not entirely sure if even ever uses this code pathway since the work using structured tags

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 5, 2025
Copy link
Contributor

@alecsolder alecsolder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have a PR ready soon as a follow up which will adjust this for tool calling as well

@yeqcharlotte yeqcharlotte enabled auto-merge (squash) November 6, 2025 07:47
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025
@yeqcharlotte yeqcharlotte merged commit 1890321 into vllm-project:main Nov 7, 2025
45 checks passed
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
…28000)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants