-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Only enable commentary channel for GPT-OSS when really necessary #23167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only enable commentary channel for GPT-OSS when really necessary #23167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly modifies the logic to enable the commentary channel for GPT-OSS models only when tools are active. This is done by adding a has_tools flag to get_system_message and conditionally including the commentary channel. The changes in serving_chat.py and serving_responses.py correctly pass this new flag. My review includes a suggestion to improve the implementation in harmony_utils.py for better type safety and maintainability.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
6dae3bb to
8a9de2d
Compare
Signed-off-by: Jan Kessler <jakessle@uni-mainz.de>
950dc3d to
004884b
Compare
004884b to
de640c1
Compare
Signed-off-by: Jan Kessler <jakessle@uni-mainz.de>
de640c1 to
afaaa46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the bug is
Generation will eventually stop without the requester receiving the tool calling content.
Can you fix the bug by sending these messages to requester properly? Modifying system prompt can be dangerous and it's hard to evaluate whether it doesn't affect model performance.
In my opinion the actual "bug" is that the model sometimes emits messages on the commentary channel which do not belong there. According to the cookbook (https://cookbook.openai.com/articles/openai-harmony), the channels are supposed to be used as follows:
Following this, I would say the model emitting tokens on the commentary channel despite no available tools whatsoever is unexpected. In my testing, specifying the valid channels as ["analysis", "final"] using the provided mechanism from the harmony library successfully prevents that behavior, making the model play along nicely with things like Open WebUI's code interpreter. To be honest, it appears to me as if the channel configuration option was added and trained exactly to better guide the model in such scenarios, but that's just speculation. From my preliminary testing I see no indication of degraded performance in benchmarks. I would like to invite more people to test the model in this configuration though, I think it is interesting to understand. Alternatively, if we wanted to leave commentary on, how do we decide what to do with the non tool-calling content on commentary channel? Just push it to regular content per default? |
|
I prefer to regard it as a message like in analysis channel. Is there any known problem to this solution? |
Yes I think so, because e.g. in the code interpreter example regular message content is expected, not reasoning content. So I think if anything, "extra" commentary content should become regular message content. In the end I think this is also what is supposed to be done with the "preamble" (see table) before multiple tool calls:
I'm happy to try and compare a solution that handles extra commentary content as regular message content! |
|
Can you have a try on better handling of model output? And can you give me some example? |
I think we should wait for merge of #22386 before further work on parsing in harmony utils. Regarding the example, Open WebUI with enabled code interpreter sends the following prompt together with the user instruction: I try with simple examples like "approximate pi" or similar. With commentary enabled, success rate for receiving the non-native toolcall in message content via chat/completions is pretty low. |
|
To confirm, you want to let the model to write some code for your local interpreter, but you don't want it to be either a builtin python tool call or a user-defined tool, and you want it to return as plain text. |
To be clear, that's nothing that I want to do, but this type of generic function calling (bypassing API via tools= and tool_calls=) is what a lot of applications do and it just doesn't work well with these models. Personally, I'm actually going to use this configuration productively until I know why I shouldn't. Regardless, given the information from the cookbook about the "preambles", parsing regular messages from commentary is still something that should be implemented. |
Purpose
In the current configuration, both GPT-OSS models would sometimes - in certain scenarios reliably - emit messages on the commentary channel despite not having access to any tools. This behavior is especially prevalent when the models are used with non-native tool calling, e.g. the "Code Interpreter" tool of Open WebUI, which asks the model to call the tool within
<code_interpreter></code_interpreter>tags. Most of the time such calls would not be emitted in the final channel or even the analysis channel, but on commentary. Generation will eventually stop without the requester receiving the tool calling content.When researching the harmony library, I found that the activated channels are set in the system message and by default analysis, commentary, final are all active. Because vllm currently doesn't actively configure the channels, this default is used regardless of active tools. The present PR changes this, such that commentary channel is only active when there is at least one tool of any kind enabled.
I don't expect any regression from this, but evaluations should be re-run to see potential effects. If anything, this should improve results.
Test Plan
Testing with Open WebUI via LiteLLM against /chat/completions and /responses API of vllm (using GPT OSS 20B). Using no tools, "non-native tools" and native tools and observing the generated system message / activated channels.
Testing whether Code Interpreter "tool" of Open WebUI can be used reliably now.
Test Result
Commentary channel is only active when tools are passed. Code Interpeter tool of Open WebUI is reliably used by the model now.