Skip to content

Conversation

@Ithanil
Copy link
Contributor

@Ithanil Ithanil commented Aug 19, 2025

Purpose

In the current configuration, both GPT-OSS models would sometimes - in certain scenarios reliably - emit messages on the commentary channel despite not having access to any tools. This behavior is especially prevalent when the models are used with non-native tool calling, e.g. the "Code Interpreter" tool of Open WebUI, which asks the model to call the tool within <code_interpreter></code_interpreter> tags. Most of the time such calls would not be emitted in the final channel or even the analysis channel, but on commentary. Generation will eventually stop without the requester receiving the tool calling content.

When researching the harmony library, I found that the activated channels are set in the system message and by default analysis, commentary, final are all active. Because vllm currently doesn't actively configure the channels, this default is used regardless of active tools. The present PR changes this, such that commentary channel is only active when there is at least one tool of any kind enabled.

I don't expect any regression from this, but evaluations should be re-run to see potential effects. If anything, this should improve results.

Test Plan

Testing with Open WebUI via LiteLLM against /chat/completions and /responses API of vllm (using GPT OSS 20B). Using no tools, "non-native tools" and native tools and observing the generated system message / activated channels.

Testing whether Code Interpreter "tool" of Open WebUI can be used reliably now.

Test Result

Commentary channel is only active when tools are passed. Code Interpeter tool of Open WebUI is reliably used by the model now.

@Ithanil Ithanil requested a review from aarnphm as a code owner August 19, 2025 08:19
@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Aug 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly modifies the logic to enable the commentary channel for GPT-OSS models only when tools are active. This is done by adding a has_tools flag to get_system_message and conditionally including the commentary channel. The changes in serving_chat.py and serving_responses.py correctly pass this new flag. My review includes a suggestion to improve the implementation in harmony_utils.py for better type safety and maintainability.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@Ithanil Ithanil force-pushed the conditionally_enable_commentary branch from 6dae3bb to 8a9de2d Compare August 19, 2025 08:23
Signed-off-by: Jan Kessler <jakessle@uni-mainz.de>
@Ithanil Ithanil force-pushed the conditionally_enable_commentary branch from 950dc3d to 004884b Compare August 21, 2025 08:11
@Ithanil Ithanil force-pushed the conditionally_enable_commentary branch from 004884b to de640c1 Compare August 21, 2025 09:02
Signed-off-by: Jan Kessler <jakessle@uni-mainz.de>
@Ithanil Ithanil force-pushed the conditionally_enable_commentary branch from de640c1 to afaaa46 Compare August 21, 2025 09:32
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bug is

Generation will eventually stop without the requester receiving the tool calling content.

Can you fix the bug by sending these messages to requester properly? Modifying system prompt can be dangerous and it's hard to evaluate whether it doesn't affect model performance.

@Ithanil
Copy link
Contributor Author

Ithanil commented Aug 21, 2025

If the bug is

Generation will eventually stop without the requester receiving the tool calling content.

Can you fix the bug by sending these messages to requester properly? Modifying system prompt can be dangerous and it's hard to evaluate whether it doesn't affect model performance.

In my opinion the actual "bug" is that the model sometimes emits messages on the commentary channel which do not belong there. According to the cookbook (https://cookbook.openai.com/articles/openai-harmony), the channels are supposed to be used as follows:

Channel Purpose
final Messages tagged in the final channel are messages intended to be shown to the end-user and represent the responses from the model.
analysis These are messages that are being used by the model for its chain of thought (CoT). Important: Messages in the analysis channel do not adhere to the same safety standards as final messages do. Avoid showing these to end-users.
commentary Any function tool call will typically be triggered on the commentary channel while built-in tools will normally be triggered on the analysis channel. However, occasionally built-in tools will still be output to commentary. Occasionally this channel might also be used by the model to generate a preamble to calling multiple functions.

Following this, I would say the model emitting tokens on the commentary channel despite no available tools whatsoever is unexpected. In my testing, specifying the valid channels as ["analysis", "final"] using the provided mechanism from the harmony library successfully prevents that behavior, making the model play along nicely with things like Open WebUI's code interpreter. To be honest, it appears to me as if the channel configuration option was added and trained exactly to better guide the model in such scenarios, but that's just speculation.

From my preliminary testing I see no indication of degraded performance in benchmarks. I would like to invite more people to test the model in this configuration though, I think it is interesting to understand.

Alternatively, if we wanted to leave commentary on, how do we decide what to do with the non tool-calling content on commentary channel? Just push it to regular content per default?

@heheda12345
Copy link
Collaborator

I prefer to regard it as a message like in analysis channel. Is there any known problem to this solution?

@Ithanil
Copy link
Contributor Author

Ithanil commented Aug 21, 2025

I prefer to regard it as a message like in analysis channel. Is there any known problem to this solution?

Yes I think so, because e.g. in the code interpreter example regular message content is expected, not reasoning content. So I think if anything, "extra" commentary content should become regular message content. In the end I think this is also what is supposed to be done with the "preamble" (see table) before multiple tool calls:

Preambles
At times the model might choose to generate a “preamble” to inform the user about the tools it is about to call. For example, when it plans to call multiple tools. If this is the case it will generate an assistant message on the commentary channel that, unlike the chain-of-thought, is intended to be shown to the end-user.

I'm happy to try and compare a solution that handles extra commentary content as regular message content!

@heheda12345
Copy link
Collaborator

Can you have a try on better handling of model output? And can you give me some example?

@Ithanil
Copy link
Contributor Author

Ithanil commented Aug 21, 2025

Can you have a try on better handling of model output? And can you give me some example?

I think we should wait for merge of #22386 before further work on parsing in harmony utils.

Regarding the example, Open WebUI with enabled code interpreter sends the following prompt together with the user instruction:

DEFAULT_CODE_INTERPRETER_PROMPT = """
#### Tools Available

1. **Code Interpreter**: `<code_interpreter type="code" lang="python"></code_interpreter>`
   - You have access to a Python shell that runs directly in the user's browser, enabling fast execution of code for analysis, calculations, or problem-solving.  Use it in this response.
   - The Python code you write can incorporate a wide array of libraries, handle data manipulation or visualization, perform API calls for web-related tasks, or tackle virtually any computational challenge. Use this flexibility to **think outside the box, craft elegant solutions, and harness Python's full potential**.
   - To use it, **you must enclose your code within `<code_interpreter type="code" lang="python">` XML tags** and stop right away. If you don't, the code won't execute. 
   - When writing code in the code_interpreter XML tag, Do NOT use the triple backticks code block for markdown formatting, example: ```py # python code ``` will cause an error because it is markdown formatting, it is not python code.
   - When coding, **always aim to print meaningful outputs** (e.g., results, tables, summaries, or visuals) to better interpret and verify the findings. Avoid relying on implicit outputs; prioritize explicit and clear print statements so the results are effectively communicated to the user.  
   - After obtaining the printed output, **always provide a concise analysis, interpretation, or next steps to help the user understand the findings or refine the outcome further.**  
   - If the results are unclear, unexpected, or require validation, refine the code and execute it again as needed. Always aim to deliver meaningful insights from the results, iterating if necessary.  
   - **If a link to an image, audio, or any file is provided in markdown format in the output, ALWAYS regurgitate word for word, explicitly display it as part of the response to ensure the user can access it easily, do NOT change the link.**
   - All responses should be communicated in the chat's primary language, ensuring seamless understanding. If the chat is multilingual, default to English for clarity.

Ensure that the tools are effectively utilized to achieve the highest-quality analysis for the user."""

I try with simple examples like "approximate pi" or similar. With commentary enabled, success rate for receiving the non-native toolcall in message content via chat/completions is pretty low.

@heheda12345
Copy link
Collaborator

To confirm, you want to let the model to write some code for your local interpreter, but you don't want it to be either a builtin python tool call or a user-defined tool, and you want it to return as plain text.
Note that user-defined tool should be replied in commentary, so you need to keep that channel if you regard your interpreter as a user-defined tool.
I'm still suspect about changing system prompt, but you can do it in your local fork if you really want it.

@Ithanil
Copy link
Contributor Author

Ithanil commented Aug 22, 2025

To confirm, you want to let the model to write some code for your local interpreter, but you don't want it to be either a builtin python tool call or a user-defined tool, and you want it to return as plain text. Note that user-defined tool should be replied in commentary, so you need to keep that channel if you regard your interpreter as a user-defined tool. I'm still suspect about changing system prompt, but you can do it in your local fork if you really want it.

To be clear, that's nothing that I want to do, but this type of generic function calling (bypassing API via tools= and tool_calls=) is what a lot of applications do and it just doesn't work well with these models.

Personally, I'm actually going to use this configuration productively until I know why I shouldn't. Regardless, given the information from the cookbook about the "preambles", parsing regular messages from commentary is still something that should be implemented.

@Ithanil
Copy link
Contributor Author

Ithanil commented Aug 23, 2025

#23386 (comment)

@Ithanil Ithanil closed this Sep 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants