[gpt-oss] Harmony changes with container tool support #23386

morgendave · 2025-08-21T22:34:33Z

Purpose

Support system prompt customization via injections
remove developer messages if all tools enabled are builtin tools
support container tool
support context with session id on MCP
support MCP session clean up for container

Test Plan

Tested on oss gpt with evals, sample requests
Jupyter python tool for AIME25 is slightly lower than model card

Test Result

OSS tested
Parsed messages

Harmony messages: [Message(author=Author(role=<Role.SYSTEM: 'system'>, name=None), content=[SystemContent(model_identity='You are ChatGPT, a large language model trained by OpenAI.\nRespond in Chinese.', reasoning_effort=<ReasoningEffort.MEDIUM: 'Medium'>, conversation_start_date='2025-08-14', knowledge_cutoff='2024-06', channel_config=ChannelConfig(valid_channels=['analysis', 'final'], channel_required=True), tools={'browser': ToolNamespaceConfig(name='browser', description='Tool for browsing.\nThe `cursor` appears in brackets before each browsing display: `[{cursor}]`.\nCite information from the tool using the following format:\n`【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`. \nDo not quote more than 10 words directly from the tool output.\nsources=web', tools=[ToolDescription(name='search', description='web search tool', parameters={'properties': {'query': {'type': 'string'}}, 'required': ['query'], 'type': 'object'})])})], channel=None, recipient=None, content_type=None), Message(author=Author(role=<Role.DEVELOPER: 'developer'>, name=None), content=[DeveloperContent(instructions=None, tools=None)], channel=None, recipient=None, content_type=None), Message(author=Author(role=<Role.USER: 'user'>, name=None), content=[TextContent(text="When is Alan Turing's birthday?")], channel=None, recipient=None, content_type=None)]

AIME25 high

Writing report to /tmp/aime25_120b-high_temp1.0_20250814_093031.html
{'chars': 2442.35, 'chars:std': 1036.7757283199358, 'score': 0.9458333333333333, 'score:std': 0.22634628092568448}

gemini-code-assist

Code Review

This pull request introduces support for container tools, system prompt customization, and session management improvements for Harmony. The changes look good overall, but I've identified a couple of high-severity issues. One is a design flaw in ConversationContext where a method signature in a subclass is incompatible with its abstract base class, violating the Liskov Substitution Principle. The other is a bug in get_system_message that can lead to an incorrect system prompt when instructions are provided without a base model identity. I've provided suggestions to fix both issues.

gemini-code-assist · 2025-08-21T22:36:15Z

vllm/entrypoints/context.py

The signature of cleanup_session in the abstract base class ConversationContext is async def cleanup_session(self) -> None:. However, the implementation in HarmonyContext is async def cleanup_session(self, *args, **kwargs) -> None:. This violates the Liskov Substitution Principle, as the subclass method has a different signature.

While Python's dynamic nature might allow this at runtime, it's a design issue that can be flagged by static analysis tools and lead to confusion. The *args, **kwargs are necessary because AsyncExitStack.push_async_exit can pass exception details to the cleanup function.

To fix this, the signature in the base class and all implementing classes should be consistent. Please update this abstract method and the implementation in SimpleContext to match the one in HarmonyContext.

Suggested change

@abstractmethod

async def cleanup_session(self) -> None:

pass

@abstractmethod

async def cleanup_session(self, *args, **kwargs) -> None:

pass

gemini-code-assist · 2025-08-21T22:36:15Z

vllm/entrypoints/harmony_utils.py

There is a potential bug here. If model_identity is None (its default) and instructions are provided, sys_msg_content.model_identity will be None. Consequently, current_identity will be None, and the new model identity will be set to the string "None\n<instructions>", which is likely not the intended behavior.

You should handle the case where current_identity is None to avoid prepending the string "None".

Suggested change

current_identity = sys_msg_content.model_identity

sys_msg_content = sys_msg_content.with_model_identity(

f"{current_identity}\n{instructions}")

current_identity = sys_msg_content.model_identity

new_identity = f'{current_identity}\n{instructions}' if current_identity else instructions

sys_msg_content = sys_msg_content.with_model_identity(new_identity)

cadedaniel · 2025-08-21T22:50:42Z

What is AIME25 high without this PR?

morgendave · 2025-08-21T23:04:05Z

What is AIME25 high without this PR?

With tools it is similar. The repro of numbers are not very stable with tools actually
One of the thing this made better is that the commentary channel showed up less when it shouldn't be

yeqcharlotte · 2025-08-21T23:52:49Z

vllm/entrypoints/openai/serving_responses.py

we should not change the default behavior.

using container tool as built-in should be opt-in through some environment variable. and ultimately we should get rid of this hack by making system prompt easier to modify so vllm is more friendly for research use case.

+1. I'm also getting other feature requests to customize system prompt like #23167 so I think we need to allow users to do that in some ways.

heheda12345 · 2025-08-22T21:33:33Z

vllm/entrypoints/context.py

what is container tool?

heheda12345 · 2025-08-22T21:44:38Z

vllm/entrypoints/tool_server.py

For the session cleanup logic, maybe just add it here?

Right now tool sessions are not managed in tool sever lifecycle but with context lifecycle so it has to be attached there unless we do a refactor

But I think regardless of exit_stack.push_async_exit(context.cleanup_session) or implementing the exist logic here, the cleanup will be called when the code goes out the async with AsyncExitStack() as exit_stack: block?

And I thought different requests should have difference session so tool sessions should not be managed in tool server lifecycle. Is this understanding correct?

This is in tool server, everything should be managed in context as it as the actual management of sessions and per request based

heheda12345 · 2025-08-22T21:50:37Z

vllm/entrypoints/openai/serving_responses.py

+1. I'm also getting other feature requests to customize system prompt like #23167 so I think we need to allow users to do that in some ways.

Ithanil · 2025-08-23T20:07:18Z

vllm/entrypoints/harmony_utils.py

If I'm not mistaken, this basically covers #23167 , so if the present PR gets merged, mine is obsolete.

vllm/entrypoints/tool_server.py

heheda12345 · 2025-08-26T23:49:48Z

vllm/envs.py

Suggested change

"VLLM_USE_CONTAINER_TOOL":

lambda: bool(int(os.getenv("VLLM_USE_CONTAINER_TOOL", "0"))),

# Allows harmony instructions to be injected on system messages

"VLLM_HARMONY_SYSTEM_INSTRUCTIONS":

lambda: bool(int(os.getenv("VLLM_HARMONY_SYSTEM_INSTRUCTIONS", "0"))),

"VLLM_GPT_OSS_USE_CONTAINER_TOOL":

lambda: bool(int(os.getenv("VLLM_GPT_OSS_USE_CONTAINER_TOOL", "0"))),

# Allows harmony instructions to be injected on system messages

"VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS":

lambda: bool(int(os.getenv("VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS", "0"))),

Feel that these logic are heavily hardcoded to gpt-oss. What about adding a GPT_OSS prefix?

Sure that might be better. Let me change on both sides

mergify · 2025-08-27T01:12:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @morgendave.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

heheda12345

Given MCP + stream generator is supported, can you also update responses_stream_generator?
I think we still need discussion about session cleanup logic. Let's sync offline.
Can you add VLLM_GPT_OSS_USE_CONTAINER_TOOL and VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS to the recipe?

houseroad · 2025-09-03T00:27:06Z

vllm/entrypoints/harmony_utils.py

add some comment about to explain the tools? What's input and output expectation?

houseroad · 2025-09-03T00:29:50Z

vllm/entrypoints/context.py

should be something like raise NotImplementedError("Should not be called.")? So we can force users carefully implement cleanup logic.

facebook-github-bot · 2025-09-03T06:44:52Z

@lacora2017 has imported this pull request. If you are a Meta employee, you can view this in D81562641.

houseroad

Some minor comment to address.

morgendave · 2025-09-03T17:55:39Z

Given MCP + stream generator is supported, can you also update responses_stream_generator?

I think we still need discussion about session cleanup logic. Let's sync offline.

Can you add VLLM_GPT_OSS_USE_CONTAINER_TOOL and VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS to the recipe?

For 1 I checked the responses_stream_generator, the only change is in calling await context.init_tool_sessions which is included
For 2 I left a comment, basically the clean up is already for different requests and not managed by tool server as it couldn't
For 3 it is already in that PR

yeqcharlotte · 2025-09-03T19:39:08Z

vllm/entrypoints/openai/serving_responses.py

@morgendave this should be VLLM_GPT_OSS_USE_CONTAINER_TOOL

lacora · 2025-09-03T21:15:20Z

@heheda12345 regarding cleanup if it's minor let's put it up as a followup for further discussions and not blocking this PR， could you or @morgendave summarize the offline discussion a bit so that we can broadly discuss as well？

simon-mo · 2025-09-04T20:48:54Z

Entrypoint failure seems related

morgendave · 2025-09-04T21:11:41Z

Entrypoint failure seems related

Which failure?I saw one GPU memory related only

facebook-github-bot · 2025-09-05T03:13:09Z

@lacora2017 has imported this pull request. If you are a Meta employee, you can view this in D81562641.

mergify · 2025-09-05T04:09:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @morgendave.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm · 2025-09-05T04:12:09Z

I merged from main to this. Let's wait for CI to run (possibly flaky CI)

Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

houseroad · 2025-09-07T06:39:08Z

The failed test seems related.

lacora · 2025-09-08T23:51:25Z

I think the failed test itself it's flaky not a problem with this PR.

Output is just asking if to use Celsius or Fahrenheit.
"ChatCompletionMessage(content='Sure! Would you like the temperature in Celsius or Fahrenheit?', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content='We need to call the function get_current_weather with city "Dallas", state "TX", unit maybe default? The user didn't specify unit. We can ask for unit? But we can choose default. Probably ask for unit? The user didn't specify. We can ask: "Would you like Celsius or Fahrenheit?" But we can also default to Fahrenheit for US. Let's ask.')"

If we remove the option for the temperature in the tool itself the test will pass, verify multiple times.

…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

The container tool was referenced in the structural tag implementation but ToolNamespaceConfig.container() does not exist in the openai_harmony library, which would cause AttributeError at runtime if the container tool was used. Changes: - Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py - Remove enable_container logic from serving_responses.py - Update tests to remove container tool test cases - Keep all structural tag improvements for browser and python tools The container tool support was originally added in PR vllm-project#23386 but the openai_harmony library never implemented ToolNamespaceConfig.container(). This commit removes only the non-functional container references while preserving all working functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

morgendave requested a review from aarnphm as a code owner August 21, 2025 22:34

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Aug 21, 2025

gemini-code-assist bot reviewed Aug 21, 2025

View reviewed changes

yeqcharlotte reviewed Aug 21, 2025

View reviewed changes

morgendave force-pushed the harmony-gpt-changes branch from 4754419 to 5eff826 Compare August 22, 2025 00:31

heheda12345 reviewed Aug 22, 2025

View reviewed changes

Ithanil mentioned this pull request Aug 23, 2025

Only enable commentary channel for GPT-OSS when really necessary #23167

Closed

Ithanil reviewed Aug 23, 2025

View reviewed changes

morgendave mentioned this pull request Aug 26, 2025

[vllm #23386]update recipe with container tool vllm-project/recipes#43

Open

heheda12345 reviewed Aug 26, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 27, 2025

morgendave force-pushed the harmony-gpt-changes branch from 4dac698 to 10bb2cf Compare August 27, 2025 18:01

mergify bot removed the needs-rebase label Aug 27, 2025

morgendave force-pushed the harmony-gpt-changes branch 2 times, most recently from 3afe81f to 503091c Compare September 2, 2025 21:56

heheda12345 reviewed Sep 3, 2025

View reviewed changes

houseroad reviewed Sep 3, 2025

View reviewed changes

vllm/entrypoints/harmony_utils.py Outdated

Copy link

Collaborator

houseroad Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comment about to explain the tools? What's input and output expectation?

houseroad reviewed Sep 3, 2025

View reviewed changes

lacora mentioned this pull request Sep 3, 2025

[Usage]: Add toy example for gpt-oss container tools #24148

Open

1 task

houseroad approved these changes Sep 3, 2025

View reviewed changes

yeqcharlotte reviewed Sep 3, 2025

View reviewed changes

morgendave force-pushed the harmony-gpt-changes branch from b58e933 to c285b53 Compare September 3, 2025 20:46

heheda12345 added this to the v0.10.2 milestone Sep 3, 2025

mergify bot added the needs-rebase label Sep 5, 2025

Merge branch 'main' into harmony-gpt-changes

c04fbc2

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

mergify bot removed the needs-rebase label Sep 5, 2025

simon-mo and others added 2 commits September 5, 2025 18:28

Merge branch 'main' into harmony-gpt-changes

131156b

Merge branch 'main' into harmony-gpt-changes

749d6cf

Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

houseroad enabled auto-merge (squash) September 7, 2025 00:52

houseroad disabled auto-merge September 7, 2025 06:39

mergify bot mentioned this pull request Sep 8, 2025

[gpt-oss] Harmony changes with container tool support #24465

Closed

houseroad merged commit 170129e into vllm-project:main Sep 9, 2025
52 checks passed

yeqcharlotte mentioned this pull request Sep 14, 2025

[gpt-oss]Support lazy init mcp session #24388

Open

5 tasks

qandrew mentioned this pull request Sep 25, 2025

[Bug][gpt-oss] streaming/tools RuntimeError: Attempted to exit cancel scope in a different task than it was entered in #25697

Open

1 task

-    @abstractmethod
-    async def cleanup_session(self) -> None:
-        pass
+    @abstractmethod
+    async def cleanup_session(self, *args, **kwargs) -> None:
+        pass

-        current_identity = sys_msg_content.model_identity
-        sys_msg_content = sys_msg_content.with_model_identity(
-            f"{current_identity}\n{instructions}")
+        current_identity = sys_msg_content.model_identity
+        new_identity = f'{current_identity}\n{instructions}' if current_identity else instructions
+        sys_msg_content = sys_msg_content.with_model_identity(new_identity)

-    "VLLM_USE_CONTAINER_TOOL":
-    lambda: bool(int(os.getenv("VLLM_USE_CONTAINER_TOOL", "0"))),
-    # Allows harmony instructions to be injected on system messages
-    "VLLM_HARMONY_SYSTEM_INSTRUCTIONS":
-    lambda: bool(int(os.getenv("VLLM_HARMONY_SYSTEM_INSTRUCTIONS", "0"))),
+    "VLLM_GPT_OSS_USE_CONTAINER_TOOL":
+    lambda: bool(int(os.getenv("VLLM_GPT_OSS_USE_CONTAINER_TOOL", "0"))),
+    # Allows harmony instructions to be injected on system messages
+    "VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS":
+    lambda: bool(int(os.getenv("VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS", "0"))),

Uh oh!

[gpt-oss] Harmony changes with container tool support #23386

[gpt-oss] Harmony changes with container tool support #23386

Uh oh!

Conversation

morgendave commented Aug 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

cadedaniel commented Aug 21, 2025

Uh oh!

morgendave commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ithanil Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 27, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

houseroad Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

morgendave commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeqcharlotte Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lacora commented Sep 3, 2025

Uh oh!

simon-mo commented Sep 4, 2025

Uh oh!

morgendave commented Sep 4, 2025

Uh oh!

morgendave commented Aug 21, 2025 •

edited by github-actions bot

Loading

morgendave commented Aug 21, 2025 •

edited

Loading

Ithanil Aug 23, 2025 •

edited

Loading

houseroad Sep 3, 2025 •

edited

Loading

morgendave commented Sep 3, 2025 •

edited

Loading

yeqcharlotte Sep 3, 2025 •

edited

Loading