Skip to content

Conversation

@morgendave
Copy link
Collaborator

@morgendave morgendave commented Aug 21, 2025

Purpose

  • Support system prompt customization via injections
  • remove developer messages if all tools enabled are builtin tools
  • support container tool
  • support context with session id on MCP
  • support MCP session clean up for container

Test Plan

Tested on oss gpt with evals, sample requests
Jupyter python tool for AIME25 is slightly lower than model card

Test Result

OSS tested
Parsed messages

Harmony messages: [Message(author=Author(role=<Role.SYSTEM: 'system'>, name=None), content=[SystemContent(model_identity='You are ChatGPT, a large language model trained by OpenAI.\nRespond in Chinese.', reasoning_effort=<ReasoningEffort.MEDIUM: 'Medium'>, conversation_start_date='2025-08-14', knowledge_cutoff='2024-06', channel_config=ChannelConfig(valid_channels=['analysis', 'final'], channel_required=True), tools={'browser': ToolNamespaceConfig(name='browser', description='Tool for browsing.\nThe `cursor` appears in brackets before each browsing display: `[{cursor}]`.\nCite information from the tool using the following format:\n`【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`. \nDo not quote more than 10 words directly from the tool output.\nsources=web', tools=[ToolDescription(name='search', description='web search tool', parameters={'properties': {'query': {'type': 'string'}}, 'required': ['query'], 'type': 'object'})])})], channel=None, recipient=None, content_type=None), Message(author=Author(role=<Role.DEVELOPER: 'developer'>, name=None), content=[DeveloperContent(instructions=None, tools=None)], channel=None, recipient=None, content_type=None), Message(author=Author(role=<Role.USER: 'user'>, name=None), content=[TextContent(text="When is Alan Turing's birthday?")], channel=None, recipient=None, content_type=None)]

AIME25 high

Writing report to /tmp/aime25_120b-high_temp1.0_20250814_093031.html
{'chars': 2442.35, 'chars:std': 1036.7757283199358, 'score': 0.9458333333333333, 'score:std': 0.22634628092568448}

@morgendave morgendave requested a review from aarnphm as a code owner August 21, 2025 22:34
@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Aug 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for container tools, system prompt customization, and session management improvements for Harmony. The changes look good overall, but I've identified a couple of high-severity issues. One is a design flaw in ConversationContext where a method signature in a subclass is incompatible with its abstract base class, violating the Liskov Substitution Principle. The other is a bug in get_system_message that can lead to an incorrect system prompt when instructions are provided without a base model identity. I've provided suggestions to fix both issues.

Comment on lines 42 to 52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The signature of cleanup_session in the abstract base class ConversationContext is async def cleanup_session(self) -> None:. However, the implementation in HarmonyContext is async def cleanup_session(self, *args, **kwargs) -> None:. This violates the Liskov Substitution Principle, as the subclass method has a different signature.

While Python's dynamic nature might allow this at runtime, it's a design issue that can be flagged by static analysis tools and lead to confusion. The *args, **kwargs are necessary because AsyncExitStack.push_async_exit can pass exception details to the cleanup function.

To fix this, the signature in the base class and all implementing classes should be consistent. Please update this abstract method and the implementation in SimpleContext to match the one in HarmonyContext.

Suggested change
@abstractmethod
async def cleanup_session(self) -> None:
pass
@abstractmethod
async def cleanup_session(self, *args, **kwargs) -> None:
pass

Comment on lines 66 to 68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential bug here. If model_identity is None (its default) and instructions are provided, sys_msg_content.model_identity will be None. Consequently, current_identity will be None, and the new model identity will be set to the string "None\n<instructions>", which is likely not the intended behavior.

You should handle the case where current_identity is None to avoid prepending the string "None".

Suggested change
current_identity = sys_msg_content.model_identity
sys_msg_content = sys_msg_content.with_model_identity(
f"{current_identity}\n{instructions}")
current_identity = sys_msg_content.model_identity
new_identity = f'{current_identity}\n{instructions}' if current_identity else instructions
sys_msg_content = sys_msg_content.with_model_identity(new_identity)

@cadedaniel
Copy link
Collaborator

What is AIME25 high without this PR?

@morgendave
Copy link
Collaborator Author

morgendave commented Aug 21, 2025

What is AIME25 high without this PR?

With tools it is similar. The repro of numbers are not very stable with tools actually
One of the thing this made better is that the commentary channel showed up less when it shouldn't be

Comment on lines 629 to 632
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not change the default behavior.

using container tool as built-in should be opt-in through some environment variable. and ultimately we should get rid of this hack by making system prompt easier to modify so vllm is more friendly for research use case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I'm also getting other feature requests to customize system prompt like #23167 so I think we need to allow users to do that in some ways.

@morgendave morgendave force-pushed the harmony-gpt-changes branch from 4754419 to 5eff826 Compare August 22, 2025 00:31
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is container tool?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the session cleanup logic, maybe just add it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now tool sessions are not managed in tool sever lifecycle but with context lifecycle so it has to be attached there unless we do a refactor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think regardless of exit_stack.push_async_exit(context.cleanup_session) or implementing the exist logic here, the cleanup will be called when the code goes out the async with AsyncExitStack() as exit_stack: block?

And I thought different requests should have difference session so tool sessions should not be managed in tool server lifecycle. Is this understanding correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in tool server, everything should be managed in context as it as the actual management of sessions and per request based

Comment on lines 629 to 632
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I'm also getting other feature requests to customize system prompt like #23167 so I think we need to allow users to do that in some ways.

Copy link
Contributor

@Ithanil Ithanil Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this basically covers #23167 , so if the present PR gets merged, mine is obsolete.

vllm/envs.py Outdated
Comment on lines 1160 to 1165
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"VLLM_USE_CONTAINER_TOOL":
lambda: bool(int(os.getenv("VLLM_USE_CONTAINER_TOOL", "0"))),
# Allows harmony instructions to be injected on system messages
"VLLM_HARMONY_SYSTEM_INSTRUCTIONS":
lambda: bool(int(os.getenv("VLLM_HARMONY_SYSTEM_INSTRUCTIONS", "0"))),
"VLLM_GPT_OSS_USE_CONTAINER_TOOL":
lambda: bool(int(os.getenv("VLLM_GPT_OSS_USE_CONTAINER_TOOL", "0"))),
# Allows harmony instructions to be injected on system messages
"VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS":
lambda: bool(int(os.getenv("VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS", "0"))),

Feel that these logic are heavily hardcoded to gpt-oss. What about adding a GPT_OSS prefix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that might be better. Let me change on both sides

@mergify
Copy link

mergify bot commented Aug 27, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @morgendave.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 27, 2025
@morgendave morgendave force-pushed the harmony-gpt-changes branch from 4dac698 to 10bb2cf Compare August 27, 2025 18:01
@mergify mergify bot removed the needs-rebase label Aug 27, 2025
@morgendave morgendave force-pushed the harmony-gpt-changes branch 2 times, most recently from 3afe81f to 503091c Compare September 2, 2025 21:56
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Given MCP + stream generator is supported, can you also update responses_stream_generator?
  2. I think we still need discussion about session cleanup logic. Let's sync offline.
  3. Can you add VLLM_GPT_OSS_USE_CONTAINER_TOOL and VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS to the recipe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comment about to explain the tools? What's input and output expectation?

Copy link
Collaborator

@houseroad houseroad Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be something like raise NotImplementedError("Should not be called.")? So we can force users carefully implement cleanup logic.

@facebook-github-bot
Copy link

@lacora2017 has imported this pull request. If you are a Meta employee, you can view this in D81562641.

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comment to address.

@morgendave
Copy link
Collaborator Author

morgendave commented Sep 3, 2025

  1. Given MCP + stream generator is supported, can you also update responses_stream_generator?
  2. I think we still need discussion about session cleanup logic. Let's sync offline.
  3. Can you add VLLM_GPT_OSS_USE_CONTAINER_TOOL and VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS to the recipe?

For 1 I checked the responses_stream_generator, the only change is in calling await context.init_tool_sessions which is included
For 2 I left a comment, basically the clean up is already for different requests and not managed by tool server as it couldn't
For 3 it is already in that PR

Copy link
Collaborator

@yeqcharlotte yeqcharlotte Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morgendave this should be VLLM_GPT_OSS_USE_CONTAINER_TOOL

@lacora
Copy link
Contributor

lacora commented Sep 3, 2025

@heheda12345 regarding cleanup if it's minor let's put it up as a followup for further discussions and not blocking this PR, could you or @morgendave summarize the offline discussion a bit so that we can broadly discuss as well?

@heheda12345 heheda12345 added this to the v0.10.2 milestone Sep 3, 2025
@simon-mo
Copy link
Collaborator

simon-mo commented Sep 4, 2025

Entrypoint failure seems related

@morgendave
Copy link
Collaborator Author

Entrypoint failure seems related

Which failure?I saw one GPU memory related only

@facebook-github-bot
Copy link

@lacora2017 has imported this pull request. If you are a Meta employee, you can view this in D81562641.

@mergify
Copy link

mergify bot commented Sep 5, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @morgendave.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 5, 2025
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
@mergify mergify bot removed the needs-rebase label Sep 5, 2025
@aarnphm
Copy link
Collaborator

aarnphm commented Sep 5, 2025

I merged from main to this. Let's wait for CI to run (possibly flaky CI)

simon-mo and others added 2 commits September 5, 2025 18:28
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
@houseroad houseroad enabled auto-merge (squash) September 7, 2025 00:52
@houseroad houseroad disabled auto-merge September 7, 2025 06:39
@houseroad
Copy link
Collaborator

The failed test seems related.

@lacora
Copy link
Contributor

lacora commented Sep 8, 2025

I think the failed test itself it's flaky not a problem with this PR.

Output is just asking if to use Celsius or Fahrenheit.
"ChatCompletionMessage(content='Sure! Would you like the temperature in Celsius or Fahrenheit?', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content='We need to call the function get_current_weather with city "Dallas", state "TX", unit maybe default? The user didn't specify unit. We can ask for unit? But we can choose default. Probably ask for unit? The user didn't specify. We can ask: "Would you like Celsius or Fahrenheit?" But we can also default to Fahrenheit for US. Let's ask.')"

If we remove the option for the temperature in the tool itself the test will pass, verify multiple times.

@houseroad houseroad merged commit 170129e into vllm-project:main Sep 9, 2025
52 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…3386)

Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…3386)

Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…3386)

Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…3386)

Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…3386)

Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
titanous added a commit to titanous/vllm that referenced this pull request Oct 26, 2025
The container tool was referenced in the structural tag implementation but
ToolNamespaceConfig.container() does not exist in the openai_harmony library,
which would cause AttributeError at runtime if the container tool was used.

Changes:
- Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py
- Remove enable_container logic from serving_responses.py
- Update tests to remove container tool test cases
- Keep all structural tag improvements for browser and python tools

The container tool support was originally added in PR vllm-project#23386 but the
openai_harmony library never implemented ToolNamespaceConfig.container().
This commit removes only the non-functional container references while
preserving all working functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
titanous added a commit to titanous/vllm that referenced this pull request Oct 27, 2025
The container tool was referenced in the structural tag implementation but
ToolNamespaceConfig.container() does not exist in the openai_harmony library,
which would cause AttributeError at runtime if the container tool was used.

Changes:
- Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py
- Remove enable_container logic from serving_responses.py
- Update tests to remove container tool test cases
- Keep all structural tag improvements for browser and python tools

The container tool support was originally added in PR vllm-project#23386 but the
openai_harmony library never implemented ToolNamespaceConfig.container().
This commit removes only the non-functional container references while
preserving all working functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
titanous added a commit to titanous/vllm that referenced this pull request Oct 30, 2025
The container tool was referenced in the structural tag implementation but
ToolNamespaceConfig.container() does not exist in the openai_harmony library,
which would cause AttributeError at runtime if the container tool was used.

Changes:
- Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py
- Remove enable_container logic from serving_responses.py
- Update tests to remove container tool test cases
- Keep all structural tag improvements for browser and python tools

The container tool support was originally added in PR vllm-project#23386 but the
openai_harmony library never implemented ToolNamespaceConfig.container().
This commit removes only the non-functional container references while
preserving all working functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants