- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[gpt-oss] Harmony changes with container tool support #23386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for container tools, system prompt customization, and session management improvements for Harmony. The changes look good overall, but I've identified a couple of high-severity issues. One is a design flaw in ConversationContext where a method signature in a subclass is incompatible with its abstract base class, violating the Liskov Substitution Principle. The other is a bug in get_system_message that can lead to an incorrect system prompt when instructions are provided without a base model identity. I've provided suggestions to fix both issues.
        
          
                vllm/entrypoints/context.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of cleanup_session in the abstract base class ConversationContext is async def cleanup_session(self) -> None:. However, the implementation in HarmonyContext is async def cleanup_session(self, *args, **kwargs) -> None:. This violates the Liskov Substitution Principle, as the subclass method has a different signature.
While Python's dynamic nature might allow this at runtime, it's a design issue that can be flagged by static analysis tools and lead to confusion. The *args, **kwargs are necessary because AsyncExitStack.push_async_exit can pass exception details to the cleanup function.
To fix this, the signature in the base class and all implementing classes should be consistent. Please update this abstract method and the implementation in SimpleContext to match the one in HarmonyContext.
| @abstractmethod | |
| async def cleanup_session(self) -> None: | |
| pass | |
| @abstractmethod | |
| async def cleanup_session(self, *args, **kwargs) -> None: | |
| pass | 
        
          
                vllm/entrypoints/harmony_utils.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential bug here. If model_identity is None (its default) and instructions are provided, sys_msg_content.model_identity will be None. Consequently, current_identity will be None, and the new model identity will be set to the string "None\n<instructions>", which is likely not the intended behavior.
You should handle the case where current_identity is None to avoid prepending the string "None".
| current_identity = sys_msg_content.model_identity | |
| sys_msg_content = sys_msg_content.with_model_identity( | |
| f"{current_identity}\n{instructions}") | |
| current_identity = sys_msg_content.model_identity | |
| new_identity = f'{current_identity}\n{instructions}' if current_identity else instructions | |
| sys_msg_content = sys_msg_content.with_model_identity(new_identity) | 
| What is AIME25 high without this PR? | 
| 
 With tools it is similar. The repro of numbers are not very stable with tools actually | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not change the default behavior.
using container tool as built-in should be opt-in through some environment variable. and ultimately we should get rid of this hack by making system prompt easier to modify so vllm is more friendly for research use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I'm also getting other feature requests to customize system prompt like #23167 so I think we need to allow users to do that in some ways.
4754419    to
    5eff826      
    Compare
  
            
          
                vllm/entrypoints/context.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is container tool?
        
          
                vllm/entrypoints/tool_server.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the session cleanup logic, maybe just add it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now tool sessions are not managed in tool sever lifecycle but with context lifecycle so it has to be attached there unless we do a refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think regardless of exit_stack.push_async_exit(context.cleanup_session) or implementing the exist logic here, the cleanup will be called when the code goes out the async with AsyncExitStack() as exit_stack: block?
And I thought different requests should have difference session so tool sessions should not be managed in tool server lifecycle. Is this understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in tool server, everything should be managed in context as it as the actual management of sessions and per request based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I'm also getting other feature requests to customize system prompt like #23167 so I think we need to allow users to do that in some ways.
        
          
                vllm/entrypoints/harmony_utils.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, this basically covers #23167 , so if the present PR gets merged, mine is obsolete.
        
          
                vllm/envs.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "VLLM_USE_CONTAINER_TOOL": | |
| lambda: bool(int(os.getenv("VLLM_USE_CONTAINER_TOOL", "0"))), | |
| # Allows harmony instructions to be injected on system messages | |
| "VLLM_HARMONY_SYSTEM_INSTRUCTIONS": | |
| lambda: bool(int(os.getenv("VLLM_HARMONY_SYSTEM_INSTRUCTIONS", "0"))), | |
| "VLLM_GPT_OSS_USE_CONTAINER_TOOL": | |
| lambda: bool(int(os.getenv("VLLM_GPT_OSS_USE_CONTAINER_TOOL", "0"))), | |
| # Allows harmony instructions to be injected on system messages | |
| "VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS": | |
| lambda: bool(int(os.getenv("VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS", "0"))), | 
Feel that these logic are heavily hardcoded to gpt-oss. What about adding a GPT_OSS prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that might be better. Let me change on both sides
| This pull request has merge conflicts that must be resolved before it can be | 
4dac698    to
    10bb2cf      
    Compare
  
    3afe81f    to
    503091c      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Given MCP + stream generator is supported, can you also update responses_stream_generator?
- I think we still need discussion about session cleanup logic. Let's sync offline.
- Can you add VLLM_GPT_OSS_USE_CONTAINER_TOOL and VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS to the recipe?
        
          
                vllm/entrypoints/harmony_utils.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comment about to explain the tools? What's input and output expectation?
        
          
                vllm/entrypoints/context.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be something like raise NotImplementedError("Should not be called.")?  So we can force users carefully implement cleanup logic.
| @lacora2017 has imported this pull request. If you are a Meta employee, you can view this in D81562641. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comment to address.
| 
 For 1 I checked the responses_stream_generator, the only change is in calling  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morgendave this should be VLLM_GPT_OSS_USE_CONTAINER_TOOL
b58e933    to
    c285b53      
    Compare
  
    | @heheda12345 regarding cleanup if it's minor let's put it up as a followup for further discussions and not blocking this PR, could you or @morgendave summarize the offline discussion a bit so that we can broadly discuss as well? | 
| Entrypoint failure seems related | 
| 
 Which failure?I saw one GPU memory related only | 
| @lacora2017 has imported this pull request. If you are a Meta employee, you can view this in D81562641. | 
| This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
| I merged from main to this. Let's wait for CI to run (possibly flaky CI) | 
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
| The failed test seems related. | 
| I think the failed test itself it's flaky not a problem with this PR. Output is just asking if to use Celsius or Fahrenheit. If we remove the option for the temperature in the tool itself the test will pass, verify multiple times. | 
…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…3386) Signed-off-by: zhiweiz <zhiweiz@fb.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: zhiweiz <zhiweiz@fb.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
The container tool was referenced in the structural tag implementation but ToolNamespaceConfig.container() does not exist in the openai_harmony library, which would cause AttributeError at runtime if the container tool was used. Changes: - Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py - Remove enable_container logic from serving_responses.py - Update tests to remove container tool test cases - Keep all structural tag improvements for browser and python tools The container tool support was originally added in PR vllm-project#23386 but the openai_harmony library never implemented ToolNamespaceConfig.container(). This commit removes only the non-functional container references while preserving all working functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The container tool was referenced in the structural tag implementation but ToolNamespaceConfig.container() does not exist in the openai_harmony library, which would cause AttributeError at runtime if the container tool was used. Changes: - Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py - Remove enable_container logic from serving_responses.py - Update tests to remove container tool test cases - Keep all structural tag improvements for browser and python tools The container tool support was originally added in PR vllm-project#23386 but the openai_harmony library never implemented ToolNamespaceConfig.container(). This commit removes only the non-functional container references while preserving all working functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The container tool was referenced in the structural tag implementation but ToolNamespaceConfig.container() does not exist in the openai_harmony library, which would cause AttributeError at runtime if the container tool was used. Changes: - Remove container tool handling from from_builtin_tool_to_tag() in gptoss_reasoning_parser.py - Remove enable_container logic from serving_responses.py - Update tests to remove container tool test cases - Keep all structural tag improvements for browser and python tools The container tool support was originally added in PR vllm-project#23386 but the openai_harmony library never implemented ToolNamespaceConfig.container(). This commit removes only the non-functional container references while preserving all working functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Purpose
Test Plan
Tested on oss gpt with evals, sample requests
Jupyter python tool for AIME25 is slightly lower than model card
Test Result
OSS tested
Parsed messages
AIME25 high