Skip to content

Comments

Add native responses#622

Merged
xingyaoww merged 40 commits intomainfrom
response
Oct 4, 2025
Merged

Add native responses#622
xingyaoww merged 40 commits intomainfrom
response

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Oct 2, 2025

Native responses API, take two (or four 😅)

  • responses() stateless, with its reasoning, sent back and shown off
  • some preparation for stateful, for another PR

This PR proposes an approach where:

  • the Agent is aware of the two completion() and responses()
  • but I've tried to keep its changes minimal, refactoring out low-level details of parsing LLM responses, and LLM tool calls
  • llm.py implements both (but we will probably change this)
    • the code is not fully clean atm, but generally I've tried to sort of keep it small and "symmetrical", so it's easy to refactor out
  • message.py, small new interface:
    • from_llm_chat_message
    • from_llm_responses_output
    • to_chat_dict
    • to_responses_dict

Agent Server images for this PR

GHCR package: https://github.com/All-Hands-AI/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Base Image Docs / Tags
golang golang:1.21-bookworm Link
java eclipse-temurin:17-jdk Link
python nikolaik/python-nodejs:python3.12-nodejs22 Link

Pull (multi-arch manifest)

docker pull ghcr.io/all-hands-ai/agent-server:400c58d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-400c58d-python \
  ghcr.io/all-hands-ai/agent-server:400c58d-python

All tags pushed for this build

ghcr.io/all-hands-ai/agent-server:400c58d-golang
ghcr.io/all-hands-ai/agent-server:v1.0.0_golang_tag_1.21-bookworm_golang
ghcr.io/all-hands-ai/agent-server:400c58d-java
ghcr.io/all-hands-ai/agent-server:v1.0.0_eclipse-temurin_tag_17-jdk_java
ghcr.io/all-hands-ai/agent-server:400c58d-python
ghcr.io/all-hands-ai/agent-server:v1.0.0_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_python

The 400c58d tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

@enyst
Copy link
Collaborator Author

enyst commented Oct 2, 2025

@OpenHands Extract NonExecutableActionEvent from this PR, make a new branch from main, and apply it on it. Then make a PR.

Understand well what role it has, and understand that on main branch we don't have Responses path. Make sure that NonExecutableActionEvent fulfills its role correctly on Completions path. Add unit tests to that new PR.

@openhands-ai
Copy link

openhands-ai bot commented Oct 2, 2025

I'm on it! enyst can track my progress at all-hands.dev

@enyst enyst marked this pull request as draft October 3, 2025 00:34
@enyst
Copy link
Collaborator Author

enyst commented Oct 3, 2025

This works again! GPT-5-mini thinks for a while when deleting a file
(this also shows the other issue)
image

@OpenHands OpenHands deleted a comment from openhands-ai bot Oct 3, 2025
@enyst enyst marked this pull request as ready for review October 3, 2025 08:43
@enyst
Copy link
Collaborator Author

enyst commented Oct 3, 2025

This works! View and send back reasoning on stateless variant, and it cleared up some of the code. At a price (llm.py is still a bit unhappy, and telemetry.py will need another pass), just IMHO it's mostly easier to read and follow the execution path for one API or the other.

I kinda like that the Agent only chooses, all the rest of its code is now without ifs and without litellm/completions code.

message.py saw a heavier lift, @xingyaoww I'd love to know what you think, I admit I like to look at a random screen of code and know exactly what belongs where. e.g. Tiny lost bits like _add_tool_call_keys were not obvious (once we have responses, are they executed on responses?), so this PR took them out. I still want to reintroduce the types for parameters, but I feel it's probably best separate.

The agent is working on porting tests.

I think llm.py needs a heavier refactoring..., I'm thinking to make a facade for normalize_kwargs and format_messages_for(), and take them out; not sure it'd reduce redundancies, but more for readability and more reliance on types. 🤔

This PR doesn't do a conversion; no conversion the other way around either

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick skim thourgh! I actually think the structure LGTM and make sense! (and agree that we might need to do a good refactor afterwards)

Happy to take another look when we merge that NonExecutableActionEvent and when tests are in and passed!

🙏

enyst and others added 2 commits October 4, 2025 01:42
Co-authored-by: openhands <openhands@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai bot Oct 4, 2025
@OpenHands OpenHands deleted a comment from openhands-ai bot Oct 4, 2025
@enyst
Copy link
Collaborator Author

enyst commented Oct 4, 2025

OpenHands-GPT-5 here. Concise summary of test coverage for the Responses API work:

  • Agent.step routing/gating
    • Override gate: uses_responses_api() True -> responses(); False -> completion(); assistant MessageEvent emitted
    • Model gate: gpt-5-mini-2025-08-07 -> responses(); gpt-4o-mini -> completion()
  • Responses API parsing & kwargs
    • from_llm_responses_output: assistant text join, function call normalization (fc_ prefix), reasoning summary mapping
    • to_responses_dict: fc_ prefix and arg stringification
    • _normalize_responses_kwargs: temperature=1.0 for responses; include adds reasoning.encrypted_content when enabled; store defaults False; max_output_tokens passthrough
    • End-to-end responses() (patched) verifies Telemetry records a usage entry
  • Message & tool coverage
    • from_llm_chat_message edge cases; _coerce_content/content_to_str; system to_responses_value
    • ToolBase.to_responses_tool: security_risk gating tests
  • Status
    • Pre-commit clean (ruff, pyright); full pytest green: 1507 passed, 35 warnings
  • Follow-ups (optional)
    • Telemetry tests for cached_tokens and reasoning_tokens mapping; audit remaining openai vs litellm imports

@neubig
Copy link
Contributor

neubig commented Oct 4, 2025

Thanks for this! I'll try to take a look sometime in these two days!

enyst and others added 2 commits October 4, 2025 16:40
…ol calls\n\n- Remove NonExecutableActionEvent usage and exports\n- Represent non-exec tool calls as ActionEvent(action=None)\n- Keep ActionEvent.visualize minimal for action=None; preserve __str__\n- Update View.filter_unmatched_tool_calls to pair by tool_call_id\n- Agent emits ActionEvent(action=None) then AgentErrorEvent for missing tool/invalid args\n- Update tests and visualizer accordingly\n\nCo-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Oct 4, 2025

Did a quick skim thourgh! I actually think the structure LGTM and make sense! (and agree that we might need to do a good refactor afterwards)

Happy to take another look when we merge that NonExecutableActionEvent and when tests are in and passed!

🙏

Merged NEA and tests!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes along the way but this PR LGTM! There's still some changes for simplification / clean up but we can leave for future PRs

One last issue i'm trying to figure out now: https://community.openai.com/t/schema-additionalproperties-must-be-false-when-strict-is-true/929996

# Keep the tasks short for demo purposes
^^^^^^^^^^^^^^^^^^

File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/openhands/sdk/conversation/impl/local_conversation.py", line 244, in run
self.agent.step(self._state, on_event=self._on_event)
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/openhands/sdk/agent/agent.py", line 209, in step
raise e
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/openhands/sdk/agent/agent.py", line 178, in step
llm_response = self.llm.responses(
^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/openhands/sdk/llm/llm.py", line 593, in responses
resp: ResponsesAPIResponse = _one_attempt()
^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/tenacity/init.py", line 338, in wrapped_f
return copy(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/tenacity/init.py", line 477, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/tenacity/init.py", line 378, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/tenacity/init.py", line 400, in
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/tenacity/init.py", line 480, in call
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/openhands/sdk/llm/llm.py", line 569, in _one_attempt
ret = litellm_responses(
^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1343, in wrapper
raise e
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1218, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/litellm/responses/main.py", line 655, in responses
raise litellm.exception_type(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2301, in exception_type
raise e
File "/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 391, in exception_type
raise BadRequestError(
litellm.exceptions.BadRequestError: litellm.BadRequestError: OpenAIException - {"error":{"message":"litellm.BadRequestError: OpenAIException - {\n "error": {\n "message": "Invalid schema for function 'execute_bash': In context=(), 'additionalProperties' is required to be supplied and to be false.",\n "type": "invalid_request_error",\n "param": "tools[0].parameters",\n "code": "invalid_function_parameters"\n }\n}. Received Model Group=gpt-5-mini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"400"}}
/Users/xingyaow/Projects/All-Hands-AI/openhands-v1-dev/agent-sdk.worktree/worktree2/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop
loop = asyncio.get_event_loop()


return message_dict

# Backward-compatible alias
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean clean this up in future PRs? We can just remove this?

)

@classmethod
def from_litellm_message(cls, message: LiteLLMMessage) -> "Message":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this too in follow up PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands create a github issue that we should remove all definition and usage of to_llm_dict and from_litellm_message in favor of the new API.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've successfully created a GitHub issue to track the removal of deprecated API methods in favor of the new responses API.

Summary

GitHub Issue Created: #639 - Remove deprecated to_llm_dict() and from_litellm_message() in favor of new responses API

The issue documents:

  1. The new API introduced in PR Add native responses #622:

    • from_llm_chat_message() - Parse Chat Completions messages
    • from_llm_responses_output() - Parse Responses API output
    • to_chat_dict() - Serialize to Chat Completions format
    • to_responses_dict() - Serialize to Responses API format
  2. Deprecated methods to remove:

    • from_litellm_message() - Backward-compatible alias
    • to_llm_dict() - Old serialization method
  3. Scope of work: Identified 5 files across production code, tests, and fixtures that need updating

  4. Migration tasks: Checklist of specific actions needed to complete the migration

The issue has been labeled with refactoring and technical-debt tags for easy filtering and prioritization.

View full conversation

@enyst
Copy link
Collaborator Author

enyst commented Oct 4, 2025

@xingyaoww If it helps, actually we had it unspecified most of the time
Trying now 👀

@enyst
Copy link
Collaborator Author

enyst commented Oct 4, 2025

@xingyaoww Works for me with strict=False!

I think that’s fine?

@xingyaoww
Copy link
Collaborator

xingyaoww commented Oct 4, 2025

@enyst yeah, that's fine, i was trying to get it to work: 4661859

But it is VERY tricky to get MCP schema to work :( In the mean time i discover another bug which is "kind" of the Action Input schema somehow show up in "properties" 😓 (unrelated to this PR)

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed it works locally for me! Let's get this in!

@xingyaoww
Copy link
Collaborator

@OpenHands create another github issue that Tool.to_mcp_schema() will outputs a MCP schema and its "properties" contains "kind" which suppose NOT be contains in tool schema we send to LLM.

We should consider keeping kind as a Pydantic field, and just remove if in the to_mcp_schema generation of openhands/sdk/tool/schema.py. Also, we can have Schema directly inherit from DiscriminatedUnionMixin instead of BaseModel, so that it makes sense to just exclude the kind field in the same class.

@openhands-ai
Copy link

openhands-ai bot commented Oct 4, 2025

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Oct 4, 2025

Summary

I've successfully created a GitHub issue for the problem you described:

Issue #640: Tool.to_mcp_schema() includes 'kind' field in properties which should not be sent to LLMs

What the issue documents:

  1. Problem: The to_mcp_schema() method currently outputs schemas that include a kind field in the properties, which is an internal Pydantic discriminator field that should NOT be included in tool schemas sent to LLMs.

  2. Root Cause:

    • Action inherits from both Schema and DiscriminatedUnionMixin
    • DiscriminatedUnionMixin defines kind as a Pydantic field
    • to_mcp_schema() uses model_json_schema() which includes all fields
    • The _process_schema_node() function doesn't filter out kind
  3. Proposed Solutions:

    • Approach 1: Modify Schema.to_mcp_schema() in openhands/sdk/tool/schema.py to explicitly exclude the kind field from the output
    • Approach 2: Refactor Schema to directly inherit from DiscriminatedUnionMixin instead of BaseModel, making it clearer that all schemas have the discriminator and should exclude it
  4. Verification: I confirmed the issue exists by testing it - the MCP schema currently shows dict_keys(['kind', 'command']) when it should only show dict_keys(['command']).

The issue has been labeled with bug and tool tags for proper categorization.

View full conversation

@xingyaoww xingyaoww merged commit 1a715ef into main Oct 4, 2025
14 checks passed
@xingyaoww xingyaoww deleted the response branch October 4, 2025 21:35
@simonrosenberg
Copy link
Collaborator

@enyst it seems that gpt5-mini is crashing because of this fix because GPT-5 does NOT natively support Responses API.

@enyst
Copy link
Collaborator Author

enyst commented Oct 8, 2025

I suspect Litellm seems to have some wonky criteria where it doesn't recognize "litellm_proxy/openai/gpt-5" as supporting Responses, but it does recognize "openai/gpt-5". So it reroutes the first.

We can look more into it tomorrow! I'm pretty sure GPT-5 does support natively Responses API. The call does go to OpenAI on /responses endpoint, and OpenAI documents support, e.g.
https://platform.openai.com/docs/guides/migrate-to-responses#responses-benefits

@neubig neubig mentioned this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants