fix(llm): support Responses streaming via on_token by enyst · Pull Request #1761 · OpenHands/software-agent-sdk

enyst · 2026-01-20T09:44:24Z

Follow-up to #1682.

Allow LLM.responses() to run with stream=true when requested.
When LiteLLM returns a Responses streaming iterator, forward best-effort text deltas to on_token and return the completed ResponsesAPIResponse so agent execution can continue.

Tested locally by running examples/01_standalone_sdk/34_subscription_login.py in subscription mode.

Enable streaming for LLM.responses() when requested (stream or LLM.stream). When LiteLLM returns a Responses streaming iterator, drain it, forward best-effort text deltas to on_token (as ModelResponseStream chunks), and return the completed ResponsesAPIResponse.

Avoid getattr() by relying on LiteLLM's ResponsesAPIStreamingResponse event types (e.g., OutputTextDeltaEvent) and ResponseCompletedEvent.

all-hands-bot

The streaming implementation looks solid overall! I found three areas worth discussing: a redundant condition check, handling of non-streaming responses when streaming is requested, and error handling for the callback. Details in inline comments below.

openhands-sdk/openhands/sdk/llm/llm.py

all-hands-bot · 2026-01-20T09:49:29Z

openhands-sdk/openhands/sdk/llm/llm.py

🟡 Suggestion: The on_token callback is invoked without error handling. If the callback raises an exception, the entire streaming response would fail and the completed response would be lost.

Consider wrapping the on_token call in a try-except block to make the system more robust:

try: on_token( ModelResponseStream( choices=[ StreamingChoices( delta=Delta(content=delta) ) ] ) ) except Exception as e: # Log the error but don't fail the entire request logger.warning(f"on_token callback failed: {e}")

This way, a faulty callback won't prevent the agent from receiving the completed response.

Good point. For now I’m intentionally not swallowing on_token callback exceptions so behavior matches the existing Chat Completions streaming path (where callback exceptions currently propagate).

If we want “best-effort streaming” (callback failures don’t abort the request), I’d prefer to implement that consistently for both LLM.completion() and LLM.responses() (and log a warning / rate-limit), rather than making Responses special-cased.

Avoid per-call imports in the Responses streaming path; rely on LiteLLM's typed event classes.

In subscription mode, Responses stream=True can be forced by options even when the caller didn’t request streaming, so keep gating on user_enable_streaming but avoid redundant checks inside the loop.

enyst · 2026-01-20T09:55:49Z

It works with a Plus subscription and streaming enabled (apparently it has to be enabled)

If streaming was requested but LiteLLM returns a non-streaming ResponsesAPIResponse, emit a warning so behavior is explicit.

Enable LLM stream mode and wire a simple token callback so users can see incremental output when using ChatGPT subscription login.

openhands-ai · 2026-01-20T10:10:45Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- [Optional] Docs example

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1761 at branch `fix/responses-streaming-on-token`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

enyst added 2 commits January 20, 2026 10:43

refactor(llm): use typed Responses stream events

e222205

Avoid getattr() by relying on LiteLLM's ResponsesAPIStreamingResponse event types (e.g., OutputTextDeltaEvent) and ResponseCompletedEvent.

all-hands-bot reviewed Jan 20, 2026

View reviewed changes

enyst added 2 commits January 20, 2026 10:50

refactor(llm): move Responses streaming imports to top-level

bbb0232

Avoid per-call imports in the Responses streaming path; rely on LiteLLM's typed event classes.

refactor(llm): simplify Responses stream callback gating

3fc8801

In subscription mode, Responses stream=True can be forced by options even when the caller didn’t request streaming, so keep gating on user_enable_streaming but avoid redundant checks inside the loop.

enyst added 2 commits January 20, 2026 10:56

fix(llm): warn when responses streaming is unsupported

7be289f

If streaming was requested but LiteLLM returns a non-streaming ResponsesAPIResponse, emit a warning so behavior is explicit.

examples: add minimal streaming to subscription login

18f3e7e

Enable LLM stream mode and wire a simple token callback so users can see incremental output when using ChatGPT subscription login.

xingyaoww merged commit fef7d02 into feat/openai-subscription-auth Jan 20, 2026
8 of 9 checks passed

xingyaoww deleted the fix/responses-streaming-on-token branch January 20, 2026 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): support Responses streaming via on_token#1761

fix(llm): support Responses streaming via on_token#1761
xingyaoww merged 6 commits intofeat/openai-subscription-authfrom
fix/responses-streaming-on-token

enyst commented Jan 20, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot Jan 20, 2026

Uh oh!

enyst Jan 20, 2026

Uh oh!

enyst commented Jan 20, 2026 •

edited

Loading

Uh oh!

openhands-ai bot commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

enyst commented Jan 20, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

enyst Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

enyst commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openhands-ai bot commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

enyst commented Jan 20, 2026 •

edited

Loading