Skip to content

Comments

Enforce batch atomicity for condenser#775

Merged
ryanhoangt merged 8 commits intomainfrom
ht/fix-extended-thinking-failed-with-condenser
Oct 20, 2025
Merged

Enforce batch atomicity for condenser#775
ryanhoangt merged 8 commits intomainfrom
ht/fix-extended-thinking-failed-with-condenser

Conversation

@ryanhoangt
Copy link
Collaborator

@ryanhoangt ryanhoangt commented Oct 17, 2025

Fix #738


Integration tests: 7/7 (100%)

[10/17/25 13:48:27] INFO     Success rate: 100.00% (7/7)             run_infer.py:278
[10/17/25 13:48:27] INFO     Evaluation Results:                     run_infer.py:279
[10/17/25 13:48:27] INFO     t04_git_staging: ✓ - Successfully       run_infer.py:283
                             committed changes with message: 'Add                    
                             hello world Python script'                              
[10/17/25 13:48:27] INFO     t07_interactive_commands: ✓ -           run_infer.py:283
                             Interactive Python script setup                         
                             completed. Agent should execute the                     
                             script with inputs 'John' and '25' and                  
                             find the secret number: 707                             
[10/17/25 13:48:27] INFO     t05_simple_browsing: ✓ - Agent          run_infer.py:283
                             successfully found the answer! Matched                  
                             pattern: (?i)openhands is all you need.                 
                             Response contained the expected content                 
                             about OpenHands.                                        
[10/17/25 13:48:27] INFO     t03_jupyter_write_file: ✓ -             run_infer.py:283
                             Successfully created file with content:                 
                             hello world                                             
[10/17/25 13:48:27] INFO     t01_fix_simple_typo: ✓ - Successfully   run_infer.py:283
                             fixed all typos                                         
[10/17/25 13:48:27] INFO     t02_add_bash_hello: ✓ - Successfully    run_infer.py:283
                             created and executed script: hello                      
[10/17/25 13:48:27] INFO     t06_github_pr_browsing: ✓ - Agent's     run_infer.py:283
                             final answer contains information about                 
                             the PR content                                          
[10/17/25 13:48:27] INFO     Total cost: $0.27                       run_infer.py:284

Agent Server images for this PR

GHCR package: https://github.com/All-Hands-AI/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Base Image Docs / Tags
golang golang:1.21-bookworm Link
java eclipse-temurin:17-jdk Link
python nikolaik/python-nodejs:python3.12-nodejs22 Link

Pull (multi-arch manifest)

docker pull ghcr.io/all-hands-ai/agent-server:14e6b9a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-14e6b9a-python \
  ghcr.io/all-hands-ai/agent-server:14e6b9a-python

All tags pushed for this build

ghcr.io/all-hands-ai/agent-server:14e6b9a-golang
ghcr.io/all-hands-ai/agent-server:v1.0.0a2_golang_tag_1.21-bookworm_binary
ghcr.io/all-hands-ai/agent-server:14e6b9a-java
ghcr.io/all-hands-ai/agent-server:v1.0.0a2_eclipse-temurin_tag_17-jdk_binary
ghcr.io/all-hands-ai/agent-server:14e6b9a-python
ghcr.io/all-hands-ai/agent-server:v1.0.0a2_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary

The 14e6b9a tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 17, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/context
   view.py1147930%41, 46, 51–52, 57–58, 63–67, 83–87, 89, 102–108, 110, 112, 114, 116–117, 123, 134–135, 137, 148–152, 159–161, 165–166, 175–178, 180, 187–192, 194–196, 201, 203, 211–212, 216–221, 223–224, 226–227, 231–237, 239
TOTAL7987342357% 

@ryanhoangt ryanhoangt requested a review from xingyaoww October 17, 2025 13:48
enyst
enyst previously requested changes Oct 17, 2025
Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern here, I think that both Claude and GPT models may crash if we repeat the same reasoning block... but I could be wrong. Do we have a log with parallel tool calls on the PR branch?

@ryanhoangt
Copy link
Collaborator Author

Do we have a log with parallel tool calls on the PR branch?

Yep, it's in the associated issue.

@enyst enyst dismissed their stale review October 17, 2025 14:19

Dismissed, I misunderstood the behavior

@ryanhoangt ryanhoangt requested a review from enyst October 20, 2025 10:24
@ryanhoangt ryanhoangt requested a review from csmith49 October 20, 2025 10:43
@OpenHands OpenHands deleted a comment from openhands-ai bot Oct 20, 2025
batch are forgotten.

This prevents partial batches from being sent to the LLM, which can cause
API errors when thinking blocks are separated from their tool calls.
Copy link
Collaborator

@enyst enyst Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: I think maybe the action events are in the order we received tool calls from the LLM? If so, maybe we could check in a simpler way, if the event(s) just before forgotten_event_ids have the same response_id as... the first of those forgotten?

Really just a thought, it's all good with checking this way too

# Enforce batch atomicity: if any event in a multi-action batch is forgotten,
# forget all events in that batch to prevent partial batches with thinking
# blocks separated from their tool calls
forgotten_event_ids = View._enforce_batch_atomicity(events, forgotten_event_ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to call this after we call filter_unmatched_tool_calls? Not sure if it's even possible for that function to break up a batch but maybe better safe than sorry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to do this event removal before filter_unmatched_tool_calls, since in the _enforce_batch_atomicity we only remove actions, not observations. Maybe putting the filter_unmatched_tool_calls at the end can help us remove some left-over observations that doesn't have a corresponding action.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll go with this for now, we can reconsider this later if we run into issues!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense to me.

Copy link
Collaborator

@csmith49 csmith49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo a few minor suggestions, I think this is good to go!

@ryanhoangt ryanhoangt changed the title Duplicate thinking_blocks when splitting from Message into actions Enforce batch atomicity for condenser Oct 20, 2025
@ryanhoangt ryanhoangt merged commit 3c4ce52 into main Oct 20, 2025
16 checks passed
@ryanhoangt ryanhoangt deleted the ht/fix-extended-thinking-failed-with-condenser branch October 20, 2025 21:08
vivekvjnk pushed a commit to vivekvjnk/agent-sdk that referenced this pull request Nov 17, 2025
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If an assistant message contains any thinking blocks, the first block must be thinking or redacted_thinking

4 participants