Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of LLM summarizing condenser #6597

Merged

Conversation

csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Feb 3, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

Updates the LLM-based summarizing condenser to improve performance.

  • Updates the prompt, fine-tuned with help from Claude.
  • LLMSummarizingCondenser now relies on the RollingCondenser to control when summarization happens.
  • Expanded unit tests to stress more complex configuration and usage.
  • The LLMSummarizingCondenser is now the default condenser (still off by default!), compared to the more aggressive AmortizedForgettingCondenser that has the same forgetting behavior as the LLMSummarizingCondenser but without the summarization.

To test performance, I ran this condenser against the no-condensation baseline (both on OH v0.21.0) on the whole of SWE-bench Verified with a maximum of 100 iterations. The condenser resolved 200 instances, compared to the baseline's 203, and cost $40 more to run due to the lower prompt cache utilization. However, the average response latency is a consistent 8 seconds compared to the 12 seconds (at iteration 30) and 16 seconds (at iteration 100) of the baseline.


Link of any specific issues this addresses

# Create a new summary event with the condensed content
summary_event = AgentCondensationObservation(summary_response)
for forgotten_event in forgotten_events:
prompt += str(forgotten_event) + '\n\n'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an argument for standardizing the Event -> Message conversion and breaking it out of the agent control-flow.

@csmith49 csmith49 requested a review from enyst February 3, 2025 20:29
csmith49 and others added 2 commits February 3, 2025 13:44
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
TESTS: test_format() passed
CHANGES: str(val) replaces f"{val:.16G}"
DEPS: None modified
INTENT: Fix float precision overflow"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that Sonnet is doing reasonable with this prompt. I'm going to test it on Deepseek 😅

(not necessary for this PR, of course, just a thought)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is what it ended up being, it's definitely not how I would default to prompting!

I started off with a very simple prompt, then I passed it and a trajectory or two to Claude and asked what it would change. Took two or three iterations before it met the baseline but it got me from 0% to 40%. I have to imagine you'd arrive at a very different result if you repeated the process with DeepSeek.

mock_llm.completion.assert_called_once()
call_args = mock_llm.completion.call_args[1]
assert 'messages' in call_args
assert len(call_args['messages']) == 1
assert 'Event 1' in call_args['messages'][0]['content']
assert 'Event 2' in call_args['messages'][0]['content']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can still test for some message inclusion? Not a big deal, just thinking that tests with condensation are not easy, so maybe we can include in unit tests whatever we can

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I absolutely love that we're doing this incrementally, and anyway we can play with that prompt later too.

@csmith49 csmith49 enabled auto-merge (squash) February 5, 2025 03:22
@csmith49 csmith49 merged commit e47aaba into All-Hands-AI:main Feb 5, 2025
13 checks passed
adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Feb 7, 2025
Co-authored-by: Calvin Smith <calvin@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants