fix: Manual compaction does not update context window. by katzdave · Pull Request #6682 · block/goose

katzdave · 2026-01-23T21:21:10Z

Summary

Simple fix, but added some pretty extensive tests with assertions about conversation state + token counts after all 3 compaction trigger types.

Copilot

Pull request overview

Fixes manual compaction so that session token/context metrics are updated alongside the conversation history, and adds tests intended to validate compaction behavior across manual, auto-threshold, and context-limit-recovery triggers.

Changes:

Update /compact command to apply compaction ProviderUsage to session metrics (input/output/total + accumulated).
Add a mock provider and new integration-style tests asserting conversation visibility and token accounting after compaction paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
crates/goose/src/agents/execute_commands.rs	Updates manual compaction to record usage into session metrics after replacing the conversation.
crates/goose/tests/agent.rs	Adds new compaction-focused tests and a mock provider to validate conversation state + token metrics.

Copilot · 2026-01-23T21:41:20Z

crates/goose/tests/agent.rs

+            // Note: The current session state reflects the compaction operation,
+            // as the agent records compaction metrics before retrying
+            let final_input = updated_session.input_tokens.unwrap();
+            let final_output = updated_session.output_tokens;
+            let final_total = updated_session.total_tokens.unwrap();
+
+            // After compaction during recovery, the session shows the compaction tokens
+            // Input: system (6000) + long_tool_call messages (~15,400) + new message (100) = ~21,500
+            // Output: 200 (compaction summary)
+            // Total: ~21,700
+            assert!(
+                final_input >= 21000 && final_input <= 22000,
+                "Final input should reflect compaction input (~21,500). Got: {}",
+                final_input
+            );
+
+            assert_eq!(
+                final_output,
+                Some(200),
+                "Final output should be compaction output (200). Got: {:?}",
+                final_output
+            );
+
+            assert_eq!(
+                final_total,
+                final_input + final_output.unwrap(),
+                "Final total should equal input + output"
+            );


test_context_limit_recovery_compaction asserts that the final session metrics reflect the compaction call (e.g., output_tokens == Some(200)), but the agent loop retries after recovery and calls update_session_metrics(..., is_compaction_usage=false) for the successful reply, so the final session state should reflect the reply usage (e.g., output tokens ~100) rather than the compaction usage.

Copilot · 2026-01-23T21:41:21Z

crates/goose/tests/agent.rs

+            // Setup session with many messages to have substantial context
+            // 20 exchanges = 40 messages * 100 tokens = ~4000 tokens in conversation
+            let mut messages = vec![];
+            for i in 0..20 {
+                messages.push(Message::user().with_text(format!("User message {}", i)));
+                messages.push(Message::assistant().with_text(format!("Assistant response {}", i)));
+            }
+
+            let session =
+                setup_test_session(&agent, &temp_dir, "auto-compact-test", messages).await?;
+
+            // Capture initial context size before triggering reply
+            // Should be: system (6000) + 40 messages (4000) = ~10000 tokens
+            let initial_session = agent
+                .config
+                .session_manager
+                .get_session(&session.id, true)
+                .await?;
+            let initial_input_tokens = initial_session.input_tokens.unwrap_or(0);
+
+            // Setup mock provider (no context limit enforcement)
+            let provider = Arc::new(MockCompactionProvider::new());
+            agent.update_provider(provider, &session.id).await?;


test_auto_compaction_during_reply is unlikely to ever exercise auto-compaction because setup_test_session sets session.total_tokens to Some(1000), and check_if_compaction_needed uses that metadata (instead of estimating from the 40 messages) against a large default context limit (128k for an unknown model), so needs_auto_compact will stay false even with lots of messages.

Copilot · 2026-01-23T21:41:21Z

crates/goose/tests/agent.rs

+                    // For compaction: system prompt length is a good proxy for conversation size
+                    // Base: 6000 (system) + conversation content embedded in prompt
+                    6000 + (system_prompt.len() as i32 / 4).max(400)


MockCompactionProvider::calculate_input_tokens derives compaction input tokens from system_prompt.len(), which makes these tests brittle to unrelated edits in the compaction.md template or prompt formatting (token assertions/ranges may fail even if compaction logic is correct); consider making the mock's compaction token accounting independent of the rendered prompt text (e.g., based on number/size of messages or a fixed value).

Suggested change

// For compaction: system prompt length is a good proxy for conversation size

// Base: 6000 (system) + conversation content embedded in prompt

6000 + (system_prompt.len() as i32 / 4).max(400)

// For compaction: use a fixed heuristic value so tests are stable

// Approximate previous minimum of 6000 (system) + 400 (conversation content)

6400

DOsinga · 2026-01-23T21:42:22Z

crates/goose/src/agents/execute_commands.rs

+            schedule_id: session.schedule_id,
+            max_turns: None,
+            retry_config: None,
+        };


uhm, that doesn't seem great. change update_session_metrics to just take a session_id or get your hands on the actual SessionConfig (I think we can do the first thing)

Done although, this schedule_id leaked out? We had it in all places so just continuing to pass it along.

DOsinga · 2026-01-23T21:45:15Z

crates/goose/tests/agent.rs

    }
+
+    #[cfg(test)]
+    mod compaction_tests {


this is a lot of lines; should this live in its own test module maybe? also the setup of three tests seems similar; can we unify?

Moved it. Lots of shared code on the setup of the provider and session + validations on post-compaction session state. Maybe could squeeze a little more with the messages, but I think the individual tests tell a clear story of the context before/after compaction and I am seeding the one that triggers out of context with extra messages.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Copilot · 2026-01-23T22:30:11Z

crates/goose/tests/compaction.rs

+    if compaction_occurred {
+        // Verify that current input context decreased after compaction
+        let tokens_after = input_tokens_after_compaction
+            .expect("Should have captured tokens after compaction");
+
+        // Before compaction: system (6000) + 40 messages (4000) = 10,000 tokens
+        // After compaction: only the summary (200 tokens) - this becomes the new input
+        assert!(
+            tokens_after < initial_input_tokens,
+            "Input tokens should decrease after compaction. Before: {}, After: {}",
+            initial_input_tokens,
+            tokens_after
+        );
+
+        // After compaction, input should be exactly the summary: 200 tokens
+        assert_eq!(
+            tokens_after, 200,
+            "Input tokens after compaction should be exactly 200 (summary). Got: {}",
+            tokens_after
+        );
+


test_auto_compaction_during_reply doesn’t actually exercise the auto-compaction path: setup_test_session hard-codes session.total_tokens to 1000 and MockCompactionProvider uses the default model context limit (128k), so check_if_compaction_needed will never request compaction and the compaction_occurred branch is effectively dead; make this deterministic by setting the mock model’s context_limit (e.g., via ModelConfig::with_context_limit) and initializing session.total_tokens high enough (or None to force estimation) so the threshold is exceeded.

Copilot · 2026-01-23T22:30:12Z

crates/goose/tests/compaction.rs

+    // as the agent records compaction metrics before retrying
+    let final_input = updated_session.input_tokens.unwrap();
+    let final_output = updated_session.output_tokens;
+    let final_total = updated_session.total_tokens.unwrap();
+
+    // After compaction during recovery, the session shows the compaction tokens
+    // Input: system (6000) + long_tool_call messages (~15,400) + new message (100) = ~21,500
+    // Output: 200 (compaction summary)
+    // Total: ~21,700
+    assert!(
+        (21000..=22000).contains(&final_input),
+        "Final input should reflect compaction input (~21,500). Got: {}",
+        final_input
+    );
+
+    assert_eq!(
+        final_output,
+        Some(200),
+        "Final output should be compaction output (200). Got: {:?}",


In test_context_limit_recovery_compaction, the assertions on updated_session.input_tokens/output_tokens after the stream finishes assume the session metrics still reflect the compaction call, but agent.rs continues the loop after HistoryReplaced and then calls update_session_metrics(..., is_compaction_usage=false) for the retry response, which overwrites the current token window; assert the post-compaction state using the input_tokens_after_compaction snapshot and expect the final session metrics to match the successful retry instead.

Suggested change

// as the agent records compaction metrics before retrying

let final_input = updated_session.input_tokens.unwrap();

let final_output = updated_session.output_tokens;

let final_total = updated_session.total_tokens.unwrap();

// After compaction during recovery, the session shows the compaction tokens

// Input: system (6000) + long_tool_call messages (~15,400) + new message (100) = ~21,500

// Output: 200 (compaction summary)

// Total: ~21,700

assert!(

(21000..=22000).contains(&final_input),

"Final input should reflect compaction input (~21,500). Got: {}",

final_input

);

assert_eq!(

final_output,

Some(200),

"Final output should be compaction output (200). Got: {:?}",

// as the agent records compaction metrics before retrying.

// After compaction during recovery, the snapshot should show the compaction tokens:

// Input: system (6000) + long_tool_call messages (~15,400) + new message (100) = ~21,500

assert!(

(21000..=22000).contains(&input_tokens_after_compaction),

"Compaction input should be ~21,500. Got: {}",

input_tokens_after_compaction

);

// After the retry, the session window should reflect the successful reply, not the compaction call.

let final_input = updated_session.input_tokens.unwrap();

let final_output = updated_session.output_tokens;

let final_total = updated_session.total_tokens.unwrap();

// Reply: ~6,300 input + 100 output = 6,400

assert!(

(6000..=6500).contains(&final_input),

"Final input should reflect reply input (~6,300). Got: {}",

final_input

);

assert_eq!(

final_output,

Some(100),

"Final output should be reply output (100). Got: {:?}",

…upport * origin/main: (79 commits) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) ... # Conflicts: # ui/desktop/src/hooks/useChatStream.ts

…ovider * 'main' of github.com:block/goose: fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684)

* 'main' of github.com:block/goose: Create default gooseignore file when missing (#6498) fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718)

katzdave added 8 commits January 23, 2026 11:49

fix works

57b0ab8

working but not careful with token counts

4070f7b

more explict context checks

a0dc4ac

way more context checks

0a88a2c

more preceise numbers

8bd7b6d

rm some comments

3972ba0

fmt

4c78120

minor cleanup comment

d6d2830

katzdave marked this pull request as ready for review January 23, 2026 21:37

Copilot AI review requested due to automatic review settings January 23, 2026 21:37

Copilot started reviewing on behalf of katzdave January 23, 2026 21:38 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

DOsinga approved these changes Jan 23, 2026

View reviewed changes

katzdave added 2 commits January 23, 2026 16:47

clipy

74e625e

move and sesion id

c7d10cb

Copilot AI review requested due to automatic review settings January 23, 2026 22:26

Copilot started reviewing on behalf of katzdave January 23, 2026 22:27 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

fmt

a2a23df

katzdave merged commit 214f7a2 into main Jan 26, 2026
31 of 34 checks passed

katzdave deleted the dkatz/compaction-usage-fix branch January 26, 2026 20:44

github-actions bot mentioned this pull request Jan 27, 2026

chore(release): release version 1.22.0 (minor) #6725

Closed

This was referenced Jan 29, 2026

chore(release): release version 1.22.0 (minor) #6812

Closed

chore(release): release version 1.22.0 (minor) #6813

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Manual compaction does not update context window.#6682

fix: Manual compaction does not update context window.#6682
katzdave merged 11 commits intomainfrom
dkatz/compaction-usage-fix

katzdave commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

DOsinga Jan 23, 2026

Uh oh!

katzdave Jan 23, 2026

Uh oh!

DOsinga Jan 23, 2026

Uh oh!

katzdave Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

katzdave commented Jan 23, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

DOsinga Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

katzdave Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

DOsinga Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

katzdave Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants