Skip to content

Conversation

@sestinj
Copy link
Contributor

@sestinj sestinj commented Dec 1, 2025

Summary by cubic

Improved token accounting by including the system message in compaction and context validation, with a safety buffer, to prevent context overflows—especially after tool calls. Also streamlined auto-compaction with helper utilities and updated APIs for clearer options.

  • New Features

    • Compaction and validation now include system message tokens and a 100-token safety buffer.
    • Added helpers for pre-API compaction, post-tool overflow validation, and normal auto-compaction.
    • Tokenizer now counts multimodal text, tool function arguments, and tool outputs, and avoids double-counting when toolCallStates are present.
  • Migration

    • compactChatHistory now takes an options object: { callbacks?, abortController?, systemMessageTokens? }.
    • Callers should move onStreamContent/onStreamComplete/onError under options.callbacks and pass abortController via options.abortController.
    • processStreamingResponse now requires a systemMessage string.

Written for commit 14dd72e. Summary will update automatically on new commits.

@sestinj sestinj requested a review from a team as a code owner December 1, 2025 22:25
@sestinj sestinj removed the request for review from a team December 1, 2025 22:25
@sestinj sestinj requested a review from RomneyDa December 1, 2025 22:25
@continue
Copy link
Contributor

continue bot commented Dec 1, 2025

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

2 similar comments
@continue-development-app
Copy link

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

@continue-development-app
Copy link

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

⚠️ PR Title Format

Your PR title doesn't follow the conventional commit format, but this won't block your PR from being merged. We recommend using this format for better project organization.

Expected Format:

<type>[optional scope]: <description>

Examples:

  • feat: add changelog generation support
  • fix: resolve login redirect issue
  • docs: update README with new instructions
  • chore: update dependencies

Valid Types:

feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert

This helps with:

  • 📝 Automatic changelog generation
  • 🚀 Automated semantic versioning
  • 📊 Better project history tracking

This is a non-blocking warning - your PR can still be merged without fixing this.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

✅ Review Complete

This is a solid PR that addresses an important issue with context length accounting. The changes systematically improve token accounting by including system messages and adding a safety buffer. Here's my review:

✅ Strengths

  1. Proper root cause fix: Including system message tokens in validation prevents context overflows that were likely causing real issues
  2. Safety buffer approach: The 100-token buffer is pragmatic for handling tokenization estimation errors
  3. Good refactoring: The new helper functions (handlePreApiCompaction, handlePostToolValidation, handleNormalAutoCompaction) improve code organization and make the flow clearer
  4. Comprehensive token counting: The enhanced countChatHistoryItemTokens now properly handles multimodal content, tool calls, and tool outputs

⚠️ Issues Found

1. Potential double-counting in tool call validation (streamChatResponse.compactionHelpers.ts:90-97)

const postToolSystemItem: ChatHistoryItem = {
  message: {
    role: "system",
    content: systemMessage,
  },
  contextItems: [],
};
const postToolValidation = validateContextLength(
  [postToolSystemItem, ...chatHistory],
  model,
  SAFETY_BUFFER,
);

Issue: The system message is being added again for validation, but chatHistory might already include tool results. If the system message was already counted in systemMessageTokens during compaction, this creates a temporary system item just for validation. This is correct, but the same pattern is repeated in multiple places - consider extracting to a helper.

2. Missing test for multimodal token counting (util/tokenizer.ts:29-46)

The new countContentTokens function handles multimodal content but there are no tests verifying:

  • Multimodal arrays are counted correctly
  • Image token estimation (1024 tokens) is reasonable
  • Empty arrays return 0

Suggestion: Add tests in tokenizer.test.ts or create a new test file.

3. Tool output token counting could be unbounded (util/tokenizer.ts:68-82)

function countToolOutputTokens(
  output: Array<{ content?: string; name?: string }> | undefined,
): number {
  if (!output) {
    return 0;
  }

  let tokenCount = 0;
  for (const item of output) {
    if (item.content) {
      tokenCount += encode(item.content).length;
    }

Issue: Tool outputs can be extremely large (file contents, search results, etc.). While this is being counted now (which is good), there's no upper limit. If a tool returns 50k tokens of output, this could easily overflow context even after compaction.

Suggestion: Consider truncating tool outputs or warning when they exceed a threshold (e.g., 5000 tokens).

4. Inconsistent error handling in handlePostToolValidation (streamChatResponse.compactionHelpers.ts:147-152)

} else {
  // Compaction failed, cannot continue
  logger.error("Failed to compact history after tool execution overflow");
  throw new Error(
    "Context limit exceeded and compaction failed. Unable to continue.",
  );
}

Issue: This throws an error, but the earlier validation failure at line 137 also throws. The error messages are slightly different but the conditions are similar. Consider consolidating error handling.

5. System message fetched in autoCompaction when provided (streamChatResponse.autoCompaction.ts:152-160)

const systemMessage =
  providedSystemMessage ??
  (async () => {
    const { services } = await import("../services/index.js");
    return services.systemMessage.getSystemMessage(
      services.toolPermissions.getState().currentMode,
    );
  })();

Issue: When providedSystemMessage is undefined, this creates an async IIFE that returns a Promise. The code then checks typeof systemMessage === "string" which will be false, forcing it to await. This works but is convoluted.

Suggestion: Simplify:

const resolvedSystemMessage = providedSystemMessage ?? 
  await services.systemMessage.getSystemMessage(
    services.toolPermissions.getState().currentMode,
  );

6. Breaking change not clearly documented (compaction.ts:22-28)

The PR description mentions migration, but the breaking change is significant:

  • Old: compactChatHistory(history, model, llmApi, callbacks, abortController)
  • New: compactChatHistory(history, model, llmApi, { callbacks, abortController, systemMessageTokens })

Issue: All call sites need updating. While tests are updated, check if there are any external consumers (other extensions, documentation examples, etc.).

7. Magic number: SAFETY_BUFFER = 100 (compaction.ts:66, streamChatResponse.ts:168)

Issue: The 100-token safety buffer is defined in multiple places. If it needs to change, you'll need to update multiple files.

Suggestion: Extract to a constant in a shared location:

// util/tokenizer.ts
export const SAFETY_BUFFER_TOKENS = 100;

8. Test mock incomplete (streamChatResponse.autoCompaction.test.ts:41-42)

vi.mock("../util/tokenizer.js", () => ({
  shouldAutoCompact: vi.fn(),
  getAutoCompactMessage: vi.fn(),
  countChatHistoryItemTokens: vi.fn(() => 100), // Mock return value
}));

Issue: The mock doesn't include validateContextLength, countChatHistoryTokens, and other exports from tokenizer.js that might be used in the code under test. This could cause runtime errors in tests.

🔍 Minor Suggestions

  1. Line 428: // eslint-disable-next-line max-params - Consider using an options object to reduce parameter count instead of disabling the lint rule

  2. Documentation: Add JSDoc comments explaining the safety buffer and system message accounting to validateContextLength

  3. Logging: The post-tool overflow compaction logs success but not the before/after token counts - this would be useful for debugging

Summary

The core changes are sound and address a real problem. The main concerns are:

  1. Potential unbounded tool output sizes
  2. Code duplication (SAFETY_BUFFER constant, system message item creation)
  3. Missing tests for new token counting logic
  4. Ensure all call sites are updated for the breaking API change

Overall: Approve with minor changes suggested. The functionality improvements outweigh the minor issues, which can be addressed in follow-up commits.


@continue
Copy link
Contributor

continue bot commented Dec 1, 2025

Reviewed the PR changes. No documentation updates needed.

Reasoning:

  • Changes are internal API refactoring ( signature, token accounting improvements)
  • No user-facing configuration or behavior changes
  • The function is not part of a public/documented API (only used internally within CLI)
  • User-visible auto-compaction behavior remains unchanged

The improvements to token accounting and system message handling are implementation details that enhance correctness without affecting how users interact with the CLI.

@continue
Copy link
Contributor

continue bot commented Dec 1, 2025

Reviewed the PR changes. No documentation updates needed.

Reasoning:

  • Changes are internal API refactoring (compactChatHistory signature, token accounting improvements)
  • No user-facing configuration or behavior changes
  • The function is not part of a public/documented API (only used internally within CLI)
  • User-visible auto-compaction behavior remains unchanged

The improvements to token accounting and system message handling are implementation details that enhance correctness without affecting how users interact with the CLI.

@continue
Copy link
Contributor

continue bot commented Dec 1, 2025

🤖 All Green agent started: View agent

continue bot and others added 3 commits December 1, 2025 22:34
Add required systemMessage parameter to all processStreamingResponse calls
in the test file to match the updated interface signature.

Co-authored-by: nate <nate@continue.dev>
Reorder imports to comply with ESLint import/order rules:
- Move vitest imports after core imports
- Add empty line between import groups
- Move services import before local imports

Co-authored-by: nate <nate@continue.dev>
ESLint import/order rule requires a blank line between different
import groups (parent vs sibling).

Co-authored-by: nate <nate@continue.dev>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 10 files

Prompt for AI agents (all 4 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:109">
P1: The post-tool “force compaction” path never runs when overflow is caused solely by the large system message, because handleAutoCompaction short-circuits before compaction whenever `shouldAutoCompact` is false for the raw history. This makes the new overflow handling throw even though compaction could have resolved it.</violation>

<violation number="2" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P1: Successful post-tool compaction still throws whenever `services.chatHistory` is unavailable, because the code conflates “no service” with “compaction failed” instead of falling back to the returned compacted history.</violation>
</file>

<file name="extensions/cli/src/stream/streamChatResponse.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.ts:470">
P2: `handlePostToolValidation` throws whenever the ChatHistory service is unavailable, so the new call causes forced post-tool compaction to fail in headless contexts that previously worked.</violation>
</file>

<file name="extensions/cli/src/compaction.ts">

<violation number="1" location="extensions/cli/src/compaction.ts:68">
P2: `systemMessageTokens` is already counted inside `historyForCompaction`, so subtracting it again artificially shrinks the available token budget and causes unnecessary pruning of user history.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

systemMessage,
});

if (wasCompacted && chatHistorySvc) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Successful post-tool compaction still throws whenever services.chatHistory is unavailable, because the code conflates “no service” with “compaction failed” instead of falling back to the returned compacted history.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119:

<comment>Successful post-tool compaction still throws whenever `services.chatHistory` is unavailable, because the code conflates “no service” with “compaction failed” instead of falling back to the returned compacted history.</comment>

<file context>
@@ -0,0 +1,197 @@
+        systemMessage,
+      });
+
+    if (wasCompacted &amp;&amp; chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
</file context>

✅ Addressed in 14dd72e

});

// Force compaction (compaction now accounts for system message during pruning)
const { wasCompacted, chatHistory: compactedHistory } =
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The post-tool “force compaction” path never runs when overflow is caused solely by the large system message, because handleAutoCompaction short-circuits before compaction whenever shouldAutoCompact is false for the raw history. This makes the new overflow handling throw even though compaction could have resolved it.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 109:

<comment>The post-tool “force compaction” path never runs when overflow is caused solely by the large system message, because handleAutoCompaction short-circuits before compaction whenever `shouldAutoCompact` is false for the raw history. This makes the new overflow handling throw even though compaction could have resolved it.</comment>

<file context>
@@ -0,0 +1,197 @@
+    });
+
+    // Force compaction (compaction now accounts for system message during pruning)
+    const { wasCompacted, chatHistory: compactedHistory } =
+      await handleAutoCompaction(chatHistory, model, llmApi, {
+        isHeadless,
</file context>
Fix with Cubic

}
}
// After tool execution, validate that we haven't exceeded context limit
chatHistory = await handlePostToolValidation(toolCalls, chatHistory, {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: handlePostToolValidation throws whenever the ChatHistory service is unavailable, so the new call causes forced post-tool compaction to fail in headless contexts that previously worked.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.ts, line 470:

<comment>`handlePostToolValidation` throws whenever the ChatHistory service is unavailable, so the new call causes forced post-tool compaction to fail in headless contexts that previously worked.</comment>

<file context>
@@ -447,33 +466,29 @@ export async function streamChatResponse(
-      }
-    }
+    // After tool execution, validate that we haven&#39;t exceeded context limit
+    chatHistory = await handlePostToolValidation(toolCalls, chatHistory, {
+      model,
+      llmApi,
</file context>
Fix with Cubic

// Account for system message AND safety buffer
const SAFETY_BUFFER = 100;
const availableForInput =
contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: systemMessageTokens is already counted inside historyForCompaction, so subtracting it again artificially shrinks the available token budget and causes unnecessary pruning of user history.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/compaction.ts, line 68:

<comment>`systemMessageTokens` is already counted inside `historyForCompaction`, so subtracting it again artificially shrinks the available token budget and causes unnecessary pruning of user history.</comment>

<file context>
@@ -56,7 +61,11 @@ export async function compactChatHistory(
+  // Account for system message AND safety buffer
+  const SAFETY_BUFFER = 100;
+  const availableForInput =
+    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
 
   // Check if we need to prune to fit within context
</file context>

✅ Addressed in 14dd72e

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 11 files

Prompt for AI agents (all 4 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts:197">
P2: This test asserts that `processStreamingResponse` rejects even though the total tokens exactly equal the context limit with the safety buffer, so validation should succeed. Change the test to expect a successful response (and mock `chatCompletionStream`) instead of `.rejects.toThrow()`.</violation>
</file>

<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P1: Compaction after tool calls immediately throws whenever the chat history service is unavailable, even if compaction succeeded; use the returned compactedHistory as a fallback instead of treating it as a failure.</violation>
</file>

<file name="extensions/cli/src/compaction.ts">

<violation number="1" location="extensions/cli/src/compaction.ts:68">
P2: System message tokens are already counted inside `historyForCompaction`, so subtracting `systemMessageTokens` again shrinks the available input budget by an entire system prompt and causes compaction to over‑prune. Remove the extra subtraction so only the safety buffer is reserved.</violation>
</file>

<file name="extensions/cli/src/util/tokenizer.ts">

<violation number="1" location="extensions/cli/src/util/tokenizer.ts:80">
P2: `toolState.output` names are never sent to the model, so counting `encode(item.name)` overestimates input tokens and causes unnecessary compaction/validation failures.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

abortController,
systemMessage,
}),
).rejects.toThrow(); // Will fail because we can't prune enough
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This test asserts that processStreamingResponse rejects even though the total tokens exactly equal the context limit with the safety buffer, so validation should succeed. Change the test to expect a successful response (and mock chatCompletionStream) instead of .rejects.toThrow().

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts, line 197:

<comment>This test asserts that `processStreamingResponse` rejects even though the total tokens exactly equal the context limit with the safety buffer, so validation should succeed. Change the test to expect a successful response (and mock `chatCompletionStream`) instead of `.rejects.toThrow()`.</comment>

<file context>
@@ -0,0 +1,199 @@
+        abortController,
+        systemMessage,
+      }),
+    ).rejects.toThrow(); // Will fail because we can&#39;t prune enough
+  });
+});
</file context>
Fix with Cubic

Comment on lines 119 to 150
if (wasCompacted && chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();

// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...compactedHistory],
model,
SAFETY_BUFFER,
);

if (!postCompactionValidation.isValid) {
logger.error(
"Compaction failed to bring context under limit, stopping execution",
{
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
},
);
throw new Error(
`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
);
}

logger.info("Successfully compacted after tool overflow", {
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
});
} else {
// Compaction failed, cannot continue
logger.error("Failed to compact history after tool execution overflow");
throw new Error(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Compaction after tool calls immediately throws whenever the chat history service is unavailable, even if compaction succeeded; use the returned compactedHistory as a fallback instead of treating it as a failure.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119:

<comment>Compaction after tool calls immediately throws whenever the chat history service is unavailable, even if compaction succeeded; use the returned compactedHistory as a fallback instead of treating it as a failure.</comment>

<file context>
@@ -0,0 +1,197 @@
+        systemMessage,
+      });
+
+    if (wasCompacted &amp;&amp; chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
</file context>
Suggested change
if (wasCompacted && chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();
// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...compactedHistory],
model,
SAFETY_BUFFER,
);
if (!postCompactionValidation.isValid) {
logger.error(
"Compaction failed to bring context under limit, stopping execution",
{
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
},
);
throw new Error(
`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
);
}
logger.info("Successfully compacted after tool overflow", {
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
});
} else {
// Compaction failed, cannot continue
logger.error("Failed to compact history after tool execution overflow");
throw new Error(
if (wasCompacted) {
if (chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();
} else {
chatHistory = [...compactedHistory];
}
// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...chatHistory],
model,
SAFETY_BUFFER,
);
if (!postCompactionValidation.isValid) {
logger.error(
"Compaction failed to bring context under limit, stopping execution",
{
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
},
);
throw new Error(
`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
);
}
logger.info("Successfully compacted after tool overflow", {
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
});
} else {
// Compaction failed, cannot continue
logger.error("Failed to compact history after tool execution overflow");
throw new Error(
"Context limit exceeded and compaction failed. Unable to continue.",
);
}

✅ Addressed in 14dd72e

// Account for system message AND safety buffer
const SAFETY_BUFFER = 100;
const availableForInput =
contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: System message tokens are already counted inside historyForCompaction, so subtracting systemMessageTokens again shrinks the available input budget by an entire system prompt and causes compaction to over‑prune. Remove the extra subtraction so only the safety buffer is reserved.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/compaction.ts, line 68:

<comment>System message tokens are already counted inside `historyForCompaction`, so subtracting `systemMessageTokens` again shrinks the available input budget by an entire system prompt and causes compaction to over‑prune. Remove the extra subtraction so only the safety buffer is reserved.</comment>

<file context>
@@ -56,7 +61,11 @@ export async function compactChatHistory(
+  // Account for system message AND safety buffer
+  const SAFETY_BUFFER = 100;
+  const availableForInput =
+    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
 
   // Check if we need to prune to fit within context
</file context>
Suggested change
contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
contextLimit - reservedForOutput - SAFETY_BUFFER;

✅ Addressed in 14dd72e

if (item.content) {
tokenCount += encode(item.content).length;
}
if (item.name) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: toolState.output names are never sent to the model, so counting encode(item.name) overestimates input tokens and causes unnecessary compaction/validation failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/util/tokenizer.ts, line 80:

<comment>`toolState.output` names are never sent to the model, so counting `encode(item.name)` overestimates input tokens and causes unnecessary compaction/validation failures.</comment>

<file context>
@@ -22,6 +22,69 @@ export function getModelContextLimit(model: ModelConfig): number {
+    if (item.content) {
+      tokenCount += encode(item.content).length;
+    }
+    if (item.name) {
+      tokenCount += encode(item.name).length;
+    }
</file context>

✅ Addressed in 14dd72e

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 11 files

Prompt for AI agents (all 3 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P1: Forced post-tool compaction is treated as a failure whenever `services.chatHistory` is unavailable, causing this path to throw even though compaction succeeded and a valid in-memory history exists. Provide a fallback that updates `chatHistory` locally when the service cannot be used instead of throwing.</violation>
</file>

<file name="extensions/cli/src/compaction.ts">

<violation number="1" location="extensions/cli/src/compaction.ts:67">
P2: System message tokens are removed twice, causing over-aggressive compaction and loss of useful history.</violation>
</file>

<file name="extensions/cli/src/stream/streamChatResponse.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.ts:480">
P1: `handleNormalAutoCompaction` drops the compacted history whenever the chat-history service is unavailable, so the 80 % safety compaction no longer runs in those environments.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Comment on lines 119 to 128
if (wasCompacted && chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();

// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...compactedHistory],
model,
SAFETY_BUFFER,
);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Forced post-tool compaction is treated as a failure whenever services.chatHistory is unavailable, causing this path to throw even though compaction succeeded and a valid in-memory history exists. Provide a fallback that updates chatHistory locally when the service cannot be used instead of throwing.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119:

<comment>Forced post-tool compaction is treated as a failure whenever `services.chatHistory` is unavailable, causing this path to throw even though compaction succeeded and a valid in-memory history exists. Provide a fallback that updates `chatHistory` locally when the service cannot be used instead of throwing.</comment>

<file context>
@@ -0,0 +1,197 @@
+        systemMessage,
+      });
+
+    if (wasCompacted &amp;&amp; chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
</file context>
Suggested change
if (wasCompacted && chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();
// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...compactedHistory],
model,
SAFETY_BUFFER,
);
if (wasCompacted) {
if (
typeof chatHistorySvc?.isReady === "function" &&
chatHistorySvc.isReady()
) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();
} else {
chatHistory = [...compactedHistory];
}
// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...chatHistory],
model,
SAFETY_BUFFER,
);

✅ Addressed in 14dd72e

Comment on lines 67 to 68
const availableForInput =
contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: System message tokens are removed twice, causing over-aggressive compaction and loss of useful history.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/compaction.ts, line 67:

<comment>System message tokens are removed twice, causing over-aggressive compaction and loss of useful history.</comment>

<file context>
@@ -56,7 +61,11 @@ export async function compactChatHistory(
+
+  // Account for system message AND safety buffer
+  const SAFETY_BUFFER = 100;
+  const availableForInput =
+    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
 
</file context>
Suggested change
const availableForInput =
contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
const hasSystemMessage = historyToUse.some(
(item) => item.message.role === "system",
);
const reservedForSystem = hasSystemMessage ? 0 : systemMessageTokens;
const availableForInput =
contextLimit - reservedForOutput - reservedForSystem - SAFETY_BUFFER;

✅ Addressed in 14dd72e

});

// Normal auto-compaction check at 80% threshold
chatHistory = await handleNormalAutoCompaction(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: handleNormalAutoCompaction drops the compacted history whenever the chat-history service is unavailable, so the 80 % safety compaction no longer runs in those environments.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.ts, line 480:

<comment>`handleNormalAutoCompaction` drops the compacted history whenever the chat-history service is unavailable, so the 80 % safety compaction no longer runs in those environments.</comment>

<file context>
@@ -447,33 +466,29 @@ export async function streamChatResponse(
+    });
+
+    // Normal auto-compaction check at 80% threshold
+    chatHistory = await handleNormalAutoCompaction(
+      chatHistory,
+      shouldContinue,
</file context>
Fix with Cubic

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 11 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P2: Post-tool compaction now throws whenever chatHistorySvc isn’t available even though handleAutoCompaction already returned compactedHistory; this makes tool-overflow recovery fail in headless/early cases where the service isn’t ready. Treat the service as optional: update it when available, fall back to the locally compacted history, and only throw when compaction itself failed.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Comment on lines 119 to 153
if (wasCompacted && chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();

// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...compactedHistory],
model,
SAFETY_BUFFER,
);

if (!postCompactionValidation.isValid) {
logger.error(
"Compaction failed to bring context under limit, stopping execution",
{
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
},
);
throw new Error(
`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
);
}

logger.info("Successfully compacted after tool overflow", {
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
});
} else {
// Compaction failed, cannot continue
logger.error("Failed to compact history after tool execution overflow");
throw new Error(
"Context limit exceeded and compaction failed. Unable to continue.",
);
}
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Post-tool compaction now throws whenever chatHistorySvc isn’t available even though handleAutoCompaction already returned compactedHistory; this makes tool-overflow recovery fail in headless/early cases where the service isn’t ready. Treat the service as optional: update it when available, fall back to the locally compacted history, and only throw when compaction itself failed.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119:

<comment>Post-tool compaction now throws whenever chatHistorySvc isn’t available even though handleAutoCompaction already returned compactedHistory; this makes tool-overflow recovery fail in headless/early cases where the service isn’t ready. Treat the service as optional: update it when available, fall back to the locally compacted history, and only throw when compaction itself failed.</comment>

<file context>
@@ -0,0 +1,197 @@
+        systemMessage,
+      });
+
+    if (wasCompacted &amp;&amp; chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
</file context>
Suggested change
if (wasCompacted && chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();
// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...compactedHistory],
model,
SAFETY_BUFFER,
);
if (!postCompactionValidation.isValid) {
logger.error(
"Compaction failed to bring context under limit, stopping execution",
{
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
},
);
throw new Error(
`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
);
}
logger.info("Successfully compacted after tool overflow", {
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
});
} else {
// Compaction failed, cannot continue
logger.error("Failed to compact history after tool execution overflow");
throw new Error(
"Context limit exceeded and compaction failed. Unable to continue.",
);
}
if (wasCompacted) {
if (chatHistorySvc) {
chatHistorySvc.setHistory(compactedHistory);
chatHistory = chatHistorySvc.getHistory();
} else {
chatHistory = [...compactedHistory];
}
// Verify compaction brought us under the limit
const postCompactionValidation = validateContextLength(
[postToolSystemItem, ...chatHistory],
model,
SAFETY_BUFFER,
);
if (!postCompactionValidation.isValid) {
logger.error(
"Compaction failed to bring context under limit, stopping execution",
{
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
},
);
throw new Error(
`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
);
}
logger.info("Successfully compacted after tool overflow", {
inputTokens: postCompactionValidation.inputTokens,
contextLimit: postCompactionValidation.contextLimit,
});
} else {
// Compaction failed, cannot continue
logger.error("Failed to compact history after tool execution overflow");
throw new Error(
"Context limit exceeded and compaction failed. Unable to continue.",
);
}

✅ Addressed in 14dd72e

Address code review feedback:
1. Add fallback to use compacted history directly when ChatHistoryService
   is unavailable (fixes P1 issue in handlePostToolValidation and
   handleNormalAutoCompaction)
2. Prevent double-counting of system message tokens by checking if
   system message is already in history before subtracting tokens
3. Remove tool output name from token count since it's not sent to model

This ensures compaction works correctly in headless contexts and provides
more accurate token estimation.

Co-authored-by: nate <nate@continue.dev>
@continue
Copy link
Contributor

continue bot commented Dec 1, 2025

Thanks @cubic-dev-ai for the detailed review! I've addressed the issues:

  1. P1: Fallback when ChatHistoryService unavailable - Fixed in both handlePostToolValidation and handleNormalAutoCompaction to use the compacted history directly when the service is unavailable, ensuring headless contexts work correctly.

  2. P2: System message double-counting - Added a check to detect if the system message is already in the history before subtracting its token count, preventing over-aggressive compaction.

  3. P2: Tool output name tokens - Removed item.name from tool output token counting since it's only used for internal tracking and not sent to the model.

All fixes are in commit 14dd72e.

@sestinj sestinj merged commit b91c7a6 into main Dec 1, 2025
55 of 58 checks passed
@sestinj sestinj deleted the nate/system-message-token-accounting branch December 1, 2025 23:16
@github-project-automation github-project-automation bot moved this from Todo to Done in Issues and PRs Dec 1, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Dec 1, 2025
@sestinj
Copy link
Contributor Author

sestinj commented Dec 2, 2025

🎉 This PR is included in version 1.33.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@sestinj
Copy link
Contributor Author

sestinj commented Dec 2, 2025

🎉 This PR is included in version 1.8.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@sestinj
Copy link
Contributor Author

sestinj commented Dec 2, 2025

🎉 This PR is included in version 1.37.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@sestinj
Copy link
Contributor Author

sestinj commented Dec 4, 2025

🎉 This PR is included in version 1.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

released size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants