Improve system message token accounting in compaction and context validation #8955

sestinj · 2025-12-01T22:25:56Z

Summary by cubic

Improved token accounting by including the system message in compaction and context validation, with a safety buffer, to prevent context overflows—especially after tool calls. Also streamlined auto-compaction with helper utilities and updated APIs for clearer options.

New Features
- Compaction and validation now include system message tokens and a 100-token safety buffer.
- Added helpers for pre-API compaction, post-tool overflow validation, and normal auto-compaction.
- Tokenizer now counts multimodal text, tool function arguments, and tool outputs, and avoids double-counting when toolCallStates are present.
Migration
- compactChatHistory now takes an options object: { callbacks?, abortController?, systemMessageTokens? }.
- Callers should move onStreamContent/onStreamComplete/onError under options.callbacks and pass abortController via options.abortController.
- processStreamingResponse now requires a systemMessage string.

^{Written for commit 14dd72e. Summary will update automatically on new commits.}

…idation

continue · 2025-12-01T22:26:00Z

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

continue-development-app · 2025-12-01T22:26:00Z

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

continue-development-app · 2025-12-01T22:26:00Z

Keep this PR in a mergeable state →

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

github-actions · 2025-12-01T22:26:05Z

⚠️ PR Title Format

Your PR title doesn't follow the conventional commit format, but this won't block your PR from being merged. We recommend using this format for better project organization.

Expected Format:

<type>[optional scope]: <description>

Examples:

feat: add changelog generation support
fix: resolve login redirect issue
docs: update README with new instructions
chore: update dependencies

Valid Types:

feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert

This helps with:

📝 Automatic changelog generation
🚀 Automated semantic versioning
📊 Better project history tracking

This is a non-blocking warning - your PR can still be merged without fixing this.

github-actions · 2025-12-01T22:26:40Z

✅ Review Complete

This is a solid PR that addresses an important issue with context length accounting. The changes systematically improve token accounting by including system messages and adding a safety buffer. Here's my review:

✅ Strengths

Proper root cause fix: Including system message tokens in validation prevents context overflows that were likely causing real issues
Safety buffer approach: The 100-token buffer is pragmatic for handling tokenization estimation errors
Good refactoring: The new helper functions (handlePreApiCompaction, handlePostToolValidation, handleNormalAutoCompaction) improve code organization and make the flow clearer
Comprehensive token counting: The enhanced countChatHistoryItemTokens now properly handles multimodal content, tool calls, and tool outputs

⚠️ Issues Found

1. Potential double-counting in tool call validation (streamChatResponse.compactionHelpers.ts:90-97)

const postToolSystemItem: ChatHistoryItem = {
  message: {
    role: "system",
    content: systemMessage,
  },
  contextItems: [],
};
const postToolValidation = validateContextLength(
  [postToolSystemItem, ...chatHistory],
  model,
  SAFETY_BUFFER,
);

Issue: The system message is being added again for validation, but chatHistory might already include tool results. If the system message was already counted in systemMessageTokens during compaction, this creates a temporary system item just for validation. This is correct, but the same pattern is repeated in multiple places - consider extracting to a helper.

2. Missing test for multimodal token counting (util/tokenizer.ts:29-46)

The new countContentTokens function handles multimodal content but there are no tests verifying:

Multimodal arrays are counted correctly
Image token estimation (1024 tokens) is reasonable
Empty arrays return 0

Suggestion: Add tests in tokenizer.test.ts or create a new test file.

3. Tool output token counting could be unbounded (util/tokenizer.ts:68-82)

function countToolOutputTokens(
  output: Array<{ content?: string; name?: string }> | undefined,
): number {
  if (!output) {
    return 0;
  }

  let tokenCount = 0;
  for (const item of output) {
    if (item.content) {
      tokenCount += encode(item.content).length;
    }

Issue: Tool outputs can be extremely large (file contents, search results, etc.). While this is being counted now (which is good), there's no upper limit. If a tool returns 50k tokens of output, this could easily overflow context even after compaction.

Suggestion: Consider truncating tool outputs or warning when they exceed a threshold (e.g., 5000 tokens).

4. Inconsistent error handling in handlePostToolValidation (streamChatResponse.compactionHelpers.ts:147-152)

} else {
  // Compaction failed, cannot continue
  logger.error("Failed to compact history after tool execution overflow");
  throw new Error(
    "Context limit exceeded and compaction failed. Unable to continue.",
  );
}

Issue: This throws an error, but the earlier validation failure at line 137 also throws. The error messages are slightly different but the conditions are similar. Consider consolidating error handling.

5. System message fetched in autoCompaction when provided (streamChatResponse.autoCompaction.ts:152-160)

const systemMessage =
  providedSystemMessage ??
  (async () => {
    const { services } = await import("../services/index.js");
    return services.systemMessage.getSystemMessage(
      services.toolPermissions.getState().currentMode,
    );
  })();

Issue: When providedSystemMessage is undefined, this creates an async IIFE that returns a Promise. The code then checks typeof systemMessage === "string" which will be false, forcing it to await. This works but is convoluted.

Suggestion: Simplify:

const resolvedSystemMessage = providedSystemMessage ?? 
  await services.systemMessage.getSystemMessage(
    services.toolPermissions.getState().currentMode,
  );

6. Breaking change not clearly documented (compaction.ts:22-28)

The PR description mentions migration, but the breaking change is significant:

Old: compactChatHistory(history, model, llmApi, callbacks, abortController)
New: compactChatHistory(history, model, llmApi, { callbacks, abortController, systemMessageTokens })

Issue: All call sites need updating. While tests are updated, check if there are any external consumers (other extensions, documentation examples, etc.).

7. Magic number: SAFETY_BUFFER = 100 (compaction.ts:66, streamChatResponse.ts:168)

Issue: The 100-token safety buffer is defined in multiple places. If it needs to change, you'll need to update multiple files.

Suggestion: Extract to a constant in a shared location:

// util/tokenizer.ts
export const SAFETY_BUFFER_TOKENS = 100;

8. Test mock incomplete (streamChatResponse.autoCompaction.test.ts:41-42)

vi.mock("../util/tokenizer.js", () => ({
  shouldAutoCompact: vi.fn(),
  getAutoCompactMessage: vi.fn(),
  countChatHistoryItemTokens: vi.fn(() => 100), // Mock return value
}));

Issue: The mock doesn't include validateContextLength, countChatHistoryTokens, and other exports from tokenizer.js that might be used in the code under test. This could cause runtime errors in tests.

🔍 Minor Suggestions

Line 428: // eslint-disable-next-line max-params - Consider using an options object to reduce parameter count instead of disabling the lint rule
Documentation: Add JSDoc comments explaining the safety buffer and system message accounting to validateContextLength
Logging: The post-tool overflow compaction logs success but not the before/after token counts - this would be useful for debugging

Summary

The core changes are sound and address a real problem. The main concerns are:

Potential unbounded tool output sizes
Code duplication (SAFETY_BUFFER constant, system message item creation)
Missing tests for new token counting logic
Ensure all call sites are updated for the breaking API change

Overall: Approve with minor changes suggested. The functionality improvements outweigh the minor issues, which can be addressed in follow-up commits.

continue · 2025-12-01T22:27:48Z

Reviewed the PR changes. No documentation updates needed.

Reasoning:

Changes are internal API refactoring ( signature, token accounting improvements)
No user-facing configuration or behavior changes
The function is not part of a public/documented API (only used internally within CLI)
User-visible auto-compaction behavior remains unchanged

The improvements to token accounting and system message handling are implementation details that enhance correctness without affecting how users interact with the CLI.

continue · 2025-12-01T22:27:53Z

Reviewed the PR changes. No documentation updates needed.

Reasoning:

Changes are internal API refactoring (compactChatHistory signature, token accounting improvements)
No user-facing configuration or behavior changes
The function is not part of a public/documented API (only used internally within CLI)
User-visible auto-compaction behavior remains unchanged

The improvements to token accounting and system message handling are implementation details that enhance correctness without affecting how users interact with the CLI.

continue · 2025-12-01T22:30:10Z

🤖 All Green agent started: View agent

Add required systemMessage parameter to all processStreamingResponse calls in the test file to match the updated interface signature. Co-authored-by: nate <nate@continue.dev>

Reorder imports to comply with ESLint import/order rules: - Move vitest imports after core imports - Add empty line between import groups - Move services import before local imports Co-authored-by: nate <nate@continue.dev>

ESLint import/order rule requires a blank line between different import groups (parent vs sibling). Co-authored-by: nate <nate@continue.dev>

cubic-dev-ai

4 issues found across 10 files

Prompt for AI agents (all 4 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:109">
P1: The post-tool “force compaction” path never runs when overflow is caused solely by the large system message, because handleAutoCompaction short-circuits before compaction whenever `shouldAutoCompact` is false for the raw history. This makes the new overflow handling throw even though compaction could have resolved it.</violation>

<violation number="2" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P1: Successful post-tool compaction still throws whenever `services.chatHistory` is unavailable, because the code conflates “no service” with “compaction failed” instead of falling back to the returned compacted history.</violation>
</file>

<file name="extensions/cli/src/stream/streamChatResponse.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.ts:470">
P2: `handlePostToolValidation` throws whenever the ChatHistory service is unavailable, so the new call causes forced post-tool compaction to fail in headless contexts that previously worked.</violation>
</file>

<file name="extensions/cli/src/compaction.ts">

<violation number="1" location="extensions/cli/src/compaction.ts:68">
P2: `systemMessageTokens` is already counted inside `historyForCompaction`, so subtracting it again artificially shrinks the available token budget and causes unnecessary pruning of user history.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-01T22:43:05Z

extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts

+        systemMessage,
+      });
+
+    if (wasCompacted && chatHistorySvc) {


P1: Successful post-tool compaction still throws whenever services.chatHistory is unavailable, because the code conflates “no service” with “compaction failed” instead of falling back to the returned compacted history.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119: <comment>Successful post-tool compaction still throws whenever `services.chatHistory` is unavailable, because the code conflates “no service” with “compaction failed” instead of falling back to the returned compacted history.</comment> <file context> @@ -0,0 +1,197 @@ + systemMessage, + }); + + if (wasCompacted && chatHistorySvc) { + chatHistorySvc.setHistory(compactedHistory); + chatHistory = chatHistorySvc.getHistory(); </file context>

✅ Addressed in 14dd72e

cubic-dev-ai · 2025-12-01T22:43:05Z

extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts

+    });
+
+    // Force compaction (compaction now accounts for system message during pruning)
+    const { wasCompacted, chatHistory: compactedHistory } =


P1: The post-tool “force compaction” path never runs when overflow is caused solely by the large system message, because handleAutoCompaction short-circuits before compaction whenever shouldAutoCompact is false for the raw history. This makes the new overflow handling throw even though compaction could have resolved it.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 109: <comment>The post-tool “force compaction” path never runs when overflow is caused solely by the large system message, because handleAutoCompaction short-circuits before compaction whenever `shouldAutoCompact` is false for the raw history. This makes the new overflow handling throw even though compaction could have resolved it.</comment> <file context> @@ -0,0 +1,197 @@ + }); + + // Force compaction (compaction now accounts for system message during pruning) + const { wasCompacted, chatHistory: compactedHistory } = + await handleAutoCompaction(chatHistory, model, llmApi, { + isHeadless, </file context>

cubic-dev-ai · 2025-12-01T22:43:05Z

extensions/cli/src/stream/streamChatResponse.ts

-      }
-    }
+    // After tool execution, validate that we haven't exceeded context limit
+    chatHistory = await handlePostToolValidation(toolCalls, chatHistory, {


P2: handlePostToolValidation throws whenever the ChatHistory service is unavailable, so the new call causes forced post-tool compaction to fail in headless contexts that previously worked.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.ts, line 470: <comment>`handlePostToolValidation` throws whenever the ChatHistory service is unavailable, so the new call causes forced post-tool compaction to fail in headless contexts that previously worked.</comment> <file context> @@ -447,33 +466,29 @@ export async function streamChatResponse( - } - } + // After tool execution, validate that we haven't exceeded context limit + chatHistory = await handlePostToolValidation(toolCalls, chatHistory, { + model, + llmApi, </file context>

cubic-dev-ai · 2025-12-01T22:43:05Z

extensions/cli/src/compaction.ts

+  // Account for system message AND safety buffer
+  const SAFETY_BUFFER = 100;
+  const availableForInput =
+    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;


P2: systemMessageTokens is already counted inside historyForCompaction, so subtracting it again artificially shrinks the available token budget and causes unnecessary pruning of user history.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/compaction.ts, line 68: <comment>`systemMessageTokens` is already counted inside `historyForCompaction`, so subtracting it again artificially shrinks the available token budget and causes unnecessary pruning of user history.</comment> <file context> @@ -56,7 +61,11 @@ export async function compactChatHistory( + // Account for system message AND safety buffer + const SAFETY_BUFFER = 100; + const availableForInput = + contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER; // Check if we need to prune to fit within context </file context>

✅ Addressed in 14dd72e

cubic-dev-ai

4 issues found across 11 files

Prompt for AI agents (all 4 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts:197">
P2: This test asserts that `processStreamingResponse` rejects even though the total tokens exactly equal the context limit with the safety buffer, so validation should succeed. Change the test to expect a successful response (and mock `chatCompletionStream`) instead of `.rejects.toThrow()`.</violation>
</file>

<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P1: Compaction after tool calls immediately throws whenever the chat history service is unavailable, even if compaction succeeded; use the returned compactedHistory as a fallback instead of treating it as a failure.</violation>
</file>

<file name="extensions/cli/src/compaction.ts">

<violation number="1" location="extensions/cli/src/compaction.ts:68">
P2: System message tokens are already counted inside `historyForCompaction`, so subtracting `systemMessageTokens` again shrinks the available input budget by an entire system prompt and causes compaction to over‑prune. Remove the extra subtraction so only the safety buffer is reserved.</violation>
</file>

<file name="extensions/cli/src/util/tokenizer.ts">

<violation number="1" location="extensions/cli/src/util/tokenizer.ts:80">
P2: `toolState.output` names are never sent to the model, so counting `encode(item.name)` overestimates input tokens and causes unnecessary compaction/validation failures.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-01T22:50:56Z

extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts

+        abortController,
+        systemMessage,
+      }),
+    ).rejects.toThrow(); // Will fail because we can't prune enough


P2: This test asserts that processStreamingResponse rejects even though the total tokens exactly equal the context limit with the safety buffer, so validation should succeed. Change the test to expect a successful response (and mock chatCompletionStream) instead of .rejects.toThrow().

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.systemMessage.test.ts, line 197: <comment>This test asserts that `processStreamingResponse` rejects even though the total tokens exactly equal the context limit with the safety buffer, so validation should succeed. Change the test to expect a successful response (and mock `chatCompletionStream`) instead of `.rejects.toThrow()`.</comment> <file context> @@ -0,0 +1,199 @@ + abortController, + systemMessage, + }), + ).rejects.toThrow(); // Will fail because we can't prune enough + }); +}); </file context>

cubic-dev-ai · 2025-12-01T22:50:57Z

extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts

+    if (wasCompacted && chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
+
+      // Verify compaction brought us under the limit
+      const postCompactionValidation = validateContextLength(
+        [postToolSystemItem, ...compactedHistory],
+        model,
+        SAFETY_BUFFER,
+      );
+
+      if (!postCompactionValidation.isValid) {
+        logger.error(
+          "Compaction failed to bring context under limit, stopping execution",
+          {
+            inputTokens: postCompactionValidation.inputTokens,
+            contextLimit: postCompactionValidation.contextLimit,
+          },
+        );
+        throw new Error(
+          `Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
+        );
+      }
+
+      logger.info("Successfully compacted after tool overflow", {
+        inputTokens: postCompactionValidation.inputTokens,
+        contextLimit: postCompactionValidation.contextLimit,
+      });
+    } else {
+      // Compaction failed, cannot continue
+      logger.error("Failed to compact history after tool execution overflow");
+      throw new Error(


P1: Compaction after tool calls immediately throws whenever the chat history service is unavailable, even if compaction succeeded; use the returned compactedHistory as a fallback instead of treating it as a failure.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119: <comment>Compaction after tool calls immediately throws whenever the chat history service is unavailable, even if compaction succeeded; use the returned compactedHistory as a fallback instead of treating it as a failure.</comment> <file context> @@ -0,0 +1,197 @@ + systemMessage, + }); + + if (wasCompacted && chatHistorySvc) { + chatHistorySvc.setHistory(compactedHistory); + chatHistory = chatHistorySvc.getHistory(); </file context>

Suggested change

if (wasCompacted && chatHistorySvc) {

chatHistorySvc.setHistory(compactedHistory);

chatHistory = chatHistorySvc.getHistory();

// Verify compaction brought us under the limit

const postCompactionValidation = validateContextLength(

[postToolSystemItem, ...compactedHistory],

model,

SAFETY_BUFFER,

);

if (!postCompactionValidation.isValid) {

logger.error(

"Compaction failed to bring context under limit, stopping execution",

{

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

},

);

throw new Error(

`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,

);

}

logger.info("Successfully compacted after tool overflow", {

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

});

} else {

// Compaction failed, cannot continue

logger.error("Failed to compact history after tool execution overflow");

throw new Error(

if (wasCompacted) {

if (chatHistorySvc) {

chatHistorySvc.setHistory(compactedHistory);

chatHistory = chatHistorySvc.getHistory();

} else {

chatHistory = [...compactedHistory];

}

// Verify compaction brought us under the limit

const postCompactionValidation = validateContextLength(

[postToolSystemItem, ...chatHistory],

model,

SAFETY_BUFFER,

);

if (!postCompactionValidation.isValid) {

logger.error(

"Compaction failed to bring context under limit, stopping execution",

{

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

},

);

throw new Error(

`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,

);

}

logger.info("Successfully compacted after tool overflow", {

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

});

} else {

// Compaction failed, cannot continue

logger.error("Failed to compact history after tool execution overflow");

throw new Error(

"Context limit exceeded and compaction failed. Unable to continue.",

);

}

✅ Addressed in 14dd72e

cubic-dev-ai · 2025-12-01T22:50:57Z

extensions/cli/src/compaction.ts

+  // Account for system message AND safety buffer
+  const SAFETY_BUFFER = 100;
+  const availableForInput =
+    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;


P2: System message tokens are already counted inside historyForCompaction, so subtracting systemMessageTokens again shrinks the available input budget by an entire system prompt and causes compaction to over‑prune. Remove the extra subtraction so only the safety buffer is reserved.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/compaction.ts, line 68: <comment>System message tokens are already counted inside `historyForCompaction`, so subtracting `systemMessageTokens` again shrinks the available input budget by an entire system prompt and causes compaction to over‑prune. Remove the extra subtraction so only the safety buffer is reserved.</comment> <file context> @@ -56,7 +61,11 @@ export async function compactChatHistory( + // Account for system message AND safety buffer + const SAFETY_BUFFER = 100; + const availableForInput = + contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER; // Check if we need to prune to fit within context </file context>

Suggested change

contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;

contextLimit - reservedForOutput - SAFETY_BUFFER;

✅ Addressed in 14dd72e

cubic-dev-ai · 2025-12-01T22:50:57Z

extensions/cli/src/util/tokenizer.ts

+    if (item.content) {
+      tokenCount += encode(item.content).length;
+    }
+    if (item.name) {


P2: toolState.output names are never sent to the model, so counting encode(item.name) overestimates input tokens and causes unnecessary compaction/validation failures.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/util/tokenizer.ts, line 80: <comment>`toolState.output` names are never sent to the model, so counting `encode(item.name)` overestimates input tokens and causes unnecessary compaction/validation failures.</comment> <file context> @@ -22,6 +22,69 @@ export function getModelContextLimit(model: ModelConfig): number { + if (item.content) { + tokenCount += encode(item.content).length; + } + if (item.name) { + tokenCount += encode(item.name).length; + } </file context>

✅ Addressed in 14dd72e

cubic-dev-ai

3 issues found across 11 files

Prompt for AI agents (all 3 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P1: Forced post-tool compaction is treated as a failure whenever `services.chatHistory` is unavailable, causing this path to throw even though compaction succeeded and a valid in-memory history exists. Provide a fallback that updates `chatHistory` locally when the service cannot be used instead of throwing.</violation>
</file>

<file name="extensions/cli/src/compaction.ts">

<violation number="1" location="extensions/cli/src/compaction.ts:67">
P2: System message tokens are removed twice, causing over-aggressive compaction and loss of useful history.</violation>
</file>

<file name="extensions/cli/src/stream/streamChatResponse.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.ts:480">
P1: `handleNormalAutoCompaction` drops the compacted history whenever the chat-history service is unavailable, so the 80 % safety compaction no longer runs in those environments.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-01T22:57:49Z

extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts

+    if (wasCompacted && chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
+
+      // Verify compaction brought us under the limit
+      const postCompactionValidation = validateContextLength(
+        [postToolSystemItem, ...compactedHistory],
+        model,
+        SAFETY_BUFFER,
+      );


P1: Forced post-tool compaction is treated as a failure whenever services.chatHistory is unavailable, causing this path to throw even though compaction succeeded and a valid in-memory history exists. Provide a fallback that updates chatHistory locally when the service cannot be used instead of throwing.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119: <comment>Forced post-tool compaction is treated as a failure whenever `services.chatHistory` is unavailable, causing this path to throw even though compaction succeeded and a valid in-memory history exists. Provide a fallback that updates `chatHistory` locally when the service cannot be used instead of throwing.</comment> <file context> @@ -0,0 +1,197 @@ + systemMessage, + }); + + if (wasCompacted && chatHistorySvc) { + chatHistorySvc.setHistory(compactedHistory); + chatHistory = chatHistorySvc.getHistory(); </file context>

Suggested change

if (wasCompacted && chatHistorySvc) {

chatHistorySvc.setHistory(compactedHistory);

chatHistory = chatHistorySvc.getHistory();

// Verify compaction brought us under the limit

const postCompactionValidation = validateContextLength(

[postToolSystemItem, ...compactedHistory],

model,

SAFETY_BUFFER,

);

if (wasCompacted) {

if (

typeof chatHistorySvc?.isReady === "function" &&

chatHistorySvc.isReady()

) {

chatHistorySvc.setHistory(compactedHistory);

chatHistory = chatHistorySvc.getHistory();

} else {

chatHistory = [...compactedHistory];

}

// Verify compaction brought us under the limit

const postCompactionValidation = validateContextLength(

[postToolSystemItem, ...chatHistory],

model,

SAFETY_BUFFER,

);

✅ Addressed in 14dd72e

cubic-dev-ai · 2025-12-01T22:57:49Z

extensions/cli/src/compaction.ts

+  const availableForInput =
+    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;


P2: System message tokens are removed twice, causing over-aggressive compaction and loss of useful history.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/compaction.ts, line 67: <comment>System message tokens are removed twice, causing over-aggressive compaction and loss of useful history.</comment> <file context> @@ -56,7 +61,11 @@ export async function compactChatHistory( + + // Account for system message AND safety buffer + const SAFETY_BUFFER = 100; + const availableForInput = + contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER; </file context>

Suggested change

const availableForInput =

contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;

const hasSystemMessage = historyToUse.some(

(item) => item.message.role === "system",

);

const reservedForSystem = hasSystemMessage ? 0 : systemMessageTokens;

const availableForInput =

contextLimit - reservedForOutput - reservedForSystem - SAFETY_BUFFER;

✅ Addressed in 14dd72e

cubic-dev-ai · 2025-12-01T22:57:50Z

extensions/cli/src/stream/streamChatResponse.ts

+    });
+
+    // Normal auto-compaction check at 80% threshold
+    chatHistory = await handleNormalAutoCompaction(


P1: handleNormalAutoCompaction drops the compacted history whenever the chat-history service is unavailable, so the 80 % safety compaction no longer runs in those environments.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.ts, line 480: <comment>`handleNormalAutoCompaction` drops the compacted history whenever the chat-history service is unavailable, so the 80 % safety compaction no longer runs in those environments.</comment> <file context> @@ -447,33 +466,29 @@ export async function streamChatResponse( + }); + + // Normal auto-compaction check at 80% threshold + chatHistory = await handleNormalAutoCompaction( + chatHistory, + shouldContinue, </file context>

cubic-dev-ai

1 issue found across 11 files

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts:119">
P2: Post-tool compaction now throws whenever chatHistorySvc isn’t available even though handleAutoCompaction already returned compactedHistory; this makes tool-overflow recovery fail in headless/early cases where the service isn’t ready. Treat the service as optional: update it when available, fall back to the locally compacted history, and only throw when compaction itself failed.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-01T23:02:59Z

extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts

+    if (wasCompacted && chatHistorySvc) {
+      chatHistorySvc.setHistory(compactedHistory);
+      chatHistory = chatHistorySvc.getHistory();
+
+      // Verify compaction brought us under the limit
+      const postCompactionValidation = validateContextLength(
+        [postToolSystemItem, ...compactedHistory],
+        model,
+        SAFETY_BUFFER,
+      );
+
+      if (!postCompactionValidation.isValid) {
+        logger.error(
+          "Compaction failed to bring context under limit, stopping execution",
+          {
+            inputTokens: postCompactionValidation.inputTokens,
+            contextLimit: postCompactionValidation.contextLimit,
+          },
+        );
+        throw new Error(
+          `Context limit exceeded even after compaction: ${postCompactionValidation.error}`,
+        );
+      }
+
+      logger.info("Successfully compacted after tool overflow", {
+        inputTokens: postCompactionValidation.inputTokens,
+        contextLimit: postCompactionValidation.contextLimit,
+      });
+    } else {
+      // Compaction failed, cannot continue
+      logger.error("Failed to compact history after tool execution overflow");
+      throw new Error(
+        "Context limit exceeded and compaction failed. Unable to continue.",
+      );
+    }


P2: Post-tool compaction now throws whenever chatHistorySvc isn’t available even though handleAutoCompaction already returned compactedHistory; this makes tool-overflow recovery fail in headless/early cases where the service isn’t ready. Treat the service as optional: update it when available, fall back to the locally compacted history, and only throw when compaction itself failed.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At extensions/cli/src/stream/streamChatResponse.compactionHelpers.ts, line 119: <comment>Post-tool compaction now throws whenever chatHistorySvc isn’t available even though handleAutoCompaction already returned compactedHistory; this makes tool-overflow recovery fail in headless/early cases where the service isn’t ready. Treat the service as optional: update it when available, fall back to the locally compacted history, and only throw when compaction itself failed.</comment> <file context> @@ -0,0 +1,197 @@ + systemMessage, + }); + + if (wasCompacted && chatHistorySvc) { + chatHistorySvc.setHistory(compactedHistory); + chatHistory = chatHistorySvc.getHistory(); </file context>

Suggested change

if (wasCompacted && chatHistorySvc) {

chatHistorySvc.setHistory(compactedHistory);

chatHistory = chatHistorySvc.getHistory();

// Verify compaction brought us under the limit

const postCompactionValidation = validateContextLength(

[postToolSystemItem, ...compactedHistory],

model,

SAFETY_BUFFER,

);

if (!postCompactionValidation.isValid) {

logger.error(

"Compaction failed to bring context under limit, stopping execution",

{

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

},

);

throw new Error(

`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,

);

}

logger.info("Successfully compacted after tool overflow", {

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

});

} else {

// Compaction failed, cannot continue

logger.error("Failed to compact history after tool execution overflow");

throw new Error(

"Context limit exceeded and compaction failed. Unable to continue.",

);

}

if (wasCompacted) {

if (chatHistorySvc) {

chatHistorySvc.setHistory(compactedHistory);

chatHistory = chatHistorySvc.getHistory();

} else {

chatHistory = [...compactedHistory];

}

// Verify compaction brought us under the limit

const postCompactionValidation = validateContextLength(

[postToolSystemItem, ...chatHistory],

model,

SAFETY_BUFFER,

);

if (!postCompactionValidation.isValid) {

logger.error(

"Compaction failed to bring context under limit, stopping execution",

{

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

},

);

throw new Error(

`Context limit exceeded even after compaction: ${postCompactionValidation.error}`,

);

}

logger.info("Successfully compacted after tool overflow", {

inputTokens: postCompactionValidation.inputTokens,

contextLimit: postCompactionValidation.contextLimit,

});

} else {

// Compaction failed, cannot continue

logger.error("Failed to compact history after tool execution overflow");

throw new Error(

"Context limit exceeded and compaction failed. Unable to continue.",

);

}

✅ Addressed in 14dd72e

Address code review feedback: 1. Add fallback to use compacted history directly when ChatHistoryService is unavailable (fixes P1 issue in handlePostToolValidation and handleNormalAutoCompaction) 2. Prevent double-counting of system message tokens by checking if system message is already in history before subtracting tokens 3. Remove tool output name from token count since it's not sent to model This ensures compaction works correctly in headless contexts and provides more accurate token estimation. Co-authored-by: nate <nate@continue.dev>

continue · 2025-12-01T23:04:11Z

Thanks @cubic-dev-ai for the detailed review! I've addressed the issues:

P1: Fallback when ChatHistoryService unavailable - Fixed in both handlePostToolValidation and handleNormalAutoCompaction to use the compacted history directly when the service is unavailable, ensuring headless contexts work correctly.
P2: System message double-counting - Added a check to detect if the system message is already in the history before subtracting its token count, preventing over-aggressive compaction.
P2: Tool output name tokens - Removed item.name from tool output token counting since it's only used for internal tracking and not sent to the model.

All fixes are in commit 14dd72e.

sestinj · 2025-12-02T00:05:19Z

🎉 This PR is included in version 1.33.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

sestinj · 2025-12-02T23:18:38Z

🎉 This PR is included in version 1.8.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

sestinj · 2025-12-02T23:19:02Z

🎉 This PR is included in version 1.37.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

sestinj · 2025-12-04T22:12:14Z

🎉 This PR is included in version 1.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Improve system message token accounting in compaction and context val…

0d25d2e

…idation

sestinj requested a review from a team as a code owner December 1, 2025 22:25

sestinj removed the request for review from a team December 1, 2025 22:25

github-project-automation bot added this to Issues and PRs Dec 1, 2025

sestinj requested a review from RomneyDa December 1, 2025 22:25

github-project-automation bot moved this to Todo in Issues and PRs Dec 1, 2025

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 1, 2025

continue bot and others added 3 commits December 1, 2025 22:34

Fix TypeScript errors in streamChatResponse tests

7e6095d

Add required systemMessage parameter to all processStreamingResponse calls in the test file to match the updated interface signature. Co-authored-by: nate <nate@continue.dev>

Fix import order in streamChatResponse.systemMessage.test.ts

ab9f37d

Reorder imports to comply with ESLint import/order rules: - Move vitest imports after core imports - Add empty line between import groups - Move services import before local imports Co-authored-by: nate <nate@continue.dev>

Add blank line between parent and sibling imports

e86bb5a

ESLint import/order rule requires a blank line between different import groups (parent vs sibling). Co-authored-by: nate <nate@continue.dev>

cubic-dev-ai bot reviewed Dec 1, 2025

View reviewed changes

sestinj merged commit b91c7a6 into main Dec 1, 2025
55 of 58 checks passed

sestinj deleted the nate/system-message-token-accounting branch December 1, 2025 23:16

github-project-automation bot moved this from Todo to Done in Issues and PRs Dec 1, 2025

github-actions bot locked and limited conversation to collaborators Dec 1, 2025

sestinj added the released label Dec 2, 2025

	contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
	contextLimit - reservedForOutput - SAFETY_BUFFER;

		const availableForInput =
		contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;

-  const availableForInput =
-    contextLimit - reservedForOutput - systemMessageTokens - SAFETY_BUFFER;
+  const hasSystemMessage = historyToUse.some(
+    (item) => item.message.role === "system",
+  );
+  const reservedForSystem = hasSystemMessage ? 0 : systemMessageTokens;
+  const availableForInput =
+    contextLimit - reservedForOutput - reservedForSystem - SAFETY_BUFFER;

Improve system message token accounting in compaction and context validation #8955

Improve system message token accounting in compaction and context validation #8955

Uh oh!

Conversation

sestinj commented Dec 1, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

continue bot commented Dec 1, 2025

Uh oh!

continue-development-app bot commented Dec 1, 2025

Uh oh!

continue-development-app bot commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

⚠️ PR Title Format

Expected Format:

Examples:

Valid Types:

Uh oh!

github-actions bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Strengths

⚠️ Issues Found

1. Potential double-counting in tool call validation (streamChatResponse.compactionHelpers.ts:90-97)

2. Missing test for multimodal token counting (util/tokenizer.ts:29-46)

3. Tool output token counting could be unbounded (util/tokenizer.ts:68-82)

4. Inconsistent error handling in handlePostToolValidation (streamChatResponse.compactionHelpers.ts:147-152)

5. System message fetched in autoCompaction when provided (streamChatResponse.autoCompaction.ts:152-160)

6. Breaking change not clearly documented (compaction.ts:22-28)

7. Magic number: SAFETY_BUFFER = 100 (compaction.ts:66, streamChatResponse.ts:168)

8. Test mock incomplete (streamChatResponse.autoCompaction.test.ts:41-42)

🔍 Minor Suggestions

Summary

Uh oh!

continue bot commented Dec 1, 2025

Uh oh!

continue bot commented Dec 1, 2025

Uh oh!

continue bot commented Dec 1, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

sestinj commented Dec 1, 2025 •

edited by cubic-dev-ai bot

Loading

github-actions bot commented Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading

cubic-dev-ai bot Dec 1, 2025 •

edited

Loading