fix(session): optimize system reminder to reduce token usage #11136
+98
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR fixes a token amplification issue caused by the watchdog reminder logic.
Previously, the session loop would wrap every queued user message in
<system-reminder>tags and inline the user’s full text on every iteration whenstep > 1. As a result, the same user content was repeatedly duplicated in the model input, causing input tokens to grow with both the number of loop iterations and the length of queued messages (effectively O(steps × message_length)). This led to rapid context window exhaustion and unnecessarily high inference costs.Changes in this PR
Refactored reminder handling
Removed the logic that mutates user message parts. User messages remain intact and are no longer rewritten/expanded during processing.
Optimized prompt injection
Replaced “copy user text into a reminder” with a concise system reminder that references the existence/count of queued messages (e.g., “There are X new user messages waiting…”), without duplicating user content.
Added throttling to prevent reminder spam
Implemented exponential backoff for reminder injection (starting at 15s, doubling up to 5 minutes), so long-running tasks are not repeatedly interrupted by identical reminders.
Added reminder deduplication state
Introduced
queuedReminderstate to track the latest queued user message ID and avoid reinjecting reminders for the same queued message on every loop cycle.How did you verify your code works?
Token usage verification
Simulated multi-turn conversations where the user sends multiple messages while the agent is busy. Confirmed input token size remains stable and does not grow explosively over repeated loop iterations.
Behavioral testing
Verified the agent still detects queued user messages and prioritizes responding to them, but now respects the backoff window instead of emitting reminders every iteration.
Regression testing
Exercised normal chat flow, compaction, and tool execution paths to confirm no behavioral regressions and that tasks still complete successfully.
Build check
Ran
bun run buildlocally to ensure the changes compile cleanly with no type errors.Issue
Fixes #11142