Skip to content

Conversation

@harshav167
Copy link

Summary

Adds an opt-in preemptive compaction feature that proactively manages context before it exceeds model limits. When enabled, it monitors token usage after each message and triggers compaction when usage exceeds a configurable threshold (default: 85%).

Multi-Phase Compaction Flow

  1. DCP Strategies — Runs all enabled DCP strategies (deduplication, supersede writes, purge errors)
  2. Tool Truncation — If still over threshold, truncates large tool outputs (preserving recent messages)
  3. Summarization — If still over threshold, triggers OpenCode's built-in summarization

Features

  • Model-aware context limits (Claude, GPT-4, Gemini, o1/o3, etc.)
  • Configurable threshold, cooldown, and minimum tokens
  • Protected recent messages from truncation
  • Cooldown to prevent rapid re-compaction
  • Event hook integration via message.updated

Configuration

Disabled by default (opt-in):

{
    "preemptiveCompaction": {
        "enabled": false,           // opt-in
        "threshold": 0.85,          // 85% context usage triggers compaction
        "cooldownMs": 60000,        // 1 minute between compactions
        "minTokens": 50000,         // minimum tokens before compaction can trigger
        "truncation": {
            "enabled": true,
            "protectedMessages": 3  // protect last 3 messages from truncation
        }
    }
}

Files Changed

  • lib/preemptive/ - New preemptive compaction module
    • types.ts - Type definitions
    • constants.ts - Constants and model context limits
    • storage.ts - Truncation utilities
    • model-limits.ts - Context limit inference
    • index.ts - Main handler
  • lib/config.ts - Added PreemptiveCompaction config interface and defaults
  • dcp.schema.json - Added preemptiveCompaction schema
  • index.ts - Registered event hook
  • README.md - Documentation

Testing

  • Build passes: bun run build
  • No TypeScript errors

Adds an opt-in preemptive compaction feature that proactively manages
context before it exceeds model limits. When enabled, it monitors token
usage after each message and triggers compaction when usage exceeds a
configurable threshold (default: 85%).

Multi-phase compaction flow:
1. DCP Strategies - Runs all enabled DCP strategies first
2. Tool Truncation - Truncates large tool outputs if still over threshold
3. Summarization - Triggers OpenCode summarization as last resort

Features:
- Model-aware context limits (Claude, GPT-4, Gemini, etc.)
- Configurable threshold, cooldown, and minimum tokens
- Protected recent messages from truncation
- Cooldown to prevent rapid re-compaction

Configuration (disabled by default):
  preemptiveCompaction:
    enabled: false
    threshold: 0.85
    cooldownMs: 60000
    minTokens: 50000
    truncation:
      enabled: true
      protectedMessages: 3
@Tarquinen
Copy link
Collaborator

Hey, I don't think I can accept this for a few reasons:

  1. Compaction is complicated and outside the scope of this project. It's possible that when this plugin matures, compaction may not be necessary with it. There's probably a lot that can be done to improve the default compaction, and if you're interested in that I would encourage you to make a plugin for it seperately.
  2. The way you're detecting model context sizes in this PR is very fragile and will require lots of maintenance. I don't think hard coding values the way you've done is a good idea, you should at least use models.dev in a similar way to opencode. This implementation also doesn't account for custom models created in the opencode config.

Let me know what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants