-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Motivation
Tool outputs (shell commands, file reads, test results) are the largest token consumers in the agent context window. Current approach is reactive — truncation and pruning after output exceeds thresholds. Command-aware semantic filtering before insertion into context can achieve 60-90% token savings.
Current State
Zeph has 3-tier compression:
- Tier 0: Overflow file when output > 30K chars
- Tier 1: Head+tail truncation (15K each) or LLM summarization
- Tier 2: Inline pruning of stale tool outputs (keep last 4)
- Tier 3: Budget-triggered compaction at 80% context usage
Problem: All tiers operate on raw text without understanding the command semantics. A cargo test with 200 passing tests and 1 failure still sends ~29K chars of passing test output before truncation kicks in.
Approach
Add a command-aware filtering layer in ShellExecutor that runs between command execution and output return. Each filter understands the output format of specific commands and extracts only the information relevant to the agent.
Architecture
execute_bash() → raw output
↓
sanitize_output() → strip ANSI, progress bars
↓
OutputFilterRegistry::apply(command, output, exit_code) → filtered output
↓
ToolOutput { summary: filtered }
OutputFilter is a trait with a registry of command-specific implementations. Filters are composable and configurable via config.toml.
Issues
| # | Issue | Priority | Size | Savings |
|---|---|---|---|---|
| #427 | Command-aware output filter framework | P0 | L | — (infra) |
| #428 | Test output filter (cargo test/nextest) | P0 | M | 94-99% |
| #429 | Git output compression | P1 | M | 80-99% |
| #430 | Clippy/lint output grouping | P1 | M | 70-90% |
| #431 | Directory listing compression | P2 | S | 50-70% |
| #432 | Log deduplication with pattern counting | P2 | M | 70-85% |
| #433 | ANSI/progress stripping | P1 | S | 5-95% |
| #434 | Token savings tracking and metrics | P2 | S | — (observability) |
Implementation Order
- Phase 1 (P0): feat: ANSI escape and progress bar stripping #433 ANSI stripping → feat: command-aware output filter framework in ShellExecutor #427 Framework → feat: test output filter — failures only for cargo test/nextest #428 Test filter
- Phase 2 (P1): feat: git output compression — compact status/diff/log/push #429 Git → feat: clippy/lint output grouping by rule #430 Clippy
- Phase 3 (P2): feat: directory listing compression — filter noise, tree format #431 Dir listing → feat: log deduplication with pattern normalization and counting #432 Log dedup → feat: token savings tracking and metrics #434 Metrics
Non-goals (v1)
- File content filtering (aggressive/minimal modes) — separate concern
- JSON schema extraction — low priority for agent workflow
Success Criteria
- 50%+ average token reduction on tool outputs in typical agent sessions
- Zero information loss for error/failure cases
- Configurable: filters can be disabled per-command
- All filters have unit tests with real-world output samples