-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
4 / 44 of 4 issues completed
Copy link
Labels
enhancementNew feature or requestNew feature or requestepicMilestone-level tracking issueMilestone-level tracking issue
Description
Problem
Usage analysis shows catastrophic token waste: ~1.9M input tokens vs ~9.8K output (200:1 ratio) with zero cache hits across all requests.
Root Causes
- No prompt caching --
ClaudeProvidersends system prompt + skills as plain text every request. Anthropic prompt caching API is not used. - Tool loop amplification -- each iteration of
process_response_native_toolsresends the full message history. With 10 iterations, that is 10x the system prompt + history. - LLM-based summarization uses primary model --
summarize_tool_outputandcompact_contextmake separate Claude API calls. - Bloated system prompt --
rebuild_system_promptinjects skills + catalog + environment + tool catalog + MCP prompt + project configs + repo map.
Estimated Impact
| Optimization | Token Reduction | Effort |
|---|---|---|
| Prompt caching | 80-90% | Medium |
| Local model for summarization | Eliminates extra API calls | Low |
| Aggressive context pruning | 30-50% of history | Low |
| Usage metrics | Observability | Low |
Phases
- M21-P1: Anthropic prompt caching for ClaudeProvider #337 Phase 1: Prompt caching (structured system blocks + anthropic-beta header)
- M21-P2: Local model for tool output summarization #338 Phase 2: Local model for tool output summarization
- M21-P3: Aggressive context pruning in tool loops #339 Phase 3: Aggressive context pruning in tool loops
- M21-P4: Cache usage metrics tracking #340 Phase 4: Cache usage metrics tracking
Architecture
See `.local/plan/m21-token-optimization.md`
Key Files
- `crates/zeph-llm/src/claude.rs`
- `crates/zeph-core/src/agent/streaming.rs`
- `crates/zeph-core/src/agent/context.rs`
- `crates/zeph-llm/src/provider.rs`
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestepicMilestone-level tracking issueMilestone-level tracking issue