M21: Token optimization -- prompt caching, local summarization, context pruning by bug-ops · Pull Request #341 · bug-ops/zeph

bug-ops · 2026-02-15T22:02:32Z

Summary

Add Anthropic prompt caching with structured system content blocks and anthropic-beta header (M21-P1: Anthropic prompt caching for ClaudeProvider #337)
Add configurable summary provider for routing tool output summarization through a local model (M21-P2: Local model for tool output summarization #338)
Add aggressive inline pruning of stale tool outputs during tool loop iterations (M21-P3: Aggressive context pruning in tool loops #339)
Propagate cache usage metrics (creation/read tokens) from API responses to MetricsSnapshot (M21-P4: Cache usage metrics tracking #340)

Estimated input token reduction: 80-90% from prompt caching alone.

Test plan

cargo nextest: 1408 tests passed
cargo clippy: zero warnings
Security audit: PASS (no new attack vectors)
Code review: APPROVE
Manual verification of cache hit rates via API usage dashboard

Closes #336, closes #337, closes #338, closes #339, closes #340

Split system prompt into structured content blocks with cache_control markers. Add anthropic-beta: prompt-caching-2024-07-31 header to all requests. Parse API usage response for cache hit tracking. Closes #337

Route tool output summarization and context compaction through a configurable local model (summary_model config option). Add inline pruning of stale tool outputs in the tool loop to reduce context growth per iteration. Closes #338, closes #339

Add last_cache_usage() to LlmProvider trait for reading cache hit statistics. ClaudeProvider stores cache_creation and cache_read token counts from each response. Agent records these into MetricsSnapshot after each LLM call. Closes #340

codecov-commenter · 2026-02-15T22:07:19Z

Codecov Report

❌ Patch coverage is 70.29412% with 101 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/zeph-llm/src/claude.rs	72.54%	42 Missing ⚠️
src/main.rs	0.00%	19 Missing ⚠️
crates/zeph-tui/src/widgets/resources.rs	0.00%	11 Missing ⚠️
crates/zeph-core/src/agent/mod.rs	50.00%	8 Missing ⚠️
crates/zeph-llm/src/orchestrator/router.rs	0.00%	7 Missing ⚠️
crates/zeph-llm/src/orchestrator/mod.rs	0.00%	5 Missing ⚠️
crates/zeph-core/src/agent/context.rs	96.55%	4 Missing ⚠️
crates/zeph-llm/src/any.rs	0.00%	3 Missing ⚠️
crates/zeph-core/src/agent/streaming.rs	66.66%	2 Missing ⚠️

@@            Coverage Diff             @@
##             main     #341      +/-   ##
==========================================
- Coverage   79.35%   79.23%   -0.13%     
==========================================
  Files          99       99              
  Lines       25076    25406     +330     
==========================================
+ Hits        19900    20131     +231     
- Misses       5176     5275      +99

Files with missing lines	Coverage Δ
crates/zeph-core/src/config/types.rs	`96.86% <100.00%> (+0.01%)`	⬆️
crates/zeph-core/src/metrics.rs	`100.00% <ø> (ø)`
crates/zeph-llm/src/provider.rs	`90.58% <100.00%> (+0.06%)`	⬆️
crates/zeph-core/src/agent/streaming.rs	`50.83% <66.66%> (+0.11%)`	⬆️
crates/zeph-llm/src/any.rs	`90.21% <0.00%> (-1.00%)`	⬇️
crates/zeph-core/src/agent/context.rs	`85.47% <96.55%> (+1.15%)`	⬆️
crates/zeph-llm/src/orchestrator/mod.rs	`90.99% <0.00%> (-0.90%)`	⬇️
crates/zeph-llm/src/orchestrator/router.rs	`54.46% <0.00%> (-3.64%)`	⬇️
crates/zeph-core/src/agent/mod.rs	`81.98% <50.00%> (-0.39%)`	⬇️
crates/zeph-tui/src/widgets/resources.rs	`0.00% <0.00%> (ø)`
... and 2 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Show cache write/read token counts conditionally when the active provider reports non-zero cache usage.

…zation Ensure tool_use blocks are only emitted in assistant messages and tool_result blocks only in user messages. Misplaced parts are downgraded to text blocks to prevent Anthropic API 400 errors.

Orchestrator was missing last_cache_usage() delegation, causing cache metrics to never reach MetricsSnapshot when using provider=orchestrator.

bug-ops added 4 commits February 15, 2026 22:50

feat: add Anthropic prompt caching for ClaudeProvider

dc15e95

Split system prompt into structured content blocks with cache_control markers. Add anthropic-beta: prompt-caching-2024-07-31 header to all requests. Parse API usage response for cache hit tracking. Closes #337

docs: update CHANGELOG for M21 token optimization

202fdfa

github-actions bot added documentation Improvements or additions to documentation llm LLM provider related rust core config size/L labels Feb 15, 2026

bug-ops added 3 commits February 15, 2026 23:18

feat: display cache metrics in TUI resources panel

fef3cf9

Show cache write/read token counts conditionally when the active provider reports non-zero cache usage.

fix: guard tool_use/tool_result blocks by message role in API seriali…

d0f69d3

…zation Ensure tool_use blocks are only emitted in assistant messages and tool_result blocks only in user messages. Misplaced parts are downgraded to text blocks to prevent Anthropic API 400 errors.

fix: delegate last_cache_usage through orchestrator and SubProvider

788f150

Orchestrator was missing last_cache_usage() delegation, causing cache metrics to never reach MetricsSnapshot when using provider=orchestrator.

github-actions bot added size/XL and removed size/L labels Feb 15, 2026

bug-ops merged commit 9f821f1 into main Feb 15, 2026
18 checks passed

bug-ops deleted the feat/m21/token-optimization branch February 15, 2026 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

M21: Token optimization -- prompt caching, local summarization, context pruning#341

M21: Token optimization -- prompt caching, local summarization, context pruning#341
bug-ops merged 7 commits intomainfrom
feat/m21/token-optimization

bug-ops commented Feb 15, 2026

Uh oh!

codecov-commenter commented Feb 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

bug-ops commented Feb 15, 2026

Summary

Test plan

Uh oh!

codecov-commenter commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Feb 15, 2026 •

edited

Loading