Parent: #336
Summary
Enable Anthropic prompt caching by sending structured system content blocks with cache_control markers. Expected to reduce input tokens by 80-90%.
Requirements
- Add
anthropic-beta: prompt-caching-2024-07-31 header to all Claude API requests
- Convert
system field from Option<&str> to Option<Vec<SystemContentBlock>> in request bodies
- Split system prompt into cacheable blocks:
- Block 1 (cached): base prompt + active skills
- Block 2 (cached): tool catalog + environment context
- Block 3 (not cached): project configs, repo map, MCP prompt
- Inject section markers in
rebuild_system_prompt for splitting
- Parse
usage.cache_read_input_tokens from responses
Acceptance Criteria
Files
crates/zeph-llm/src/claude.rs
crates/zeph-core/src/agent/context.rs