-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Implement prompt caching metrics reporting for the OpenAI provider, mirroring the existing Claude implementation. OpenAI caches prompts automatically for requests >= 1024 tokens and reports cached token counts in the usage response. We need to parse and surface this data.
Background
The Claude provider already tracks cache usage via:
last_cache: Mutex<Option<(u64, u64)>>fieldApiUsagestruct withcache_creation_input_tokens/cache_read_input_tokenslast_cache_usage()trait method onLlmProviderrecord_cache_usage()in Agent aggregates into TUI metrics
OpenAI returns cached token counts in usage.prompt_tokens_details.cached_tokens. No API changes are needed to enable caching -- it is automatic. We only need to parse the response.
Architecture plan
See .local/plan/openai-prompt-caching.md
Tasks
- M0: Project bootstrap — workspace and crate skeleton #1: Add usage deserialization structs and
last_cachefield toOpenAiProvider - M1: Ollama chat loop — interactive CLI with LLM #2: Parse cache usage from chat and tool responses
- M2: Skills system — SKILL.md loading and shell execution #3: Implement
last_cache_usage()for OpenAI provider - M3: Persistent memory and Claude provider #4: Add unit tests for usage parsing
Files to modify
crates/zeph-llm/src/openai.rs
Acceptance criteria
OpenAiProvider::last_cache_usage()returns(0, cached_tokens)after API calls- Cache usage is logged via
tracing::debug!(matching Claude pattern) - TUI metrics display OpenAI cached tokens via existing
record_cache_usage()path - All existing tests pass, new unit tests for usage deserialization added
cargo clippy --workspace -- -D warningspasses
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request