Skip to content

OpenAI prompt caching: parse and report cached token usage #348

@bug-ops

Description

@bug-ops

Summary

Implement prompt caching metrics reporting for the OpenAI provider, mirroring the existing Claude implementation. OpenAI caches prompts automatically for requests >= 1024 tokens and reports cached token counts in the usage response. We need to parse and surface this data.

Background

The Claude provider already tracks cache usage via:

  • last_cache: Mutex<Option<(u64, u64)>> field
  • ApiUsage struct with cache_creation_input_tokens / cache_read_input_tokens
  • last_cache_usage() trait method on LlmProvider
  • record_cache_usage() in Agent aggregates into TUI metrics

OpenAI returns cached token counts in usage.prompt_tokens_details.cached_tokens. No API changes are needed to enable caching -- it is automatic. We only need to parse the response.

Architecture plan

See .local/plan/openai-prompt-caching.md

Tasks

Files to modify

  • crates/zeph-llm/src/openai.rs

Acceptance criteria

  • OpenAiProvider::last_cache_usage() returns (0, cached_tokens) after API calls
  • Cache usage is logged via tracing::debug! (matching Claude pattern)
  • TUI metrics display OpenAI cached tokens via existing record_cache_usage() path
  • All existing tests pass, new unit tests for usage deserialization added
  • cargo clippy --workspace -- -D warnings passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions