Skip to content

feat: add cache_control support for Claude models in OpenAI provider#3334

Closed
HikaruEgashira wants to merge 1 commit intoblock:mainfrom
HikaruEgashira:cache_control_openai
Closed

feat: add cache_control support for Claude models in OpenAI provider#3334
HikaruEgashira wants to merge 1 commit intoblock:mainfrom
HikaruEgashira:cache_control_openai

Conversation

@HikaruEgashira
Copy link
Contributor

@HikaruEgashira HikaruEgashira commented Jul 10, 2025

Adds conditional cache_control functionality to OpenAI provider when model name contains "claude", enabling prompt caching for LiteLLM and other OpenAI-compatible services.

Fixes #3333

@HikaruEgashira HikaruEgashira force-pushed the cache_control_openai branch 3 times, most recently from af5f819 to a513be5 Compare July 10, 2025 03:59
@michaelneale
Copy link
Collaborator

@HikaruEgashira thanks for this - can you run cargo fmt/check clippy etc - also, is will this work specifically with openrouter - what provider are you testing it with that uses the openai provider in this case?

@michaelneale michaelneale self-assigned this Jul 10, 2025
@michaelneale michaelneale added p1 Priority 1 - High (supports roadmap) waiting labels Jul 10, 2025
Signed-off-by: HikaruEgashira <account@egahika.dev>
@michaelneale
Copy link
Collaborator

/// Update the request when using anthropic model.
/// For anthropic model, we can enable prompt caching to save cost. Since openrouter is the OpenAI compatible
/// endpoint, we need to modify the open ai request to have anthropic cache control field.

Does openrouter not do this as it is offering the openai compatible api? that seem really odd if they aren't? why would the offer an openai like api?

@michaelneale
Copy link
Collaborator

any chance we can confirm before/after caching with openai to know it does need this header? (still seems odd to me)

@HikaruEgashira
Copy link
Contributor Author

HikaruEgashira commented Jul 10, 2025

I tested this program with LiteLLM. This header does not work with pure OpenAI but most OpenAI Compatible will work. I provide the usage from the system log.

before (1.0.29)

\"model\": \"anthropic.claude-3-5-haiku-20241022-v1:0\",\n  \"object\": \"chat.completion\",\n  \"system_fingerprint\": null,\n  \"usage\": {\n    \"cache_creation_input_tokens\": 0,\n    \"cache_read_input_tokens\": 0,\n    \"completion_tokens\": 554,\n    \"completion_tokens_details\": null,\n    \"prompt_tokens\": 5902,\n    \"prompt_tokens_details\": {\n      \"audio_tokens\": null,\n      \"cached_tokens\": 0\n    },\n    \"total_tokens\": 6456\n  }\n}","input_tokens":"5902","output_tokens":"554","total_tokens":"6456"},"target":"goose::providers::utils","span":{"name":"complete"},"spans":[{"name":"complete"}]}

after

\"model\": \"anthropic.claude-3-5-haiku-20241022-v1:0\",\n  \"object\": \"chat.completion\",\n  \"system_fingerprint\": null,\n  \"usage\": {\n    \"cache_creation_input_tokens\": 0,\n    \"cache_read_input_tokens\": 7593,\n    \"completion_tokens\": 68,\n    \"completion_tokens_details\": null,\n    \"prompt_tokens\": 8025,\n    \"prompt_tokens_details\": {\n      \"audio_tokens\": null,\n      \"cached_tokens\": 7593\n    },\n    \"total_tokens\": 8093\n  }\n}","input_tokens":"8025","output_tokens":"68","total_tokens":"8093"},"target":"goose::providers::utils","span":{"name":"complete"},"spans":[{"name":"complete"}]}

https://docs.litellm.ai/docs/completion/prompt_caching

create_request(&self.model, system, messages, tools, &ImageFormat::OpenAi)?;

// Add cache_control for claude models (LiteLLM and other OpenAI-compatible services)
if self.model.model_name.to_lowercase().contains("claude") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add a method to the Provider trait for this instead?

@michaelneale
Copy link
Collaborator

I think if liteLLM offers an openai api - that should include things like caching abstraction on it, that should be part of its job.

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a liteLLM provider specifically - we shouldn't be changing the openai provider for specific middleware routers which are lacking features (but if we have a liteLLM provider - that would be ideal - and leave the openai one as it is).

This could be done by cloning the openai provider, and adding in liteLLM specific code (and things like this) which is a better experience as well (as there likely will be other things like this which are liteLLM specific - as liteLLM is important enough I think to justify this)

@michaelneale michaelneale added status: backlog and removed p1 Priority 1 - High (supports roadmap) labels Jul 11, 2025
@HikaruEgashira
Copy link
Contributor Author

OK! I'll add new provider. Thanks for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cache_control support to OpenAI provider for Claude models

3 participants