Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable prompt cache for anthropic #631

Merged
merged 3 commits into from
Jan 16, 2025
Merged

Conversation

yingjiehe-xyz
Copy link
Collaborator

@yingjiehe-xyz yingjiehe-xyz commented Jan 16, 2025

Enable prompt cache for Anthropic following https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#how-prompt-caching-works.

Generally, we add "cache_control": {"type": "ephemeral"} into tool, system and message sections. The cache hit can be verified with the usage response, like

usage: {
cache_creation_input_tokens: 1479
cache_read_input_tokens: 0
input_tokens: 4
output_tokens: 78
}

And currently, “ephemeral” is the only supported cache type, which corresponds to this 5-minute lifetime.

Cost saving: From the pricing,

  1. Cache write tokens are 25% more expensive than base input tokens
  2. Cache read tokens are 90% cheaper than base input tokens

Assume we have N turns conversion, S denotes to system prompt length and the average new tokens(user inputs + outputs) length for each turn is M
Before cache: our estimated cost is around

  1. S + (S + M) + (S + M * 2) + ... + (S + M * (N - 1)) = S * N + (N - 1) * N / 2 * M
    After cache: our estimated cost is around
  2. (S + M * (N - 1)) * 1.25 + (S + (S + M) + ... + (S + M * (N - 1))) * 0.1 = (S + M * (N - 1)) * 1.25 + (S * N + (N - 1) * N / 2 * M) * 0.1
    To compare result 1 and result 2, we need to compare (S + M * (N - 1)) * 1.25 and (S * N + (N - 1) * N / 2 * M) * 0.9, if (S + M * (N - 1)) * 1.25 is greater, then cache is cost more and vice visa. Normally, our S is greater than 1000 and M is large as well, (S + M * (N - 1)) * 1.25 should be much smaller which means cache reduces our cost.
    And some estimation from anthropic post:
    image

Test with just run-ui and response verification:
Screenshot 2025-01-15 at 8 25 14 PM

@yingjiehe-xyz yingjiehe-xyz changed the title Yingjiehe/cache feat: enable prompt cache for anthropic Jan 16, 2025
@yingjiehe-xyz yingjiehe-xyz requested a review from baxen January 16, 2025 19:06
Copy link

Desktop App for this PR

The following build is available for testing:

The app is signed and notarized for macOS. After downloading, unzip the file and drag the Goose.app to your Applications folder.

This link is provided by nightly.link and will work even if you're not logged into GitHub.

Copy link
Collaborator

@ahau-square ahau-square left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a quick back of the envelope estimation of cost and savings for turning on prompt caching based on the Anthropic pricing?

crates/goose/src/providers/anthropic.rs Show resolved Hide resolved
@yingjiehe-xyz
Copy link
Collaborator Author

Can you do a quick back of the envelope estimation of cost and savings for turning on prompt caching based on the Anthropic pricing?

Sure, add one screenshot from anthropic post, also add my estimations

@ahau-square
Copy link
Collaborator

Can you do a quick back of the envelope estimation of cost and savings for turning on prompt caching based on the Anthropic pricing?

Sure, add one screenshot from anthropic post, also add my estimations

Awesome, looks like we can expect some big savings here!

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really cool @yingjiehe-xyz and I think people will appreciate this a lot.

I wonder if similar exists for openrouter (as people like to use anthropic that way) but yes! very nice!

@yingjiehe-xyz
Copy link
Collaborator Author

this is really cool @yingjiehe-xyz and I think people will appreciate this a lot.

I wonder if similar exists for openrouter (as people like to use anthropic that way) but yes! very nice!

Yes, it is available in openrouter: https://openrouter.ai/docs/prompt-caching, I am planing on this for the next step

@yingjiehe-xyz yingjiehe-xyz merged commit 3454855 into v1.0 Jan 16, 2025
6 checks passed
@yingjiehe-xyz yingjiehe-xyz deleted the yingjiehe/cache branch January 16, 2025 23:29
michaelneale added a commit that referenced this pull request Jan 17, 2025
* v1.0:
  ci: remove bundle.py, and reference to it (#632)
  feat: enable prompt cache for anthropic  (#631)
  feat: memory server (#601)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants