Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Prompt caching #43

Open
LeonRuggiero opened this issue Dec 13, 2024 · 0 comments
Open

AWS Prompt caching #43

LeonRuggiero opened this issue Dec 13, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@LeonRuggiero
Copy link
Contributor

https://aws.amazon.com/bedrock/prompt-caching/

AWS Prompt caching works by caching "prefixes", large chunks of the prompt starting from the beginning.

This can enable:

  • Cost reduction
  • Latency reduction

This is particularly useful for multiple request with long inputs such as:

  • Repeated generation from documents, where the output window is too small to use a single generation
  • Chat-like functionality, which inherently makes multiple requests including previous chat history

Prompt caching is only supported by a limited number of models and works differently for them.
https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html

@LeonRuggiero LeonRuggiero added the enhancement New feature or request label Dec 13, 2024
@LeonRuggiero LeonRuggiero self-assigned this Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant