Skip to content

Comments

Enable bedrock prompt cache#6710

Merged
jh-block merged 9 commits intoblock:mainfrom
fbalicchia:add-prompt-bedrock-cache
Feb 23, 2026
Merged

Enable bedrock prompt cache#6710
jh-block merged 9 commits intoblock:mainfrom
fbalicchia:add-prompt-bedrock-cache

Conversation

@fbalicchia
Copy link
Contributor

Summary

Implemented prompt caching for Anthropic Claude models on AWS Bedrock to reduce costs
Introduced an intelligent cache point placement strategy that complies with AWS Bedrock’s four cache point limitation.
Added the BEDROCK_ENABLE_CACHING configuration parameter

Testing

  • Verify caching is enabled for Claude models by default
  • Test that cache points are correctly placed in system prompts
  • Test that cache points are correctly placed in user/assistant messages
  • Verify cache point limit (4 max) is respected

@fbalicchia
Copy link
Contributor Author

@jh-block and @blackgirlbytes, I’ve created a fresh PR. Thanks!

@jh-block
Copy link
Collaborator

Thanks! The implementation looks good. I have a couple of questions:

  1. It wasn't totally clear to me how tool messages are being handled, since they can't be cache points. Do we explicitly filter them out somewhere? Could we verify that with test(s)?
  2. Did the strategy of "most recent 3" come from any analysis? A comment explaining the choice would be good. I (perhaps naïvely) would have thought caching the earliest messages would have better ROI. (This is just for discussion, I think it's perfectly fine to merge the implementation and then iterate on the cache strategy if necessary.)

@fbalicchia
Copy link
Contributor Author

Thanks for your feedback

For question 1:

Current state: Cache points are added at the message level (after all content blocks), regardless of whether the message contains a ToolRequest or a ToolResponse.
There are currently no tests verifying that this behavior is correct. A comment in bedrock.rs:178 mentions that “tools don’t support cache points,” but the intent and correctness of this are unclear.
I plan to fix this and add appropriate tests.

For question 2:
AWS Bedrock enforces a strict limit of 4 cache checkpoints per request. The current implementation correctly allocates them as 1 system message + 3 user/assistant messages, for a total of 4.
https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html

This means you can only cache 3 messages in addition to the system message.
At the moment, the system caches the most recent 3 messages, so every time a new message is added, the “last 3” shift. This is not very efficient.

A better approach might be to cache the earliest messages instead of the most recent ones, since early messages tend to remain stable, while recent messages change on every turn and frequently invalidate the cache.

Tell me what you think

@fbalicchia fbalicchia force-pushed the add-prompt-bedrock-cache branch 2 times, most recently from ba59c47 to b0a9321 Compare January 28, 2026 20:30
@fbalicchia
Copy link
Contributor Author

@jh-block and @blackgirlbytes, I’ve applied the requested changes

@fbalicchia fbalicchia force-pushed the add-prompt-bedrock-cache branch from b0a9321 to c2561e2 Compare January 29, 2026 11:34
@fbalicchia
Copy link
Contributor Author

Hi @michaelneale, when you have a moment, could you kindly review this PR? Thank you.”

@fbalicchia
Copy link
Contributor Author

Hi @jh-block ,
Is there any chance this could be reviewed?
Thank you!

@jh-block
Copy link
Collaborator

Thanks for the changes. I have a couple comments:

  • In bedrock.rs we're now caching the earliest 3 messages which is good (I think we can improve further, but this is fine for now), but some tests still seem to test the "last 3 messages" strategy: test_cache_point_allocation_without_tools, test_cache_point_allocation_with_tools, test_cache_point_limit_respected_with_few_messages, test_cache_points_with_few_messages. Actually, looking at those tests, they don't seem to be actually testing anything since they duplicate the logic inside the test rather than calling the actual code. There are other tests like this, e.g. test_max_four_cache_points_respected is just testing that 1 + 3 == 4. These tests can just be removed, or if appropriate, updated to actually test the production code.
  • to_bedrock_message() in formats/bedrock.rs is now dead code, as it's only used in a test: test_to_bedrock_message_without_caching. If we're never using bedrock without caching, this test and to_bedrock_message() can be removed.

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
@fbalicchia fbalicchia force-pushed the add-prompt-bedrock-cache branch from c787ead to 47dc354 Compare February 21, 2026 20:30
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
@fbalicchia fbalicchia force-pushed the add-prompt-bedrock-cache branch from 8db3c7c to f7e8ca5 Compare February 23, 2026 10:22
@jh-block
Copy link
Collaborator

Hey @fbalicchia, let me know when this is ready for review, I have a few small fixes but I can just push them up if that's OK with you, so we can get this merged sooner

@fbalicchia
Copy link
Contributor Author

@jh-block go ahead thanks

The method is on &self and can access self.model.model_name directly.
All call sites were passing &self.model.model_name anyway.
Keep only the 'why' rationale for cache strategy, remove restated code.
…ith_caching

The caller already filters to visible messages, so the inner
filtering was redundant.
Merge test_to_bedrock_message_with_caching, _multiple_content, and
_preserves_order into a single test. Remove test_message_conversion_with_cache_points
from bedrock.rs as it duplicated format-level tests. Fix fast_model field name.
@jh-block jh-block added this pull request to the merge queue Feb 23, 2026
Merged via the queue into block:main with commit b581446 Feb 23, 2026
20 checks passed
zanesq added a commit that referenced this pull request Feb 23, 2026
…oviders

* 'main' of github.com:block/goose:
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
  dev: add cmake to hermitized env (#7399)
  refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255)
  feat: expose context window utilization to agent via MOIM (#7418)
  Small model naming (#7394)
  chore(deps): bump ajv in /documentation (#7416)
  doc: groq models (#7404)
  Client settings (#7381)
  Fix settings tabs getting cut off in narrow windows (#7379)

# Conflicts:
#	ui/desktop/src/components/settings/dictation/DictationSettings.tsx
michaelneale added a commit that referenced this pull request Feb 23, 2026
…xt-edit

* origin/main: (35 commits)
  docs: generate manpages (#7443)
  Blog/goose v1 25 0 release (#7433)
  fix: detect truncated LLM responses in apps extension (#7354)
  fix: removed unnecessary version for goose acp macro dependency (#7428)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
  dev: add cmake to hermitized env (#7399)
  refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255)
  feat: expose context window utilization to agent via MOIM (#7418)
  Small model naming (#7394)
  chore(deps): bump ajv in /documentation (#7416)
  doc: groq models (#7404)
  ...
lifeizhou-ap added a commit that referenced this pull request Feb 24, 2026
* main:
  Simplified custom model flow with canonical models (#6934)
  feat: simplify the text editor to be more like pi (#7426)
  docs: add YouTube short embed to Neighborhood extension tutorial (#7456)
  fix: flake.nix build failure and deprecation warning (#7408)
  feat(claude-code): add permission prompt routing for approve mode (#7420)
  docs: generate manpages (#7443)
  Blog/goose v1 25 0 release (#7433)
  fix: detect truncated LLM responses in apps extension (#7354)
  fix: removed unnecessary version for goose acp macro dependency (#7428)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
lifeizhou-ap added a commit that referenced this pull request Feb 24, 2026
* main: (171 commits)
  fix: TLDR CLI tab in Neighborhood MCP docs (#7461)
  fix(summon): restore skill supporting files and directory path in load output (#7457)
  Simplified custom model flow with canonical models (#6934)
  feat: simplify the text editor to be more like pi (#7426)
  docs: add YouTube short embed to Neighborhood extension tutorial (#7456)
  fix: flake.nix build failure and deprecation warning (#7408)
  feat(claude-code): add permission prompt routing for approve mode (#7420)
  docs: generate manpages (#7443)
  Blog/goose v1 25 0 release (#7433)
  fix: detect truncated LLM responses in apps extension (#7354)
  fix: removed unnecessary version for goose acp macro dependency (#7428)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  ...
aharvard added a commit that referenced this pull request Feb 24, 2026
* origin/main: (49 commits)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
  dev: add cmake to hermitized env (#7399)
  refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255)
  feat: expose context window utilization to agent via MOIM (#7418)
  Small model naming (#7394)
  chore(deps): bump ajv in /documentation (#7416)
  doc: groq models (#7404)
  Client settings (#7381)
  Fix settings tabs getting cut off in narrow windows (#7379)
  docs: voice dictation updates (#7396)
  [docs] Add Excalidraw MCP App Tutorial (#7401)
  ...

# Conflicts:
#	ui/desktop/src/components/McpApps/McpAppRenderer.tsx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants