Enable bedrock prompt cache by fbalicchia · Pull Request #6710 · block/goose

fbalicchia · 2026-01-26T09:52:47Z

Summary

Implemented prompt caching for Anthropic Claude models on AWS Bedrock to reduce costs
Introduced an intelligent cache point placement strategy that complies with AWS Bedrock’s four cache point limitation.
Added the BEDROCK_ENABLE_CACHING configuration parameter

Testing

Verify caching is enabled for Claude models by default
Test that cache points are correctly placed in system prompts
Test that cache points are correctly placed in user/assistant messages
Verify cache point limit (4 max) is respected

fbalicchia · 2026-01-26T10:00:29Z

@jh-block and @blackgirlbytes, I’ve created a fresh PR. Thanks!

jh-block · 2026-01-26T12:52:19Z

Thanks! The implementation looks good. I have a couple of questions:

It wasn't totally clear to me how tool messages are being handled, since they can't be cache points. Do we explicitly filter them out somewhere? Could we verify that with test(s)?
Did the strategy of "most recent 3" come from any analysis? A comment explaining the choice would be good. I (perhaps naïvely) would have thought caching the earliest messages would have better ROI. (This is just for discussion, I think it's perfectly fine to merge the implementation and then iterate on the cache strategy if necessary.)

fbalicchia · 2026-01-26T14:14:57Z

Thanks for your feedback

For question 1:

Current state: Cache points are added at the message level (after all content blocks), regardless of whether the message contains a ToolRequest or a ToolResponse.
There are currently no tests verifying that this behavior is correct. A comment in bedrock.rs:178 mentions that “tools don’t support cache points,” but the intent and correctness of this are unclear.
I plan to fix this and add appropriate tests.

For question 2:
AWS Bedrock enforces a strict limit of 4 cache checkpoints per request. The current implementation correctly allocates them as 1 system message + 3 user/assistant messages, for a total of 4.
https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html

This means you can only cache 3 messages in addition to the system message.
At the moment, the system caches the most recent 3 messages, so every time a new message is added, the “last 3” shift. This is not very efficient.

A better approach might be to cache the earliest messages instead of the most recent ones, since early messages tend to remain stable, while recent messages change on every turn and frequently invalidate the cache.

Tell me what you think

fbalicchia · 2026-01-28T20:32:31Z

@jh-block and @blackgirlbytes, I’ve applied the requested changes

fbalicchia · 2026-01-29T12:18:23Z

Hi @michaelneale, when you have a moment, could you kindly review this PR? Thank you.”

fbalicchia · 2026-02-16T09:22:41Z

Hi @jh-block ,
Is there any chance this could be reviewed?
Thank you!

jh-block · 2026-02-18T09:40:08Z

Thanks for the changes. I have a couple comments:

In bedrock.rs we're now caching the earliest 3 messages which is good (I think we can improve further, but this is fine for now), but some tests still seem to test the "last 3 messages" strategy: test_cache_point_allocation_without_tools, test_cache_point_allocation_with_tools, test_cache_point_limit_respected_with_few_messages, test_cache_points_with_few_messages. Actually, looking at those tests, they don't seem to be actually testing anything since they duplicate the logic inside the test rather than calling the actual code. There are other tests like this, e.g. test_max_four_cache_points_respected is just testing that 1 + 3 == 4. These tests can just be removed, or if appropriate, updated to actually test the production code.
to_bedrock_message() in formats/bedrock.rs is now dead code, as it's only used in a test: test_to_bedrock_message_without_caching. If we're never using bedrock without caching, this test and to_bedrock_message() can be removed.

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

jh-block · 2026-02-23T10:24:03Z

Hey @fbalicchia, let me know when this is ready for review, I have a few small fixes but I can just push them up if that's OK with you, so we can get this merged sooner

fbalicchia · 2026-02-23T12:10:40Z

@jh-block go ahead thanks

The method is on &self and can access self.model.model_name directly. All call sites were passing &self.model.model_name anyway.

Keep only the 'why' rationale for cache strategy, remove restated code.

…ith_caching The caller already filters to visible messages, so the inner filtering was redundant.

Merge test_to_bedrock_message_with_caching, _multiple_content, and _preserves_order into a single test. Remove test_message_conversion_with_cache_points from bedrock.rs as it duplicated format-level tests. Fix fast_model field name.

…oviders * 'main' of github.com:block/goose: New navigation settings layout options and styling (#6645) refactor: MCP-compliant theme tokens and CSS class rename (#7275) Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434) refactor: change open recipe in new window to pass recipe id (#7392) fix: handle truncated tool calls that break conversation alternation (#7424) streamline some github actions (#7430) Enable bedrock prompt cache (#6710) fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429) Display working dir (#7419) dev: add cmake to hermitized env (#7399) refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255) feat: expose context window utilization to agent via MOIM (#7418) Small model naming (#7394) chore(deps): bump ajv in /documentation (#7416) doc: groq models (#7404) Client settings (#7381) Fix settings tabs getting cut off in narrow windows (#7379) # Conflicts: # ui/desktop/src/components/settings/dictation/DictationSettings.tsx

…xt-edit * origin/main: (35 commits) docs: generate manpages (#7443) Blog/goose v1 25 0 release (#7433) fix: detect truncated LLM responses in apps extension (#7354) fix: removed unnecessary version for goose acp macro dependency (#7428) add flag to hide select voice providers (#7406) New navigation settings layout options and styling (#6645) refactor: MCP-compliant theme tokens and CSS class rename (#7275) Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434) refactor: change open recipe in new window to pass recipe id (#7392) fix: handle truncated tool calls that break conversation alternation (#7424) streamline some github actions (#7430) Enable bedrock prompt cache (#6710) fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429) Display working dir (#7419) dev: add cmake to hermitized env (#7399) refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255) feat: expose context window utilization to agent via MOIM (#7418) Small model naming (#7394) chore(deps): bump ajv in /documentation (#7416) doc: groq models (#7404) ...

* main: Simplified custom model flow with canonical models (#6934) feat: simplify the text editor to be more like pi (#7426) docs: add YouTube short embed to Neighborhood extension tutorial (#7456) fix: flake.nix build failure and deprecation warning (#7408) feat(claude-code): add permission prompt routing for approve mode (#7420) docs: generate manpages (#7443) Blog/goose v1 25 0 release (#7433) fix: detect truncated LLM responses in apps extension (#7354) fix: removed unnecessary version for goose acp macro dependency (#7428) add flag to hide select voice providers (#7406) New navigation settings layout options and styling (#6645) refactor: MCP-compliant theme tokens and CSS class rename (#7275) Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434) refactor: change open recipe in new window to pass recipe id (#7392) fix: handle truncated tool calls that break conversation alternation (#7424) streamline some github actions (#7430) Enable bedrock prompt cache (#6710) fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429) Display working dir (#7419)

* main: (171 commits) fix: TLDR CLI tab in Neighborhood MCP docs (#7461) fix(summon): restore skill supporting files and directory path in load output (#7457) Simplified custom model flow with canonical models (#6934) feat: simplify the text editor to be more like pi (#7426) docs: add YouTube short embed to Neighborhood extension tutorial (#7456) fix: flake.nix build failure and deprecation warning (#7408) feat(claude-code): add permission prompt routing for approve mode (#7420) docs: generate manpages (#7443) Blog/goose v1 25 0 release (#7433) fix: detect truncated LLM responses in apps extension (#7354) fix: removed unnecessary version for goose acp macro dependency (#7428) add flag to hide select voice providers (#7406) New navigation settings layout options and styling (#6645) refactor: MCP-compliant theme tokens and CSS class rename (#7275) Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434) refactor: change open recipe in new window to pass recipe id (#7392) fix: handle truncated tool calls that break conversation alternation (#7424) streamline some github actions (#7430) Enable bedrock prompt cache (#6710) fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429) ...

* origin/main: (49 commits) add flag to hide select voice providers (#7406) New navigation settings layout options and styling (#6645) refactor: MCP-compliant theme tokens and CSS class rename (#7275) Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434) refactor: change open recipe in new window to pass recipe id (#7392) fix: handle truncated tool calls that break conversation alternation (#7424) streamline some github actions (#7430) Enable bedrock prompt cache (#6710) fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429) Display working dir (#7419) dev: add cmake to hermitized env (#7399) refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255) feat: expose context window utilization to agent via MOIM (#7418) Small model naming (#7394) chore(deps): bump ajv in /documentation (#7416) doc: groq models (#7404) Client settings (#7381) Fix settings tabs getting cut off in narrow windows (#7379) docs: voice dictation updates (#7396) [docs] Add Excalidraw MCP App Tutorial (#7401) ... # Conflicts: # ui/desktop/src/components/McpApps/McpAppRenderer.tsx

fbalicchia mentioned this pull request Jan 26, 2026

Enable bedrock prompt cache #6463

Closed

fbalicchia force-pushed the add-prompt-bedrock-cache branch 2 times, most recently from ba59c47 to b0a9321 Compare January 28, 2026 20:30

fbalicchia force-pushed the add-prompt-bedrock-cache branch from b0a9321 to c2561e2 Compare January 29, 2026 11:34

DOsinga assigned jh-block Jan 29, 2026

fbalicchia force-pushed the add-prompt-bedrock-cache branch from c2561e2 to c787ead Compare January 29, 2026 21:03

fbalicchia added 3 commits February 21, 2026 21:28

apply change

42d7761

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

fix commment and apply changes

dfbf382

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

apply change

47dc354

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

fbalicchia force-pushed the add-prompt-bedrock-cache branch from c787ead to 47dc354 Compare February 21, 2026 20:30

fix lint

f7e8ca5

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

fbalicchia force-pushed the add-prompt-bedrock-cache branch from 8db3c7c to f7e8ca5 Compare February 23, 2026 10:22

jh-block added 5 commits February 23, 2026 13:13

Remove redundant model_name parameter from should_enable_caching

38b3d0e

The method is on &self and can access self.model.model_name directly. All call sites were passing &self.model.model_name anyway.

Trim verbose comments to follow project guidelines

b850c4c

Keep only the 'why' rationale for cache strategy, remove restated code.

Fix description string indentation alignment

e13d22b

Remove redundant agent_visible_content() call in to_bedrock_message_w…

32daf2b

…ith_caching The caller already filters to visible messages, so the inner filtering was redundant.

Consolidate duplicate caching tests

3c8341e

Merge test_to_bedrock_message_with_caching, _multiple_content, and _preserves_order into a single test. Remove test_message_conversion_with_cache_points from bedrock.rs as it duplicated format-level tests. Fix fast_model field name.

jh-block approved these changes Feb 23, 2026

View reviewed changes

jh-block added this pull request to the merge queue Feb 23, 2026

Merged via the queue into block:main with commit b581446 Feb 23, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Enable bedrock prompt cache#6710

Enable bedrock prompt cache#6710
jh-block merged 9 commits intoblock:mainfrom
fbalicchia:add-prompt-bedrock-cache

fbalicchia commented Jan 26, 2026

Uh oh!

fbalicchia commented Jan 26, 2026

Uh oh!

jh-block commented Jan 26, 2026

Uh oh!

fbalicchia commented Jan 26, 2026

Uh oh!

fbalicchia commented Jan 28, 2026

Uh oh!

fbalicchia commented Jan 29, 2026

Uh oh!

fbalicchia commented Feb 16, 2026

Uh oh!

jh-block commented Feb 18, 2026

Uh oh!

jh-block commented Feb 23, 2026

Uh oh!

fbalicchia commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

fbalicchia commented Jan 26, 2026

Summary

Testing

Uh oh!

fbalicchia commented Jan 26, 2026

Uh oh!

jh-block commented Jan 26, 2026

Uh oh!

fbalicchia commented Jan 26, 2026

Uh oh!

fbalicchia commented Jan 28, 2026

Uh oh!

fbalicchia commented Jan 29, 2026

Uh oh!

fbalicchia commented Feb 16, 2026

Uh oh!

jh-block commented Feb 18, 2026

Uh oh!

jh-block commented Feb 23, 2026

Uh oh!

fbalicchia commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants