Skip to content

Context caching at root level is not applied until first agent LLM call; sub-agents don’t reuse parent cache and receive full context #4570

@darkmars27

Description

@darkmars27

Product / Area

Google ADK → Agents/Sub-agents → Context caching propagation

Summary

We enabled context caching at the root level and pass that context to agents and sub-agents. However, the cache is not attached to the context until the first LLM request occurs, and that request is made by the parent agent. As a result, when a sub-agent is invoked, it does not reuse the parent’s cache and instead receives the entire context as an uncached request payload.

This causes:

  • Duplicate token usage
  • Higher latency
  • Loss of expected caching benefits for sub-agent invocations

Current Behavior (Actual)

  • Root context caching is configured/enabled.
  • Cache is not created/attached until the parent agent makes its first LLM call.
  • When a sub-agent is called, it does not receive a reference to the existing cache from the parent agent.
  • The full context is sent to the sub-agent LLM request (no cache hit / no cache reference).

Expected Behavior
When context caching is enabled at root level:

  • The cache should be available for both agents and sub-agents, and
  • Sub-agents should either:
  1. inherit / reuse the parent agent’s cache reference, or
  2. have the cache created/attached before any LLM call so it can be reused across the whole agent tree.

Steps to Reproduce

  • Enable context caching at the root level and configure agents + sub-agents to use that shared context.
  • Invoke a flow where:
  1. Parent agent initializes context, and
  2. Parent agent calls a sub-agent (either before or after parent’s first LLM request).
  • Observe sub-agent LLM request payload and caching metadata.

Observed Result

  • Sub-agent request contains the entire root context rather than a cache reference / cache key.
  • Cache appears to only be created/registered after the parent agent’s first LLM call, not at initialization or context construction time.

Impact

  • Token costs increase because the full context is resent to sub-agents.
  • Latency increases for each sub-agent call.
  • Caching at root level does not provide expected benefit for agent hierarchies.

Proposed Fix Options
Option 1: Pre-attach cache before first LLM request

Ensure root context caching creates/attaches the cache reference during context initialization, before any agent makes an LLM call.

Option 2: Propagate existing parent cache to sub-agents

When parent agent has an active cache reference, automatically pass that cache reference to all sub-agent LLM calls so sub-agents can reuse it.

Option 3: Explicit / Manual caching API + use cache_id in GenerateContentConfig
Provide an explicit caching mechanism so developers can:

  • Create/seed a cache entry manually (before any agent makes an LLM call), returning a cache_id (or equivalent handle).
  • Attach that cache_id explicitly when making LLM calls—both from parent agents and sub-agents—via GenerateContentConfig (e.g., cache_id / context_cache_id field).

Acceptance Criteria

  • With root context caching enabled, a sub-agent LLM call should reuse the same cache (cache reference/key) created for the root/parent context.
  • Sub-agent requests should not include the entire root context if it is already cached.
  • Verified by logs/metadata showing:
  1. Cache created once (or deterministically reused),
  2. Cache hits for sub-agent calls,
  3. Reduced request payload size/tokens for sub-agent invocations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions