Context caching at root level is not applied until first agent LLM call; sub-agents don’t reuse parent cache and receive full context

**Product / Area**

Google ADK → Agents/Sub-agents → Context caching propagation

**Summary**

We enabled context caching at the root level and pass that context to agents and sub-agents. However, the cache is not attached to the context until the first LLM request occurs, and that request is made by the parent agent. As a result, when a sub-agent is invoked, it does not reuse the parent’s cache and instead receives the entire context as an uncached request payload.

This causes:

- Duplicate token usage
- Higher latency
- Loss of expected caching benefits for sub-agent invocations

**Current Behavior (Actual)**
- Root context caching is configured/enabled.
- Cache is not created/attached until the parent agent makes its first LLM call.
- When a sub-agent is called, it does not receive a reference to the existing cache from the parent agent.
- The full context is sent to the sub-agent LLM request (no cache hit / no cache reference).

Expected Behavior
When context caching is enabled at root level:

- The cache should be available for both agents and sub-agents, and
- Sub-agents should either:
1. inherit / reuse the parent agent’s cache reference, or
2. have the cache created/attached before any LLM call so it can be reused across the whole agent tree.

**Steps to Reproduce**

- Enable context caching at the root level and configure agents + sub-agents to use that shared context.
- Invoke a flow where:
1. Parent agent initializes context, and
2. Parent agent calls a sub-agent (either before or after parent’s first LLM request).
- Observe sub-agent LLM request payload and caching metadata.

**Observed Result**
- Sub-agent request contains the entire root context rather than a cache reference / cache key.
- Cache appears to only be created/registered after the parent agent’s first LLM call, not at initialization or context construction time.

**Impact**
- Token costs increase because the full context is resent to sub-agents.
- Latency increases for each sub-agent call.
- Caching at root level does not provide expected benefit for agent hierarchies.

**Proposed Fix Options**
Option 1: Pre-attach cache before first LLM request

Ensure root context caching creates/attaches the cache reference during context initialization, before any agent makes an LLM call.

Option 2: Propagate existing parent cache to sub-agents

When parent agent has an active cache reference, automatically pass that cache reference to all sub-agent LLM calls so sub-agents can reuse it.

Option 3: Explicit / Manual caching API + use cache_id in GenerateContentConfig
Provide an explicit caching mechanism so developers can:

- Create/seed a cache entry manually (before any agent makes an LLM call), returning a cache_id (or equivalent handle).
- Attach that cache_id explicitly when making LLM calls—both from parent agents and sub-agents—via GenerateContentConfig (e.g., cache_id / context_cache_id field).

**Acceptance Criteria**

- With root context caching enabled, a sub-agent LLM call should reuse the same cache (cache reference/key) created for the root/parent context.
- Sub-agent requests should not include the entire root context if it is already cached.
- Verified by logs/metadata showing:
1. Cache created once (or deterministically reused),
2. Cache hits for sub-agent calls,
3. Reduced request payload size/tokens for sub-agent invocations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context caching at root level is not applied until first agent LLM call; sub-agents don’t reuse parent cache and receive full context #4570

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context caching at root level is not applied until first agent LLM call; sub-agents don’t reuse parent cache and receive full context #4570

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions