-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Product / Area
Google ADK → Agents/Sub-agents → Context caching propagation
Summary
We enabled context caching at the root level and pass that context to agents and sub-agents. However, the cache is not attached to the context until the first LLM request occurs, and that request is made by the parent agent. As a result, when a sub-agent is invoked, it does not reuse the parent’s cache and instead receives the entire context as an uncached request payload.
This causes:
- Duplicate token usage
- Higher latency
- Loss of expected caching benefits for sub-agent invocations
Current Behavior (Actual)
- Root context caching is configured/enabled.
- Cache is not created/attached until the parent agent makes its first LLM call.
- When a sub-agent is called, it does not receive a reference to the existing cache from the parent agent.
- The full context is sent to the sub-agent LLM request (no cache hit / no cache reference).
Expected Behavior
When context caching is enabled at root level:
- The cache should be available for both agents and sub-agents, and
- Sub-agents should either:
- inherit / reuse the parent agent’s cache reference, or
- have the cache created/attached before any LLM call so it can be reused across the whole agent tree.
Steps to Reproduce
- Enable context caching at the root level and configure agents + sub-agents to use that shared context.
- Invoke a flow where:
- Parent agent initializes context, and
- Parent agent calls a sub-agent (either before or after parent’s first LLM request).
- Observe sub-agent LLM request payload and caching metadata.
Observed Result
- Sub-agent request contains the entire root context rather than a cache reference / cache key.
- Cache appears to only be created/registered after the parent agent’s first LLM call, not at initialization or context construction time.
Impact
- Token costs increase because the full context is resent to sub-agents.
- Latency increases for each sub-agent call.
- Caching at root level does not provide expected benefit for agent hierarchies.
Proposed Fix Options
Option 1: Pre-attach cache before first LLM request
Ensure root context caching creates/attaches the cache reference during context initialization, before any agent makes an LLM call.
Option 2: Propagate existing parent cache to sub-agents
When parent agent has an active cache reference, automatically pass that cache reference to all sub-agent LLM calls so sub-agents can reuse it.
Option 3: Explicit / Manual caching API + use cache_id in GenerateContentConfig
Provide an explicit caching mechanism so developers can:
- Create/seed a cache entry manually (before any agent makes an LLM call), returning a cache_id (or equivalent handle).
- Attach that cache_id explicitly when making LLM calls—both from parent agents and sub-agents—via GenerateContentConfig (e.g., cache_id / context_cache_id field).
Acceptance Criteria
- With root context caching enabled, a sub-agent LLM call should reuse the same cache (cache reference/key) created for the root/parent context.
- Sub-agent requests should not include the entire root context if it is already cached.
- Verified by logs/metadata showing:
- Cache created once (or deterministically reused),
- Cache hits for sub-agent calls,
- Reduced request payload size/tokens for sub-agent invocations.