Skip to content

Properly handle Gemini thought summaries and export them via OpenTelemetry in Google LLM plugin #4190

@giovaborgogno

Description

@giovaborgogno

Bug Description

Gemini’s API provides thought summaries when include_thoughts=true is enabled. These appear in content.parts with the part.thought flag set and must be handled separately from normal answer text. https://ai.google.dev/gemini-api/docs/thinking#summaries

Right now, the Google LLM plugin in agents/livekit-plugins/livekit-plugins-google/livekit/plugins/google /llm.py ignores part.thought entirely. The parsing code only looks at part.function_call and part.text:

def _parse_part(self, id: str, part: types.Part) -> llm.ChatChunk | None:
    if part.function_call:
        chat_chunk = llm.ChatChunk(
            id=id,
            delta=llm.ChoiceDelta(
                role="assistant",
                tool_calls=[
                    llm.FunctionToolCall(
                        arguments=json.dumps(part.function_call.args),
                        name=part.function_call.name,
                        call_id=part.function_call.id or utils.shortuuid("function_call_"),
                    )
                ],
                content=part.text,
            ),
        )
        return chat_chunk

    return llm.ChatChunk(
        id=id,
        delta=llm.ChoiceDelta(content=part.text, role="assistant"),
    )

There’s no check for part.thought, so thought summaries are treated as regular assistant output.

Problems

  1. Bug: Thoughts are spoken by TTS
    When include_thoughts=True, thought summaries are merged into the same content used for user-facing responses. The TTS layer receives them and reads the agent’s internal reasoning out loud, which is not what Gemini’s “thinking” feature is meant for.

  2. Missing observability of thoughts in OTEL
    LiveKit already uses OpenTelemetry, but the Gemini thought summaries are not surfaced there at all. There is no way to inspect the model’s internal reasoning in traces/logs while keeping it hidden from the end user and TTS.

Expected Behavior

Separation of thoughts vs. answer
Parts with part.thought == True should not be included in the assistant’s user-visible content.
Thought parts should never be forwarded to TTS or any channel that is meant for end-user output.

Observability via existing OpenTelemetry
Thought summaries should be attached to the existing OTEL spans/traces for Gemini calls

Reproduction Steps

e.g.:


session = AgentSession(
        stt="assemblyai/universal-streaming-multilingual",
        llm=google.LLM(
            model="gemini-2.5-flash-preview-09-2025",
            temperature=0.8,
            thinking_config=types.ThinkingConfig(
                include_thoughts=True,
                thinking_budget=1500,
            ),
        ),
        tts=elevenlabs.TTS(
            voice_id=elevenlabs_voice_id,
            model=elevenlabs_model,
            language=tts_language,
        ),
    )

Operating System

macOS Tahoe

Models Used

assemblyai, google plugin, eleven labs

Package Versions

# Core LiveKit dependencies
livekit>=1.0.13
livekit-agents[images,elevenlabs]>=1.3.6
livekit-api>=1.0.5
livekit-protocol>=1.0.6

# LiveKit plugins
livekit-plugins-google>=1.3.6

# Google Gemini API
google-generativeai==0.8.3

# Telemetry (Langfuse/OpenTelemetry/Judgment Labs)
opentelemetry-api>=1.39.0
opentelemetry-sdk>=1.39.0
opentelemetry-exporter-otlp-proto-http>=1.39.0
judgeval>=0.1.0  # Judgment Labs tracing (OpenTelemetry compatible)

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions