Skip to content

[Vertex AI] Spurious Server VAD events cause unavoidable tool cancellation #4441

@vndee

Description

@vndee

Bug Description

When using google.realtime.RealtimeModel with vertexai=True, the Vertex AI endpoint appears to have a very sensitive Server VAD configuration. It frequently sends InputSpeechStarted events (signaled via interrupted=True in the server content) immediately after a tool finishes execution or even during silence.

The current implementation of AgentActivity._on_input_speech_started in livekit.agents.voice.agent_activity blindly calls self.interrupt(). This forcibly cancels all running tasks, including tool execution.

Crucially, AgentActivity explicitly forbids setting allow_interruptions=False when server_turn_detection is enabled:

# livekit/agents/voice/agent_activity.py
if (
    isinstance(self.llm, llm.RealtimeModel)
    and self.llm.capabilities.turn_detection
    and not self.allow_interruptions
):
    raise ValueError(...)

This means there is currently no way for a developer to "shield" critical agent tasks (like tool execution) from being cancelled by these false-positive VAD events from Vertex AI.

Expected Behavior

The library should allow developers to control whether Server VAD events force an interruption.
Specifically, AgentActivity should respect self.allow_interruptions (or SpeechHandle.allow_interruptions) even for Server VAD events.

If allow_interruptions is False, the agent should log/ignore the server interruption signal instead of cancelling tasks. This would allow developers to temporarily disable interruptions during sensitive operations:

# Desired User Code
agent.allow_interruptions = False
result = await tool.execute() # Safe from spurious Vertex interruptions
agent.allow_interruptions = True

Reproduction Steps

  1. Initialize RealtimeModel with vertexai=True (and default AutomaticActivityDetection).
  2. Create a tool that takes a few seconds to run (e.g., await asyncio.sleep(2)).
  3. Ask the agent to run the tool.
  4. Observe that AgentActivity logs input_speech_started almost immediately after tool start (despite silence), followed by cancelling running tool.
  5. The tool never completes or its output is ignored.

Operating System

macOS

Models Used

gemini-live-2.5-flash-native-audio (Vertex AI)

Package Versions

livekit==1.0.23
livekit-agents==1.3.10
livekit-api==1.1.0
livekit-plugins-google==1.3.10

Session/Room/Call IDs

N/A (Reproducible in local dev)

Proposed Solution

  1. Remove or relax the ValueError check in AgentActivity.__init__ to allow allow_interruptions=False with Server VAD.
  2. Update _on_input_speech_started to check self.allow_interruptions:
def _on_input_speech_started(self, ev: llm.InputSpeechStartedEvent) -> None:
    if self.vad is None:
        self._session._update_user_state("speaking")

    # PROPOSED FIX:
    if not self.allow_interruptions:
        logger.debug("Server detected speech, but interruptions are disabled. Ignoring.")
        return

    try:
        self.interrupt()
    except RuntimeError: ...

Additional Context

This issue seems specific to Vertex AI's "Gemini Live" endpoints, which have "twitchier" VAD than the standard Google AI Studio endpoints. Without this fix, vertexai=True is unstable for tool-using voice agents.

Screenshots and Recordings

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions