-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Bug Description
When using google.realtime.RealtimeModel with vertexai=True, the Vertex AI endpoint appears to have a very sensitive Server VAD configuration. It frequently sends InputSpeechStarted events (signaled via interrupted=True in the server content) immediately after a tool finishes execution or even during silence.
The current implementation of AgentActivity._on_input_speech_started in livekit.agents.voice.agent_activity blindly calls self.interrupt(). This forcibly cancels all running tasks, including tool execution.
Crucially, AgentActivity explicitly forbids setting allow_interruptions=False when server_turn_detection is enabled:
# livekit/agents/voice/agent_activity.py
if (
isinstance(self.llm, llm.RealtimeModel)
and self.llm.capabilities.turn_detection
and not self.allow_interruptions
):
raise ValueError(...)This means there is currently no way for a developer to "shield" critical agent tasks (like tool execution) from being cancelled by these false-positive VAD events from Vertex AI.
Expected Behavior
The library should allow developers to control whether Server VAD events force an interruption.
Specifically, AgentActivity should respect self.allow_interruptions (or SpeechHandle.allow_interruptions) even for Server VAD events.
If allow_interruptions is False, the agent should log/ignore the server interruption signal instead of cancelling tasks. This would allow developers to temporarily disable interruptions during sensitive operations:
# Desired User Code
agent.allow_interruptions = False
result = await tool.execute() # Safe from spurious Vertex interruptions
agent.allow_interruptions = TrueReproduction Steps
- Initialize
RealtimeModelwithvertexai=True(and defaultAutomaticActivityDetection). - Create a tool that takes a few seconds to run (e.g.,
await asyncio.sleep(2)). - Ask the agent to run the tool.
- Observe that
AgentActivitylogsinput_speech_startedalmost immediately after tool start (despite silence), followed bycancelling running tool. - The tool never completes or its output is ignored.
Operating System
macOS
Models Used
gemini-live-2.5-flash-native-audio (Vertex AI)
Package Versions
livekit==1.0.23
livekit-agents==1.3.10
livekit-api==1.1.0
livekit-plugins-google==1.3.10Session/Room/Call IDs
N/A (Reproducible in local dev)
Proposed Solution
- Remove or relax the
ValueErrorcheck inAgentActivity.__init__to allowallow_interruptions=Falsewith Server VAD. - Update
_on_input_speech_startedto checkself.allow_interruptions:
def _on_input_speech_started(self, ev: llm.InputSpeechStartedEvent) -> None:
if self.vad is None:
self._session._update_user_state("speaking")
# PROPOSED FIX:
if not self.allow_interruptions:
logger.debug("Server detected speech, but interruptions are disabled. Ignoring.")
return
try:
self.interrupt()
except RuntimeError: ...Additional Context
This issue seems specific to Vertex AI's "Gemini Live" endpoints, which have "twitchier" VAD than the standard Google AI Studio endpoints. Without this fix, vertexai=True is unstable for tool-using voice agents.
Screenshots and Recordings
N/A