-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Bug Description
This function is triggered when I receive an RPC. Inside it, I check whether there’s any ongoing speech, and if there is, I wait for it to finish playing. The "done waiting for playout" only gets logged when the user says something after the agent’s current turn has already completed.
At that point, say() appears to get queued, but no audio is ever played. If I send another user message, I get an error saying that the current say() cannot be interrupted because allow_interruptions = False, which suggests that say() is running as expected.
A few more confusing behaviors:
- I never hear the audio from say()
- Any log statements after say() do not execute until I close the session
- The user transcript only appears in the logs after the session is closed, with a very large transcription_delay
So my questions are:
- Why am I not getting any audio output from say()?
- Why does execution after say() appear to be blocked until the session is closed?
- What is causing the massive transcription delay, and how can I fix it?
async def _handle_session_ended(self, data: RpcInvocationData):
handle = self.session.current_speech
if handle:
await handle.wait_for_playout()
logger.info("done waiting for playout")
await self.session.say(SESSION_END_MESSAGE, allow_interruptions=False)
logger.info("Session end message said")
We’re seeing unreliable behavior with session.say() when it is triggered after waiting for current speech playout. The issue only occurs when user transcripts are delayed. If the transcript arrives on time, audio playback works as expected. We can not determine what could cause the transcription to get delayed and be received on when eventually session end button is clicked when session ends.
Expected Behavior
session.say() should always produce audible output once called.
Audio playback should not depend on user transcript timing.
Delayed or late transcripts should not:
- Block audio playout
- Stall the session
- Delay execution of code after say()
User transcripts should arrive in logs in a timely manner, independent of session lifecycle.
Reproduction Steps
This reproduces unreliability so need to try multiple times
start avatar session (liveavatar) then start session.
Trigger the RPC that runs the following logic:
1. Detect an active speech
2. Call wait_for_playout() on the current speech handle
3. Call session.say() with allow_interruptions=False
Observe the behavior:
wait_for_playout() will resolve until the user speaks again
session.say() is invoked but no audio is heard
Code after say() does not execute
Send another user message:
The system reports that say() cannot be interrupted (indicating it is considered active)
Close the session:
The delayed user transcript appears in logs with a large transcription_delay
Any log after say() appears in logsOperating System
Windows 11
Models Used
STT:. Azure STT , LLM: GPT-4o, TTS: minimax speech-2.6-hd
Package Versions
"livekit-agents[azure,elevenlabs,minimax,openai,silero]~=1.3.10",
"livekit-plugins-liveavatar>=1.3.10",Session/Room/Call IDs
RM_7xxFDKPHb3XP
RM_aMPRKkxqReo9
RM_Lqr8LQpXJ5sa
Proposed Solution
Additional Context
17:23:42.826 INFO IntakeAgent done waiting for playout {"encounterId": "123456789001"}
17:23:42.837 INFO livekit.agents VoiceAgent.say invoked with add_to_chat_ctx=True
{"encounterId": "123456789001"}
17:23:42.845 INFO livekit.agents SpeechHandle created: speech_aad605786e1e
{"encounterId": "123456789001"}
17:23:42.853 INFO livekit.agents Speech task scheduled for handle: speech_aad605786e1e
{"encounterId": "123456789001"}
17:23:42.858 INFO livekit.agents SpeechHandle enqueued: speech_aad605786e1e (queue size=1)
{"encounterId": "123456789001"}
17:23:55.363 INFO livekit.agents closing agent session due to participant disconnect (disable via
`RoomInputOptions.close_on_disconnect=False`)
{"room": "room_1769689162407", "participant": "user", "reason":
"CLIENT_INITIATED", "encounterId": "123456789001"}
17:23:55.393 DEBUG livekit.agents stream closed
{"participant": "user", "source": "SOURCE_MICROPHONE", "encounterId":
"123456789001"}
17:23:55.406 INFO IntakeAgent Session end message said {"encounterId": "123456789001"}
17:23:55.416 DEBUG livekit.agents input stream detached
{"participant": "user", "source": "SOURCE_UNKNOWN", "accepted_sources":
["SOURCE_MICROPHONE"], "encounterId": "123456789001"}
17:23:55.525 INFO IntakeAgent User stopped speaking {"encounterId": "123456789001"}
Screenshots and Recordings
No response