Skip to content

session.say() produces no audio after session.current_speech.wait_for_playout() (works unreliably) #4650

@issrids

Description

@issrids

Bug Description

This function is triggered when I receive an RPC. Inside it, I check whether there’s any ongoing speech, and if there is, I wait for it to finish playing. The "done waiting for playout" only gets logged when the user says something after the agent’s current turn has already completed.

At that point, say() appears to get queued, but no audio is ever played. If I send another user message, I get an error saying that the current say() cannot be interrupted because allow_interruptions = False, which suggests that say() is running as expected.

A few more confusing behaviors:

  1. I never hear the audio from say()
  2. Any log statements after say() do not execute until I close the session
  3. The user transcript only appears in the logs after the session is closed, with a very large transcription_delay

So my questions are:

  1. Why am I not getting any audio output from say()?
  2. Why does execution after say() appear to be blocked until the session is closed?
  3. What is causing the massive transcription delay, and how can I fix it?
async def _handle_session_ended(self, data: RpcInvocationData):
    handle = self.session.current_speech

    if handle:
        await handle.wait_for_playout()

    logger.info("done waiting for playout")

    await self.session.say(SESSION_END_MESSAGE, allow_interruptions=False)
    logger.info("Session end message said")



We’re seeing unreliable behavior with session.say() when it is triggered after waiting for current speech playout. The issue only occurs when user transcripts are delayed. If the transcript arrives on time, audio playback works as expected. We can not determine what could cause the transcription to get delayed and be received on when eventually session end button is clicked when session ends.

Expected Behavior

session.say() should always produce audible output once called.
Audio playback should not depend on user transcript timing.

Delayed or late transcripts should not:

  1. Block audio playout
  2. Stall the session
  3. Delay execution of code after say()

User transcripts should arrive in logs in a timely manner, independent of session lifecycle.

Reproduction Steps

This reproduces unreliability so need to try multiple times

start avatar session (liveavatar) then start session.

Trigger the RPC that runs the following logic:
1. Detect an active speech
2. Call wait_for_playout() on the current speech handle
3. Call session.say() with allow_interruptions=False

Observe the behavior:
wait_for_playout() will resolve until the user speaks again
session.say() is invoked but no audio is heard
Code after say() does not execute
Send another user message:
The system reports that say() cannot be interrupted (indicating it is considered active)
Close the session:
The delayed user transcript appears in logs with a large transcription_delay
Any log after say() appears in logs

Operating System

Windows 11

Models Used

STT:. Azure STT , LLM: GPT-4o, TTS: minimax speech-2.6-hd

Package Versions

"livekit-agents[azure,elevenlabs,minimax,openai,silero]~=1.3.10",
"livekit-plugins-liveavatar>=1.3.10",

Session/Room/Call IDs

RM_7xxFDKPHb3XP
RM_aMPRKkxqReo9
RM_Lqr8LQpXJ5sa

Proposed Solution

Additional Context

    17:23:42.826 INFO   IntakeAgent        done waiting for playout {"encounterId": "123456789001"}
    17:23:42.837 INFO   livekit.agents     VoiceAgent.say invoked with add_to_chat_ctx=True  
                                         {"encounterId": "123456789001"}
    17:23:42.845 INFO   livekit.agents     SpeechHandle created: speech_aad605786e1e  
                                         {"encounterId": "123456789001"}
    17:23:42.853 INFO   livekit.agents     Speech task scheduled for handle: speech_aad605786e1e  
                                         {"encounterId": "123456789001"}
    17:23:42.858 INFO   livekit.agents     SpeechHandle enqueued: speech_aad605786e1e (queue size=1)  
                                         {"encounterId": "123456789001"}
    17:23:55.363 INFO   livekit.agents     closing agent session due to participant disconnect (disable via          
                                           `RoomInputOptions.close_on_disconnect=False`)                             
                                         {"room": "room_1769689162407", "participant": "user", "reason": 
"CLIENT_INITIATED", "encounterId": "123456789001"}
    17:23:55.393 DEBUG  livekit.agents     stream closed  
                                         {"participant": "user", "source": "SOURCE_MICROPHONE", "encounterId":       
"123456789001"}
    17:23:55.406 INFO   IntakeAgent        Session end message said {"encounterId": "123456789001"}
    17:23:55.416 DEBUG  livekit.agents     input stream detached  
                                         {"participant": "user", "source": "SOURCE_UNKNOWN", "accepted_sources":     
["SOURCE_MICROPHONE"], "encounterId": "123456789001"}
    17:23:55.525 INFO   IntakeAgent        User stopped speaking {"encounterId": "123456789001"}

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions