fix: commit user turn with STT and realtime#4663
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis PR removes the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
davidzhao
left a comment
There was a problem hiding this comment.
is this true when manual turn mode is used?
I see that if it's not server vad, we have to manually create them: https://platform.openai.com/docs/api-reference/realtime-client-events/input_audio_buffer/commit
I think I found the issue: when using with STT, it actually committed the transcripts to the model and therefore triggered a response. Without STT, it will not respond as expected. |
got it, so the bug is: when STT is used with manual turn detection mode and realtime model, it should not trigger a response. |
Yep. But it is also kinda ambiguous what users want if they have an STT configured: do they want it only for the transcripts or actual model text input. |
|
In our use case we want STT with manual turn taking both as a way to know when it's safe to commit audio before a user turn has ended (as a user turn could be very long, we cannot wait until end of turn to commit) and for its improved transcripts post call versus openai server side transcription So I think ideally we'd have STT transcripts available and in the local chat context (so if there is a connection error we can restore the conversation on a fresh session) but the remote openai realtime chat context need not even be aware of it as it can function perfectly fine without a transcript of user utterances. STT also provides a neat signal to the user that the system is working during a long user turn (by allowing them to view their own transcription as they talk). |
6099024 to
a3a206e
Compare
commit c46013d Author: Long Chen <longch1024@gmail.com> Date: Tue Feb 3 20:02:57 2026 +0800 add exclude_config_update to ChatContext copy (livekit#4700) commit 7849a8c Author: Chenghao Mou <chenghao.mou@livekit.io> Date: Tue Feb 3 09:51:07 2026 +0000 fix: commit user turn with STT and realtime (livekit#4663) commit edfa391 Author: Chenghao Mou <chenghao.mou@livekit.io> Date: Tue Feb 3 09:48:36 2026 +0000 add STT usage for google (livekit#4599) commit 34d0d62 Author: Long Chen <longch1024@gmail.com> Date: Tue Feb 3 15:53:42 2026 +0800 fix gemini live tool execution interrupted by generation_complete event (livekit#4699) commit 1725929 Author: Long Chen <longch1024@gmail.com> Date: Tue Feb 3 11:08:27 2026 +0800 prevent tool cancellation when AgentTask is called inside it (livekit#4586)
Turns out using STT with a realtime model will trigger a response when committing a turn. I didn't notice the interruption when testing so wrongly assumed it was doing just fine.
Example to reproduce:
The model will respond after 20 seconds
cc @bml1g12
Summary by CodeRabbit
commit_user_turnmethod from the Realtime Session interface. This method was previously unused and non-functional across implementations and has been eliminated to simplify the API surface and reduce unnecessary complexity. Applications that reference this method require updating for compatibility.✏️ Tip: You can customize this high-level summary in your review settings.