-
Notifications
You must be signed in to change notification settings - Fork 79
fix: Agent Example and TURN detection #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implement automatic LLM triggering in _on_transcript() for both modes: * Without turn detection: triggers immediately on transcript completion * With turn detection: accumulates transcripts and waits for TurnEndedEvent - Add _pending_user_transcripts dict to track multi-chunk transcripts per user - Implement turn detection LLM response in _on_turn_event() - Add TTS interruption when user starts speaking (barge-in) - Fix FAL turn detection event emission logic - Fix double TTS triggering in OpenAI LLM plugin (was emitting LLMResponseCompletedEvent twice) - Add FAL turn detection to simple agent example - Update example dependencies to use vision-agents naming Known limitation: LLM response generation is not yet cancelled when user interrupts. Only TTS audio playback stops, but LLM continues generating in background.
WalkthroughImplements per-speaker transcript accumulation and turn-based LLM triggering, TTS interruption on non-agent speech, realtime-mode short-circuiting, and expanded partial-transcript handling. Fal turn detection now emits explicit TURN_ENDED/TURN_STARTED on speaker switches. OpenAI plugin avoids duplicate completion events for streaming. Example and deps updated to vision-agents and fal-client. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Mic as FalTurnDetection
participant Agent
participant LLM
participant TTS
User->>Mic: audio
Mic->>Agent: TURN_STARTED(user_id)
Note over Agent,TTS: If TTS playing and speaker != agent → interrupt TTS
par Speech streaming
Mic-->>Agent: PARTIAL_TRANSCRIPT / TRANSCRIPT chunks
Agent->>Agent: Accumulate per-speaker transcripts
end
Mic->>Agent: TURN_ENDED(user_id)
alt realtime_mode == true
Agent->>Agent: Short-circuit (no LLM trigger)
else turn_detection enabled
Agent->>Agent: Fetch accumulated transcript for user
alt transcript non-empty
Agent->>LLM: simple_response(text, participant)
alt streaming response
LLM-->>Agent: stream events/tokens
Agent-->>TTS: stream speak (optional)
Note right of Agent: Completion emitted by streaming path
else non-streaming response
LLM-->>Agent: final response
Agent-->>Agent: emit LLMResponseCompletedEvent
Agent-->>TTS: speak response (optional)
end
Agent->>Agent: clear pending transcript for user
else
Agent->>Agent: no-op
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
- Add early return if in realtime mode to skip LLM triggering logic - In realtime mode, the LLM handles STT, turn detection, and responses itself - Removes redundant check in else branch - Improves code clarity and efficiency
- Add early return for realtime mode after logging the event - Skips unnecessary transcript fetching and participant metadata extraction - Removes redundant realtime_mode check later in the flow - Consistent with _on_transcript optimization
- Realtime LLMs handle their own turn detection and interruption - Skip all turn event processing in realtime mode (not just LLM triggering) - Removes duplicate realtime check in TurnEndedEvent branch - Cleaner and more efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
examples/01_simple_agent_example/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
agents-core/vision_agents/core/agents/agents.py(3 hunks)agents-core/vision_agents/core/turn_detection/fal_turn_detection.py(1 hunks)examples/01_simple_agent_example/pyproject.toml(1 hunks)examples/01_simple_agent_example/simple_agent_example.py(2 hunks)plugins/openai/vision_agents/plugins/openai/openai_llm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide
Files:
plugins/openai/vision_agents/plugins/openai/openai_llm.pyagents-core/vision_agents/core/agents/agents.pyagents-core/vision_agents/core/turn_detection/fal_turn_detection.pyexamples/01_simple_agent_example/simple_agent_example.py
🧬 Code graph analysis (3)
agents-core/vision_agents/core/agents/agents.py (2)
agents-core/vision_agents/core/turn_detection/events.py (1)
TurnEndedEvent(29-44)plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
simple_response(67-91)
agents-core/vision_agents/core/turn_detection/fal_turn_detection.py (1)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
_emit_turn_event(99-126)TurnEvent(12-16)
examples/01_simple_agent_example/simple_agent_example.py (1)
agents-core/vision_agents/core/turn_detection/fal_turn_detection.py (1)
FalTurnDetection(31-377)
🪛 GitHub Actions: CI (unit)
agents-core/vision_agents/core/agents/agents.py
[error] 807-807: F541: f-string without any placeholders. Remove extraneous f prefix. Found 1 error; 1 fixable with the --fix option.
- Fixed lint error F541 on line 797 - Changed f-string to regular string since no interpolation needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
agents-core/vision_agents/core/agents/agents.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide
Files:
agents-core/vision_agents/core/agents/agents.py
🧬 Code graph analysis (1)
agents-core/vision_agents/core/agents/agents.py (3)
agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(10-25)TurnEndedEvent(29-44)agents-core/vision_agents/core/edge/types.py (1)
duration(73-109)plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
simple_response(67-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
| # This is the signal that the user has finished speaking and expects a response | ||
| if event.speaker_id and event.speaker_id != self.agent_user.id: | ||
| # Get the accumulated transcript for this speaker | ||
| transcript = self._pending_user_transcripts.get(event.speaker_id, "") | ||
|
|
||
| if transcript and transcript.strip(): | ||
| self.logger.info(f"🤖 Triggering LLM response after turn ended for {event.speaker_id}") | ||
|
|
||
| # Create participant object if we have metadata | ||
| participant = None | ||
| if hasattr(event, 'custom') and event.custom: | ||
| # Try to extract participant info from custom metadata | ||
| participant = event.custom.get('participant') | ||
|
|
||
| # Trigger LLM response with the complete transcript | ||
| if self.llm: | ||
| await self.simple_response(transcript, participant) | ||
|
|
||
| # Clear the pending transcript for this speaker | ||
| self._pending_user_transcripts[event.speaker_id] = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't assume transcripts arrive before TurnEnded.
Turn detection emits TurnEndedEvent as soon as silence is detected, but STT final transcripts can land a beat later (we call turn_detection.process_audio before stt.process_audio). When that happens, Line 701 sees an empty transcript, we skip simple_response, and nothing ever re-triggers once the transcript finally shows up in _on_transcript. Result: user turns are silently dropped whenever STT lags behind turn detection—major functional regression.
Please track “turn ended but transcript pending” state. One way:
@@
- self._pending_user_transcripts: Dict[str, str] = {}
+ self._pending_user_transcripts: Dict[str, str] = {}
+ self._pending_turn_completions: set[str] = set()
@@ def _on_turn_event(...):
- if event.speaker_id and event.speaker_id != self.agent_user.id:
+ if event.speaker_id and event.speaker_id != self.agent_user.id:
transcript = self._pending_user_transcripts.get(event.speaker_id, "")
- if transcript and transcript.strip():
+ if transcript and transcript.strip():
...
self._pending_user_transcripts[event.speaker_id] = ""
+ self._pending_turn_completions.discard(event.speaker_id)
+ else:
+ self._pending_turn_completions.add(event.speaker_id)
@@ def _on_transcript(...):
- if user_id not in self._pending_user_transcripts:
+ if user_id not in self._pending_user_transcripts:
...
else:
...
+ if user_id in getattr(self, "_pending_turn_completions", set()):
+ participant = getattr(event, "user_metadata", None)
+ await self.simple_response(self._pending_user_transcripts[user_id], participant)
+ self._pending_user_transcripts[user_id] = ""
+ self._pending_turn_completions.discard(user_id)Any equivalent solution that ensures a late-arriving transcript still fires the LLM response works.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 699–718, the
TurnEnded handling assumes the final STT transcript is already available and
skips triggering the LLM when the transcript arrives late; fix by recording that
a turn ended for this speaker when TurnEnded is received (e.g., add speaker_id
to a pending_turns set or mark a flag alongside the empty transcript) and then,
in the transcript arrival path (_on_transcript or wherever transcripts are
written to _pending_user_transcripts), check for that pending-turn-ended marker
and if present call simple_response(transcript, participant) and clear both the
pending marker and the stored transcript; ensure you still call simple_response
immediately when TurnEnded sees a non-empty transcript and avoid double-calling
by clearing the marker after handling.
commit 4757845 Merge: 8d9a9e2 c834231 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 8 10:29:01 2025 +0200 Merge branch 'main' of github.com:GetStream/agents commit 8d9a9e2 Author: Thierry Schellenbach <thierry@getstream.io> Date: Wed Oct 8 10:28:54 2025 +0200 move fal smart detection to plugin commit c834231 Merge: b6deb4d facedf2 Author: maxkahan <max.kahan@getstream.io> Date: Wed Oct 8 10:17:22 2025 +0200 Merge pull request #73 from GetStream/fix/shared_forwarder fix: video feed mismatch and VideoForwarder resource leaks commit b6deb4d Author: Neevash Ramdial (Nash) <mail@neevash.dev> Date: Wed Oct 8 09:38:51 2025 +0200 Add CI secrets (#72) * Add in secrets for daily integration * Rename to realtime instead of realtime 2 * Add events.wait to xAI test commit 73ddc8e Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 17:27:40 2025 +0200 pyproject cleanup commit facedf2 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 17:26:11 2025 +0200 fix: critical video feed mismatch and VideoForwarder resource leaks CRITICAL FIXES: 1. Video Feed Mismatch (LLM getting wrong video) - When YOLO/video processors are used, LLM was receiving empty processed track - Root cause: shared_forwarder was created from RAW track but LLM was given processed track - Fix: Create separate forwarders for raw and processed video tracks - Now LLM correctly receives YOLO-annotated frames when using pose detection 2. VideoForwarder Resource Leaks - Consumer tasks were never removed from _tasks set (memory leak) - Fix: Add task.add_done_callback(self._task_done) to clean up tasks - Producer exceptions were silently swallowed - Fix: Log and re-raise exceptions for proper error handling 3. Race Condition in VideoForwarder.stop() - Used list() snapshot for cancellation but original set for gather() - Fix: Use tasks_snapshot consistently throughout stop() 4. Multiple start() Protection - No guard against calling start() multiple times - Fix: Add _started flag and early return with warning 5. Missing VideoForwarder Cleanup in Agent - Forwarders were created but never stopped on agent.close() - Fix: Track all forwarders and stop them in close() method These fixes prevent resource leaks, ensure correct video routing, and improve error visibility for production debugging. commit fbc1759 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 17:19:45 2025 +0200 wip on pyproject files commit 3739605 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 15:55:19 2025 +0200 pypi environment commit 6144265 Merge: 231efc8 9b5db80 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 15:17:09 2025 +0200 cleanup commit 231efc8 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 15:12:31 2025 +0200 remove duplicate publish tracks commit 9b5db80 Merge: 2d08f1d 4f60ab2 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 14:40:31 2025 +0200 Merge pull request #71 from GetStream/fix/agents-tracks fix: remove duplicate track publishing code commit 2d08f1d Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 14:30:01 2025 +0200 fix openai realtime test commit 4f60ab2 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 14:25:01 2025 +0200 fix: remove duplicate track publishing code and initialize error counters - Remove duplicate track publishing and audio/video listening code in join() method - Initialize timeout_errors and consecutive_errors before video processing loop - Increment timeout_errors in TimeoutError exception handler - Fixes potential crash when error counters are referenced but not initialized commit ca562de Merge: 4b8f686 b121bc6 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 14:24:02 2025 +0200 Merge branch 'main' of github.com:GetStream/agents commit 4b8f686 Author: Thierry Schellenbach <thierry@getstream.io> Date: Tue Oct 7 14:23:54 2025 +0200 nicer tests for openai realtime commit b121bc6 Merge: 4a178e9 1bd131b Author: Yarik <43354956+yarikdevcom@users.noreply.github.com> Date: Tue Oct 7 14:22:56 2025 +0200 Merge pull request #69 from GetStream/yarikrudenok/ai-176-migrate-branding-to-vision-agents Refactor project structure to replace 'stream_agents' with 'vision_ag… commit 1bd131b Author: Yarik <yarik.rudenok@getstream.io> Date: Tue Oct 7 14:16:49 2025 +0200 feat: [AI-176] Rename to vision commit 4a178e9 Merge: a940bd3 2eacdfb Author: maxkahan <max.kahan@getstream.io> Date: Tue Oct 7 11:50:28 2025 +0100 Merge pull request #70 from GetStream/fix/agent-example fix: Agent Example and TURN detection commit 2eacdfb Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 12:42:58 2025 +0200 Fix: Remove f-string prefix from log with no placeholders - Fixed lint error F541 on line 797 - Changed f-string to regular string since no interpolation needed commit 66deea5 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 12:41:33 2025 +0200 Move realtime mode check to top of _on_turn_event - Realtime LLMs handle their own turn detection and interruption - Skip all turn event processing in realtime mode (not just LLM triggering) - Removes duplicate realtime check in TurnEndedEvent branch - Cleaner and more efficient commit 8c01c31 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 12:20:42 2025 +0200 Optimize: Check realtime mode early in _on_turn_event TurnEndedEvent - Add early return for realtime mode after logging the event - Skips unnecessary transcript fetching and participant metadata extraction - Removes redundant realtime_mode check later in the flow - Consistent with _on_transcript optimization commit f4fa0a5 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 12:18:09 2025 +0200 Optimize: Check realtime mode early in _on_transcript - Add early return if in realtime mode to skip LLM triggering logic - In realtime mode, the LLM handles STT, turn detection, and responses itself - Removes redundant check in else branch - Improves code clarity and efficiency commit 12b1638 Author: Deven Joshi <deven9852@gmail.com> Date: Tue Oct 7 10:48:07 2025 +0200 Fix agent LLM triggering and turn detection - Implement automatic LLM triggering in _on_transcript() for both modes: * Without turn detection: triggers immediately on transcript completion * With turn detection: accumulates transcripts and waits for TurnEndedEvent - Add _pending_user_transcripts dict to track multi-chunk transcripts per user - Implement turn detection LLM response in _on_turn_event() - Add TTS interruption when user starts speaking (barge-in) - Fix FAL turn detection event emission logic - Fix double TTS triggering in OpenAI LLM plugin (was emitting LLMResponseCompletedEvent twice) - Add FAL turn detection to simple agent example - Update example dependencies to use vision-agents naming Known limitation: LLM response generation is not yet cancelled when user interrupts. Only TTS audio playback stops, but LLM continues generating in background.
Summary by CodeRabbit