Skip to content

Conversation

@d3xvn
Copy link
Contributor

@d3xvn d3xvn commented Oct 7, 2025

  • Implement automatic LLM triggering in _on_transcript() for both modes:
    • Without turn detection: triggers immediately on transcript completion
    • With turn detection: accumulates transcripts and waits for TurnEndedEvent
  • Add _pending_user_transcripts dict to track multi-chunk transcripts per user
  • Implement turn detection LLM response in _on_turn_event()
  • Add TTS interruption when user starts speaking (barge-in)
  • Fix FAL turn detection event emission logic
  • Fix double TTS triggering in OpenAI LLM plugin (was emitting LLMResponseCompletedEvent twice)

Summary by CodeRabbit

  • New Features
    • Turn-based handling with per-speaker transcript accumulation and optional immediate responses.
    • Automatic interruption of TTS when another participant begins speaking.
  • Bug Fixes
    • Avoid duplicate "response completed" events for streaming LLM outputs.
  • Documentation
    • Example updated to demonstrate enabling turn detection.
  • Chores
    • Renamed dependencies from stream-agents to vision-agents plugins; added fal-client and updated package sources.

- Implement automatic LLM triggering in _on_transcript() for both modes:
  * Without turn detection: triggers immediately on transcript completion
  * With turn detection: accumulates transcripts and waits for TurnEndedEvent
- Add _pending_user_transcripts dict to track multi-chunk transcripts per user
- Implement turn detection LLM response in _on_turn_event()
- Add TTS interruption when user starts speaking (barge-in)
- Fix FAL turn detection event emission logic
- Fix double TTS triggering in OpenAI LLM plugin (was emitting LLMResponseCompletedEvent twice)
- Add FAL turn detection to simple agent example
- Update example dependencies to use vision-agents naming

Known limitation: LLM response generation is not yet cancelled when user interrupts.
Only TTS audio playback stops, but LLM continues generating in background.
@coderabbitai
Copy link

coderabbitai bot commented Oct 7, 2025

Walkthrough

Implements per-speaker transcript accumulation and turn-based LLM triggering, TTS interruption on non-agent speech, realtime-mode short-circuiting, and expanded partial-transcript handling. Fal turn detection now emits explicit TURN_ENDED/TURN_STARTED on speaker switches. OpenAI plugin avoids duplicate completion events for streaming. Example and deps updated to vision-agents and fal-client.

Changes

Cohort / File(s) Summary
Agent turn-based handling
agents-core/vision_agents/core/agents/agents.py
Adds self._pending_user_transcripts for per-speaker accumulation; handles TurnStartedEvent / TurnEndedEvent to interrupt TTS and trigger LLM responses when appropriate; respects realtime_mode to short-circuit LLM triggering; accumulates transcripts in _on_transcript and extends _on_partial_transcript behavior for streaming user messages.
Turn detection event semantics
agents-core/vision_agents/core/turn_detection/fal_turn_detection.py
Reworks _process_turn_prediction to emit TURN_ENDED more deterministically (including on speaker switches), emit previous-speaker TURN_ENDED before new TURN_STARTED, and update/clear current_speaker around events.
OpenAI LLM completion events
plugins/openai/vision_agents/plugins/openai/openai_llm.py
Emits LLMResponseCompletedEvent only for non-streaming OpenAIResponse paths to avoid duplicate completion events when streaming responses are handled elsewhere.
Example app enablement
examples/01_simple_agent_example/simple_agent_example.py
Imports FalTurnDetection and passes an instance to the agent via the new turn_detection parameter to enable turn-detection during the example run.
Example project dependencies
examples/01_simple_agent_example/pyproject.toml
Replaces stream-agents* dependencies with vision-agents*, adds fal-client>=0.5.3, updates [tool.uv.sources] to vision-agents plugin sources, and removes krisp-audio sources/export mapping.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Mic as FalTurnDetection
  participant Agent
  participant LLM
  participant TTS

  User->>Mic: audio
  Mic->>Agent: TURN_STARTED(user_id)
  Note over Agent,TTS: If TTS playing and speaker != agent → interrupt TTS

  par Speech streaming
    Mic-->>Agent: PARTIAL_TRANSCRIPT / TRANSCRIPT chunks
    Agent->>Agent: Accumulate per-speaker transcripts
  end

  Mic->>Agent: TURN_ENDED(user_id)
  alt realtime_mode == true
    Agent->>Agent: Short-circuit (no LLM trigger)
  else turn_detection enabled
    Agent->>Agent: Fetch accumulated transcript for user
    alt transcript non-empty
      Agent->>LLM: simple_response(text, participant)
      alt streaming response
        LLM-->>Agent: stream events/tokens
        Agent-->>TTS: stream speak (optional)
        Note right of Agent: Completion emitted by streaming path
      else non-streaming response
        LLM-->>Agent: final response
        Agent-->>Agent: emit LLMResponseCompletedEvent
        Agent-->>TTS: speak response (optional)
      end
      Agent->>Agent: clear pending transcript for user
    else
      Agent->>Agent: no-op
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • GetStream/agents#66 — Modifies Agent turn-detection integration and fal turn event handling; overlaps with changes to agents.py and fal_turn_detection.py.
  • GetStream/agents#68 — Renames/exports turn-detection base and relates to the new turn_detection parameter and turn-detector usage in examples.

Poem

I listen—glass mouth, waiting to split—
and catalog the small departures of air.
Each voice a cold room where I stitch an ending,
clamp the sudden syllable, press it into a lamp.
The machine learns the shape of goodbye and says it back.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “fix: Agent Example and TURN detection” is too generic and splits focus between an example and turn detection without capturing the primary change of introducing turn-based LLM triggering, transcript accumulation, TTS interruption, and related bug fixes. Consider renaming the PR to clearly summarize the main feature and fixes, for example “Add turn-based LLM triggering with transcript accumulation and fix turn detection logic” to provide a concise, accurate overview of the scope of changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/agent-example

Comment @coderabbitai help to get the list of available commands and usage tips.

d3xvn added 2 commits October 7, 2025 12:18
- Add early return if in realtime mode to skip LLM triggering logic
- In realtime mode, the LLM handles STT, turn detection, and responses itself
- Removes redundant check in else branch
- Improves code clarity and efficiency
- Add early return for realtime mode after logging the event
- Skips unnecessary transcript fetching and participant metadata extraction
- Removes redundant realtime_mode check later in the flow
- Consistent with _on_transcript optimization
@d3xvn d3xvn marked this pull request as ready for review October 7, 2025 10:38
- Realtime LLMs handle their own turn detection and interruption
- Skip all turn event processing in realtime mode (not just LLM triggering)
- Removes duplicate realtime check in TurnEndedEvent branch
- Cleaner and more efficient
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 589f6da and 8c01c31.

⛔ Files ignored due to path filters (1)
  • examples/01_simple_agent_example/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • agents-core/vision_agents/core/agents/agents.py (3 hunks)
  • agents-core/vision_agents/core/turn_detection/fal_turn_detection.py (1 hunks)
  • examples/01_simple_agent_example/pyproject.toml (1 hunks)
  • examples/01_simple_agent_example/simple_agent_example.py (2 hunks)
  • plugins/openai/vision_agents/plugins/openai/openai_llm.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

  • plugins/openai/vision_agents/plugins/openai/openai_llm.py
  • agents-core/vision_agents/core/agents/agents.py
  • agents-core/vision_agents/core/turn_detection/fal_turn_detection.py
  • examples/01_simple_agent_example/simple_agent_example.py
🧬 Code graph analysis (3)
agents-core/vision_agents/core/agents/agents.py (2)
agents-core/vision_agents/core/turn_detection/events.py (1)
  • TurnEndedEvent (29-44)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
  • simple_response (67-91)
agents-core/vision_agents/core/turn_detection/fal_turn_detection.py (1)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • _emit_turn_event (99-126)
  • TurnEvent (12-16)
examples/01_simple_agent_example/simple_agent_example.py (1)
agents-core/vision_agents/core/turn_detection/fal_turn_detection.py (1)
  • FalTurnDetection (31-377)
🪛 GitHub Actions: CI (unit)
agents-core/vision_agents/core/agents/agents.py

[error] 807-807: F541: f-string without any placeholders. Remove extraneous f prefix. Found 1 error; 1 fixable with the --fix option.

- Fixed lint error F541 on line 797
- Changed f-string to regular string since no interpolation needed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c01c31 and 2eacdfb.

📒 Files selected for processing (1)
  • agents-core/vision_agents/core/agents/agents.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Never adjust sys.path (e.g., sys.path.append/insert/assignment)
Docstrings must follow the Google style guide

Files:

  • agents-core/vision_agents/core/agents/agents.py
🧬 Code graph analysis (1)
agents-core/vision_agents/core/agents/agents.py (3)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (10-25)
  • TurnEndedEvent (29-44)
agents-core/vision_agents/core/edge/types.py (1)
  • duration (73-109)
plugins/openai/vision_agents/plugins/openai/openai_llm.py (1)
  • simple_response (67-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"

Comment on lines +699 to +718
# This is the signal that the user has finished speaking and expects a response
if event.speaker_id and event.speaker_id != self.agent_user.id:
# Get the accumulated transcript for this speaker
transcript = self._pending_user_transcripts.get(event.speaker_id, "")

if transcript and transcript.strip():
self.logger.info(f"🤖 Triggering LLM response after turn ended for {event.speaker_id}")

# Create participant object if we have metadata
participant = None
if hasattr(event, 'custom') and event.custom:
# Try to extract participant info from custom metadata
participant = event.custom.get('participant')

# Trigger LLM response with the complete transcript
if self.llm:
await self.simple_response(transcript, participant)

# Clear the pending transcript for this speaker
self._pending_user_transcripts[event.speaker_id] = ""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't assume transcripts arrive before TurnEnded.

Turn detection emits TurnEndedEvent as soon as silence is detected, but STT final transcripts can land a beat later (we call turn_detection.process_audio before stt.process_audio). When that happens, Line 701 sees an empty transcript, we skip simple_response, and nothing ever re-triggers once the transcript finally shows up in _on_transcript. Result: user turns are silently dropped whenever STT lags behind turn detection—major functional regression.

Please track “turn ended but transcript pending” state. One way:

@@
-        self._pending_user_transcripts: Dict[str, str] = {}
+        self._pending_user_transcripts: Dict[str, str] = {}
+        self._pending_turn_completions: set[str] = set()
@@ def _on_turn_event(...):
-            if event.speaker_id and event.speaker_id != self.agent_user.id:
+            if event.speaker_id and event.speaker_id != self.agent_user.id:
                 transcript = self._pending_user_transcripts.get(event.speaker_id, "")
 
-                if transcript and transcript.strip():
+                if transcript and transcript.strip():
                     ...
                     self._pending_user_transcripts[event.speaker_id] = ""
+                    self._pending_turn_completions.discard(event.speaker_id)
+                else:
+                    self._pending_turn_completions.add(event.speaker_id)
@@ def _on_transcript(...):
-            if user_id not in self._pending_user_transcripts:
+            if user_id not in self._pending_user_transcripts:
                 ...
             else:
                 ...
 
+            if user_id in getattr(self, "_pending_turn_completions", set()):
+                participant = getattr(event, "user_metadata", None)
+                await self.simple_response(self._pending_user_transcripts[user_id], participant)
+                self._pending_user_transcripts[user_id] = ""
+                self._pending_turn_completions.discard(user_id)

Any equivalent solution that ensures a late-arriving transcript still fires the LLM response works.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 699–718, the
TurnEnded handling assumes the final STT transcript is already available and
skips triggering the LLM when the transcript arrives late; fix by recording that
a turn ended for this speaker when TurnEnded is received (e.g., add speaker_id
to a pending_turns set or mark a flag alongside the empty transcript) and then,
in the transcript arrival path (_on_transcript or wherever transcripts are
written to _pending_user_transcripts), check for that pending-turn-ended marker
and if present call simple_response(transcript, participant) and clear both the
pending marker and the stored transcript; ensure you still call simple_response
immediately when TurnEnded sees a non-empty transcript and avoid double-calling
by clearing the marker after handling.

@maxkahan maxkahan merged commit 4a178e9 into main Oct 7, 2025
2 of 5 checks passed
@maxkahan maxkahan deleted the fix/agent-example branch October 7, 2025 10:50
Nash0x7E2 added a commit that referenced this pull request Oct 8, 2025
commit 4757845
Merge: 8d9a9e2 c834231
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 8 10:29:01 2025 +0200

    Merge branch 'main' of github.com:GetStream/agents

commit 8d9a9e2
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Wed Oct 8 10:28:54 2025 +0200

    move fal smart detection to plugin

commit c834231
Merge: b6deb4d facedf2
Author: maxkahan <max.kahan@getstream.io>
Date:   Wed Oct 8 10:17:22 2025 +0200

    Merge pull request #73 from GetStream/fix/shared_forwarder

    fix: video feed mismatch and VideoForwarder resource leaks

commit b6deb4d
Author: Neevash Ramdial (Nash) <mail@neevash.dev>
Date:   Wed Oct 8 09:38:51 2025 +0200

    Add CI secrets  (#72)

    * Add in secrets for daily integration

    * Rename to realtime instead of realtime 2

    * Add events.wait to xAI test

commit 73ddc8e
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 17:27:40 2025 +0200

    pyproject cleanup

commit facedf2
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 17:26:11 2025 +0200

    fix: critical video feed mismatch and VideoForwarder resource leaks

    CRITICAL FIXES:

    1. Video Feed Mismatch (LLM getting wrong video)
       - When YOLO/video processors are used, LLM was receiving empty processed track
       - Root cause: shared_forwarder was created from RAW track but LLM was given processed track
       - Fix: Create separate forwarders for raw and processed video tracks
       - Now LLM correctly receives YOLO-annotated frames when using pose detection

    2. VideoForwarder Resource Leaks
       - Consumer tasks were never removed from _tasks set (memory leak)
       - Fix: Add task.add_done_callback(self._task_done) to clean up tasks
       - Producer exceptions were silently swallowed
       - Fix: Log and re-raise exceptions for proper error handling

    3. Race Condition in VideoForwarder.stop()
       - Used list() snapshot for cancellation but original set for gather()
       - Fix: Use tasks_snapshot consistently throughout stop()

    4. Multiple start() Protection
       - No guard against calling start() multiple times
       - Fix: Add _started flag and early return with warning

    5. Missing VideoForwarder Cleanup in Agent
       - Forwarders were created but never stopped on agent.close()
       - Fix: Track all forwarders and stop them in close() method

    These fixes prevent resource leaks, ensure correct video routing, and improve
    error visibility for production debugging.

commit fbc1759
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 17:19:45 2025 +0200

    wip on pyproject files

commit 3739605
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 15:55:19 2025 +0200

    pypi environment

commit 6144265
Merge: 231efc8 9b5db80
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 15:17:09 2025 +0200

    cleanup

commit 231efc8
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 15:12:31 2025 +0200

    remove duplicate publish tracks

commit 9b5db80
Merge: 2d08f1d 4f60ab2
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 14:40:31 2025 +0200

    Merge pull request #71 from GetStream/fix/agents-tracks

    fix: remove duplicate track publishing code

commit 2d08f1d
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 14:30:01 2025 +0200

    fix openai realtime test

commit 4f60ab2
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 14:25:01 2025 +0200

    fix: remove duplicate track publishing code and initialize error counters

    - Remove duplicate track publishing and audio/video listening code in join() method
    - Initialize timeout_errors and consecutive_errors before video processing loop
    - Increment timeout_errors in TimeoutError exception handler
    - Fixes potential crash when error counters are referenced but not initialized

commit ca562de
Merge: 4b8f686 b121bc6
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 14:24:02 2025 +0200

    Merge branch 'main' of github.com:GetStream/agents

commit 4b8f686
Author: Thierry Schellenbach <thierry@getstream.io>
Date:   Tue Oct 7 14:23:54 2025 +0200

    nicer tests for openai realtime

commit b121bc6
Merge: 4a178e9 1bd131b
Author: Yarik <43354956+yarikdevcom@users.noreply.github.com>
Date:   Tue Oct 7 14:22:56 2025 +0200

    Merge pull request #69 from GetStream/yarikrudenok/ai-176-migrate-branding-to-vision-agents

    Refactor project structure to replace 'stream_agents' with 'vision_ag…

commit 1bd131b
Author: Yarik <yarik.rudenok@getstream.io>
Date:   Tue Oct 7 14:16:49 2025 +0200

    feat: [AI-176] Rename to vision

commit 4a178e9
Merge: a940bd3 2eacdfb
Author: maxkahan <max.kahan@getstream.io>
Date:   Tue Oct 7 11:50:28 2025 +0100

    Merge pull request #70 from GetStream/fix/agent-example

    fix: Agent Example and TURN detection

commit 2eacdfb
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 12:42:58 2025 +0200

    Fix: Remove f-string prefix from log with no placeholders

    - Fixed lint error F541 on line 797
    - Changed f-string to regular string since no interpolation needed

commit 66deea5
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 12:41:33 2025 +0200

    Move realtime mode check to top of _on_turn_event

    - Realtime LLMs handle their own turn detection and interruption
    - Skip all turn event processing in realtime mode (not just LLM triggering)
    - Removes duplicate realtime check in TurnEndedEvent branch
    - Cleaner and more efficient

commit 8c01c31
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 12:20:42 2025 +0200

    Optimize: Check realtime mode early in _on_turn_event TurnEndedEvent

    - Add early return for realtime mode after logging the event
    - Skips unnecessary transcript fetching and participant metadata extraction
    - Removes redundant realtime_mode check later in the flow
    - Consistent with _on_transcript optimization

commit f4fa0a5
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 12:18:09 2025 +0200

    Optimize: Check realtime mode early in _on_transcript

    - Add early return if in realtime mode to skip LLM triggering logic
    - In realtime mode, the LLM handles STT, turn detection, and responses itself
    - Removes redundant check in else branch
    - Improves code clarity and efficiency

commit 12b1638
Author: Deven Joshi <deven9852@gmail.com>
Date:   Tue Oct 7 10:48:07 2025 +0200

    Fix agent LLM triggering and turn detection

    - Implement automatic LLM triggering in _on_transcript() for both modes:
      * Without turn detection: triggers immediately on transcript completion
      * With turn detection: accumulates transcripts and waits for TurnEndedEvent
    - Add _pending_user_transcripts dict to track multi-chunk transcripts per user
    - Implement turn detection LLM response in _on_turn_event()
    - Add TTS interruption when user starts speaking (barge-in)
    - Fix FAL turn detection event emission logic
    - Fix double TTS triggering in OpenAI LLM plugin (was emitting LLMResponseCompletedEvent twice)
    - Add FAL turn detection to simple agent example
    - Update example dependencies to use vision-agents naming

    Known limitation: LLM response generation is not yet cancelled when user interrupts.
    Only TTS audio playback stops, but LLM continues generating in background.
This was referenced Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants