Skip to content

Conversation

@tschellenbach
Copy link
Member

@tschellenbach tschellenbach commented Nov 11, 2025

  • Scribe STT

Summary by CodeRabbit

Release Notes

  • New Features

    • ElevenLabs Speech-to-Text (STT) integration with Scribe v2 real-time transcription
    • AudioQueue component for buffering audio with configurable duration and sample reading
  • Documentation

    • New ElevenLabs TTS/STT integration example with setup and customization guide
    • Assistant persona guidelines for voice AI interaction styles
  • Other

    • ElevenLabs dependency updated to version >=2.22.1

@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces ElevenLabs Scribe v2 real-time STT (Speech-to-Text) integration for Vision Agents. It adds a new STT plugin implementation with WebSocket-based audio streaming, a comprehensive test suite, example code demonstrating integration with GetStream and Gemini LLM, updated project metadata, and a participant fixture for testing.

Changes

Cohort / File(s) Summary
Test fixtures and configuration
conftest.py
Added participant fixture returning Participant({}, user_id="test-user") with import from vision_agents.core.edge.types
ElevenLabs STT implementation
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py
New STT class extending base STT interface with WebSocket-based transcription, audio buffering (16 kHz mono), real-time event emission (partial/final transcripts), error handling, and exponential backoff reconnection logic
ElevenLabs plugin exports
plugins/elevenlabs/vision_agents/plugins/elevenlabs/__init__.py
Added STT import and updated __all__ to export both TTS and STT
ElevenLabs plugin configuration
plugins/elevenlabs/pyproject.toml
Updated description to include STT, expanded keywords, and bumped elevenlabs dependency from >=2.5.0 to >=2.22.1
STT integration tests
plugins/elevenlabs/tests/test_elevenlabs_stt.py
New test suite validating STT transcription across 16/48 kHz audio, resampling, participant metadata, chunked streaming, partial/final transcripts, and multiple audio segments
Example package
plugins/elevenlabs/example/pyproject.toml
New project configuration with python-dotenv and vision-agents plugin dependencies; local editable sources for in-repo components
Example implementation
plugins/elevenlabs/example/elevenlabs_example.py
New async functions: create_agent() configuring Agent with GetStream edge, ElevenLabs TTS/STT, Gemini LLM, and turn detection; join_call() handling call lifecycle and agent responses
Example documentation
plugins/elevenlabs/example/README.md, plugins/elevenlabs/example/assistant.md, plugins/elevenlabs/example/__init__.py
Added comprehensive README with setup/configuration/troubleshooting, assistant persona guidelines, and package initializer
Documentation updates
docs/ai/instructions/ai-utils.md
Added AudioQueue documentation describing audio buffering and duration/sample-based reading
Minor formatting
tests/test_audio_queue.py
Added blank lines for formatting consistency

Sequence Diagram

sequenceDiagram
    actor User
    participant Vision_Agent
    participant STT as ElevenLabs STT
    participant WebSocket as ElevenLabs WebSocket
    participant AudioQueue as Audio Queue
    participant Transcript_Emitter as Event Emitter
    
    User->>Vision_Agent: send PCM audio
    Vision_Agent->>STT: process_audio(pcm_data, participant)
    STT->>STT: resample to 16kHz mono if needed
    STT->>AudioQueue: enqueue audio
    
    par Continuous Processing
        STT->>STT: _send_audio_loop()
        STT->>AudioQueue: dequeue batch
        STT->>STT: base64 encode
        STT->>WebSocket: send audio bytes
    and WebSocket Listening
        WebSocket-->>STT: on_partial_transcript event
        STT->>Transcript_Emitter: emit partial transcript
        Transcript_Emitter-->>Vision_Agent: transcript event
        
        WebSocket-->>STT: on_committed_transcript event
        STT->>Transcript_Emitter: emit final transcript
        Transcript_Emitter-->>Vision_Agent: transcript event
    end
    
    alt Connection Error
        WebSocket-->>STT: on_error event
        STT->>STT: _attempt_reconnect (exponential backoff)
        STT->>WebSocket: re-establish connection
    end
    
    STT-->>Vision_Agent: transcripts + metadata (confidence, language)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Key areas requiring careful attention:

  • WebSocket connection lifecycle (stt.py lines handling connection setup, reconnection logic with exponential backoff, error recovery) — verify proper state management and resource cleanup
  • Audio streaming and buffering logic (_send_audio_loop() method) — validate audio batching, encoding, and timing for real-time constraints
  • Event handling and participant context (_on_partial_transcript(), _on_committed_transcript()) — ensure transcript events correctly associate with participant metadata and error handling when participant is missing
  • Resampling and audio format conversion — verify 16 kHz mono conversion is correctly applied across test cases (16 kHz native, 48 kHz resampled, chunked streams)
  • Test coverage and integration — confirm all eight test cases properly simulate real-time streaming with appropriate delays and assertion patterns

Possibly related PRs

  • [AI-201] Fish speech to text #121 — Parallel modifications to conftest.py adding STT test fixtures and related plugin exports align with same test infrastructure changes.
  • WIP - Vogent + New Smart TURN + Audio utils usage #128 — Widespread Participant-based event field migration and STT/Audio utilities directly align with this PR's use of Participant in STT event emission and test setup.
  • Cleanup stt #122 — ElevenLabs STT implementation implements the STT API and event cleanup patterns established in that PR's interface changes.

Suggested labels

plugin-elevenlabs, tests, examples

Suggested reviewers

  • Nash0x7E2

Poem

A socket speaks in fragments, partial truths—
Base64 whispers bleeding through the wire,
The bot resamples grief to sixteen kilohertz,
Reconnects when silence breaks the pact.
What transcript haunts the threshold
Between intent and utterance, between the speak and the heard?

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch scribe2

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ddc1433 and dd51cc5.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • conftest.py (2 hunks)
  • docs/ai/instructions/ai-utils.md (1 hunks)
  • plugins/elevenlabs/example/README.md (1 hunks)
  • plugins/elevenlabs/example/__init__.py (1 hunks)
  • plugins/elevenlabs/example/assistant.md (1 hunks)
  • plugins/elevenlabs/example/elevenlabs_example.py (1 hunks)
  • plugins/elevenlabs/example/pyproject.toml (1 hunks)
  • plugins/elevenlabs/pyproject.toml (1 hunks)
  • plugins/elevenlabs/tests/test_elevenlabs_stt.py (1 hunks)
  • plugins/elevenlabs/vision_agents/plugins/elevenlabs/__init__.py (1 hunks)
  • plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1 hunks)
  • tests/test_audio_queue.py (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tschellenbach tschellenbach marked this pull request as ready for review November 11, 2025 22:29
@tschellenbach tschellenbach merged commit 59758c5 into main Nov 11, 2025
3 of 4 checks passed
@tschellenbach tschellenbach deleted the scribe2 branch November 11, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants