[AI-271] Elevenhour labs Scribe2 #170

tschellenbach · 2025-11-11T22:19:47Z

Scribe STT

Summary by CodeRabbit

Release Notes

New Features
- ElevenLabs Speech-to-Text (STT) integration with Scribe v2 real-time transcription
- AudioQueue component for buffering audio with configurable duration and sample reading
Documentation
- New ElevenLabs TTS/STT integration example with setup and customization guide
- Assistant persona guidelines for voice AI interaction styles
Other
- ElevenLabs dependency updated to version >=2.22.1

coderabbitai · 2025-11-11T22:19:53Z

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces ElevenLabs Scribe v2 real-time STT (Speech-to-Text) integration for Vision Agents. It adds a new STT plugin implementation with WebSocket-based audio streaming, a comprehensive test suite, example code demonstrating integration with GetStream and Gemini LLM, updated project metadata, and a participant fixture for testing.

Changes

Cohort / File(s)	Summary
Test fixtures and configuration `conftest.py`	Added `participant` fixture returning `Participant({}, user_id="test-user")` with import from `vision_agents.core.edge.types`
ElevenLabs STT implementation `plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py`	New STT class extending base STT interface with WebSocket-based transcription, audio buffering (16 kHz mono), real-time event emission (partial/final transcripts), error handling, and exponential backoff reconnection logic
ElevenLabs plugin exports `plugins/elevenlabs/vision_agents/plugins/elevenlabs/__init__.py`	Added STT import and updated `__all__` to export both `TTS` and `STT`
ElevenLabs plugin configuration `plugins/elevenlabs/pyproject.toml`	Updated description to include STT, expanded keywords, and bumped elevenlabs dependency from >=2.5.0 to >=2.22.1
STT integration tests `plugins/elevenlabs/tests/test_elevenlabs_stt.py`	New test suite validating STT transcription across 16/48 kHz audio, resampling, participant metadata, chunked streaming, partial/final transcripts, and multiple audio segments
Example package `plugins/elevenlabs/example/pyproject.toml`	New project configuration with python-dotenv and vision-agents plugin dependencies; local editable sources for in-repo components
Example implementation `plugins/elevenlabs/example/elevenlabs_example.py`	New async functions: `create_agent()` configuring Agent with GetStream edge, ElevenLabs TTS/STT, Gemini LLM, and turn detection; `join_call()` handling call lifecycle and agent responses
Example documentation `plugins/elevenlabs/example/README.md`, `plugins/elevenlabs/example/assistant.md`, `plugins/elevenlabs/example/__init__.py`	Added comprehensive README with setup/configuration/troubleshooting, assistant persona guidelines, and package initializer
Documentation updates `docs/ai/instructions/ai-utils.md`	Added AudioQueue documentation describing audio buffering and duration/sample-based reading
Minor formatting `tests/test_audio_queue.py`	Added blank lines for formatting consistency

Sequence Diagram

sequenceDiagram
    actor User
    participant Vision_Agent
    participant STT as ElevenLabs STT
    participant WebSocket as ElevenLabs WebSocket
    participant AudioQueue as Audio Queue
    participant Transcript_Emitter as Event Emitter
    
    User->>Vision_Agent: send PCM audio
    Vision_Agent->>STT: process_audio(pcm_data, participant)
    STT->>STT: resample to 16kHz mono if needed
    STT->>AudioQueue: enqueue audio
    
    par Continuous Processing
        STT->>STT: _send_audio_loop()
        STT->>AudioQueue: dequeue batch
        STT->>STT: base64 encode
        STT->>WebSocket: send audio bytes
    and WebSocket Listening
        WebSocket-->>STT: on_partial_transcript event
        STT->>Transcript_Emitter: emit partial transcript
        Transcript_Emitter-->>Vision_Agent: transcript event
        
        WebSocket-->>STT: on_committed_transcript event
        STT->>Transcript_Emitter: emit final transcript
        Transcript_Emitter-->>Vision_Agent: transcript event
    end
    
    alt Connection Error
        WebSocket-->>STT: on_error event
        STT->>STT: _attempt_reconnect (exponential backoff)
        STT->>WebSocket: re-establish connection
    end
    
    STT-->>Vision_Agent: transcripts + metadata (confidence, language)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Key areas requiring careful attention:

WebSocket connection lifecycle (stt.py lines handling connection setup, reconnection logic with exponential backoff, error recovery) — verify proper state management and resource cleanup
Audio streaming and buffering logic (_send_audio_loop() method) — validate audio batching, encoding, and timing for real-time constraints
Event handling and participant context (_on_partial_transcript(), _on_committed_transcript()) — ensure transcript events correctly associate with participant metadata and error handling when participant is missing
Resampling and audio format conversion — verify 16 kHz mono conversion is correctly applied across test cases (16 kHz native, 48 kHz resampled, chunked streams)
Test coverage and integration — confirm all eight test cases properly simulate real-time streaming with appropriate delays and assertion patterns

Possibly related PRs

[AI-201] Fish speech to text #121 — Parallel modifications to conftest.py adding STT test fixtures and related plugin exports align with same test infrastructure changes.
WIP - Vogent + New Smart TURN + Audio utils usage #128 — Widespread Participant-based event field migration and STT/Audio utilities directly align with this PR's use of Participant in STT event emission and test setup.
Cleanup stt #122 — ElevenLabs STT implementation implements the STT API and event cleanup patterns established in that PR's interface changes.

Suggested labels

plugin-elevenlabs, tests, examples

Suggested reviewers

Nash0x7E2

Poem

A socket speaks in fragments, partial truths—
Base64 whispers bleeding through the wire,
The bot resamples grief to sixteen kilohertz,
Reconnects when silence breaks the pact.
What transcript haunts the threshold
Between intent and utterance, between the speak and the heard?

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch scribe2

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ddc1433 and dd51cc5.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

conftest.py (2 hunks)
docs/ai/instructions/ai-utils.md (1 hunks)
plugins/elevenlabs/example/README.md (1 hunks)
plugins/elevenlabs/example/__init__.py (1 hunks)
plugins/elevenlabs/example/assistant.md (1 hunks)
plugins/elevenlabs/example/elevenlabs_example.py (1 hunks)
plugins/elevenlabs/example/pyproject.toml (1 hunks)
plugins/elevenlabs/pyproject.toml (1 hunks)
plugins/elevenlabs/tests/test_elevenlabs_stt.py (1 hunks)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/__init__.py (1 hunks)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/stt.py (1 hunks)
tests/test_audio_queue.py (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tschellenbach added 3 commits November 11, 2025 15:10

scribe2

984a97b

scribe2

3f8eaf1

cleanup

81a2749

github-actions bot added dependencies plugins config docs labels Nov 11, 2025

add an elevenlabs example

dd51cc5

github-actions bot added the project-info label Nov 11, 2025

tschellenbach marked this pull request as ready for review November 11, 2025 22:29

tschellenbach merged commit 59758c5 into main Nov 11, 2025
3 of 4 checks passed

tschellenbach deleted the scribe2 branch November 11, 2025 22:29

coderabbitai bot mentioned this pull request Nov 12, 2025

Fix audio processor signature to use PcmData instead of bytes #173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AI-271] Elevenhour labs Scribe2 #170

[AI-271] Elevenhour labs Scribe2 #170

Uh oh!

tschellenbach commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[AI-271] Elevenhour labs Scribe2 #170

[AI-271] Elevenhour labs Scribe2 #170

Uh oh!

Conversation

tschellenbach commented Nov 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tschellenbach commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading