Skip to content

audio: real-time streaming STT support #527

@bug-ops

Description

@bug-ops

Parent: #520
Depends on: #522, #523

Context

For real-time voice interaction, support streaming audio input with incremental transcription. This is a stretch goal for v1.

Options

  1. OpenAI Realtime API — WebSocket-based, supports audio streaming with function calling
  2. Local streaming — Whisper with chunked audio (VAD + sliding window)
  3. Deepgram/AssemblyAI — third-party streaming STT APIs

Design

`SpeechToText` streaming extension

pub trait StreamingStt: SpeechToText {
    fn transcribe_stream(
        &self,
        audio_stream: impl Stream<Item = Vec<u8>> + Send,
    ) -> impl Stream<Item = Result<PartialTranscript, SttError>> + Send;
}

pub struct PartialTranscript {
    pub text: String,
    pub is_final: bool,
}

Integration points

  • TUI: microphone input via `cpal` crate + VAD (voice activity detection)
  • Channels: platform-specific streaming (if supported)

Acceptance criteria

  • `StreamingStt` trait defined
  • At least one streaming backend implemented
  • TUI microphone input works (feature-gated)
  • Partial transcripts displayed in real-time

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions