-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Context
Need a pluggable transcription abstraction so different STT backends (OpenAI Whisper API, local Whisper, future providers) can be used interchangeably.
Design
New module: crates/zeph-llm/src/stt.rs (or separate crate zeph-stt)
pub trait SpeechToText: Send + Sync {
fn transcribe(
&self,
audio: &[u8],
mime_type: &str,
) -> impl Future<Output = Result<Transcript, SttError>> + Send;
}
pub struct Transcript {
pub text: String,
pub language: Option<String>,
pub duration_secs: Option<f32>,
}
pub enum SttError {
UnsupportedFormat(String),
FileTooLarge { size: usize, max: usize },
TranscriptionFailed(String),
NetworkError(String),
}OpenAI Whisper backend
pub struct WhisperApi {
client: reqwest::Client,
api_key: String,
model: String, // "whisper-1"
}- Uses
POST /v1/audio/transcriptionswith multipart form - 25 MB file size limit
- Supports: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
- Language detection or explicit language parameter
Config
[audio]
enabled = true
backend = "whisper-api" # or "whisper-local", "none"
language = "auto" # or "en", "ru", etc.
[audio.whisper_api]
model = "whisper-1"Acceptance criteria
-
SpeechToTexttrait defined -
WhisperApiimplementation with multipart upload - Supported format validation
- File size check before upload
- Config section for audio settings
- Unit tests with mock HTTP responses
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request