-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Context
For offline/air-gapped environments and zero API cost, provide local speech-to-text using Whisper via candle. Feature-gated behind `candle` (or a new `whisper-local` flag).
Design
Implementation
Leverage candle-transformers Whisper example as reference:
- Download model weights from HuggingFace (whisper-tiny, whisper-base, whisper-small)
- Audio decoding: use `symphonia` crate for format handling (ogg, mp3, wav, flac)
- Mel spectrogram computation + whisper inference via candle
Model options
| Model | Size | VRAM | Quality |
|---|---|---|---|
| whisper-tiny | 39 MB | ~1 GB | Good for short commands |
| whisper-base | 74 MB | ~1 GB | Better accuracy |
| whisper-small | 244 MB | ~2 GB | Best quality, still fast |
Config
[audio]
backend = "whisper-local"
[audio.whisper_local]
model = "whisper-base" # tiny, base, small
device = "auto" # cpu, metal, cudaFeature gate
[features]
whisper-local = ["dep:symphonia", "dep:candle-core", "dep:candle-nn", "dep:candle-transformers", "dep:hf-hub"]Acceptance criteria
- Implements `SpeechToText` trait
- Auto-downloads model on first use via hf-hub
- Supports Metal (macOS) and CUDA (Linux) acceleration
- CPU fallback works
- Feature-gated, does not affect default build
- Transcription of 10s audio completes in <2s on Metal
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request