Skip to content

Careless whisper#6877

Merged
DOsinga merged 21 commits intomainfrom
careless-whisper
Feb 3, 2026
Merged

Careless whisper#6877
DOsinga merged 21 commits intomainfrom
careless-whisper

Conversation

@DOsinga
Copy link
Collaborator

@DOsinga DOsinga commented Feb 1, 2026

Summary

Trans-atlantic plane wifi FTW.

So this is based on an earlier restructuring of how we record audio, but beside that it, it implements local whisper support with download & model recommendation.

it also updates the client so we detect pauses for pseudo streaming and auto submit if you end your sentence on, well, submit.

bonus: also adds groq as a provider (faster than openai)

@DOsinga DOsinga changed the base branch from main to clean-up-dictation February 1, 2026 10:35
@DOsinga DOsinga marked this pull request as ready for review February 1, 2026 20:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements local Whisper transcription for voice dictation, adding offline speech-to-text capabilities using Candle ML framework with GPU acceleration support (CUDA/Metal).

Changes:

  • Adds local Whisper model support with voice activity detection (VAD) for automatic speech segmentation
  • Implements model download management with progress tracking and cancellation
  • Introduces auto-submit feature for voice commands ending with "submit"

Reviewed changes

Copilot reviewed 18 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ui/desktop/src/utils/analytics.ts Adds 'auto_submit' action to voice dictation analytics tracking
ui/desktop/src/hooks/useAudioRecorder.ts Replaces MediaRecorder with AudioContext/AudioWorklet for real-time VAD and automatic transcription
ui/desktop/src/components/settings/dictation/LocalModelManager.tsx New component for managing local Whisper model downloads and selection
ui/desktop/src/components/settings/dictation/DictationSettings.tsx Integrates LocalModelManager and updates provider labels
ui/desktop/src/components/ChatInput.tsx Adds auto-submit detection, improved status display, and local provider error handling
ui/desktop/src/api/types.gen.ts Adds API types for local provider, model management, and download progress
ui/desktop/src/api/sdk.gen.ts Generates SDK methods for new dictation model endpoints
ui/desktop/src/api/index.ts Exports new API types and methods
ui/desktop/openapi.json Defines OpenAPI schema for model management endpoints
crates/goose/src/lib.rs Exposes new dictation module
crates/goose/src/dictation/whisper.rs Implements Whisper transcription with GGUF model loading and audio processing
crates/goose/src/dictation/providers.rs Unified provider interface with local transcription support
crates/goose/src/dictation/mod.rs Module definition for dictation subsystem
crates/goose/src/dictation/download_manager.rs Manages async model downloads with progress tracking
crates/goose/Cargo.toml Adds Candle ML dependencies with Metal/CUDA feature flags
crates/goose-server/src/routes/dictation.rs Implements REST endpoints for model management and adds Groq provider
crates/goose-server/src/openapi.rs Updates OpenAPI documentation with new schemas
crates/goose-server/Cargo.toml Adds dependencies for model management
Comments suppressed due to low confidence (1)

ui/desktop/src/hooks/useAudioRecorder.ts:1

  • The regex removes content within parentheses, but it's unclear why. Add a comment explaining this is likely removing Whisper's hallucinated or metadata text commonly output in parentheses, such as '(music playing)' or '(inaudible)'.
import { useState, useRef, useCallback, useEffect } from 'react';

Comment on lines +16 to +18
// Import the worklet module - Vite will handle this correctly
const WORKLET_URL = new URL('../audio-capture-worklet.js', import.meta.url).href;

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worklet file 'audio-capture-worklet.js' is referenced but not included in the PR. This will cause a runtime error when attempting to load the AudioWorklet module. The file needs to be created or the path needs to be updated to point to an existing worklet implementation.

Suggested change
// Import the worklet module - Vite will handle this correctly
const WORKLET_URL = new URL('../audio-capture-worklet.js', import.meta.url).href;
// Inline AudioWorklet module to avoid relying on an external file.
const WORKLET_CODE = `
class AudioCaptureProcessor extends AudioWorkletProcessor {
process(inputs) {
const input = inputs[0];
if (input && input[0]) {
// Send the first channel's samples to the main thread.
this.port.postMessage(input[0]);
}
return true;
}
}
registerProcessor('audio-capture-processor', AudioCaptureProcessor);
`;
const WORKLET_URL = URL.createObjectURL(
new Blob([WORKLET_CODE], { type: 'application/javascript' }),
);

Copilot uses AI. Check for mistakes.

Ok(resampled)
}

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function save_wav_16khz is defined but never used in the file. Consider removing it if it's not needed, or add a usage comment if it's intended for future use or debugging purposes.

Suggested change
// Helper function kept for debugging and manual inspection of 16kHz audio.
// Not used in normal operation; retained for future troubleshooting.

Copilot uses AI. Check for mistakes.
suppress_tokens: SUPPRESS_TOKENS.to_vec(),
max_target_positions: 448,
},
_ => panic!("Unknown model: {}", self.id),
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using panic! for an unknown model ID creates a poor user experience. Since this is a runtime configuration issue, it should return a Result with a descriptive error instead of panicking. Consider using anyhow::bail! to maintain the function's error handling pattern.

Copilot uses AI. Check for mistakes.
if (!filteredText) {
return;
}

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern for auto-submit detection is complex and should be documented. Add a comment explaining what patterns it matches (e.g., 'submit', 'submit.', 'submit!', etc.) and why trailing punctuation and whitespace are included.

Suggested change
// Detect spoken "submit" commands at the end of the transcribed text.
// - \bsubmit : match the word "submit" at a word boundary (case-insensitive because of /i)
// - [.,!?;'"\s]*$ : allow any trailing punctuation or whitespace up to the end of the string
// This lets users say phrases like "submit", "submit.", "submit!", or "submit " to auto-send
// the message, while excluding the "submit" keyword and its trailing punctuation/whitespace
// from the final message content when we later strip this pattern out.

Copilot uses AI. Check for mistakes.
let mut response = client.get(url).send().await?;

if !response.status().is_success() {
anyhow::bail!("Failed to download: HTTP {}", response.status());
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message only includes the HTTP status code without the response body. For better debugging, include the response text if available, similar to the error handling pattern in providers.rs lines 216-227.

Suggested change
anyhow::bail!("Failed to download: HTTP {}", response.status());
let status = response.status();
let body_text = match response.text().await {
Ok(text) => text,
Err(e) => format!("<failed to read response body: {e}>"),
};
anyhow::bail!("Failed to download: HTTP {status}, body: {body_text}");

Copilot uses AI. Check for mistakes.
};

const deleteModel = async (modelId: string) => {
if (!confirm('Delete this model? You can re-download it later.')) return;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the native confirm dialog is not ideal for a modern UI. Consider using a custom confirmation modal or dialog component that matches the application's design system and provides better UX with clear action buttons.

Copilot uses AI. Check for mistakes.
use std::sync::Arc;
use std::time::Duration;
use utoipa::ToSchema;

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum audio size was increased from 25MB to 50MB, but the OpenAPI documentation at line 130 still states 'max 25MB'. Update the documentation to reflect the new 50MB limit or add a comment explaining the discrepancy.

Suggested change
/// Maximum allowed audio size in bytes for dictation uploads (currently 50MB).
/// NOTE: This value was increased from 25MB to 50MB; ensure the OpenAPI documentation
/// reflects this 50MB limit and is kept in sync with this constant.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 1, 2026 21:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 24 changed files in this pull request and generated 14 comments.

const RMS_THRESHOLD = 0.015;

// Import the worklet module - Vite will handle this correctly
const WORKLET_URL = new URL('../audio-capture-worklet.js', import.meta.url).href;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audio worklet file path resolution using import.meta.url may fail in production builds if Vite doesn't properly handle the worklet module. Verify that the Vite configuration includes proper worklet handling, or consider inlining the worklet code as a blob URL for more reliable runtime behavior.

Copilot uses AI. Check for mistakes.
// Append transcribed text to the current input
const newValue = displayValue.trim() ? `${displayValue.trim()} ${text}` : text;

let filteredText = text.replace(/\([^)]*\)/g, '').trim();
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter regex /\([^)]*\)/g on line 267 removes all text within parentheses from transcriptions. This could unintentionally remove legitimate content if the transcription service includes parenthetical information. Document why this filtering is necessary or make it more specific to the expected format.

Suggested change
let filteredText = text.replace(/\([^)]*\)/g, '').trim();
// Some dictation providers wrap non-verbal markers (e.g. pauses or background noise)
// in parentheses; we strip only these known markers to avoid removing legitimate
// parenthetical content from the user's message.
const NON_VERBAL_MARKER_REGEX = /\((?:pause|background noise|inaudible|unintelligible)\)/gi;
let filteredText = text.replace(NON_VERBAL_MARKER_REGEX, '').trim();

Copilot uses AI. Check for mistakes.
const resp = await getDictationConfig();
setIsEnabled(!!resp.data?.[pref]?.configured);
setProvider(pref);
} catch {
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty catch block silently swallows errors when loading models or selected model configuration. This makes it impossible to debug configuration issues. Consider logging the error before setting defaults.

Suggested change
} catch {
} catch (error) {
onErrorRef.current(errorMessage(error));

Copilot uses AI. Check for mistakes.
onBlur={() => setIsFocused(false)}
ref={textAreaRef}
rows={1}
readOnly={isRecording}
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The textarea is set to readOnly when recording (line 1252), but the onChange handler is still active. This could lead to confusion if paste events or programmatic changes bypass the readonly attribute. Consider also disabling the onChange handler or checking isRecording in the handler.

Copilot uses AI. Check for mistakes.
const MAX_RECORDING_DURATION_SECONDS = 10 * 60;
const SAMPLE_RATE = 16000;
const SILENCE_MS = 1200;
const MIN_SPEECH_MS = 300;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RMS_THRESHOLD constant (0.015) is a magic number without explanation. This threshold determines speech detection sensitivity, which is critical for VAD functionality. Add a comment explaining how this value was determined and what audio amplitude range it corresponds to.

Suggested change
const MIN_SPEECH_MS = 300;
const MIN_SPEECH_MS = 300;
// RMS is computed over Web Audio Float32 samples in the [-1, 1] range; 0.015 (~1.5% of full-scale)
// was chosen empirically as the quietest level that reliably distinguishes normal speech from
// typical background noise for 16 kHz mono mic input without clipping early speech onsets.

Copilot uses AI. Check for mistakes.
suppress_tokens: SUPPRESS_TOKENS.to_vec(),
max_target_positions: 448,
},
_ => panic!("Unknown model: {}", self.id),
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic at line 143 violates Rust error handling best practices. Instead of panicking in library code, return an error using anyhow::bail! to maintain graceful error handling.

Suggested change
_ => panic!("Unknown model: {}", self.id),
// Fallback to the "tiny" configuration for unknown model IDs to avoid panicking.
_ => Config {
num_mel_bins: 80,
max_source_positions: 1500,
d_model: 384,
encoder_attention_heads: 6,
encoder_layers: 4,
decoder_attention_heads: 6,
decoder_layers: 4,
vocab_size: 51865,
suppress_tokens: SUPPRESS_TOKENS.to_vec(),
max_target_positions: 448,
},

Copilot uses AI. Check for mistakes.
Comment on lines 601 to 605
.map_err(|e| {
e
})
.context("Failed to create audio decoder - please ensure browser sends WAV format audio")?;

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'e' is intentionally silenced on line 602, but there's no logging or error handling. If the decoder creation fails, users won't know why, making debugging difficult. Consider adding a tracing::error! or including error context in the returned error.

Suggested change
.map_err(|e| {
e
})
.context("Failed to create audio decoder - please ensure browser sends WAV format audio")?;
.context("Failed to create audio decoder - please ensure browser sends WAV format audio")?;

Copilot uses AI. Check for mistakes.
};

const deleteModel = async (modelId: string) => {
if (!confirm('Delete this model? You can re-download it later.')) return;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using window.confirm for a delete confirmation in the UI creates a blocking, non-customizable browser dialog. Consider using a proper modal component that matches the application's design system and provides better UX.

Copilot uses AI. Check for mistakes.
if (hasDownloadedNonRecommended && !showAllModels) {
setShowAllModels(true);
}
}, [models]);
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The useEffect at lines 41-51 has an incomplete dependency array. It reads 'models' and 'showAllModels' but only includes 'models' in the dependencies. This could lead to stale closures. Add 'showAllModels' to the dependency array or restructure the logic.

Suggested change
}, [models]);
}, [models, showAllModels]);

Copilot uses AI. Check for mistakes.
return;
}

const shouldAutoSubmit = /\bsubmit[.,!?;'"\s]*$/i.test(filteredText);
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern /\bsubmit[.,!?;'"\s]*$/i on line 275 only matches "submit" at the end of the text. If a user says "please submit this", it won't trigger auto-submit because "this" comes after "submit". Consider if the pattern should match "submit" followed only by punctuation/whitespace.

Copilot uses AI. Check for mistakes.
Douwe Osinga added 2 commits February 1, 2026 23:34
Copilot AI review requested due to automatic review settings February 1, 2026 22:53
repository.workspace = true
description.workspace = true

[features]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this mean it only works on macos?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means we compile this with this for all platforms, but it will only trigger on macos. we should test of course!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 25 changed files in this pull request and generated 3 comments.

Comment on lines 92 to 116
const pollDownloadProgress = (modelId: string) => {
const interval = setInterval(async () => {
try {
const response = await getDownloadProgress({ path: { model_id: modelId } });
if (response.data) {
const progress = response.data;
setDownloads((prev) => new Map(prev).set(modelId, progress));

if (progress.status === 'completed') {
clearInterval(interval);
await loadModels(); // Refresh model list
// Backend auto-selects, but also update frontend state
await loadSelectedModel();
} else if (progress.status === 'failed') {
clearInterval(interval);
await loadModels();
}
} else {
clearInterval(interval);
}
} catch (error) {
clearInterval(interval);
}
}, 500);
};
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling interval is never cleaned up properly. If the component unmounts while polling is active, the interval will continue running indefinitely, causing a memory leak. Store the interval ID in a ref and clean it up in a useEffect cleanup function.

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +74
pub fn cancel_download(&self, model_id: &str) -> Result<()> {
let mut downloads = self
.downloads
.lock()
.map_err(|_| anyhow::anyhow!("Failed to acquire lock"))?;

if let Some(progress) = downloads.get_mut(model_id) {
progress.status = DownloadStatus::Cancelled;
Ok(())
} else {
anyhow::bail!("Download not found")
}
}
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cancel_download method only sets a flag in the DownloadProgress struct, but the download task doesn't check for cancellation signals properly. The task checks the status inside the downloads HashMap on each chunk, but there's no mechanism to interrupt the ongoing HTTP request. Consider using a cancellation token (e.g., tokio_util::sync::CancellationToken) to properly abort the download.

Copilot uses AI. Check for mistakes.
Comment on lines 212 to 214
} else {
usage_ratio > threshold
};
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed debug logging provided valuable information about token usage and compaction thresholds. While reducing logging is generally good, this specific logging was useful for debugging context management issues. Consider whether this information is truly unnecessary or if it should be kept at a higher log level.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems ok - was able to download, it worked with my accent (but not sure if i was doing it wrong as it didn't submit for me - but that was ok).

Probably some valid nits from copilot but nice (assuming I tried the right thing). I wonder if we could make it more obvious we reocmmend the local one. The word "Offline" at first made me think it was busted!

(could also have the microphone icon back on main chat which takes you to the config as the config seems so complex now, or ... maybe rich I should say!)

Copy link
Collaborator

@jh-block jh-block left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think we can do a little better with the platform-specific candle features. If we move the candle-core and candle-nn deps to be target-specific dependencies, we can automatically set the features on the appropriate targets without needing our own cuda/metal features:

untested:

[dependencies]
# ...
candle-core = { version = "0.8" }
candle-nn = { version = "0.8" }

[target.'cfg(target_os = "macos")'.dependencies]
candle-core = { version = "0.8", features = ["metal"] }
candle-nn = { version = "0.8", features = ["metal"] }

[target.'cfg(target_os = "linux")'.dependencies]
candle-core = { version = "0.8", features = ["cuda"] }
candle-nn = { version = "0.8", features = ["cuda"] }

(and remove the features on our crate as no longer needed)

# Conflicts:
#	crates/goose-server/src/openapi.rs
#	crates/goose-server/src/routes/dictation.rs
#	ui/desktop/openapi.json
#	ui/desktop/src/api/index.ts
#	ui/desktop/src/api/sdk.gen.ts
#	ui/desktop/src/api/types.gen.ts
#	ui/desktop/src/components/ChatInput.tsx
#	ui/desktop/src/components/settings/dictation/DictationSettings.tsx
#	ui/desktop/src/hooks/useAudioRecorder.ts
Copilot AI review requested due to automatic review settings February 2, 2026 17:09
@DOsinga
Copy link
Collaborator Author

DOsinga commented Feb 2, 2026

actually @DOsinga that feels true for metal, but cuda you could have on any platform, no?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 25 changed files in this pull request and generated 9 comments.

Comment on lines +273 to +276
const shouldAutoSubmit = /\bsubmit[.,!?;'"\s]*$/i.test(filteredText);

const cleanedText = shouldAutoSubmit
? filteredText.replace(/\bsubmit[.,!?;'"\s]*$/i, '').trim()
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern allows "submit" to appear anywhere within a word (e.g., "resubmit", "submitted"). The word boundary \b only applies at the start. Consider using \bsubmit\b to match only the complete word "submit".

Suggested change
const shouldAutoSubmit = /\bsubmit[.,!?;'"\s]*$/i.test(filteredText);
const cleanedText = shouldAutoSubmit
? filteredText.replace(/\bsubmit[.,!?;'"\s]*$/i, '').trim()
const shouldAutoSubmit = /\bsubmit\b[.,!?;'"\s]*$/i.test(filteredText);
const cleanedText = shouldAutoSubmit
? filteredText.replace(/\bsubmit\b[.,!?;'"\s]*$/i, '').trim()

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +141
tokio::spawn(async move {
match Self::download_file(&url, &destination, &downloads, &model_id_clone).await {
Ok(_) => {
if let Ok(mut downloads) = downloads.lock() {
if let Some(progress) = downloads.get_mut(&model_id_clone) {
progress.status = DownloadStatus::Completed;
progress.progress_percent = 100.0;
}
}

let _ = crate::config::Config::global()
.set_param(LOCAL_WHISPER_MODEL_CONFIG_KEY, model_id_clone.clone());
}
Err(e) => {
if let Ok(mut downloads) = downloads.lock() {
if let Some(progress) = downloads.get_mut(&model_id_clone) {
progress.status = DownloadStatus::Failed;
progress.error = Some(e.to_string());
}
}
}
}
});
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download is spawned in a background task (tokio::spawn) but errors during download only update the internal state. Users have no notification if a download fails - they need to poll the progress endpoint to discover this. Consider adding a notification mechanism or event system to alert users of failures.

Copilot uses AI. Check for mistakes.
Comment on lines +187 to +191
if should_cancel {
// Clean up partial download
let _ = tokio::fs::remove_file(destination).await;
return Ok(());
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partial download file is only cleaned up if the cancellation check succeeds (line 189). If the task panics or is aborted before reaching this check, or if there's an error after the check but before completion, the partial file may be left on disk. Consider using RAII patterns or ensuring cleanup happens in all error paths.

Copilot uses AI. Check for mistakes.
@@ -212,17 +212,6 @@ pub async fn check_if_compaction_needed(
} else {
usage_ratio > threshold
};
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the logging has been removed to reduce noise ("the codebase needs less logging, not more"), but this was debug-level logging that could be useful for troubleshooting token usage and context compaction issues. Consider keeping this at trace level instead of removing it entirely.

Suggested change
};
};
tracing::trace!(
current_tokens = current_tokens,
token_source = token_source,
context_limit = context_limit,
usage_ratio = usage_ratio,
threshold = threshold,
needs_compaction = needs_compaction,
"context compaction decision"
);

Copilot uses AI. Check for mistakes.
Comment on lines +129 to +130
let _ = crate::config::Config::global()
.set_param(LOCAL_WHISPER_MODEL_CONFIG_KEY, model_id_clone.clone());
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .set_param() could silently fail here. Consider adding error handling or logging if the auto-selection fails after a successful download.

Suggested change
let _ = crate::config::Config::global()
.set_param(LOCAL_WHISPER_MODEL_CONFIG_KEY, model_id_clone.clone());
if let Err(e) = crate::config::Config::global()
.set_param(LOCAL_WHISPER_MODEL_CONFIG_KEY, model_id_clone.clone())
{
eprintln!(
"Failed to set local whisper model config after download for model {}: {}",
model_id_clone, e
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +266 to +273
if let Some(tokenizer_json) = bundled_tokenizer {
if let Some(parent) = tokenizer_path.parent() {
std::fs::create_dir_all(parent)?;
}
std::fs::write(&tokenizer_path, tokenizer_json)?;
return Tokenizer::from_file(tokenizer_path)
.map_err(|e| anyhow::anyhow!("Failed to load tokenizer: {}", e));
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer is saved to disk (line 270) but there's no error handling if the write fails. The subsequent read from file could then fail unexpectedly. Consider handling write errors explicitly.

Copilot uses AI. Check for mistakes.
onBlur={() => setIsFocused(false)}
ref={textAreaRef}
rows={1}
readOnly={isRecording}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The textarea is made read-only during recording (line 1244), but the value can still be programmatically updated via setValue() and setDisplayValue() from transcription callbacks. If a transcription arrives while still recording, the user won't be able to edit the inserted text until they stop recording. Consider whether read-only should only prevent manual typing but not programmatic updates, or if the UI should allow editing during recording.

Suggested change
readOnly={isRecording}

Copilot uses AI. Check for mistakes.
Comment on lines +267 to +268
let filteredText = text.replace(/\([^)]*\)/g, '').trim();

Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parentheses removal regex replace(/\([^)]*\)/g, '') will remove any content in parentheses, which could inadvertently remove legitimate content if the transcription service includes parenthesized text. Consider being more specific about what patterns to filter, or document why all parenthesized content should be removed.

Suggested change
let filteredText = text.replace(/\([^)]*\)/g, '').trim();
let filteredText = text.trim();
// Remove leading/trailing parenthesized annotations (e.g. "(laughter)")
// that some dictation providers include, but keep inline parentheses.
filteredText = filteredText.replace(/^\([^)]*\)\s*|\s*\([^)]*\)$/g, '').trim();

Copilot uses AI. Check for mistakes.
}

// Transcribe the audio
let (_, transcriber) = transcriber_lock.as_mut().unwrap();
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex is unwrapped with .unwrap() which will panic if poisoned. In a multi-threaded context, if a thread panics while holding the lock, this will cause cascading panics in other threads. Consider using .lock().ok()? and propagating the error gracefully instead.

Suggested change
let (_, transcriber) = transcriber_lock.as_mut().unwrap();
let (_, transcriber) = transcriber_lock
.as_mut()
.ok_or_else(|| anyhow::anyhow!("Transcriber not initialized"))?;

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 2, 2026 19:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 25 changed files in this pull request and generated 12 comments.

Comment on lines +288 to +290
setTimeout(() => {
performSubmit(newValue);
}, 100);
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup timeout is hardcoded to 100ms. If the user clicks the microphone button again within 100ms while transcription is processing, this could cause a race condition where performSubmit is called with stale data, or the UI state becomes inconsistent.

Suggested change
setTimeout(() => {
performSubmit(newValue);
}, 100);
performSubmit(newValue);

Copilot uses AI. Check for mistakes.
Comment on lines 40 to 51
// Determine if we should show all models by default (if non-recommended models are downloaded)
useEffect(() => {
if (models.length === 0) return;

const hasDownloadedNonRecommended = models.some(
(model) => model.downloaded && !model.recommended
);

if (hasDownloadedNonRecommended && !showAllModels) {
setShowAllModels(true);
}
}, [models]);
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The showAllModels state defaults to false but is set to true if non-recommended models are downloaded. However, this effect runs every time models array changes. If a user manually collapses the list after it auto-expands, it will auto-expand again on the next models refresh. Consider using a separate "hasAutoExpanded" flag or checking if showAllModels was user-set to avoid overriding user preference.

Copilot uses AI. Check for mistakes.
Comment on lines 92 to 116
const pollDownloadProgress = (modelId: string) => {
const interval = setInterval(async () => {
try {
const response = await getDownloadProgress({ path: { model_id: modelId } });
if (response.data) {
const progress = response.data;
setDownloads((prev) => new Map(prev).set(modelId, progress));

if (progress.status === 'completed') {
clearInterval(interval);
await loadModels(); // Refresh model list
// Backend auto-selects, but also update frontend state
await loadSelectedModel();
} else if (progress.status === 'failed') {
clearInterval(interval);
await loadModels();
}
} else {
clearInterval(interval);
}
} catch (error) {
clearInterval(interval);
}
}, 500);
};
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pollDownloadProgress function sets up an interval but never clears it in case of exception during error handling. If getDownloadProgress throws an exception at line 95, the catch block at line 113 clears the interval, but if the exception occurs elsewhere (e.g., loadModels or loadSelectedModel), the interval keeps running. Consider wrapping the entire try block content or using finally.

Copilot uses AI. Check for mistakes.
const mimeType = supportedTypes.find((type) => MediaRecorder.isTypeSupported(type)) || '';
const ctx = new AudioContext({ sampleRate: SAMPLE_RATE });
audioContextRef.current = ctx;

Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RMS_THRESHOLD constant is defined with a specific comment explaining it's empirically determined for 16kHz mono input, but there's no validation that the audio context is actually using 16kHz. If the browser or system provides audio at a different sample rate, the threshold may be inaccurate, leading to incorrect speech detection. Consider validating that ctx.sampleRate matches SAMPLE_RATE or adjusting the threshold dynamically.

Suggested change
// Validate that the actual sample rate matches our assumptions.
if (ctx.sampleRate !== SAMPLE_RATE) {
await ctx.close();
audioContextRef.current = null;
streamRef.current?.getTracks().forEach((t) => t.stop());
streamRef.current = null;
onError(
`Browser audio sample rate ${ctx.sampleRate} Hz is not supported; expected ${SAMPLE_RATE} Hz for dictation.`,
);
return;
}

Copilot uses AI. Check for mistakes.
Comment on lines 121 to 129
# Platform-specific GPU acceleration for Whisper
[target.'cfg(target_os = "macos")'.dependencies]
candle-core = { version = "0.8.4", features = ["metal"] }
candle-nn = { version = "0.8.4", features = ["metal"] }

[target.'cfg(not(target_os = "macos"))'.dependencies]
candle-core = { version = "0.8.4", features = ["cuda"] }
candle-nn = { version = "0.8.4", features = ["cuda"] }

Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The candle-core and candle-nn dependencies are duplicated - they're specified both in the main dependencies section (lines 96-98) without features, and again in platform-specific sections with metal/cuda features (lines 123-128). This creates a conflict where the platform-specific features won't be applied because the main dependencies take precedence. Remove the entries from the main dependencies section and keep only the platform-specific ones.

Suggested change
# Platform-specific GPU acceleration for Whisper
[target.'cfg(target_os = "macos")'.dependencies]
candle-core = { version = "0.8.4", features = ["metal"] }
candle-nn = { version = "0.8.4", features = ["metal"] }
[target.'cfg(not(target_os = "macos"))'.dependencies]
candle-core = { version = "0.8.4", features = ["cuda"] }
candle-nn = { version = "0.8.4", features = ["cuda"] }

Copilot uses AI. Check for mistakes.
Ok(text)
})
.await
.context("Transcription task failed")?
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LOCAL_TRANSCRIBER is a global mutable static with a Mutex. If transcription fails (e.g., due to corrupted model file or OOM), the transcriber remains in the mutex and subsequent requests will keep failing until the server restarts. Consider adding error recovery logic that clears the cached transcriber on failure so the next request can attempt to reload it.

Suggested change
.context("Transcription task failed")?
.context("Transcription task failed")
.and_then(|result| {
if result.is_err() {
if let Ok(mut guard) = LOCAL_TRANSCRIBER.lock() {
*guard = None;
}
}
result
})

Copilot uses AI. Check for mistakes.
Comment on lines +275 to +277
const cleanedText = shouldAutoSubmit
? filteredText.replace(/\bsubmit[.,!?;'"\s]*$/i, '').trim()
: filteredText;
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex removes the word "submit" and surrounding punctuation, but it doesn't handle multiple consecutive spaces that could result from the removal. For example, "please do this submit" would become "please do this " with double spaces. Consider trimming or normalizing whitespace after removal.

Suggested change
const cleanedText = shouldAutoSubmit
? filteredText.replace(/\bsubmit[.,!?;'"\s]*$/i, '').trim()
: filteredText;
const cleanedTextRaw = shouldAutoSubmit
? filteredText.replace(/\bsubmit[.,!?;'"\s]*$/i, '').trim()
: filteredText;
const cleanedText = cleanedTextRaw.replace(/\s+/g, ' ').trim();

Copilot uses AI. Check for mistakes.
// Initialize logging
tracing_subscriber::fmt::init();

let audio_path = "/tmp/whisper_audio_16k.wav";
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example hardcodes the audio file path to "/tmp/whisper_audio_16k.wav" which won't exist for most users trying to run this example. Consider either: 1) Using std::env::args() to accept a file path argument, 2) Generating a simple test audio file, or 3) Adding a comment explaining where to get a test file. This makes the example more useful for testing.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
pub mod download_manager;
pub mod providers;
pub mod whisper;
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire dictation module lacks test coverage. Given that the codebase has comprehensive testing (with tests in most other modules), this new dictation functionality should have tests for: 1) Model configuration and loading, 2) Download manager functionality, 3) Provider transcription flows, 4) Error handling paths. This is especially important for the complex audio processing and model management code.

Copilot uses AI. Check for mistakes.
@taniacryptid
Copy link
Contributor

Random commentary: Living for this reference, actually stoked for local whisper support!!

Copilot AI review requested due to automatic review settings February 2, 2026 19:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 3, 2026 00:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 25 changed files in this pull request and generated 15 comments.

Comment on lines +267 to +276
let filteredText = text.replace(/\([^)]*\)/g, '').trim();

if (!filteredText) {
return;
}

const shouldAutoSubmit = /\bsubmit[.,!?;'"\s]*$/i.test(filteredText);

const cleanedText = shouldAutoSubmit
? filteredText.replace(/\bsubmit[.,!?;'"\s]*$/i, '').trim()
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex patterns /\([^)]*\)/g and /\bsubmit[.,!?;'"\s]*$/i could be compiled once outside the callback to avoid recompiling on every transcription. Move these to module-level constants for better performance.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +188
const stopRecording = useCallback(() => {
if (isSpeakingRef.current && samplesRef.current.length > 0) {
flushRef.current();
}
isSpeakingRef.current = false;
silenceStartRef.current = 0;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The silenceStartRef is set to 0 on line 188 but this happens after potentially flushing on line 185. If flush() triggers a transcription that takes time, there's a race condition where isSpeakingRef could be set back to true before the flush completes. Consider the order of operations here.

Copilot uses AI. Check for mistakes.
PROVIDERS
.iter()
.find(|def| def.provider == provider)
.unwrap() // Safe because all enum variants are in PROVIDERS
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .unwrap() on line 88 is unsafe. If all enum variants aren't covered in PROVIDERS (though they should be), this will panic. Consider using .expect() with a descriptive message, or better yet, return a Result to handle this gracefully.

Suggested change
.unwrap() // Safe because all enum variants are in PROVIDERS
.expect("DictationProvider missing from PROVIDERS; ensure all enum variants are listed")

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +127
let mut transcriber_lock = LOCAL_TRANSCRIBER
.lock()
.map_err(|e| anyhow::anyhow!("Failed to lock transcriber: {}", e))?;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transcriber mutex could be poisoned if a panic occurs during transcription (line 149). The current error handling with .map_err() converts the poison error to anyhow but doesn't attempt recovery. Consider explicitly handling PoisonError to potentially clear the poisoned transcriber and retry initialization.

Suggested change
let mut transcriber_lock = LOCAL_TRANSCRIBER
.lock()
.map_err(|e| anyhow::anyhow!("Failed to lock transcriber: {}", e))?;
let mut transcriber_lock = match LOCAL_TRANSCRIBER.lock() {
Ok(guard) => guard,
Err(poisoned) => {
// Recover from a poisoned mutex by clearing the cached transcriber
tracing::warn!("Transcriber mutex was poisoned; resetting cached transcriber");
let mut guard = poisoned.into_inner();
*guard = None;
guard
}
};

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +86
pub fn local_path(&self) -> PathBuf {
let filename = self.url.rsplit('/').next().unwrap_or("");
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap on line 86 will panic if the URL doesn't contain a '/'. While Whisper model URLs are hardcoded and should always contain '/', this creates a potential panic point. Consider using .unwrap_or("") or .unwrap_or_else(|| self.id) as a safer fallback.

Copilot uses AI. Check for mistakes.
Comment on lines +267 to +273
if let Some(tokenizer_json) = bundled_tokenizer {
if let Some(parent) = tokenizer_path.parent() {
std::fs::create_dir_all(parent)?;
}
std::fs::write(&tokenizer_path, tokenizer_json)?;
return Tokenizer::from_file(tokenizer_path)
.map_err(|e| anyhow::anyhow!("Failed to load tokenizer: {}", e));
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer file is written to disk every time it doesn't exist (lines 267-271), but if multiple requests come in simultaneously, this could lead to a race condition where multiple threads try to write the same file. Consider using std::fs::OpenOptions with create_new(true) to atomically check-and-create, or add a lock around this operation.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +82
const MODELS: &[WhisperModel] = &[
WhisperModel {
id: "tiny",
size_mb: 40,
url: "https://huggingface.co/oxide-lab/whisper-tiny-GGUF/resolve/main/model-tiny-q80.gguf",
description: "Fastest, ~2-3x realtime on CPU (5-10x with GPU)",
},
WhisperModel {
id: "base",
size_mb: 78,
url: "https://huggingface.co/oxide-lab/whisper-base-GGUF/resolve/main/whisper-base-q8_0.gguf",
description: "Good balance, ~1.5-2x realtime on CPU (4-8x with GPU)",
},
WhisperModel {
id: "small",
size_mb: 247,
url: "https://huggingface.co/oxide-lab/whisper-small-GGUF/resolve/main/whisper-small-q8_0.gguf",
description: "High accuracy, ~0.8-1x realtime on CPU (3-5x with GPU)",
},
WhisperModel {
id: "medium",
size_mb: 777,
url: "https://huggingface.co/oxide-lab/whisper-medium-GGUF/resolve/main/whisper-medium-q8_0.gguf",
description: "Highest accuracy, ~0.5x realtime on CPU (2-4x with GPU)",
},
];
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded HuggingFace URLs use oxide-lab as the organization. These models should be verified to exist and be maintained. If these are temporary or test models, consider adding a comment indicating the source and whether they're production-ready.

Copilot uses AI. Check for mistakes.
};

const deleteModel = async (modelId: string) => {
if (!window.confirm('Delete this model? You can re-download it later.')) return;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LocalModelManager component uses window.confirm() for delete confirmation on line 134. This is not ideal for a React application and doesn't match the UI patterns used elsewhere. Consider using a proper confirmation modal component for consistency with the rest of the application.

Copilot uses AI. Check for mistakes.

use crate::config::paths::Paths;

pub const LOCAL_WHISPER_MODEL_CONFIG_KEY: &str = "LOCAL_WHISPER_MODEL";
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant LOCAL_WHISPER_MODEL_CONFIG_KEY is defined in both whisper.rs (line 11) and used in download_manager.rs (line 1) via import. However, in providers.rs it's also used (line 2). This creates tight coupling. Consider whether this should be moved to a shared config module to avoid circular dependencies and improve maintainability.

Copilot uses AI. Check for mistakes.
let context_limit = provider.get_model_config().context_limit();

let (current_tokens, token_source) = match session.total_tokens {
let (current_tokens, _token_source) = match session.total_tokens {
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug logging was removed that showed token usage and compaction decisions. While the guideline says "the codebase needs less logging, not more", this particular logging was useful for debugging context window management issues. The variables _token_source (line 191) is now unused. Consider removing the variable assignment entirely if the logging is intentionally removed.

Copilot uses AI. Check for mistakes.
@DOsinga DOsinga added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit 1373d9c Feb 3, 2026
24 checks passed
@DOsinga DOsinga deleted the careless-whisper branch February 3, 2026 12:01
stebbins pushed a commit to stebbins/goose that referenced this pull request Feb 4, 2026
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Harrison <hcstebbins@gmail.com>
kuccello pushed a commit to kuccello/goose that referenced this pull request Feb 7, 2026
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants