feat(stt): improve language handling per adapter#2058
Conversation
- Argmax: use only first language (single language mode) - Deepgram: model-aware multi vs detect-restricted (nova-2/3 support multi) Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughNormalize ListenParams.languages per-model in argmax; move language-query construction into the Deepgram adapter with model-aware multi-language logic; remove a large frontend language-support map and delete former helper query builders from the client library. No public API changes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
14-22: Add documentation explaining the single-language constraint.The
adapt_paramsmethod silently discards all but the first language without explanation or warning. Consider:
- Adding a doc comment explaining why ArgmaxAdapter supports only one language
- Optionally logging a warning when multiple languages are provided but truncated
Apply this diff to add documentation:
+ /// Adapts ListenParams to conform to ArgmaxAdapter's single-language constraint. + /// If multiple languages are provided, only the first is retained. fn adapt_params(params: &ListenParams) -> ListenParams { let mut adapted = params.clone(); if let Some(first) = adapted.languages.first().cloned() { adapted.languages = vec![first]; } adapted }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
owhisper/owhisper-client/src/adapter/argmax.rs(1 hunks)owhisper/owhisper-client/src/lib.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/lib.rs (2)
crates/transcribe-whisper-local/src/service/streaming.rs (1)
params(97-101)crates/whisper-local/src/model/mod.rs (1)
language(26-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Devin
- GitHub Check: fmt
🔇 Additional comments (2)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
29-31: LGTM!The delegation pattern correctly applies parameter adaptation before passing to the inner adapter.
owhisper/owhisper-client/src/lib.rs (1)
144-157: Verify language query behavior across adapters.The multi-language logic correctly branches on
supports_multi:
- Models with
nova-2/nova-3prefix: useslanguage=multiwith all language codes- Other models: uses
detect_language=truewith all language codesFor
ArgmaxAdapter, theadapt_paramsnormalization filters parameters to the first language before delegating to DeepgramAdapter. DeepgramAdapter then callsappend_language_querywith the normalized single-language parameters, ensuringparams.languages.len() == 1. Consequently, the multi-language branch (lines 144-157) never executes for ArgmaxAdapter—only the single-language path (lines 137-142) is reached.
…GUAGE_SUPPORT - Deepgram: nova-2 multi only for en+es, nova-3 multi for 10 langs, else detect_language with hints - Argmax: parakeet v2 always en, parakeet v3 uses first supported lang or falls back to en - Move language constants to deepgram.rs where they belong - Remove unused LANGUAGE_SUPPORT map from frontend shared.tsx Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
72-82: Inconsistent parameter handling between streaming and batch transcription.
build_ws_urlappliesadapt_paramsto normalize languages, buttranscribe_filepasses params directly to the inner adapter without adaptation. This means:
- Streaming: Languages normalized to single language for Parakeet models
- Batch: Original multi-language params passed through unchanged
If this is intentional (e.g., batch uses different backend), please document it. Otherwise, apply adaptation for consistency:
fn transcribe_file<'a, P: AsRef<Path> + Send + 'a>( &'a self, client: &'a reqwest::Client, api_base: &'a str, api_key: &'a str, params: &'a ListenParams, file_path: P, ) -> BatchFuture<'a> { + let adapted = Self::adapt_params(params); self.inner - .transcribe_file(client, api_base, api_key, params, file_path) + .transcribe_file(client, api_base, api_key, &adapted, file_path) }Note: This would require changing the lifetime constraints since
adaptedis locally owned.owhisper/owhisper-client/src/adapter/deepgram.rs (1)
315-326: Critical: Hardcoded API key in source code.The API key on line 317 should not be committed to version control. Even for test purposes, this poses security and cost risks if the key is valid.
Replace with environment variable:
let client = ListenClient::builder() .api_base("https://api.deepgram.com/v1") - .api_key("71557216ffdd13bff22702be5017e4852c052b7c") + .api_key(std::env::var("DEEPGRAM_API_KEY").expect("DEEPGRAM_API_KEY not set")) .params(owhisper_interface::ListenParams {Additionally, consider adding this test to an
#[ignore]attribute since it requires external API access:#[tokio::test] #[ignore] // Requires DEEPGRAM_API_KEY environment variable async fn test_client() {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/desktop/src/components/settings/ai/stt/shared.tsx(0 hunks)owhisper/owhisper-client/src/adapter/argmax.rs(1 hunks)owhisper/owhisper-client/src/adapter/deepgram.rs(1 hunks)owhisper/owhisper-client/src/lib.rs(0 hunks)
💤 Files with no reviewable changes (2)
- apps/desktop/src/components/settings/ai/stt/shared.tsx
- owhisper/owhisper-client/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/argmax.rs (4)
owhisper/owhisper-client/src/lib.rs (2)
params(46-49)api_base(36-39)owhisper/owhisper-client/src/adapter/deepgram.rs (2)
supports_native_multichannel(166-168)build_ws_url(170-212)owhisper/owhisper-client/src/adapter/mod.rs (2)
supports_native_multichannel(24-24)build_ws_url(26-26)owhisper/owhisper-client/src/adapter/soniox.rs (2)
supports_native_multichannel(327-329)build_ws_url(331-348)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: fmt
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Devin
🔇 Additional comments (5)
owhisper/owhisper-client/src/adapter/argmax.rs (3)
9-12: LGTM!The language constant is well-defined with standard ISO 639-1 codes for Parakeet v3 supported languages.
19-44: LGTM!The
adapt_paramsimplementation correctly normalizes languages to single-language mode:
- Parakeet v2: English only
- Parakeet v3: First matching supported language
- Other models: First provided language
The fallback to English ensures graceful handling when no matching language is found.
51-54: LGTM!The
build_ws_urlcorrectly applies parameter adaptation before delegating to the inner adapter.owhisper/owhisper-client/src/adapter/deepgram.rs (2)
18-34: LGTM!The
can_use_multilogic correctly gates multi-language mode:
- Requires at least 2 languages
- Requires nova-2 or nova-3 model prefix
- Validates all languages are in the model's supported set
36-68: Well-structured language query logic.The implementation correctly handles all cases per PR objectives:
- 0 languages → auto-detection
- 1 language → explicit language code
- 2+ languages →
language=multifor supported nova models, otherwise detection with hintsMinor: The
if letguard on line 47 is technically redundant after matchinglen() == 1, but it's harmless and defensive.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
14-15: Document intentional NOVA‑2 multi‑language restriction
can_use_multigateslanguage=multibased onNOVA2_MULTI_LANGS/NOVA3_MULTI_LANGS, withNOVA2_MULTI_LANGSlimited to["en", "es"]. This matches the PR description (“Nova‑2:language=multionly for English+Spanish combinations”) but is much stricter than Deepgram’s documented overall language support for Nova‑2.To avoid future confusion when someone compares this to provider docs, I’d recommend:
- Adding a brief comment next to
NOVA2_MULTI_LANGSexplaining that the list is intentionally restricted (e.g., for MVP or product constraints), and- Optionally adding a TODO pointing to the place in the codebase where broader Nova‑2 language support would be wired in later.
This is mostly about maintainability and expectations, not behavior.
Also applies to: 200-216, 236-268
317-335: Remove hardcoded Deepgram API key from testThe
test_clientintegration test still embeds a literal Deepgram API key in source:.api_key("71557216ffdd13bff22702be5017e4852c052b7c")This is a serious security risk if it’s (or was ever) a real key: it can be scraped from the repo and abused, and it also encourages a pattern of committing secrets in code.
Please:
- Remove the literal key and load it from configuration (env var, test fixture, or CI secret), and
- Rotate/revoke the exposed key in Deepgram if it was ever valid.
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/soniox.rs (1)
93-119: DeduplicateContextdefinition between HTTP and WS pathsThe
Contextstruct and thecontext-building logic are duplicated in bothcreate_transcriptionandinitial_message. Behavior is fine, but this duplication makes it easy for the two payload shapes to drift if Soniox changes their context schema.Consider lifting
Contextto a private module‑level struct and reusing the same helper to buildOption<Context>fromparams.keywordsfor both the HTTP and WebSocket requests.Also applies to: 389-433
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
apps/desktop/src/components/settings/general/index.tsx(2 hunks)apps/desktop/src/components/settings/general/spoken-languages.tsx(6 hunks)owhisper/owhisper-client/src/adapter/deepgram.rs(2 hunks)owhisper/owhisper-client/src/adapter/soniox.rs(5 hunks)owhisper/owhisper-client/src/lib.rs(0 hunks)
💤 Files with no reviewable changes (1)
- owhisper/owhisper-client/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, usecn(import from@hypr/utils). It is similar toclsx. Always pass an array and split by logical grouping.
Usemotion/reactinstead offramer-motion.
Files:
apps/desktop/src/components/settings/general/spoken-languages.tsxapps/desktop/src/components/settings/general/index.tsx
🧬 Code graph analysis (3)
apps/desktop/src/components/settings/general/spoken-languages.tsx (1)
extensions/shared/types/hypr-extension.d.ts (2)
Badge(133-133)Button(159-159)
apps/desktop/src/components/settings/general/index.tsx (1)
apps/desktop/src/components/settings/general/spoken-languages.tsx (1)
SpokenLanguagesView(22-190)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
owhisper/owhisper-client/src/lib.rs (1)
params(43-46)crates/whisper-local/src/model/mod.rs (1)
language(26-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: fmt
🔇 Additional comments (2)
apps/desktop/src/components/settings/general/index.tsx (1)
12-12: LGTM! Clean integration following established patterns.The SpokenLanguagesView integration mirrors the MainLanguageView pattern above it and correctly wires the form field with proper value/onChange handlers. The form setup properly includes spoken_languages in config fetching (line 21), default values (line 49), and JSON serialization (lines 28-30).
Also applies to: 138-146
apps/desktop/src/components/settings/general/spoken-languages.tsx (1)
17-20: LGTM! Solid refactoring to code-based language handling.The refactoring consistently shifts from name-based to code-based language handling throughout:
- Helper function (lines 17-20): Provides defensive fallback when looking up language names
- Filtering (lines 36-41): Correctly filters by code and searches by name
- Selection handlers (lines 69-70, 162): Emit language codes instead of names
- Rendering (lines 97-116, 154-178): Uses codes as keys and displays names via the helper
The component preserves all accessibility features (ARIA attributes, keyboard navigation) and follows React best practices. The shift to code-based handling aligns with the broader PR goal of improving language management across adapters.
Also applies to: 36-41, 69-70, 97-116, 154-178
Summary
Implements simplified "best effort" language handling for STT adapters as discussed. Each adapter now handles languages according to its provider's semantics:
language=multifor English+Spanish combinationslanguage=multifor 10 supported languages (en, es, fr, de, hi, ru, pt, ja, it, nl)detect_language=truewith language hintsFrontend cleanup: Removed the unused
LANGUAGE_SUPPORTmap (~240 lines) and related type definitions fromshared.tsx. This was dead code that duplicated language info now handled by the Rust backend.Code organization: Moved Deepgram-specific language constants (
NOVA2_MULTI_LANGS,NOVA3_MULTI_LANGS) andappend_language_queryfunction fromlib.rstodeepgram.rswhere they belong.Review & Testing Checklist for Human
contains("parakeet")+contains("v2")for Parakeet v2,starts_with("nova-3")/starts_with("nova-2")for Deepgramlanguage=multiis sentlanguage=multi, and with en+fr to confirm fallback todetect_language=trueLANGUAGE_SUPPORT(grep shows it was only defined, never imported elsewhere)Recommended test plan: Run the desktop app with different STT configurations and check the WebSocket URLs being constructed (can add temporary logging or check network traffic).
Notes
ArgmaxAdapter::transcribe_filestill delegates directly to the inner adapter without adapting params - this may need attention if batch transcription with Argmax is used.Link to Devin run: https://app.devin.ai/sessions/5ac816cc0deb401c99c5d1d935de7671
Requested by: yujonglee (@yujonglee)