Skip to content

feat(stt): improve language handling per adapter#2058

Merged
yujonglee merged 4 commits intomainfrom
devin/1764647712-stt-language-handling
Dec 2, 2025
Merged

feat(stt): improve language handling per adapter#2058
yujonglee merged 4 commits intomainfrom
devin/1764647712-stt-language-handling

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 2, 2025

Summary

Implements simplified "best effort" language handling for STT adapters as discussed. Each adapter now handles languages according to its provider's semantics:

  • Argmax: Model-aware language selection:
    • Parakeet v2: Always uses English (model is English-only)
    • Parakeet v3: Uses first supported language from the 25-language list, or falls back to English
    • Other models: Uses first provided language or defaults to English
  • Deepgram: Model-aware multi-language support with stricter validation:
    • Nova-2: Only uses language=multi for English+Spanish combinations
    • Nova-3: Uses language=multi for 10 supported languages (en, es, fr, de, hi, ru, pt, ja, it, nl)
    • Other cases: Falls back to detect_language=true with language hints
  • Soniox: No changes needed (already uses language hints correctly)

Frontend cleanup: Removed the unused LANGUAGE_SUPPORT map (~240 lines) and related type definitions from shared.tsx. This was dead code that duplicated language info now handled by the Rust backend.

Code organization: Moved Deepgram-specific language constants (NOVA2_MULTI_LANGS, NOVA3_MULTI_LANGS) and append_language_query function from lib.rs to deepgram.rs where they belong.

Review & Testing Checklist for Human

  • Verify model string matching covers your use cases: contains("parakeet") + contains("v2") for Parakeet v2, starts_with("nova-3") / starts_with("nova-2") for Deepgram
  • Test Deepgram nova-3 with multiple supported languages (e.g., en+es+fr) to confirm language=multi is sent
  • Test Deepgram nova-2 with en+es to confirm language=multi, and with en+fr to confirm fallback to detect_language=true
  • Test Argmax with parakeet-v2 model to confirm it always uses English regardless of input
  • Confirm nothing in the frontend was actually using LANGUAGE_SUPPORT (grep shows it was only defined, never imported elsewhere)

Recommended test plan: Run the desktop app with different STT configurations and check the WebSocket URLs being constructed (can add temporary logging or check network traffic).

Notes

  • ArgmaxAdapter::transcribe_file still delegates directly to the inner adapter without adapting params - this may need attention if batch transcription with Argmax is used.
  • Language constants are hardcoded in the adapters. If providers add new multi-language support, these lists will need manual updates.

Link to Devin run: https://app.devin.ai/sessions/5ac816cc0deb401c99c5d1d935de7671
Requested by: yujonglee (@yujonglee)

- Argmax: use only first language (single language mode)
- Deepgram: model-aware multi vs detect-restricted (nova-2/3 support multi)

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 2, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit a376f71
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/692e6ce5d4d1a000083d8b5b
😎 Deploy Preview https://deploy-preview-2058--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

📝 Walkthrough

Walkthrough

Normalize ListenParams.languages per-model in argmax; move language-query construction into the Deepgram adapter with model-aware multi-language logic; remove a large frontend language-support map and delete former helper query builders from the client library. No public API changes.

Changes

Cohort / File(s) Summary
Argmax adapter parameter normalization
owhisper/owhisper-client/src/adapter/argmax.rs
Add private ArgmaxAdapter::adapt_params to normalize ListenParams.languages to a single language (parakeet v2/v3 logic). build_ws_url now adapts params before delegating to Deepgram adapter.
Deepgram adapter language & keyword query logic
owhisper/owhisper-client/src/adapter/deepgram.rs
Add model-specific language lists (NOVA2_MULTI_LANGS, NOVA3_MULTI_LANGS), can_use_multi helper, and pub(crate) append_language_query to construct language query params for 0/1/2+ languages with model-aware branching; add internal append_keyword_query handling and update imports.
Client library helpers removed
owhisper/owhisper-client/src/lib.rs
Remove previously exported helper functions that built language and keyword URL query parameters (append_language_query, append_keyword_query) and their imports.
Soniox: context/keywords serialization
owhisper/owhisper-client/src/adapter/soniox.rs
Add local Context struct and optional context fields; populate context from ListenParams.keywords, skipping serialization when empty.
Frontend: language settings refactor
apps/desktop/src/components/settings/general/spoken-languages.tsx
Switch UI logic to operate on language codes (add getLanguageName(code)), store codes in state, and render names via the helper.
Frontend: remove large language-support map
apps/desktop/src/components/settings/ai/stt/shared.tsx
Delete ProviderModels and LanguageSupportMap types and the LANGUAGE_SUPPORT constant (per-provider language mapping removed).
Frontend: integrate spoken languages view
apps/desktop/src/components/settings/general/index.tsx
Import and render SpokenLanguagesView within Language & Vocabulary section, wired to spoken_languages form field.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect NOVA2_MULTI_LANGS / NOVA3_MULTI_LANGS contents for accuracy and completeness.
  • Verify can_use_multi logic correctly matches model identifiers and handles prefixes/variants.
  • Confirm ArgmaxAdapter::adapt_params preserves required language info and does not remove needed languages for other adapters.
  • Review interplay between Deepgram append_language_query and append_keyword_query to ensure query names/format match each model's expectations.
  • Check Soniox context serialization to ensure empty keywords are omitted and non-empty keywords are serialized as intended.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(stt): improve language handling per adapter' directly and clearly summarizes the main change—implementing adapter-specific language handling for STT providers.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the language handling improvements for each adapter, frontend cleanup, code organization changes, and testing guidance.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764647712-stt-language-handling

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link

netlify bot commented Dec 2, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit a376f71
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/692e6ce5d3d88d000897b3e6
😎 Deploy Preview https://deploy-preview-2058--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/argmax.rs (1)

14-22: Add documentation explaining the single-language constraint.

The adapt_params method silently discards all but the first language without explanation or warning. Consider:

  1. Adding a doc comment explaining why ArgmaxAdapter supports only one language
  2. Optionally logging a warning when multiple languages are provided but truncated

Apply this diff to add documentation:

+    /// Adapts ListenParams to conform to ArgmaxAdapter's single-language constraint.
+    /// If multiple languages are provided, only the first is retained.
     fn adapt_params(params: &ListenParams) -> ListenParams {
         let mut adapted = params.clone();
         if let Some(first) = adapted.languages.first().cloned() {
             adapted.languages = vec![first];
         }
         adapted
     }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71bf5f8 and 44bd64e.

📒 Files selected for processing (2)
  • owhisper/owhisper-client/src/adapter/argmax.rs (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/lib.rs (2)
crates/transcribe-whisper-local/src/service/streaming.rs (1)
  • params (97-101)
crates/whisper-local/src/model/mod.rs (1)
  • language (26-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Devin
  • GitHub Check: fmt
🔇 Additional comments (2)
owhisper/owhisper-client/src/adapter/argmax.rs (1)

29-31: LGTM!

The delegation pattern correctly applies parameter adaptation before passing to the inner adapter.

owhisper/owhisper-client/src/lib.rs (1)

144-157: Verify language query behavior across adapters.

The multi-language logic correctly branches on supports_multi:

  • Models with nova-2/nova-3 prefix: uses language=multi with all language codes
  • Other models: uses detect_language=true with all language codes

For ArgmaxAdapter, the adapt_params normalization filters parameters to the first language before delegating to DeepgramAdapter. DeepgramAdapter then calls append_language_query with the normalized single-language parameters, ensuring params.languages.len() == 1. Consequently, the multi-language branch (lines 144-157) never executes for ArgmaxAdapter—only the single-language path (lines 137-142) is reached.

…GUAGE_SUPPORT

- Deepgram: nova-2 multi only for en+es, nova-3 multi for 10 langs, else detect_language with hints
- Argmax: parakeet v2 always en, parakeet v3 uses first supported lang or falls back to en
- Move language constants to deepgram.rs where they belong
- Remove unused LANGUAGE_SUPPORT map from frontend shared.tsx

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
owhisper/owhisper-client/src/adapter/argmax.rs (1)

72-82: Inconsistent parameter handling between streaming and batch transcription.

build_ws_url applies adapt_params to normalize languages, but transcribe_file passes params directly to the inner adapter without adaptation. This means:

  • Streaming: Languages normalized to single language for Parakeet models
  • Batch: Original multi-language params passed through unchanged

If this is intentional (e.g., batch uses different backend), please document it. Otherwise, apply adaptation for consistency:

 fn transcribe_file<'a, P: AsRef<Path> + Send + 'a>(
     &'a self,
     client: &'a reqwest::Client,
     api_base: &'a str,
     api_key: &'a str,
     params: &'a ListenParams,
     file_path: P,
 ) -> BatchFuture<'a> {
+    let adapted = Self::adapt_params(params);
     self.inner
-        .transcribe_file(client, api_base, api_key, params, file_path)
+        .transcribe_file(client, api_base, api_key, &adapted, file_path)
 }

Note: This would require changing the lifetime constraints since adapted is locally owned.

owhisper/owhisper-client/src/adapter/deepgram.rs (1)

315-326: Critical: Hardcoded API key in source code.

The API key on line 317 should not be committed to version control. Even for test purposes, this poses security and cost risks if the key is valid.

Replace with environment variable:

         let client = ListenClient::builder()
             .api_base("https://api.deepgram.com/v1")
-            .api_key("71557216ffdd13bff22702be5017e4852c052b7c")
+            .api_key(std::env::var("DEEPGRAM_API_KEY").expect("DEEPGRAM_API_KEY not set"))
             .params(owhisper_interface::ListenParams {

Additionally, consider adding this test to an #[ignore] attribute since it requires external API access:

#[tokio::test]
#[ignore] // Requires DEEPGRAM_API_KEY environment variable
async fn test_client() {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44bd64e and fedb7d6.

📒 Files selected for processing (4)
  • apps/desktop/src/components/settings/ai/stt/shared.tsx (0 hunks)
  • owhisper/owhisper-client/src/adapter/argmax.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/deepgram.rs (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (0 hunks)
💤 Files with no reviewable changes (2)
  • apps/desktop/src/components/settings/ai/stt/shared.tsx
  • owhisper/owhisper-client/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/argmax.rs (4)
owhisper/owhisper-client/src/lib.rs (2)
  • params (46-49)
  • api_base (36-39)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
  • supports_native_multichannel (166-168)
  • build_ws_url (170-212)
owhisper/owhisper-client/src/adapter/mod.rs (2)
  • supports_native_multichannel (24-24)
  • build_ws_url (26-26)
owhisper/owhisper-client/src/adapter/soniox.rs (2)
  • supports_native_multichannel (327-329)
  • build_ws_url (331-348)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
  • GitHub Check: fmt
  • GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
  • GitHub Check: desktop_ci (macos, depot-macos-14)
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Devin
🔇 Additional comments (5)
owhisper/owhisper-client/src/adapter/argmax.rs (3)

9-12: LGTM!

The language constant is well-defined with standard ISO 639-1 codes for Parakeet v3 supported languages.


19-44: LGTM!

The adapt_params implementation correctly normalizes languages to single-language mode:

  • Parakeet v2: English only
  • Parakeet v3: First matching supported language
  • Other models: First provided language

The fallback to English ensures graceful handling when no matching language is found.


51-54: LGTM!

The build_ws_url correctly applies parameter adaptation before delegating to the inner adapter.

owhisper/owhisper-client/src/adapter/deepgram.rs (2)

18-34: LGTM!

The can_use_multi logic correctly gates multi-language mode:

  1. Requires at least 2 languages
  2. Requires nova-2 or nova-3 model prefix
  3. Validates all languages are in the model's supported set

36-68: Well-structured language query logic.

The implementation correctly handles all cases per PR objectives:

  • 0 languages → auto-detection
  • 1 language → explicit language code
  • 2+ languages → language=multi for supported nova models, otherwise detection with hints

Minor: The if let guard on line 47 is technically redundant after matching len() == 1, but it's harmless and defensive.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)

14-15: Document intentional NOVA‑2 multi‑language restriction

can_use_multi gates language=multi based on NOVA2_MULTI_LANGS / NOVA3_MULTI_LANGS, with NOVA2_MULTI_LANGS limited to ["en", "es"]. This matches the PR description (“Nova‑2: language=multi only for English+Spanish combinations”) but is much stricter than Deepgram’s documented overall language support for Nova‑2.

To avoid future confusion when someone compares this to provider docs, I’d recommend:

  • Adding a brief comment next to NOVA2_MULTI_LANGS explaining that the list is intentionally restricted (e.g., for MVP or product constraints), and
  • Optionally adding a TODO pointing to the place in the codebase where broader Nova‑2 language support would be wired in later.

This is mostly about maintainability and expectations, not behavior.

Also applies to: 200-216, 236-268


317-335: Remove hardcoded Deepgram API key from test

The test_client integration test still embeds a literal Deepgram API key in source:

.api_key("71557216ffdd13bff22702be5017e4852c052b7c")

This is a serious security risk if it’s (or was ever) a real key: it can be scraped from the repo and abused, and it also encourages a pattern of committing secrets in code.

Please:

  • Remove the literal key and load it from configuration (env var, test fixture, or CI secret), and
  • Rotate/revoke the exposed key in Deepgram if it was ever valid.
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/soniox.rs (1)

93-119: Deduplicate Context definition between HTTP and WS paths

The Context struct and the context-building logic are duplicated in both create_transcription and initial_message. Behavior is fine, but this duplication makes it easy for the two payload shapes to drift if Soniox changes their context schema.

Consider lifting Context to a private module‑level struct and reusing the same helper to build Option<Context> from params.keywords for both the HTTP and WebSocket requests.

Also applies to: 389-433

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fedb7d6 and a376f71.

📒 Files selected for processing (5)
  • apps/desktop/src/components/settings/general/index.tsx (2 hunks)
  • apps/desktop/src/components/settings/general/spoken-languages.tsx (6 hunks)
  • owhisper/owhisper-client/src/adapter/deepgram.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/soniox.rs (5 hunks)
  • owhisper/owhisper-client/src/lib.rs (0 hunks)
💤 Files with no reviewable changes (1)
  • owhisper/owhisper-client/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, use cn (import from @hypr/utils). It is similar to clsx. Always pass an array and split by logical grouping.
Use motion/react instead of framer-motion.

Files:

  • apps/desktop/src/components/settings/general/spoken-languages.tsx
  • apps/desktop/src/components/settings/general/index.tsx
🧬 Code graph analysis (3)
apps/desktop/src/components/settings/general/spoken-languages.tsx (1)
extensions/shared/types/hypr-extension.d.ts (2)
  • Badge (133-133)
  • Button (159-159)
apps/desktop/src/components/settings/general/index.tsx (1)
apps/desktop/src/components/settings/general/spoken-languages.tsx (1)
  • SpokenLanguagesView (22-190)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
owhisper/owhisper-client/src/lib.rs (1)
  • params (43-46)
crates/whisper-local/src/model/mod.rs (1)
  • language (26-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
  • GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
  • GitHub Check: desktop_ci (macos, depot-macos-14)
  • GitHub Check: fmt
🔇 Additional comments (2)
apps/desktop/src/components/settings/general/index.tsx (1)

12-12: LGTM! Clean integration following established patterns.

The SpokenLanguagesView integration mirrors the MainLanguageView pattern above it and correctly wires the form field with proper value/onChange handlers. The form setup properly includes spoken_languages in config fetching (line 21), default values (line 49), and JSON serialization (lines 28-30).

Also applies to: 138-146

apps/desktop/src/components/settings/general/spoken-languages.tsx (1)

17-20: LGTM! Solid refactoring to code-based language handling.

The refactoring consistently shifts from name-based to code-based language handling throughout:

  • Helper function (lines 17-20): Provides defensive fallback when looking up language names
  • Filtering (lines 36-41): Correctly filters by code and searches by name
  • Selection handlers (lines 69-70, 162): Emit language codes instead of names
  • Rendering (lines 97-116, 154-178): Uses codes as keys and displays names via the helper

The component preserves all accessibility features (ARIA attributes, keyboard navigation) and follows React best practices. The shift to code-based handling aligns with the broader PR goal of improving language management across adapters.

Also applies to: 36-41, 69-70, 97-116, 154-178

@yujonglee yujonglee merged commit 55f8ffd into main Dec 2, 2025
16 checks passed
@yujonglee yujonglee deleted the devin/1764647712-stt-language-handling branch December 2, 2025 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant