Skip to content

feat: add Soniox STT provider to settings#2060

Merged
yujonglee merged 6 commits intomainfrom
devin/1764653410-add-soniox-stt
Dec 2, 2025
Merged

feat: add Soniox STT provider to settings#2060
yujonglee merged 6 commits intomainfrom
devin/1764653410-add-soniox-stt

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 2, 2025

Summary

Adds Soniox as a new STT provider option in the desktop app settings. This includes the Soniox logo asset and provider configuration following the existing pattern for Deepgram and other providers.

Changes:

  • Added soniox.jpeg logo to apps/desktop/public/assets/
  • Added Soniox provider entry in shared.tsx with stt-v3 model
  • Added displayModelId mapping to show "Soniox v3" in the UI
  • Added Soniox context description in configure.tsx
  • Updated Soniox adapters to map stt-v3 to the appropriate API model names:
    • live.rs: maps stt-v3stt-rt-v3 for realtime transcription
    • batch.rs: maps stt-v3stt-async-v3 for batch transcription

Review & Testing Checklist for Human

  • Verify stt-rt-v3 and stt-async-v3 are valid Soniox model names per Soniox docs
  • Verify the Soniox logo displays correctly in the provider selector (size-5 rounded styling)
  • Test end-to-end: configure Soniox API key, select "Soniox v3" model, and verify realtime transcription works
  • If batch transcription is used in the app, verify that flow also works with Soniox

Notes

This PR is based on the refactor-adapters branch (PR #2059) as requested.

The model mapping approach was chosen because Soniox uses different model names for realtime (stt-rt-v3) vs batch (stt-async-v3), unlike Deepgram which uses the same model for both. The UI exposes a single stt-v3 option and the adapters handle the translation.


Link to Devin run: https://app.devin.ai/sessions/f857a3f230434654b027a7ab2b183b85
Requested by: yujonglee (@yujonglee)

yujonglee and others added 3 commits December 2, 2025 14:18
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

📝 Walkthrough

Walkthrough

Adds Soniox speech-to-text provider support through desktop UI configuration and updates the RealtimeSttAdapter trait signature across all implementations to return Vec<StreamResponse> instead of Option<StreamResponse>, enabling multi-response emission per streamed input.

Changes

Cohort / File(s) Summary
Desktop UI: Soniox Provider Configuration
apps/desktop/src/components/settings/ai/stt/configure.tsx, apps/desktop/src/components/settings/ai/stt/shared.tsx
Added conditional branch for Soniox provider in ProviderContext messaging and introduced Soniox provider object with id, displayName, icon, baseUrl, and model mappings in PROVIDERS array. Updated displayModelId to handle "stt-v3" model identifier.
Adapter Trait Signature Change
owhisper/owhisper-client/src/adapter/mod.rs
Changed RealtimeSttAdapter trait method parse_response return type from Option<StreamResponse> to Vec<StreamResponse>.
Deepgram & Argmax Adapter Implementations
owhisper/owhisper-client/src/adapter/deepgram/live.rs, owhisper/owhisper-client/src/adapter/argmax/live.rs
Updated parse_response implementations to return Vec, converting JSON parsing results using .into_iter().collect() instead of .ok() for error handling.
Soniox Adapter: Batch Mode
owhisper/owhisper-client/src/adapter/soniox/batch.rs
Modified model selection logic to default to "stt-v3", mapping both "stt-v3" and "stt-async-preview" to "stt-async-v3" backend model.
Soniox Adapter: Live Streaming
owhisper/owhisper-client/src/adapter/soniox/live.rs
Implemented parse_response returning Vec, added build_response helper method, restructured token/response processing for final and non-final tokens, changed multichannel support to false, updated model mapping ("stt-v3" → "stt-rt-v3"), and added Debug derives to Token, SpeakerId, and SonioxMessage structs.
Stream Response Processing
owhisper/owhisper-client/src/live.rs
Replaced per-item asynchronous mapping with flat_map + iterator pattern, converting each adapter parse_response result into a stream via futures_util::stream::iter to handle multiple responses per input across all listen paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • soniox/live.rs: Review response construction logic, token aggregation, and build_response helper implementation for correctness
  • owhisper-client/src/live.rs: Verify flat_map + iter stream flattening pattern consistency across all listen paths (mic/speaker/dual streams)
  • Adapter implementations: Confirm all parse_response implementations correctly handle empty/single-element vectors and maintain error semantics
  • Desktop UI integration: Validate Soniox provider configuration matches backend adapter expectations

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add Soniox STT provider to settings' clearly and concisely describes the main change: adding Soniox as a new STT provider option in the settings.
Description check ✅ Passed The description is well-detailed and directly related to the changeset, providing context about the Soniox provider addition, model mappings, testing checklist, and rationale for design decisions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764653410-add-soniox-stt

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration bot and others added 2 commits December 2, 2025 05:40
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Base automatically changed from refactor-adapters to main December 2, 2025 06:55
@netlify
Copy link

netlify bot commented Dec 2, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit 4253bf3
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/692e8deae80ead0008f0b1d4
😎 Deploy Preview https://deploy-preview-2060--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 2, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit 4253bf3
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/692e8dea4b984700089e595f
😎 Deploy Preview https://deploy-preview-2060--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (1)

77-78: Consider logging JSON parse failures for debugging.

The current implementation silently returns an empty Vec on parse errors. For consistency with the Soniox adapter (which logs parse failures), consider adding a warning log:

     fn parse_response(&self, raw: &str) -> Vec<StreamResponse> {
-        serde_json::from_str(raw).into_iter().collect()
+        match serde_json::from_str(raw) {
+            Ok(response) => vec![response],
+            Err(e) => {
+                tracing::warn!(error = ?e, raw = raw, "deepgram_json_parse_failed");
+                vec![]
+            }
+        }
     }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d7101e and 4253bf3.

⛔ Files ignored due to path filters (1)
  • apps/desktop/public/assets/soniox.jpeg is excluded by !**/*.jpeg
📒 Files selected for processing (8)
  • apps/desktop/src/components/settings/ai/stt/configure.tsx (1 hunks)
  • apps/desktop/src/components/settings/ai/stt/shared.tsx (2 hunks)
  • owhisper/owhisper-client/src/adapter/argmax/live.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/deepgram/live.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/mod.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/soniox/batch.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/soniox/live.rs (7 hunks)
  • owhisper/owhisper-client/src/live.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, use cn (import from @hypr/utils). It is similar to clsx. Always pass an array and split by logical grouping.
Use motion/react instead of framer-motion.

Files:

  • apps/desktop/src/components/settings/ai/stt/shared.tsx
  • apps/desktop/src/components/settings/ai/stt/configure.tsx
🧬 Code graph analysis (4)
owhisper/owhisper-client/src/adapter/argmax/live.rs (3)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (1)
  • parse_response (77-79)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • parse_response (43-43)
owhisper/owhisper-client/src/adapter/soniox/live.rs (1)
  • parse_response (81-135)
owhisper/owhisper-client/src/adapter/soniox/batch.rs (1)
owhisper/owhisper-client/src/lib.rs (1)
  • params (45-48)
owhisper/owhisper-client/src/adapter/mod.rs (3)
owhisper/owhisper-client/src/adapter/argmax/live.rs (1)
  • parse_response (30-32)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (1)
  • parse_response (77-79)
owhisper/owhisper-client/src/adapter/soniox/live.rs (1)
  • parse_response (81-135)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (2)
owhisper/owhisper-client/src/adapter/argmax/live.rs (1)
  • parse_response (30-32)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • parse_response (43-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
  • GitHub Check: desktop_ci (macos, depot-macos-14)
  • GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
  • GitHub Check: fmt
🔇 Additional comments (14)
apps/desktop/src/components/settings/ai/stt/shared.tsx (2)

31-33: LGTM!

The displayModelId mapping for "stt-v3" to "Soniox v3" follows the existing pattern and is correctly placed before the generic prefix-based checks.


101-111: LGTM!

The Soniox provider entry is well-structured and follows the existing provider pattern. All required fields are properly defined.

apps/desktop/src/components/settings/ai/stt/configure.tsx (1)

464-468: LGTM!

The Soniox context message follows the existing pattern and integrates cleanly into the conditional chain. The message is concise and provides a helpful link.

owhisper/owhisper-client/src/adapter/argmax/live.rs (1)

30-32: LGTM!

The signature update correctly aligns with the trait change, and the delegation to inner.parse_response works seamlessly with the new return type.

owhisper/owhisper-client/src/adapter/mod.rs (1)

43-43: LGTM!

The trait signature change from Option<StreamResponse> to Vec<StreamResponse> is a well-reasoned design decision that enables adapters to emit multiple responses per input message—necessary for Soniox's separate final/non-final token handling.

owhisper/owhisper-client/src/live.rs (3)

202-209: LGTM!

The flat_map + iter pattern correctly expands the Vec<StreamResponse> into individual stream items while preserving error propagation.


244-251: LGTM!

Consistent application of the flat_map pattern for the native multichannel path.


284-306: LGTM!

The split path correctly applies the same flat_map pattern to both mic and speaker streams, maintaining consistency across all stream processing paths.

owhisper/owhisper-client/src/adapter/soniox/live.rs (5)

81-134: LGTM!

The parse_response implementation is well-structured:

  • Properly handles error messages and parse failures with logging
  • Correctly identifies speech finalization via <fin>/<end> tokens
  • Separates final and non-final tokens into distinct responses, enabling the UI to show interim results separately from confirmed transcriptions

209-248: LGTM!

The build_response helper is well-implemented:

  • Correctly handles whitespace-only tokens (adds to transcript but skips word creation)
  • Properly converts milliseconds to seconds for timing
  • Calculates duration from first/last token timestamps

272-325: LGTM!

Good addition of integration test coverage. The test demonstrates the expected usage pattern with build_dual() and validates the stream processing pipeline.


259-268: The hardcoded channel_index: [0, 1] is overwritten downstream by the split path.

Since supports_native_multichannel() returns false for Soniox, responses flow through the split path in owhisper/owhisper-client/src/live.rs where set_channel_index() is called: set_channel_index(0, 2) for microphone and set_channel_index(1, 2) for speaker. The hardcoded [0, 1] value is never actually used.


44-49: LGTM!

The model mapping correctly translates UI model identifiers to Soniox realtime API model names (stt-rt-v3), consistent with the batch adapter's approach which uses stt-async-v3. Both adapters normalize the same input identifiers ("stt-v3" and preview variants) to their respective endpoint-specific model names.

owhisper/owhisper-client/src/adapter/soniox/batch.rs (1)

90-95: LGTM!

The model mapping logic correctly translates UI model identifiers to Soniox API-specific batch model names. stt-async-v3 is the valid Soniox asynchronous transcription model for batch processing. The backward compatibility for "stt-async-preview" is a nice touch.

@yujonglee yujonglee merged commit 65aea13 into main Dec 2, 2025
15 checks passed
@yujonglee yujonglee deleted the devin/1764653410-add-soniox-stt branch December 2, 2025 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant