feat: add Soniox STT provider to settings#2060
Conversation
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
📝 WalkthroughWalkthroughAdds Soniox speech-to-text provider support through desktop UI configuration and updates the RealtimeSttAdapter trait signature across all implementations to return Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (1)
77-78: Consider logging JSON parse failures for debugging.The current implementation silently returns an empty
Vecon parse errors. For consistency with the Soniox adapter (which logs parse failures), consider adding a warning log:fn parse_response(&self, raw: &str) -> Vec<StreamResponse> { - serde_json::from_str(raw).into_iter().collect() + match serde_json::from_str(raw) { + Ok(response) => vec![response], + Err(e) => { + tracing::warn!(error = ?e, raw = raw, "deepgram_json_parse_failed"); + vec![] + } + } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
apps/desktop/public/assets/soniox.jpegis excluded by!**/*.jpeg
📒 Files selected for processing (8)
apps/desktop/src/components/settings/ai/stt/configure.tsx(1 hunks)apps/desktop/src/components/settings/ai/stt/shared.tsx(2 hunks)owhisper/owhisper-client/src/adapter/argmax/live.rs(1 hunks)owhisper/owhisper-client/src/adapter/deepgram/live.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(1 hunks)owhisper/owhisper-client/src/adapter/soniox/batch.rs(1 hunks)owhisper/owhisper-client/src/adapter/soniox/live.rs(7 hunks)owhisper/owhisper-client/src/live.rs(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{ts,tsx}: Avoid creating a bunch of types/interfaces if they are not shared. Especially for function props, just inline them instead.
Never do manual state management for form/mutation. Use useForm (from tanstack-form) and useQuery/useMutation (from tanstack-query) instead for 99% of cases. Avoid patterns like setError.
If there are many classNames with conditional logic, usecn(import from@hypr/utils). It is similar toclsx. Always pass an array and split by logical grouping.
Usemotion/reactinstead offramer-motion.
Files:
apps/desktop/src/components/settings/ai/stt/shared.tsxapps/desktop/src/components/settings/ai/stt/configure.tsx
🧬 Code graph analysis (4)
owhisper/owhisper-client/src/adapter/argmax/live.rs (3)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (1)
parse_response(77-79)owhisper/owhisper-client/src/adapter/mod.rs (1)
parse_response(43-43)owhisper/owhisper-client/src/adapter/soniox/live.rs (1)
parse_response(81-135)
owhisper/owhisper-client/src/adapter/soniox/batch.rs (1)
owhisper/owhisper-client/src/lib.rs (1)
params(45-48)
owhisper/owhisper-client/src/adapter/mod.rs (3)
owhisper/owhisper-client/src/adapter/argmax/live.rs (1)
parse_response(30-32)owhisper/owhisper-client/src/adapter/deepgram/live.rs (1)
parse_response(77-79)owhisper/owhisper-client/src/adapter/soniox/live.rs (1)
parse_response(81-135)
owhisper/owhisper-client/src/adapter/deepgram/live.rs (2)
owhisper/owhisper-client/src/adapter/argmax/live.rs (1)
parse_response(30-32)owhisper/owhisper-client/src/adapter/mod.rs (1)
parse_response(43-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: fmt
🔇 Additional comments (14)
apps/desktop/src/components/settings/ai/stt/shared.tsx (2)
31-33: LGTM!The
displayModelIdmapping for"stt-v3"to"Soniox v3"follows the existing pattern and is correctly placed before the generic prefix-based checks.
101-111: LGTM!The Soniox provider entry is well-structured and follows the existing provider pattern. All required fields are properly defined.
apps/desktop/src/components/settings/ai/stt/configure.tsx (1)
464-468: LGTM!The Soniox context message follows the existing pattern and integrates cleanly into the conditional chain. The message is concise and provides a helpful link.
owhisper/owhisper-client/src/adapter/argmax/live.rs (1)
30-32: LGTM!The signature update correctly aligns with the trait change, and the delegation to
inner.parse_responseworks seamlessly with the new return type.owhisper/owhisper-client/src/adapter/mod.rs (1)
43-43: LGTM!The trait signature change from
Option<StreamResponse>toVec<StreamResponse>is a well-reasoned design decision that enables adapters to emit multiple responses per input message—necessary for Soniox's separate final/non-final token handling.owhisper/owhisper-client/src/live.rs (3)
202-209: LGTM!The
flat_map+iterpattern correctly expands theVec<StreamResponse>into individual stream items while preserving error propagation.
244-251: LGTM!Consistent application of the flat_map pattern for the native multichannel path.
284-306: LGTM!The split path correctly applies the same flat_map pattern to both mic and speaker streams, maintaining consistency across all stream processing paths.
owhisper/owhisper-client/src/adapter/soniox/live.rs (5)
81-134: LGTM!The parse_response implementation is well-structured:
- Properly handles error messages and parse failures with logging
- Correctly identifies speech finalization via
<fin>/<end>tokens- Separates final and non-final tokens into distinct responses, enabling the UI to show interim results separately from confirmed transcriptions
209-248: LGTM!The
build_responsehelper is well-implemented:
- Correctly handles whitespace-only tokens (adds to transcript but skips word creation)
- Properly converts milliseconds to seconds for timing
- Calculates duration from first/last token timestamps
272-325: LGTM!Good addition of integration test coverage. The test demonstrates the expected usage pattern with
build_dual()and validates the stream processing pipeline.
259-268: The hardcodedchannel_index: [0, 1]is overwritten downstream by the split path.Since
supports_native_multichannel()returnsfalsefor Soniox, responses flow through the split path inowhisper/owhisper-client/src/live.rswhereset_channel_index()is called:set_channel_index(0, 2)for microphone andset_channel_index(1, 2)for speaker. The hardcoded[0, 1]value is never actually used.
44-49: LGTM!The model mapping correctly translates UI model identifiers to Soniox realtime API model names (
stt-rt-v3), consistent with the batch adapter's approach which usesstt-async-v3. Both adapters normalize the same input identifiers ("stt-v3" and preview variants) to their respective endpoint-specific model names.owhisper/owhisper-client/src/adapter/soniox/batch.rs (1)
90-95: LGTM!The model mapping logic correctly translates UI model identifiers to Soniox API-specific batch model names.
stt-async-v3is the valid Soniox asynchronous transcription model for batch processing. The backward compatibility for"stt-async-preview"is a nice touch.
Summary
Adds Soniox as a new STT provider option in the desktop app settings. This includes the Soniox logo asset and provider configuration following the existing pattern for Deepgram and other providers.
Changes:
soniox.jpeglogo toapps/desktop/public/assets/shared.tsxwithstt-v3modeldisplayModelIdmapping to show "Soniox v3" in the UIconfigure.tsxstt-v3to the appropriate API model names:live.rs: mapsstt-v3→stt-rt-v3for realtime transcriptionbatch.rs: mapsstt-v3→stt-async-v3for batch transcriptionReview & Testing Checklist for Human
stt-rt-v3andstt-async-v3are valid Soniox model names per Soniox docsNotes
This PR is based on the
refactor-adaptersbranch (PR #2059) as requested.The model mapping approach was chosen because Soniox uses different model names for realtime (
stt-rt-v3) vs batch (stt-async-v3), unlike Deepgram which uses the same model for both. The UI exposes a singlestt-v3option and the adapters handle the translation.Link to Devin run: https://app.devin.ai/sessions/f857a3f230434654b027a7ab2b183b85
Requested by: yujonglee (@yujonglee)