refactor(owhisper-client): add adapter pattern for STT providers#2051
refactor(owhisper-client): add adapter pattern for STT providers#2051
Conversation
- Add SttAdapter trait to abstract provider-specific behavior - Implement DeepgramAdapter with all existing Deepgram-specific logic - Make ListenClient and ListenClientDual generic over adapter type - Use default type parameter (DeepgramAdapter) for backward compatibility - Add adapter() method to access the underlying adapter - Prepare architecture for future Soniox adapter implementation Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughIntroduces an adapter pattern for the owhisper-client's STT functionality by defining a generic Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
owhisper/owhisper-client/src/live.rs (1)
63-91: Adapter methods not used inWebSocketIOimplementations.
ListenClientIO<A>hasPhantomData<A>but doesn't use the adapter'sencode_audio,encode_control, ordecode_responsemethods. The encoding/decoding logic into_messageandfrom_message(lines 76-90) duplicates what's inDeepgramAdapter.This undermines the adapter pattern since different adapters may require different message formats. Consider passing the adapter instance to the IO wrapper or restructuring so the adapter methods are actually invoked.
One approach would be to store a reference or clone of the adapter in the IO struct:
-pub struct ListenClientIO<A: SttAdapter> { - _marker: PhantomData<A>, +pub struct ListenClientIO<A: SttAdapter> { + adapter: A, }Then use
self.adapter.encode_audio(data)etc. in the trait implementation. However, sinceWebSocketIOis a trait with associated functions (not methods taking&self), this may require refactoring the trait design.
🧹 Nitpick comments (4)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
23-34: Consider returningResultinstead of panicking on invalidapi_base.The
.expect()on line 24 will panic ifapi_baseis not a valid URL. Since this is user-provided input, consider returning aResultor validating earlier in the builder.If changing the trait signature is not feasible, at least document that
api_basemust be a valid URL, or validate it in the builder'sapi_base()method before it reaches here.
184-194: Model name detection is fragile.The
contains("nova-3")check works for current models but could break if Deepgram introduces new model naming conventions (e.g.,nova-3.1,nova-3-turbo). Consider documenting this behavior or making it configurable.This is acceptable for now but worth noting for future maintenance.
owhisper/owhisper-client/src/lib.rs (2)
93-108: Consider avoiding repeated cloning ofparams.The
params.clone().unwrap_or_default()on lines 95 and 101 clones the params struct on every call. Ifbuild_urlis called multiple times (e.g., for retries), this could be inefficient. Consider storing a resolvedListenParamsor usingCowto avoid repeated allocations.This is minor since these methods are typically called once per client build.
184-198: Consider simplifying empty match arms.The nested
matchwith empty arms (_ => {}) on lines 193-196 can be simplified. If onlyTranscriptResponsematters, consider usingif letinstead.- match result { - Ok(response) => match response { - owhisper_interface::stream::StreamResponse::TranscriptResponse { - channel, - .. - } => { - println!("{:?}", channel.alternatives.first().unwrap().transcript); - } - _ => {} - }, - _ => {} - } + if let Ok(owhisper_interface::stream::StreamResponse::TranscriptResponse { + channel, + .. + }) = result + { + println!("{:?}", channel.alternatives.first().unwrap().transcript); + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
owhisper/owhisper-client/src/adapter/deepgram.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(1 hunks)owhisper/owhisper-client/src/lib.rs(2 hunks)owhisper/owhisper-client/src/live.rs(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Devin
- GitHub Check: fmt
🔇 Additional comments (8)
owhisper/owhisper-client/src/adapter/deepgram.rs (3)
122-130: LGTM with minor note.The URL-to-URI conversion is safe in practice since a valid
url::Urlwill produce a valid URI string. The implementation correctly handles optional API key authentication.
136-145: LGTM!The encoding uses
unwrap()which is acceptable sinceControlMessageis a well-defined enum that should always serialize successfully. Thedecode_responsecorrectly returnsNonefor non-text messages, as documented in the trait.
48-83: LGTM!The streaming URL builder correctly sets up all required Deepgram parameters including language detection, model selection, and audio format configuration. The separation of
build_urlfor streaming andbuild_batch_urlfor batch processing is well-designed.owhisper/owhisper-client/src/adapter/mod.rs (1)
11-47: Well-designed adapter trait.The
SttAdaptertrait is well-documented and provides a clean abstraction for different STT providers. The trait bounds (Clone + Send + Sync + 'static) are appropriate for async WebSocket contexts. The method signatures cover all necessary aspects: URL construction, authentication, message encoding/decoding, and keep-alive configuration.owhisper/owhisper-client/src/lib.rs (1)
34-50: LGTM!The generic builder with
DeepgramAdapteras the default type parameter provides a clean API for common use cases while allowing customization. TheDefaultimplementation is correctly constrained to onlyDeepgramAdapter.owhisper/owhisper-client/src/live.rs (3)
189-200: LGTM!The
websocket_client_with_keep_alivefunction correctly delegates keep-alive configuration to the adapter while providing a sensible default fallback. This allows different STT providers to customize their keep-alive behavior.
36-57: LGTM!The
interleave_audiofunction correctly handles 16-bit little-endian audio interleaving for dual-channel processing, including proper handling of mismatched input lengths by padding with silence (zeros).
16-34: LGTM!The
ListenClient<A>andListenClientDual<A>structs are well-designed with appropriate default type parameters. Storing the adapter instance enables adapter-aware behavior throughout the client lifecycle.
refactor(owhisper-client): add adapter pattern for STT providers
Summary
Refactors
owhisper-clientto use an adapter pattern that abstracts provider-specific STT logic. This prepares the codebase for supporting multiple STT providers (starting with Soniox) while maintaining backward compatibility with the existing Deepgram-like interface.Key changes:
SttAdaptertrait defining the interface for STT providers (URL building, authentication, message encoding/decoding, keep-alive)DeepgramAdapterimplementation extracting all Deepgram-specific logicListenClient,ListenClientDual, andListenClientBuilderare now generic over adapter type withDeepgramAdapteras defaultListenClientIO<A>andListenClientDualIO<A>wrapper types for WebSocketIO traitThe public API remains backward compatible - existing code using
ListenClient::builder()continues to work unchanged.Review & Testing Checklist for Human
DeepgramAdapter::build_urlagainst the original implementation - subtle differences in query parameter ordering or values could break transcriptionWebSocketIOdirectly onListenClientto using wrapper types (ListenClientIO<A>) is a structural change worth verifyingRecommended test plan:
ONBOARDING=0 pnpm -F desktop tauri devNotes
ListenClientIOandListenClientDualIOstructs currently don't use the adapter for encoding/decoding (still hardcoded JSON) - this is intentional for now as all providers use the same format, but may need revisiting for SonioxLink to Devin run: https://app.devin.ai/sessions/59fa87b9825244959a599f450a15a050
Requested by: yujonglee (yujonglee.dev@gmail.com) (@yujonglee)