Skip to content

refactor(owhisper-client): add adapter pattern for STT providers#2051

Closed
yujonglee wants to merge 1 commit intomainfrom
devin/1764584079-stt-adapter-architecture
Closed

refactor(owhisper-client): add adapter pattern for STT providers#2051
yujonglee wants to merge 1 commit intomainfrom
devin/1764584079-stt-adapter-architecture

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 1, 2025

refactor(owhisper-client): add adapter pattern for STT providers

Summary

Refactors owhisper-client to use an adapter pattern that abstracts provider-specific STT logic. This prepares the codebase for supporting multiple STT providers (starting with Soniox) while maintaining backward compatibility with the existing Deepgram-like interface.

Key changes:

  • New SttAdapter trait defining the interface for STT providers (URL building, authentication, message encoding/decoding, keep-alive)
  • DeepgramAdapter implementation extracting all Deepgram-specific logic
  • ListenClient, ListenClientDual, and ListenClientBuilder are now generic over adapter type with DeepgramAdapter as default
  • New ListenClientIO<A> and ListenClientDualIO<A> wrapper types for WebSocketIO trait

The public API remains backward compatible - existing code using ListenClient::builder() continues to work unchanged.

Review & Testing Checklist for Human

  • Verify URL building equivalence: Compare the URLs generated by DeepgramAdapter::build_url against the original implementation - subtle differences in query parameter ordering or values could break transcription
  • Test with actual STT service: Run the desktop app and verify real-time transcription still works end-to-end (this was not tested locally due to missing binary)
  • Review WebSocketIO wrapper pattern: The change from implementing WebSocketIO directly on ListenClient to using wrapper types (ListenClientIO<A>) is a structural change worth verifying

Recommended test plan:

  1. Run the desktop app with ONBOARDING=0 pnpm -F desktop tauri dev
  2. Start a recording session and verify transcription works
  3. Test both single-channel and dual-channel (mic + speaker) modes

Notes

  • The ListenClientIO and ListenClientDualIO structs currently don't use the adapter for encoding/decoding (still hardcoded JSON) - this is intentional for now as all providers use the same format, but may need revisiting for Soniox
  • The e2e test failure in CI is unrelated (missing desktop binary)

Link to Devin run: https://app.devin.ai/sessions/59fa87b9825244959a599f450a15a050
Requested by: yujonglee (yujonglee.dev@gmail.com) (@yujonglee)

- Add SttAdapter trait to abstract provider-specific behavior
- Implement DeepgramAdapter with all existing Deepgram-specific logic
- Make ListenClient and ListenClientDual generic over adapter type
- Use default type parameter (DeepgramAdapter) for backward compatibility
- Add adapter() method to access the underlying adapter
- Prepare architecture for future Soniox adapter implementation

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit 29709de
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/692d6e1a5808080008d39c1e
😎 Deploy Preview https://deploy-preview-2051--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

📝 Walkthrough

Walkthrough

Introduces an adapter pattern for the owhisper-client's STT functionality by defining a generic SttAdapter trait and implementing a DeepgramAdapter. Refactors ListenClientBuilder, ListenClient, and ListenClientDual to be generic over the adapter type with DeepgramAdapter as the default, delegating URL construction and request creation to the adapter.

Changes

Cohort / File(s) Summary
New Adapter Infrastructure
owhisper/owhisper-client/src/adapter/mod.rs
Introduces public SttAdapter trait defining provider-specific WebSocket pipeline methods including build_url(), build_batch_url(), build_request(), encode_audio(), encode_control(), decode_response(), and keep_alive_config(). Re-exports DeepgramAdapter via new deepgram module.
Deepgram Adapter Implementation
owhisper/owhisper-client/src/adapter/deepgram.rs
Implements SttAdapter for DeepgramAdapter with URL construction for per-call and batch calls, WebSocket scheme selection (ws/wss), language and keyword query parameter handling, audio/control message encoding, response decoding, and periodic keep-alive behavior.
Generic Builder and Client Core
owhisper/owhisper-client/src/lib.rs
Refactors ListenClientBuilder<A> to be generic over SttAdapter with default DeepgramAdapter. Adds public adapter accessors and new build methods (build_with_channels(), build_batch(), build_single(), build_dual()). Delegates URL construction and request creation to the adapter. Re-exports SttAdapter and DeepgramAdapter.
Generic Live Clients and IO Wrappers
owhisper/owhisper-client/src/live.rs
Makes ListenClient<A> and ListenClientDual<A> generic over SttAdapter with default adapter. Introduces public wrapper types ListenClientIO<A> and ListenClientDualIO<A> for adapter-driven encoding/decoding. Adds adapter accessors, updates from_realtime_audio() to use IO wrappers, and adapts WebSocket connection creation via websocket_client_with_keep_alive() to leverage adapter keep-alive configuration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Key focus areas:
    • Verify SttAdapter trait correctly abstracts provider-agnostic WebSocket pipeline operations and that trait bounds (Clone + Send + Sync + 'static) are appropriate
    • Confirm DeepgramAdapter correctly extracts and encapsulates existing Deepgram-specific logic (URL construction, language/keyword query handling, keep-alive behavior)
    • Validate generic parameter propagation through ListenClientBuilder<A>, ListenClient<A>, ListenClientDual<A>, and their IO wrappers maintains backward compatibility with DeepgramAdapter defaults
    • Review builder method signatures and initialization flows for consistency and correctness
    • Examine adapter-driven URL construction and request building to ensure equivalent behavior to previous monomorphic implementation
    • Check websocket_client_with_keep_alive() adapter integration and fallback to default keep-alive when adapter provides none

Possibly related PRs

  • Implement binary diarization #1015: Introduces ListenClientDual and build variants (build_single, build_dual) with dual-channel IO wrappers that this PR now makes generic over the adapter type.
  • Add language detect constraint #1212: Modifies multi-language handling and propagation in ListenParams; this PR moves language/keyword query assembly into the SttAdapter trait implementation.
  • Explicit sample_rate in owhisper client #1651: Updates ListenParams with explicit sample_rate and refactors URL query construction; this PR now abstracts that URL construction into adapter-driven build_url() and build_batch_url() methods.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.78% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(owhisper-client): add adapter pattern for STT providers' accurately summarizes the main change - introducing an adapter pattern for STT providers in the owhisper-client.
Description check ✅ Passed The description provides comprehensive context about the adapter pattern refactor, explains the key changes, maintains backward compatibility, and includes testing recommendations related to the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764584079-stt-adapter-architecture

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit 29709de
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/692d6e1acc76e80008548e86
😎 Deploy Preview https://deploy-preview-2051--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
owhisper/owhisper-client/src/live.rs (1)

63-91: Adapter methods not used in WebSocketIO implementations.

ListenClientIO<A> has PhantomData<A> but doesn't use the adapter's encode_audio, encode_control, or decode_response methods. The encoding/decoding logic in to_message and from_message (lines 76-90) duplicates what's in DeepgramAdapter.

This undermines the adapter pattern since different adapters may require different message formats. Consider passing the adapter instance to the IO wrapper or restructuring so the adapter methods are actually invoked.

One approach would be to store a reference or clone of the adapter in the IO struct:

-pub struct ListenClientIO<A: SttAdapter> {
-    _marker: PhantomData<A>,
+pub struct ListenClientIO<A: SttAdapter> {
+    adapter: A,
 }

Then use self.adapter.encode_audio(data) etc. in the trait implementation. However, since WebSocketIO is a trait with associated functions (not methods taking &self), this may require refactoring the trait design.

🧹 Nitpick comments (4)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)

23-34: Consider returning Result instead of panicking on invalid api_base.

The .expect() on line 24 will panic if api_base is not a valid URL. Since this is user-provided input, consider returning a Result or validating earlier in the builder.

If changing the trait signature is not feasible, at least document that api_base must be a valid URL, or validate it in the builder's api_base() method before it reaches here.


184-194: Model name detection is fragile.

The contains("nova-3") check works for current models but could break if Deepgram introduces new model naming conventions (e.g., nova-3.1, nova-3-turbo). Consider documenting this behavior or making it configurable.

This is acceptable for now but worth noting for future maintenance.

owhisper/owhisper-client/src/lib.rs (2)

93-108: Consider avoiding repeated cloning of params.

The params.clone().unwrap_or_default() on lines 95 and 101 clones the params struct on every call. If build_url is called multiple times (e.g., for retries), this could be inefficient. Consider storing a resolved ListenParams or using Cow to avoid repeated allocations.

This is minor since these methods are typically called once per client build.


184-198: Consider simplifying empty match arms.

The nested match with empty arms (_ => {}) on lines 193-196 can be simplified. If only TranscriptResponse matters, consider using if let instead.

-            match result {
-                Ok(response) => match response {
-                    owhisper_interface::stream::StreamResponse::TranscriptResponse {
-                        channel,
-                        ..
-                    } => {
-                        println!("{:?}", channel.alternatives.first().unwrap().transcript);
-                    }
-                    _ => {}
-                },
-                _ => {}
-            }
+            if let Ok(owhisper_interface::stream::StreamResponse::TranscriptResponse {
+                channel,
+                ..
+            }) = result
+            {
+                println!("{:?}", channel.alternatives.first().unwrap().transcript);
+            }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b42afb8 and 29709de.

📒 Files selected for processing (4)
  • owhisper/owhisper-client/src/adapter/deepgram.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/mod.rs (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (2 hunks)
  • owhisper/owhisper-client/src/live.rs (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Devin
  • GitHub Check: fmt
🔇 Additional comments (8)
owhisper/owhisper-client/src/adapter/deepgram.rs (3)

122-130: LGTM with minor note.

The URL-to-URI conversion is safe in practice since a valid url::Url will produce a valid URI string. The implementation correctly handles optional API key authentication.


136-145: LGTM!

The encoding uses unwrap() which is acceptable since ControlMessage is a well-defined enum that should always serialize successfully. The decode_response correctly returns None for non-text messages, as documented in the trait.


48-83: LGTM!

The streaming URL builder correctly sets up all required Deepgram parameters including language detection, model selection, and audio format configuration. The separation of build_url for streaming and build_batch_url for batch processing is well-designed.

owhisper/owhisper-client/src/adapter/mod.rs (1)

11-47: Well-designed adapter trait.

The SttAdapter trait is well-documented and provides a clean abstraction for different STT providers. The trait bounds (Clone + Send + Sync + 'static) are appropriate for async WebSocket contexts. The method signatures cover all necessary aspects: URL construction, authentication, message encoding/decoding, and keep-alive configuration.

owhisper/owhisper-client/src/lib.rs (1)

34-50: LGTM!

The generic builder with DeepgramAdapter as the default type parameter provides a clean API for common use cases while allowing customization. The Default implementation is correctly constrained to only DeepgramAdapter.

owhisper/owhisper-client/src/live.rs (3)

189-200: LGTM!

The websocket_client_with_keep_alive function correctly delegates keep-alive configuration to the adapter while providing a sensible default fallback. This allows different STT providers to customize their keep-alive behavior.


36-57: LGTM!

The interleave_audio function correctly handles 16-bit little-endian audio interleaving for dual-channel processing, including proper handling of mismatched input lengths by padding with silence (zeros).


16-34: LGTM!

The ListenClient<A> and ListenClientDual<A> structs are well-designed with appropriate default type parameters. Storing the adapter instance enables adapter-aware behavior throughout the client lifecycle.

@yujonglee yujonglee closed this Dec 1, 2025
@ComputelessComputer ComputelessComputer deleted the devin/1764584079-stt-adapter-architecture branch December 14, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant