Skip to content

Refactor provider specific split connection through adaptor#2052

Merged
yujonglee merged 3 commits intomainfrom
refactor-websocket-split
Dec 2, 2025
Merged

Refactor provider specific split connection through adaptor#2052
yujonglee merged 3 commits intomainfrom
refactor-websocket-split

Conversation

@yujonglee
Copy link
Contributor

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

📝 Walkthrough

Walkthrough

Introduces a trait-based STT adapter system (SttAdapter) with multiple adapter implementations (Argmax, Deepgram, Soniox), makes ListenClient/BatchClient generic over adapters, threads an optional initial WebSocket Message through client/from_audio, refactors dual-path streaming (native vs split) and finalization into a FinalizeHandle-based flow.

Changes

Cohort / File(s) Summary
WebSocket client API
crates/ws/src/client.rs
Consolidated tokio_tungstenite imports; added Utf8Bytes to public re-exports; WebSocketClient::from_audio now accepts initial_message: Option<Message> and sends it before audio if present; simplified control command handling.
Adapter trait and modules
owhisper/owhisper-client/src/adapter/mod.rs
Added SttAdapter trait, BatchFuture alias, new adapter modules and re-exports (argmax, deepgram, soniox, owhisper); trait includes hooks for WS URL, auth header, keep-alive/initial/finalize messages, response parsing, and batch transcribe.
Argmax adapter
owhisper/owhisper-client/src/adapter/argmax.rs
Added ArgmaxAdapter (Clone, Default) implementing SttAdapter; delegates to internal DeepgramAdapter for most behavior; supports_native_multichannel() returns false; includes an integration-style test.
Deepgram adapter
owhisper/owhisper-client/src/adapter/deepgram.rs
Added DeepgramAdapter (Clone, Default) implementing SttAdapter; builds WS and batch URLs, auth header, keep-alive/finalize/initial messages, streaming parse, and file-transcribe path with audio decode helper and tests.
Soniox adapter
owhisper/owhisper-client/src/adapter/soniox.rs
Added SonioxAdapter implementing SttAdapter with WS scaffolding, initial auth payload, keep-alive/finalize, streaming parse, file upload/transcription polling, and result mapping to BatchResponse.
Client library genericization
owhisper/owhisper-client/src/lib.rs
Made ListenClientBuilder, ListenClient, ListenClientDual and builder methods generic over A: SttAdapter (default DeepgramAdapter); added .adapter() to switch adapter type; reworked build paths to use adapter-provided URLs/messages.
Live/dual streaming refactor
owhisper/owhisper-client/src/live.rs
Made ListenClient/ListenClientDual generic over adapters; added ListenClientIO/ListenClientDualIO; introduced DualHandle (Native
Batch client update
owhisper/owhisper-client/src/batch.rs
Made BatchClient generic BatchClient<A: SttAdapter>; delegated file transcription to adapter transcribe_file and added constructor using ListenParams.
Client crate deps
owhisper/owhisper-client/Cargo.toml
Added tokio-stream and serde; expanded reqwest features to include multipart; dev-deps updated.
Listener actor adapterization
plugins/listener/src/actors/listener.rs
Replaced concrete adapter usage with generic A: SttAdapter; added spawn_rx_task_single_with_adapter and spawn_rx_task_dual_with_adapter; process_stream generalized to accept H: FinalizeHandle; finalization-count-driven shutdown; adapter selection by base URL.
Listener2 batch provider
plugins/listener2/src/ext.rs
Added BatchProvider::Soniox variant and a Soniox-based batch transcription path using ListenClient with SonioxAdapter.
Tests updated for new signature
crates/ws/tests/client_tests.rs
Updated calls to WebSocketClient::from_audio to include the new initial_message argument (passed as None in tests).
Docs and examples
apps/web/content/docs/developers/13.languages.mdx
New documentation page describing language support across providers.
Adapter tests/examples
owhisper/owhisper-client/src/adapter/owhisper.rs
Added live integration-style tests exercising owhisper and Deepgram using the new client flows.

Sequence Diagram(s)

sequenceDiagram
    participant Builder as ListenClientBuilder
    participant Adapter as SttAdapter (A)
    participant Client as ListenClientDual<A>
    participant WSClient as WebSocketClient
    participant Runtime as Runtime (tasks/streams)
    participant Listener as Listener actor

    Builder->>Adapter: determine adapter (adapter::<A>() or default)
    Builder->>Client: build_dual() with A::default(), request, initial_message
    Client->>Adapter: supports_native_multichannel()?
    alt native supported
        Client->>WSClient: websocket_client_with_keep_alive(request, adapter)
        WSClient->>Runtime: spawn sender/receiver tasks (send initial_message if present)
        Runtime->>Listener: stream responses (single WS)
    else split required
        Client->>WSClient: build two WS requests (mic, spk) via adapter.build_ws_url()
        WSClient->>Runtime: spawn two sender/receiver tasks (each may send initial_message)
        Runtime->>Client: merge_streams_with_channel_remap()
        Runtime->>Listener: stream merged responses
    end
    Listener->>Client: finalize via FinalizeHandle::finalize()
    Client->>WSClient: send finalize_message() and await expected_finalize_count()
    WSClient->>Runtime: emit final responses, then close
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing attention:
    • Generic trait bounds propagation across ListenClientBuilder / BatchClient / listener actor call sites.
    • Correct construction and sending of adapter-provided initial_message and keep-alive messages in sender tasks.
    • Dual-path logic: stream merging, channel remapping, and forward_dual_to_single correctness.
    • FinalizeHandle implementations and expected_finalize_count semantics in listener shutdown.
    • Adapter-specific file-transcribe implementations for audio decoding, HTTP payloads, and polling (Deepgram, Soniox).

Possibly related PRs

  • Wrap STT server with actor #1495 — Overlaps on listener actor spawn/process_stream and shutdown/finalize flow; likely touches the same finalize/shutdown logic.
  • audio-pipeline-fixes #1764 — Refactors single/dual RX-task spawning and dual audio handling; closely related to adapter-based spawn_*_with_adapter changes.
  • Implement binary diarization #1015 — Modifies ListenClient/WebSocket IO surface and dual-audio support; overlaps with new generic ListenClientDual and from_audio signature changes.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.59% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive No pull request description was provided by the author. Add a pull request description explaining the motivation, key changes, and impact of this refactoring for better context during code review.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main refactoring effort: introducing provider-specific adapters for WebSocket split connections.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-websocket-split

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit b980f80
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/692e450e6368780008a4bb27
😎 Deploy Preview https://deploy-preview-2052--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit b980f80
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/692e450e51be92000864bed4
😎 Deploy Preview https://deploy-preview-2052--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/mod.rs (1)

7-9: LGTM!

The SttAdapter trait is well-designed with appropriate bounds for async/concurrent usage. The single method cleanly captures the capability difference between providers.

Consider using an associated constant instead of a method if the value is always compile-time known per adapter type:

 pub trait SttAdapter: Clone + Default + Send + Sync + 'static {
-    fn supports_native_multichannel(&self) -> bool;
+    const SUPPORTS_NATIVE_MULTICHANNEL: bool;
 }

This would enable compile-time optimization and eliminate the method call overhead, though the current design offers more flexibility if runtime configuration becomes necessary.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b507af6 and 17e893a.

📒 Files selected for processing (8)
  • crates/ws/src/client.rs (1 hunks)
  • owhisper/owhisper-client/Cargo.toml (1 hunks)
  • owhisper/owhisper-client/src/adapter/argmax.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/deepgram.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/mod.rs (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (3 hunks)
  • owhisper/owhisper-client/src/live.rs (5 hunks)
  • plugins/listener/src/actors/listener.rs (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
owhisper/owhisper-client/src/adapter/argmax.rs (2)
owhisper/owhisper-client/src/adapter/deepgram.rs (1)
  • supports_native_multichannel (7-9)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • supports_native_multichannel (8-8)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
  • supports_native_multichannel (7-9)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • supports_native_multichannel (8-8)
plugins/listener/src/actors/listener.rs (2)
owhisper/owhisper-client/src/live.rs (3)
  • expected_finalize_count (38-38)
  • expected_finalize_count (46-48)
  • expected_finalize_count (65-70)
owhisper/owhisper-interface/src/stream.rs (2)
  • apply_offset (150-169)
  • set_extra (171-175)
owhisper/owhisper-client/src/live.rs (2)
crates/ws/src/client.rs (1)
  • finalize_with_text (29-33)
owhisper/owhisper-interface/src/stream.rs (1)
  • text (141-148)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Redirect rules - hyprnote-storybook
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Header rules - hyprnote-storybook
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Pages changed - hyprnote-storybook
  • GitHub Check: fmt
  • GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
  • GitHub Check: desktop_ci (macos, depot-macos-14)
  • GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
🔇 Additional comments (18)
owhisper/owhisper-client/Cargo.toml (1)

17-17: LGTM!

The tokio-stream dependency addition is appropriate for the new streaming interfaces and adapter architecture introduced in this PR.

crates/ws/src/client.rs (1)

8-10: LGTM!

The import consolidation and public re-export of Utf8Bytes are appropriate. Exposing Utf8Bytes is necessary since it's used in the public finalize_with_text method signature.

owhisper/owhisper-client/src/adapter/argmax.rs (1)

1-10: LGTM!

Clean implementation of ArgmaxAdapter. The supports_native_multichannel() -> false correctly indicates that the Argmax provider requires split mic/speaker handling rather than native multichannel support.

owhisper/owhisper-client/src/adapter/deepgram.rs (1)

1-10: LGTM!

Clean implementation of DeepgramAdapter. The supports_native_multichannel() -> true correctly indicates that Deepgram can handle dual-channel audio natively.

owhisper/owhisper-client/src/lib.rs (3)

17-33: LGTM!

The generic ListenClientBuilder<A: SttAdapter> with PhantomData<A> marker is correctly designed. The Default implementation properly targets the non-parameterized type (defaulting to DeepgramAdapter), maintaining backward compatibility.


51-58: LGTM!

The adapter<B>() method provides a clean way to switch adapter types while preserving builder configuration. This is a well-implemented type-changing builder pattern.


198-207: LGTM!

The build_dual() method correctly leverages the adapter's capability to determine channel configuration. The logic properly routes multichannel-capable providers (Deepgram) to use 2 channels while falling back to 1 channel for providers like Argmax that require split handling.

plugins/listener/src/actors/listener.rs (5)

7-7: LGTM!

Clean import of the new adapter types and traits. The import aligns with the new generic adapter architecture.


189-195: LGTM!

The URL parsing with fallback to string matching is pragmatic. The fallback could technically have false positives (e.g., localhost in a path), but this is a low-risk edge case for adapter selection.


212-219: LGTM!

The adapter selection based on URL locality is a clean pattern that allows different STT providers to use their optimal connection strategy (native multichannel vs split).


307-364: LGTM!

The generic adapter pattern is well implemented. The function structure correctly mirrors spawn_rx_task_single while enabling adapter-specific behavior through the type parameter.


366-461: LGTM!

The generic FinalizeHandle pattern elegantly handles both single and split connection finalization. The counting mechanism (finalize_count >= expected_count) correctly adapts to the handle type—waiting for 1 finalize response for native handles and 2 for split handles.

owhisper/owhisper-client/src/live.rs (6)

23-34: LGTM!

The ListenClientDual<A: SttAdapter> design cleanly encapsulates adapter-specific behavior, and DualHandle properly models the two connection strategies with appropriate data for each variant.


36-71: LGTM!

The FinalizeHandle trait with impl Future return type is a clean async trait pattern. The DualHandle implementation correctly uses tokio::join! to finalize both handles concurrently for split connections.


96-156: LGTM!

The IO types correctly implement WebSocketIO with proper message translation. ListenClientDualIO appropriately interleaves mic/speaker audio for native multichannel support.


180-232: LGTM!

The branching based on supports_native_multichannel() elegantly routes to the appropriate connection strategy. The split path correctly orchestrates dual websockets with forwarding and stream merging.

One minor observation: the spawned forwarder task's JoinHandle is discarded (line 220). This is acceptable since the task naturally terminates when input streams close, and errors in the forwarder would manifest as closed channels downstream.


234-251: Intentional design choice noted for try_send on audio.

Using try_send for audio (lines 242-243) means audio frames may be dropped if channels fill up. This is typically the right trade-off for real-time audio—prefer dropping frames over introducing latency. Control messages use the blocking send().await to ensure delivery, which is appropriate.


253-276: LGTM!

The stream merging correctly assigns channel indices (0 for mic, 1 for speaker) and uses futures_util::stream::select for fair interleaving of responses from both connections.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
owhisper/owhisper-client/Cargo.toml (1)

4-4: Fix invalid Rust edition.

Rust edition "2024" does not exist. Valid editions are 2015, 2018, and 2021.

Apply this diff to correct the edition:

-edition = "2024"
+edition = "2021"
owhisper/owhisper-client/src/live.rs (1)

93-114: Silent truncation if audio buffers have odd byte lengths.

chunks_exact(2) discards any trailing byte if the input length is odd. While 16-bit PCM audio should always have even-length buffers, consider adding a debug assertion or logging to catch malformed input during development.

 fn interleave_audio(mic: &[u8], speaker: &[u8]) -> Vec<u8> {
+    debug_assert!(mic.len() % 2 == 0, "mic buffer has odd length");
+    debug_assert!(speaker.len() % 2 == 0, "speaker buffer has odd length");
+
     let mic_samples: Vec<i16> = mic
🧹 Nitpick comments (10)
owhisper/owhisper-client/src/adapter/owhisper.rs (2)

9-34: Integration test lacks assertions and requires external service.

This test connects to a hardcoded local endpoint (ws://127.0.0.1:52693) and only prints results without any assertions. Consider:

  1. Adding #[ignore] attribute to prevent CI failures when the service isn't running
  2. Adding at least one assertion to validate the stream produces expected output
 #[tokio::test]
+#[ignore] // Requires local owhisper service
 async fn test_owhisper_with_owhisper() {

36-67: Test has placeholder API key and no assertions.

The Deepgram test at Line 46 uses "TODO" as the API key, which will cause authentication failures. Similar to the other test, it lacks assertions.

 #[tokio::test]
+#[ignore] // Requires local Deepgram-compatible service
 async fn test_owhisper_with_deepgram() {
owhisper/owhisper-client/src/adapter/argmax.rs (1)

62-92: Integration test requires external service.

Similar to other adapter tests, this test connects to a hardcoded local endpoint and lacks assertions. Consider adding #[ignore] to prevent CI failures.

 #[tokio::test]
+#[ignore] // Requires local Argmax service
 async fn test_client() {
owhisper/owhisper-client/src/adapter/deepgram.rs (1)

243-288: Test requires external Deepgram service and lacks #[ignore].

This integration test will fail in CI without network access to Deepgram's API. Consider marking it as ignored.

 #[tokio::test]
+#[ignore] // Requires Deepgram API access
 async fn test_client() {
plugins/listener2/src/ext.rs (2)

118-175: Consider extracting common batch transcription logic.

The Deepgram (Lines 118-145) and Soniox (Lines 147-175) branches are nearly identical, differing only in the adapter type and log messages. This duplication could be reduced with a helper function or macro.

Example approach using a generic helper:

async fn run_batch_transcription<A: owhisper_client::SttAdapter + Default>(
    app: &tauri::AppHandle<R>,
    session_id: &str,
    base_url: String,
    api_key: String,
    listen_params: owhisper_interface::ListenParams,
    file_path: &str,
) -> Result<(), crate::Error> {
    BatchEvent::BatchStarted { session_id: session_id.to_string() }
        .emit(app)
        .map_err(|_| crate::Error::BatchStartFailed("failed to emit event".to_string()))?;

    let client = owhisper_client::ListenClient::builder()
        .adapter::<A>()
        .api_base(base_url)
        .api_key(api_key)
        .params(listen_params)
        .build_batch();

    let response = client.transcribe_file(file_path).await?;
    
    BatchEvent::BatchResponse { session_id: session_id.to_string(), response }
        .emit(app)
        .map_err(|_| crate::Error::BatchStartFailed("failed to emit event".to_string()))?;

    Ok(())
}

127-131: Deepgram path doesn't explicitly specify adapter.

Unlike the Soniox path which explicitly calls .adapter::<owhisper_client::SonioxAdapter>(), the Deepgram path relies on the default adapter. For consistency and clarity, consider making the adapter explicit.

-                let client = owhisper_client::ListenClient::builder()
+                let client = owhisper_client::ListenClient::builder()
+                    .adapter::<owhisper_client::DeepgramAdapter>()
                     .api_base(params.base_url.clone())
owhisper/owhisper-client/src/lib.rs (1)

68-82: Consider handling malformed URLs gracefully.

Line 71 uses .parse().unwrap() which will panic if the adapter returns a malformed URL. While adapters are expected to return valid URLs, a defensive approach would improve robustness.

     fn build_request(&self, adapter: &A, channels: u8) -> hypr_ws::client::ClientRequestBuilder {
         let params = self.get_params();
         let url = adapter.build_ws_url(self.get_api_base(), &params, channels);
-        let uri = url.to_string().parse().unwrap();
+        let uri = url
+            .to_string()
+            .parse()
+            .expect("adapter returned invalid URL");
owhisper/owhisper-client/src/adapter/soniox.rs (2)

136-202: Consider adding retry logic for transient polling failures.

The polling loop will fail immediately on any HTTP error. For long-running batch jobs, transient network issues could cause unnecessary failures. Consider adding limited retries for recoverable errors (e.g., 5xx status codes).


326-348: Panic on invalid api_base URL.

Line 332 uses .expect("invalid api_base") which will panic if the user provides a malformed URL. Since this is user-provided input, consider returning a Result or using a fallback.

     fn build_ws_url(&self, api_base: &str, _params: &ListenParams, _channels: u8) -> url::Url {
-        let mut url: url::Url = api_base.parse().expect("invalid api_base");
+        let mut url: url::Url = match api_base.parse() {
+            Ok(u) => u,
+            Err(_) => {
+                tracing::error!(api_base = api_base, "invalid api_base, falling back to default");
+                DEFAULT_API_BASE.parse().unwrap()
+            }
+        };
owhisper/owhisper-client/src/live.rs (1)

325-342: Consider logging control message send failures.

While silently ignoring audio try_send errors is appropriate for backpressure handling, control message send failures (lines 337-338) could indicate a closed channel or other issue worth investigating during debugging.

         MixedMessage::Control(ctrl) => {
-            let _ = mic_tx.send(MixedMessage::Control(ctrl.clone())).await;
-            let _ = spk_tx.send(MixedMessage::Control(ctrl)).await;
+            if mic_tx.send(MixedMessage::Control(ctrl.clone())).await.is_err() {
+                tracing::debug!("mic channel closed, stopping forward task");
+                break;
+            }
+            if spk_tx.send(MixedMessage::Control(ctrl)).await.is_err() {
+                tracing::debug!("spk channel closed, stopping forward task");
+                break;
+            }
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17e893a and b980f80.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • apps/web/content/docs/developers/13.languages.mdx (1 hunks)
  • crates/ws/src/client.rs (4 hunks)
  • crates/ws/tests/client_tests.rs (4 hunks)
  • owhisper/owhisper-client/Cargo.toml (1 hunks)
  • owhisper/owhisper-client/src/adapter/argmax.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/deepgram.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/mod.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/owhisper.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/soniox.rs (1 hunks)
  • owhisper/owhisper-client/src/batch.rs (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (2 hunks)
  • owhisper/owhisper-client/src/live.rs (4 hunks)
  • plugins/listener/src/actors/listener.rs (9 hunks)
  • plugins/listener2/src/ext.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
owhisper/owhisper-client/src/batch.rs (4)
owhisper/owhisper-client/src/adapter/deepgram.rs (1)
  • transcribe_file (181-193)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • transcribe_file (45-52)
owhisper/owhisper-client/src/adapter/soniox.rs (1)
  • transcribe_file (540-552)
owhisper/owhisper-interface/src/lib.rs (1)
  • default (148-157)
plugins/listener2/src/ext.rs (1)
plugins/listener/src/actors/listener.rs (2)
  • owhisper_client (284-285)
  • owhisper_client (343-344)
owhisper/owhisper-client/src/adapter/owhisper.rs (3)
owhisper/owhisper-client/src/live.rs (3)
  • tokio (267-267)
  • tokio (268-268)
  • builder (179-181)
owhisper/owhisper-client/src/batch.rs (1)
  • new (21-29)
owhisper/owhisper-client/src/lib.rs (1)
  • default (25-32)
plugins/listener/src/actors/listener.rs (2)
owhisper/owhisper-client/src/live.rs (8)
  • tokio (267-267)
  • tokio (268-268)
  • finalize (49-49)
  • finalize (54-58)
  • finalize (66-83)
  • expected_finalize_count (50-50)
  • expected_finalize_count (60-62)
  • expected_finalize_count (85-90)
owhisper/owhisper-interface/src/stream.rs (2)
  • apply_offset (150-169)
  • set_extra (171-175)
owhisper/owhisper-client/src/lib.rs (4)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
  • ListenClientBuilder (75-83)
owhisper/owhisper-interface/src/lib.rs (1)
  • default (148-157)
owhisper/owhisper-client/src/batch.rs (1)
  • new (21-29)
owhisper/owhisper-client/src/adapter/mod.rs (1)
  • initial_message (34-41)
owhisper/owhisper-client/src/adapter/argmax.rs (3)
owhisper/owhisper-client/src/adapter/deepgram.rs (7)
  • supports_native_multichannel (109-111)
  • build_ws_url (113-155)
  • build_auth_header (157-159)
  • keep_alive_message (161-167)
  • finalize_message (169-175)
  • parse_response (177-179)
  • transcribe_file (181-193)
owhisper/owhisper-client/src/adapter/mod.rs (7)
  • supports_native_multichannel (24-24)
  • build_ws_url (26-26)
  • build_auth_header (28-28)
  • keep_alive_message (30-30)
  • finalize_message (32-32)
  • parse_response (43-43)
  • transcribe_file (45-52)
owhisper/owhisper-client/src/batch.rs (1)
  • transcribe_file (31-45)
crates/ws/src/client.rs (2)
crates/ws/tests/client_tests.rs (4)
  • client (101-101)
  • client (118-118)
  • client (172-172)
  • client (219-219)
owhisper/owhisper-client/src/adapter/soniox.rs (1)
  • initial_message (358-402)
owhisper/owhisper-client/src/live.rs (2)
crates/ws/src/client.rs (2)
  • tokio (96-96)
  • new (52-57)
owhisper/owhisper-interface/src/stream.rs (1)
  • text (141-148)
owhisper/owhisper-client/src/adapter/mod.rs (3)
owhisper/owhisper-client/src/adapter/argmax.rs (7)
  • supports_native_multichannel (15-17)
  • build_ws_url (19-21)
  • build_auth_header (23-25)
  • keep_alive_message (27-29)
  • finalize_message (31-33)
  • parse_response (35-37)
  • transcribe_file (39-49)
owhisper/owhisper-client/src/adapter/deepgram.rs (7)
  • supports_native_multichannel (109-111)
  • build_ws_url (113-155)
  • build_auth_header (157-159)
  • keep_alive_message (161-167)
  • finalize_message (169-175)
  • parse_response (177-179)
  • transcribe_file (181-193)
owhisper/owhisper-client/src/adapter/soniox.rs (8)
  • supports_native_multichannel (327-329)
  • build_ws_url (331-348)
  • build_auth_header (350-352)
  • keep_alive_message (354-356)
  • finalize_message (536-538)
  • initial_message (358-402)
  • parse_response (404-534)
  • transcribe_file (540-552)
owhisper/owhisper-client/src/adapter/soniox.rs (3)
owhisper/owhisper-client/src/batch.rs (1)
  • transcribe_file (31-45)
owhisper/owhisper-interface/src/batch.rs (2)
  • alternatives (69-73)
  • channel (85-89)
owhisper/owhisper-client/src/adapter/mod.rs (7)
  • supports_native_multichannel (24-24)
  • build_ws_url (26-26)
  • build_auth_header (28-28)
  • keep_alive_message (30-30)
  • initial_message (34-41)
  • parse_response (43-43)
  • transcribe_file (45-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: ci
  • GitHub Check: fmt
  • GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
  • GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
  • GitHub Check: desktop_ci (macos, depot-macos-14)
🔇 Additional comments (34)
owhisper/owhisper-client/Cargo.toml (2)

15-15: Dependency expansion for multipart support is appropriate.

The addition of the "multipart" feature to reqwest aligns with the adapter-based architecture for file uploads and streaming interactions across the multiple SttAdapter implementations.


17-17: Serde and tokio-stream additions support the adapter pattern.

The inclusion of serde and tokio-stream are necessary for:

  • serde: Serialization/deserialization of request/response payloads across adapter implementations
  • tokio-stream: Streaming operations in the adapter-based flow

These align with the refactored architecture.

Also applies to: 20-20

owhisper/owhisper-client/src/adapter/argmax.rs (1)

9-50: Clean delegation pattern for ArgmaxAdapter.

The adapter correctly wraps DeepgramAdapter and delegates all methods while overriding supports_native_multichannel to return false. This is a well-structured decorator pattern.

owhisper/owhisper-client/src/adapter/deepgram.rs (1)

196-233: Audio decoding logic is well-structured.

The decode_audio_to_linear16 function correctly:

  • Uses spawn_blocking for CPU-bound audio processing
  • Handles mono/stereo conversion properly
  • Validates non-empty samples
  • Returns appropriate errors
crates/ws/tests/client_tests.rs (4)

101-101: API signature update looks correct.

The from_audio call now includes None for the new initial_message parameter, aligning with the updated API that supports sending an optional initial WebSocket message before streaming.


118-118: Consistent API update in finalize test.


172-172: Consistent API update in keep-alive test.


219-219: Consistent API update in retry test.

plugins/listener2/src/ext.rs (1)

11-15: New Soniox provider variant added.

The enum now includes Soniox as a batch provider option alongside Deepgram and Am.

crates/ws/src/client.rs (3)

8-10: LGTM - Import consolidation and new re-export.

The import path is simplified and Utf8Bytes is now publicly re-exported, which is needed for callers using finalize_with_text.


68-106: LGTM - Initial message support with proper error handling.

The new initial_message parameter correctly enables adapters (e.g., Soniox) to send configuration before streaming audio. The early return on failure prevents streaming to a broken connection and properly propagates the error via error_tx.


140-148: LGTM - Simplified finalize handling.

The refactored pattern match is cleaner and correctly updates last_outbound_at when a finalize message is sent.

owhisper/owhisper-client/src/batch.rs (1)

11-45: LGTM - Clean generic adapter abstraction.

The BatchClient<A> properly delegates transcription to the adapter while retaining ownership of the shared reqwest::Client and configuration. Using PhantomData<A> is the correct pattern here since the adapter is stateless.

owhisper/owhisper-client/src/lib.rs (3)

17-33: LGTM - Generic builder with Default implementation.

The Default implementation for ListenClientBuilder (defaulting to DeepgramAdapter) enables ergonomic construction via ListenClient::builder().


51-58: LGTM - Adapter type switching.

The adapter::<B>() method cleanly transfers builder state while changing the adapter type, enabling fluent API usage like .adapter::<SonioxAdapter>().


97-101: Silent fallback to empty API key may cause confusing auth failures.

self.api_key.unwrap_or_default() silently uses an empty string if the key is missing. For Soniox (which requires an API key in the initial message), this will result in a runtime warning but no clear error at build time. Consider whether this should fail explicitly or at least warn.

plugins/listener/src/actors/listener.rs (5)

197-206: LGTM - Soniox URL detection helper.

The is_soniox_base_url function follows the same pattern as is_local_stt_base_url and correctly handles both parsed and unparsed URL formats.


219-238: URL-based adapter selection is pragmatic but may need extension.

The if-else chain for selecting adapters based on URL patterns works but will need updates as more providers are added. Consider whether a more extensible pattern (e.g., adapter registry or configuration-driven selection) would be beneficial in the future.


268-324: LGTM - Generic single-channel spawn task.

The function correctly uses the adapter type parameter to build the appropriate client and delegates to process_stream with the adapter's handle type.


327-384: LGTM - Generic dual-channel spawn task.

Mirrors the single-channel implementation correctly, using build_dual() to construct the appropriate client.


386-397: LGTM - Finalize-driven termination logic.

The process_stream function correctly:

  1. Calls handle.finalize().await on shutdown.
  2. Uses expected_finalize_count() to determine how many finalize responses to wait for.
  3. Counts from_finalize responses and breaks when the expected count is reached.

This properly handles both native multichannel (1 finalize) and split connections (2 finalizes).

Also applies to: 400-452

owhisper/owhisper-client/src/adapter/mod.rs (2)

1-8: LGTM - Module structure and re-exports.

The adapter modules are correctly organized with public re-exports for argmax, deepgram, and soniox. The owhisper module is kept private as an internal helper.


21-53: LGTM - Well-designed adapter trait.

The SttAdapter trait comprehensively captures the adapter contract:

  • Clone + Default + Send + Sync + 'static bounds ensure adapters work in async/concurrent contexts.
  • The default initial_message returning None is appropriate since most adapters don't need it.
  • The BatchFuture type alias keeps the trait method signature clean.
owhisper/owhisper-client/src/adapter/soniox.rs (4)

16-21: LGTM - Constants and struct definition.

The polling configuration (2-second interval, 300 attempts = 10 minutes max) is reasonable for batch transcription jobs.


40-84: LGTM - File upload implementation.

The upload flow correctly:

  1. Reads the file asynchronously.
  2. Constructs a multipart form with the file content.
  3. Handles non-success status codes with descriptive errors.

358-402: LGTM - Initial message with embedded API key.

Soniox uses message-based authentication rather than headers, so embedding the API key in the initial JSON config is correct. The warning on missing API key is appropriate.


404-534: LGTM - Response parsing with good edge-case handling.

The parser correctly:

  1. Handles <fin> and <end> special tokens.
  2. Constructs transcripts with proper word spacing (accounting for punctuation).
  3. Maps channel/speaker info from Soniox format to internal structures.

The TODO on line 531-532 is valid—channel_index is hardcoded to [channel_index, 1], where the second value (total channels) is always 1. Other adapters in the codebase (transcribe-whisper-local, transcribe-moonshine) dynamically determine total channels. This should be addressed if Soniox responses contain multichannel information.

owhisper/owhisper-client/src/live.rs (7)

1-16: LGTM!

Imports are well-organized and type aliases clearly express the input semantics for single and dual channel audio streams.


17-46: LGTM!

The generic adapter pattern with a default type parameter preserves backward compatibility. The handle types cleanly encapsulate the finalization requirements for single vs dual (native/split) streaming modes.


48-91: LGTM!

The FinalizeHandle trait provides a clean abstraction for finalization across different streaming modes. Using tokio::join! for concurrent finalization in the Split variant is efficient, and expected_finalize_count() enables callers to properly coordinate completion.


116-176: LGTM!

The WebSocketIO implementations correctly handle the transformation between internal data types and WebSocket messages. The dual IO appropriately uses interleave_audio for native multichannel support.


184-218: LGTM!

The from_realtime_audio implementation correctly wires the WebSocket client with keep-alive support and maps raw responses through the adapter. The filter_map pattern appropriately handles parse filtering while preserving transport errors.


262-322: LGTM!

The split streaming implementation correctly handles providers that don't support native multichannel. Using try_send for audio allows graceful backpressure handling (dropping frames under load), while send().await for control messages ensures reliable delivery of critical messages like finalize.


369-387: LGTM!

The helper functions provide clean abstractions. websocket_client_with_keep_alive correctly applies adapter-specific keep-alive settings, and extract_finalize_text safely handles non-text finalize messages with a sensible default fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant