Refactor provider specific split connection through adaptor#2052
Refactor provider specific split connection through adaptor#2052
Conversation
📝 WalkthroughWalkthroughIntroduces a trait-based STT adapter system (SttAdapter) with multiple adapter implementations (Argmax, Deepgram, Soniox), makes ListenClient/BatchClient generic over adapters, threads an optional initial WebSocket Message through client/from_audio, refactors dual-path streaming (native vs split) and finalization into a FinalizeHandle-based flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as ListenClientBuilder
participant Adapter as SttAdapter (A)
participant Client as ListenClientDual<A>
participant WSClient as WebSocketClient
participant Runtime as Runtime (tasks/streams)
participant Listener as Listener actor
Builder->>Adapter: determine adapter (adapter::<A>() or default)
Builder->>Client: build_dual() with A::default(), request, initial_message
Client->>Adapter: supports_native_multichannel()?
alt native supported
Client->>WSClient: websocket_client_with_keep_alive(request, adapter)
WSClient->>Runtime: spawn sender/receiver tasks (send initial_message if present)
Runtime->>Listener: stream responses (single WS)
else split required
Client->>WSClient: build two WS requests (mic, spk) via adapter.build_ws_url()
WSClient->>Runtime: spawn two sender/receiver tasks (each may send initial_message)
Runtime->>Client: merge_streams_with_channel_remap()
Runtime->>Listener: stream merged responses
end
Listener->>Client: finalize via FinalizeHandle::finalize()
Client->>WSClient: send finalize_message() and await expected_finalize_count()
WSClient->>Runtime: emit final responses, then close
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/mod.rs (1)
7-9: LGTM!The
SttAdaptertrait is well-designed with appropriate bounds for async/concurrent usage. The single method cleanly captures the capability difference between providers.Consider using an associated constant instead of a method if the value is always compile-time known per adapter type:
pub trait SttAdapter: Clone + Default + Send + Sync + 'static { - fn supports_native_multichannel(&self) -> bool; + const SUPPORTS_NATIVE_MULTICHANNEL: bool; }This would enable compile-time optimization and eliminate the method call overhead, though the current design offers more flexibility if runtime configuration becomes necessary.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
crates/ws/src/client.rs(1 hunks)owhisper/owhisper-client/Cargo.toml(1 hunks)owhisper/owhisper-client/src/adapter/argmax.rs(1 hunks)owhisper/owhisper-client/src/adapter/deepgram.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(1 hunks)owhisper/owhisper-client/src/lib.rs(3 hunks)owhisper/owhisper-client/src/live.rs(5 hunks)plugins/listener/src/actors/listener.rs(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
owhisper/owhisper-client/src/adapter/argmax.rs (2)
owhisper/owhisper-client/src/adapter/deepgram.rs (1)
supports_native_multichannel(7-9)owhisper/owhisper-client/src/adapter/mod.rs (1)
supports_native_multichannel(8-8)
owhisper/owhisper-client/src/adapter/deepgram.rs (2)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
supports_native_multichannel(7-9)owhisper/owhisper-client/src/adapter/mod.rs (1)
supports_native_multichannel(8-8)
plugins/listener/src/actors/listener.rs (2)
owhisper/owhisper-client/src/live.rs (3)
expected_finalize_count(38-38)expected_finalize_count(46-48)expected_finalize_count(65-70)owhisper/owhisper-interface/src/stream.rs (2)
apply_offset(150-169)set_extra(171-175)
owhisper/owhisper-client/src/live.rs (2)
crates/ws/src/client.rs (1)
finalize_with_text(29-33)owhisper/owhisper-interface/src/stream.rs (1)
text(141-148)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Redirect rules - hyprnote-storybook
- GitHub Check: Header rules - hyprnote
- GitHub Check: Header rules - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Pages changed - hyprnote-storybook
- GitHub Check: fmt
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
🔇 Additional comments (18)
owhisper/owhisper-client/Cargo.toml (1)
17-17: LGTM!The
tokio-streamdependency addition is appropriate for the new streaming interfaces and adapter architecture introduced in this PR.crates/ws/src/client.rs (1)
8-10: LGTM!The import consolidation and public re-export of
Utf8Bytesare appropriate. ExposingUtf8Bytesis necessary since it's used in the publicfinalize_with_textmethod signature.owhisper/owhisper-client/src/adapter/argmax.rs (1)
1-10: LGTM!Clean implementation of
ArgmaxAdapter. Thesupports_native_multichannel() -> falsecorrectly indicates that the Argmax provider requires split mic/speaker handling rather than native multichannel support.owhisper/owhisper-client/src/adapter/deepgram.rs (1)
1-10: LGTM!Clean implementation of
DeepgramAdapter. Thesupports_native_multichannel() -> truecorrectly indicates that Deepgram can handle dual-channel audio natively.owhisper/owhisper-client/src/lib.rs (3)
17-33: LGTM!The generic
ListenClientBuilder<A: SttAdapter>withPhantomData<A>marker is correctly designed. TheDefaultimplementation properly targets the non-parameterized type (defaulting toDeepgramAdapter), maintaining backward compatibility.
51-58: LGTM!The
adapter<B>()method provides a clean way to switch adapter types while preserving builder configuration. This is a well-implemented type-changing builder pattern.
198-207: LGTM!The
build_dual()method correctly leverages the adapter's capability to determine channel configuration. The logic properly routes multichannel-capable providers (Deepgram) to use 2 channels while falling back to 1 channel for providers like Argmax that require split handling.plugins/listener/src/actors/listener.rs (5)
7-7: LGTM!Clean import of the new adapter types and traits. The import aligns with the new generic adapter architecture.
189-195: LGTM!The URL parsing with fallback to string matching is pragmatic. The fallback could technically have false positives (e.g.,
localhostin a path), but this is a low-risk edge case for adapter selection.
212-219: LGTM!The adapter selection based on URL locality is a clean pattern that allows different STT providers to use their optimal connection strategy (native multichannel vs split).
307-364: LGTM!The generic adapter pattern is well implemented. The function structure correctly mirrors
spawn_rx_task_singlewhile enabling adapter-specific behavior through the type parameter.
366-461: LGTM!The generic
FinalizeHandlepattern elegantly handles both single and split connection finalization. The counting mechanism (finalize_count >= expected_count) correctly adapts to the handle type—waiting for 1 finalize response for native handles and 2 for split handles.owhisper/owhisper-client/src/live.rs (6)
23-34: LGTM!The
ListenClientDual<A: SttAdapter>design cleanly encapsulates adapter-specific behavior, andDualHandleproperly models the two connection strategies with appropriate data for each variant.
36-71: LGTM!The
FinalizeHandletrait withimpl Futurereturn type is a clean async trait pattern. TheDualHandleimplementation correctly usestokio::join!to finalize both handles concurrently for split connections.
96-156: LGTM!The IO types correctly implement
WebSocketIOwith proper message translation.ListenClientDualIOappropriately interleaves mic/speaker audio for native multichannel support.
180-232: LGTM!The branching based on
supports_native_multichannel()elegantly routes to the appropriate connection strategy. The split path correctly orchestrates dual websockets with forwarding and stream merging.One minor observation: the spawned forwarder task's
JoinHandleis discarded (line 220). This is acceptable since the task naturally terminates when input streams close, and errors in the forwarder would manifest as closed channels downstream.
234-251: Intentional design choice noted fortry_sendon audio.Using
try_sendfor audio (lines 242-243) means audio frames may be dropped if channels fill up. This is typically the right trade-off for real-time audio—prefer dropping frames over introducing latency. Control messages use the blockingsend().awaitto ensure delivery, which is appropriate.
253-276: LGTM!The stream merging correctly assigns channel indices (0 for mic, 1 for speaker) and uses
futures_util::stream::selectfor fair interleaving of responses from both connections.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
owhisper/owhisper-client/Cargo.toml (1)
4-4: Fix invalid Rust edition.Rust edition "2024" does not exist. Valid editions are 2015, 2018, and 2021.
Apply this diff to correct the edition:
-edition = "2024" +edition = "2021"owhisper/owhisper-client/src/live.rs (1)
93-114: Silent truncation if audio buffers have odd byte lengths.
chunks_exact(2)discards any trailing byte if the input length is odd. While 16-bit PCM audio should always have even-length buffers, consider adding a debug assertion or logging to catch malformed input during development.fn interleave_audio(mic: &[u8], speaker: &[u8]) -> Vec<u8> { + debug_assert!(mic.len() % 2 == 0, "mic buffer has odd length"); + debug_assert!(speaker.len() % 2 == 0, "speaker buffer has odd length"); + let mic_samples: Vec<i16> = mic
🧹 Nitpick comments (10)
owhisper/owhisper-client/src/adapter/owhisper.rs (2)
9-34: Integration test lacks assertions and requires external service.This test connects to a hardcoded local endpoint (
ws://127.0.0.1:52693) and only prints results without any assertions. Consider:
- Adding
#[ignore]attribute to prevent CI failures when the service isn't running- Adding at least one assertion to validate the stream produces expected output
#[tokio::test] +#[ignore] // Requires local owhisper service async fn test_owhisper_with_owhisper() {
36-67: Test has placeholder API key and no assertions.The Deepgram test at Line 46 uses
"TODO"as the API key, which will cause authentication failures. Similar to the other test, it lacks assertions.#[tokio::test] +#[ignore] // Requires local Deepgram-compatible service async fn test_owhisper_with_deepgram() {owhisper/owhisper-client/src/adapter/argmax.rs (1)
62-92: Integration test requires external service.Similar to other adapter tests, this test connects to a hardcoded local endpoint and lacks assertions. Consider adding
#[ignore]to prevent CI failures.#[tokio::test] +#[ignore] // Requires local Argmax service async fn test_client() {owhisper/owhisper-client/src/adapter/deepgram.rs (1)
243-288: Test requires external Deepgram service and lacks#[ignore].This integration test will fail in CI without network access to Deepgram's API. Consider marking it as ignored.
#[tokio::test] +#[ignore] // Requires Deepgram API access async fn test_client() {plugins/listener2/src/ext.rs (2)
118-175: Consider extracting common batch transcription logic.The Deepgram (Lines 118-145) and Soniox (Lines 147-175) branches are nearly identical, differing only in the adapter type and log messages. This duplication could be reduced with a helper function or macro.
Example approach using a generic helper:
async fn run_batch_transcription<A: owhisper_client::SttAdapter + Default>( app: &tauri::AppHandle<R>, session_id: &str, base_url: String, api_key: String, listen_params: owhisper_interface::ListenParams, file_path: &str, ) -> Result<(), crate::Error> { BatchEvent::BatchStarted { session_id: session_id.to_string() } .emit(app) .map_err(|_| crate::Error::BatchStartFailed("failed to emit event".to_string()))?; let client = owhisper_client::ListenClient::builder() .adapter::<A>() .api_base(base_url) .api_key(api_key) .params(listen_params) .build_batch(); let response = client.transcribe_file(file_path).await?; BatchEvent::BatchResponse { session_id: session_id.to_string(), response } .emit(app) .map_err(|_| crate::Error::BatchStartFailed("failed to emit event".to_string()))?; Ok(()) }
127-131: Deepgram path doesn't explicitly specify adapter.Unlike the Soniox path which explicitly calls
.adapter::<owhisper_client::SonioxAdapter>(), the Deepgram path relies on the default adapter. For consistency and clarity, consider making the adapter explicit.- let client = owhisper_client::ListenClient::builder() + let client = owhisper_client::ListenClient::builder() + .adapter::<owhisper_client::DeepgramAdapter>() .api_base(params.base_url.clone())owhisper/owhisper-client/src/lib.rs (1)
68-82: Consider handling malformed URLs gracefully.Line 71 uses
.parse().unwrap()which will panic if the adapter returns a malformed URL. While adapters are expected to return valid URLs, a defensive approach would improve robustness.fn build_request(&self, adapter: &A, channels: u8) -> hypr_ws::client::ClientRequestBuilder { let params = self.get_params(); let url = adapter.build_ws_url(self.get_api_base(), ¶ms, channels); - let uri = url.to_string().parse().unwrap(); + let uri = url + .to_string() + .parse() + .expect("adapter returned invalid URL");owhisper/owhisper-client/src/adapter/soniox.rs (2)
136-202: Consider adding retry logic for transient polling failures.The polling loop will fail immediately on any HTTP error. For long-running batch jobs, transient network issues could cause unnecessary failures. Consider adding limited retries for recoverable errors (e.g., 5xx status codes).
326-348: Panic on invalid api_base URL.Line 332 uses
.expect("invalid api_base")which will panic if the user provides a malformed URL. Since this is user-provided input, consider returning aResultor using a fallback.fn build_ws_url(&self, api_base: &str, _params: &ListenParams, _channels: u8) -> url::Url { - let mut url: url::Url = api_base.parse().expect("invalid api_base"); + let mut url: url::Url = match api_base.parse() { + Ok(u) => u, + Err(_) => { + tracing::error!(api_base = api_base, "invalid api_base, falling back to default"); + DEFAULT_API_BASE.parse().unwrap() + } + };owhisper/owhisper-client/src/live.rs (1)
325-342: Consider logging control message send failures.While silently ignoring audio
try_senderrors is appropriate for backpressure handling, control message send failures (lines 337-338) could indicate a closed channel or other issue worth investigating during debugging.MixedMessage::Control(ctrl) => { - let _ = mic_tx.send(MixedMessage::Control(ctrl.clone())).await; - let _ = spk_tx.send(MixedMessage::Control(ctrl)).await; + if mic_tx.send(MixedMessage::Control(ctrl.clone())).await.is_err() { + tracing::debug!("mic channel closed, stopping forward task"); + break; + } + if spk_tx.send(MixedMessage::Control(ctrl)).await.is_err() { + tracing::debug!("spk channel closed, stopping forward task"); + break; + } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
apps/web/content/docs/developers/13.languages.mdx(1 hunks)crates/ws/src/client.rs(4 hunks)crates/ws/tests/client_tests.rs(4 hunks)owhisper/owhisper-client/Cargo.toml(1 hunks)owhisper/owhisper-client/src/adapter/argmax.rs(1 hunks)owhisper/owhisper-client/src/adapter/deepgram.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(1 hunks)owhisper/owhisper-client/src/adapter/owhisper.rs(1 hunks)owhisper/owhisper-client/src/adapter/soniox.rs(1 hunks)owhisper/owhisper-client/src/batch.rs(1 hunks)owhisper/owhisper-client/src/lib.rs(2 hunks)owhisper/owhisper-client/src/live.rs(4 hunks)plugins/listener/src/actors/listener.rs(9 hunks)plugins/listener2/src/ext.rs(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
owhisper/owhisper-client/src/batch.rs (4)
owhisper/owhisper-client/src/adapter/deepgram.rs (1)
transcribe_file(181-193)owhisper/owhisper-client/src/adapter/mod.rs (1)
transcribe_file(45-52)owhisper/owhisper-client/src/adapter/soniox.rs (1)
transcribe_file(540-552)owhisper/owhisper-interface/src/lib.rs (1)
default(148-157)
plugins/listener2/src/ext.rs (1)
plugins/listener/src/actors/listener.rs (2)
owhisper_client(284-285)owhisper_client(343-344)
owhisper/owhisper-client/src/adapter/owhisper.rs (3)
owhisper/owhisper-client/src/live.rs (3)
tokio(267-267)tokio(268-268)builder(179-181)owhisper/owhisper-client/src/batch.rs (1)
new(21-29)owhisper/owhisper-client/src/lib.rs (1)
default(25-32)
plugins/listener/src/actors/listener.rs (2)
owhisper/owhisper-client/src/live.rs (8)
tokio(267-267)tokio(268-268)finalize(49-49)finalize(54-58)finalize(66-83)expected_finalize_count(50-50)expected_finalize_count(60-62)expected_finalize_count(85-90)owhisper/owhisper-interface/src/stream.rs (2)
apply_offset(150-169)set_extra(171-175)
owhisper/owhisper-client/src/lib.rs (4)
owhisper/owhisper-client/src/adapter/argmax.rs (1)
ListenClientBuilder(75-83)owhisper/owhisper-interface/src/lib.rs (1)
default(148-157)owhisper/owhisper-client/src/batch.rs (1)
new(21-29)owhisper/owhisper-client/src/adapter/mod.rs (1)
initial_message(34-41)
owhisper/owhisper-client/src/adapter/argmax.rs (3)
owhisper/owhisper-client/src/adapter/deepgram.rs (7)
supports_native_multichannel(109-111)build_ws_url(113-155)build_auth_header(157-159)keep_alive_message(161-167)finalize_message(169-175)parse_response(177-179)transcribe_file(181-193)owhisper/owhisper-client/src/adapter/mod.rs (7)
supports_native_multichannel(24-24)build_ws_url(26-26)build_auth_header(28-28)keep_alive_message(30-30)finalize_message(32-32)parse_response(43-43)transcribe_file(45-52)owhisper/owhisper-client/src/batch.rs (1)
transcribe_file(31-45)
crates/ws/src/client.rs (2)
crates/ws/tests/client_tests.rs (4)
client(101-101)client(118-118)client(172-172)client(219-219)owhisper/owhisper-client/src/adapter/soniox.rs (1)
initial_message(358-402)
owhisper/owhisper-client/src/live.rs (2)
crates/ws/src/client.rs (2)
tokio(96-96)new(52-57)owhisper/owhisper-interface/src/stream.rs (1)
text(141-148)
owhisper/owhisper-client/src/adapter/mod.rs (3)
owhisper/owhisper-client/src/adapter/argmax.rs (7)
supports_native_multichannel(15-17)build_ws_url(19-21)build_auth_header(23-25)keep_alive_message(27-29)finalize_message(31-33)parse_response(35-37)transcribe_file(39-49)owhisper/owhisper-client/src/adapter/deepgram.rs (7)
supports_native_multichannel(109-111)build_ws_url(113-155)build_auth_header(157-159)keep_alive_message(161-167)finalize_message(169-175)parse_response(177-179)transcribe_file(181-193)owhisper/owhisper-client/src/adapter/soniox.rs (8)
supports_native_multichannel(327-329)build_ws_url(331-348)build_auth_header(350-352)keep_alive_message(354-356)finalize_message(536-538)initial_message(358-402)parse_response(404-534)transcribe_file(540-552)
owhisper/owhisper-client/src/adapter/soniox.rs (3)
owhisper/owhisper-client/src/batch.rs (1)
transcribe_file(31-45)owhisper/owhisper-interface/src/batch.rs (2)
alternatives(69-73)channel(85-89)owhisper/owhisper-client/src/adapter/mod.rs (7)
supports_native_multichannel(24-24)build_ws_url(26-26)build_auth_header(28-28)keep_alive_message(30-30)initial_message(34-41)parse_response(43-43)transcribe_file(45-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: ci
- GitHub Check: fmt
- GitHub Check: desktop_ci (linux, depot-ubuntu-24.04-8)
- GitHub Check: desktop_ci (linux, depot-ubuntu-22.04-8)
- GitHub Check: desktop_ci (macos, depot-macos-14)
🔇 Additional comments (34)
owhisper/owhisper-client/Cargo.toml (2)
15-15: Dependency expansion for multipart support is appropriate.The addition of the "multipart" feature to reqwest aligns with the adapter-based architecture for file uploads and streaming interactions across the multiple SttAdapter implementations.
17-17: Serde and tokio-stream additions support the adapter pattern.The inclusion of serde and tokio-stream are necessary for:
- serde: Serialization/deserialization of request/response payloads across adapter implementations
- tokio-stream: Streaming operations in the adapter-based flow
These align with the refactored architecture.
Also applies to: 20-20
owhisper/owhisper-client/src/adapter/argmax.rs (1)
9-50: Clean delegation pattern for ArgmaxAdapter.The adapter correctly wraps
DeepgramAdapterand delegates all methods while overridingsupports_native_multichannelto returnfalse. This is a well-structured decorator pattern.owhisper/owhisper-client/src/adapter/deepgram.rs (1)
196-233: Audio decoding logic is well-structured.The
decode_audio_to_linear16function correctly:
- Uses
spawn_blockingfor CPU-bound audio processing- Handles mono/stereo conversion properly
- Validates non-empty samples
- Returns appropriate errors
crates/ws/tests/client_tests.rs (4)
101-101: API signature update looks correct.The
from_audiocall now includesNonefor the newinitial_messageparameter, aligning with the updated API that supports sending an optional initial WebSocket message before streaming.
118-118: Consistent API update in finalize test.
172-172: Consistent API update in keep-alive test.
219-219: Consistent API update in retry test.plugins/listener2/src/ext.rs (1)
11-15: New Soniox provider variant added.The enum now includes
Sonioxas a batch provider option alongsideDeepgramandAm.crates/ws/src/client.rs (3)
8-10: LGTM - Import consolidation and new re-export.The import path is simplified and
Utf8Bytesis now publicly re-exported, which is needed for callers usingfinalize_with_text.
68-106: LGTM - Initial message support with proper error handling.The new
initial_messageparameter correctly enables adapters (e.g., Soniox) to send configuration before streaming audio. The early return on failure prevents streaming to a broken connection and properly propagates the error viaerror_tx.
140-148: LGTM - Simplified finalize handling.The refactored pattern match is cleaner and correctly updates
last_outbound_atwhen a finalize message is sent.owhisper/owhisper-client/src/batch.rs (1)
11-45: LGTM - Clean generic adapter abstraction.The
BatchClient<A>properly delegates transcription to the adapter while retaining ownership of the sharedreqwest::Clientand configuration. UsingPhantomData<A>is the correct pattern here since the adapter is stateless.owhisper/owhisper-client/src/lib.rs (3)
17-33: LGTM - Generic builder with Default implementation.The
Defaultimplementation forListenClientBuilder(defaulting toDeepgramAdapter) enables ergonomic construction viaListenClient::builder().
51-58: LGTM - Adapter type switching.The
adapter::<B>()method cleanly transfers builder state while changing the adapter type, enabling fluent API usage like.adapter::<SonioxAdapter>().
97-101: Silent fallback to empty API key may cause confusing auth failures.
self.api_key.unwrap_or_default()silently uses an empty string if the key is missing. For Soniox (which requires an API key in the initial message), this will result in a runtime warning but no clear error at build time. Consider whether this should fail explicitly or at least warn.plugins/listener/src/actors/listener.rs (5)
197-206: LGTM - Soniox URL detection helper.The
is_soniox_base_urlfunction follows the same pattern asis_local_stt_base_urland correctly handles both parsed and unparsed URL formats.
219-238: URL-based adapter selection is pragmatic but may need extension.The if-else chain for selecting adapters based on URL patterns works but will need updates as more providers are added. Consider whether a more extensible pattern (e.g., adapter registry or configuration-driven selection) would be beneficial in the future.
268-324: LGTM - Generic single-channel spawn task.The function correctly uses the adapter type parameter to build the appropriate client and delegates to
process_streamwith the adapter's handle type.
327-384: LGTM - Generic dual-channel spawn task.Mirrors the single-channel implementation correctly, using
build_dual()to construct the appropriate client.
386-397: LGTM - Finalize-driven termination logic.The
process_streamfunction correctly:
- Calls
handle.finalize().awaiton shutdown.- Uses
expected_finalize_count()to determine how many finalize responses to wait for.- Counts
from_finalizeresponses and breaks when the expected count is reached.This properly handles both native multichannel (1 finalize) and split connections (2 finalizes).
Also applies to: 400-452
owhisper/owhisper-client/src/adapter/mod.rs (2)
1-8: LGTM - Module structure and re-exports.The adapter modules are correctly organized with public re-exports for
argmax,deepgram, andsoniox. Theowhispermodule is kept private as an internal helper.
21-53: LGTM - Well-designed adapter trait.The
SttAdaptertrait comprehensively captures the adapter contract:
Clone + Default + Send + Sync + 'staticbounds ensure adapters work in async/concurrent contexts.- The default
initial_messagereturningNoneis appropriate since most adapters don't need it.- The
BatchFuturetype alias keeps the trait method signature clean.owhisper/owhisper-client/src/adapter/soniox.rs (4)
16-21: LGTM - Constants and struct definition.The polling configuration (2-second interval, 300 attempts = 10 minutes max) is reasonable for batch transcription jobs.
40-84: LGTM - File upload implementation.The upload flow correctly:
- Reads the file asynchronously.
- Constructs a multipart form with the file content.
- Handles non-success status codes with descriptive errors.
358-402: LGTM - Initial message with embedded API key.Soniox uses message-based authentication rather than headers, so embedding the API key in the initial JSON config is correct. The warning on missing API key is appropriate.
404-534: LGTM - Response parsing with good edge-case handling.The parser correctly:
- Handles
<fin>and<end>special tokens.- Constructs transcripts with proper word spacing (accounting for punctuation).
- Maps channel/speaker info from Soniox format to internal structures.
The TODO on line 531-532 is valid—
channel_indexis hardcoded to[channel_index, 1], where the second value (total channels) is always 1. Other adapters in the codebase (transcribe-whisper-local, transcribe-moonshine) dynamically determine total channels. This should be addressed if Soniox responses contain multichannel information.owhisper/owhisper-client/src/live.rs (7)
1-16: LGTM!Imports are well-organized and type aliases clearly express the input semantics for single and dual channel audio streams.
17-46: LGTM!The generic adapter pattern with a default type parameter preserves backward compatibility. The handle types cleanly encapsulate the finalization requirements for single vs dual (native/split) streaming modes.
48-91: LGTM!The
FinalizeHandletrait provides a clean abstraction for finalization across different streaming modes. Usingtokio::join!for concurrent finalization in theSplitvariant is efficient, andexpected_finalize_count()enables callers to properly coordinate completion.
116-176: LGTM!The
WebSocketIOimplementations correctly handle the transformation between internal data types and WebSocket messages. The dual IO appropriately usesinterleave_audiofor native multichannel support.
184-218: LGTM!The
from_realtime_audioimplementation correctly wires the WebSocket client with keep-alive support and maps raw responses through the adapter. Thefilter_mappattern appropriately handles parse filtering while preserving transport errors.
262-322: LGTM!The split streaming implementation correctly handles providers that don't support native multichannel. Using
try_sendfor audio allows graceful backpressure handling (dropping frames under load), whilesend().awaitfor control messages ensures reliable delivery of critical messages like finalize.
369-387: LGTM!The helper functions provide clean abstractions.
websocket_client_with_keep_alivecorrectly applies adapter-specific keep-alive settings, andextract_finalize_textsafely handles non-text finalize messages with a sensible default fallback.
No description provided.