Conversation
📝 WalkthroughWalkthroughThis change introduces support for dual audio (microphone and speaker) input streams throughout the listening and transcription pipeline. New enum variants, struct fields, and methods are added to represent and handle dual audio. The WebSocket and client infrastructure is refactored to support both single and dual audio modes, including new builder methods and client types. Metadata handling is also updated for consistency. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ListenerInterface
participant Listener
participant WSUtils
participant LocalSTT
Client->>ListenerInterface: Send ListenParams (audio_mode: Single/Dual)
Client->>Listener: Open WebSocket (audio_mode param)
Client->>Listener: Stream ListenInputChunk::SingleAudio or DualAudio
Listener->>WSUtils: Convert input chunk(s) to f32 samples
alt SingleAudio
WSUtils->>LocalSTT: Forward single audio samples
else DualAudio
WSUtils->>LocalSTT: Mix mic/speaker samples, forward mixed samples
end
LocalSTT->>Listener: Stream ListenOutputChunk (with meta)
Listener->>Client: Send ListenOutputChunk (with meta)
Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.vscode/settings.json (1)
16-16: Consider the impact of disabling macro-error diagnostics.While this change may suppress noise from the structural changes, disabling diagnostics can hide real macro-related issues. Ensure this is truly necessary and consider re-enabling once the structural changes stabilize.
crates/whisper-local/src/stream.rs (1)
191-193: Consider the performance impact of cloning metadata for each segment.The current implementation clones the metadata for every segment in the batch. If metadata objects are large or segments are numerous, this could impact performance.
Consider using
Rc<Option<serde_json::Value>>or similar reference-counted approach if performance becomes an issue:- for segment in &mut segments { - segment.meta = meta.clone(); - } + let shared_meta = meta.map(|m| std::rc::Rc::new(m)); + for segment in &mut segments { + segment.meta = shared_meta.as_ref().map(|rc| (**rc).clone()); + }However, the current approach is simpler and likely acceptable for most use cases.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
.vscode/settings.json(1 hunks)Cargo.toml(1 hunks)crates/stt/src/realtime/clova.rs(1 hunks)crates/stt/src/realtime/deepgram.rs(1 hunks)crates/stt/src/realtime/whisper.rs(1 hunks)crates/whisper-local/src/model.rs(2 hunks)crates/whisper-local/src/stream.rs(4 hunks)plugins/listener-interface/Cargo.toml(1 hunks)plugins/listener-interface/src/lib.rs(2 hunks)plugins/listener/Cargo.toml(1 hunks)plugins/listener/src/fsm.rs(1 hunks)plugins/local-stt/src/server.rs(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{js,ts,tsx,rs}`: 1. No error handling. 2. No unused imports, variables, or functions. 3. For comments, keep it minimal. It should be about "Why", not "What".
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
crates/stt/src/realtime/clova.rsplugins/listener/src/fsm.rscrates/stt/src/realtime/whisper.rscrates/stt/src/realtime/deepgram.rsplugins/local-stt/src/server.rscrates/whisper-local/src/model.rsplugins/listener-interface/src/lib.rscrates/whisper-local/src/stream.rs
🧬 Code Graph Analysis (3)
crates/whisper-local/src/model.rs (1)
crates/whisper-local/src/stream.rs (2)
meta(22-22)meta(36-38)
plugins/listener-interface/src/lib.rs (3)
crates/whisper-local/src/model.rs (1)
meta(266-268)crates/whisper-local/src/stream.rs (2)
meta(22-22)meta(36-38)plugins/listener/src/client.rs (1)
serde_json(87-87)
crates/whisper-local/src/stream.rs (1)
crates/whisper-local/src/model.rs (1)
meta(266-268)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: ci (macos, macos-latest)
- GitHub Check: ci (windows, windows-latest)
- GitHub Check: ci
🔇 Additional comments (18)
Cargo.toml (1)
191-191: LGTM - New voice activity detection dependency added.The addition of
voice_activity_detectoraligns with the PR's binarydiarization implementation. The dependency follows the workspace pattern correctly.Please verify that version 0.2.0 is the latest stable version:
What is the latest version of the voice_activity_detector Rust crate?crates/whisper-local/src/model.rs (2)
242-242: LGTM - Field renamed for consistency.The rename from
metadatatometaaligns with the broader refactoring across the codebase.
266-268: LGTM - Method updated to return owned value.The method rename and change to return
Option<serde_json::Value>(owned) instead ofOption<&serde_json::Value>(reference) is consistent with the trait definition incrates/whisper-local/src/stream.rs. Theclone()correctly creates an owned copy.Note: This is a breaking change for any external consumers of this method.
plugins/listener/Cargo.toml (1)
57-59: LGTM - Voice activity detector dependency added.The workspace dependency addition is correctly implemented and aligns with the new functionality being introduced.
plugins/listener-interface/Cargo.toml (1)
16-16: LGTM - Added serde_json feature for metadata support.Adding the
"serde_json"feature tospectais necessary to support serialization of the newmetafield of typeOption<serde_json::Value>.crates/stt/src/realtime/clova.rs (1)
39-39: LGTM: Proper default initialization pattern.The addition of
..Default::default()ensures all fields ofListenOutputChunkare properly initialized, which is especially important with the newmetafield. This aligns with similar changes across other STT implementations.crates/stt/src/realtime/whisper.rs (1)
33-33: LGTM: Consistent default initialization.The addition of
..Default::default()maintains consistency with other STT implementations and ensures proper field initialization for theListenOutputChunkstruct.crates/stt/src/realtime/deepgram.rs (1)
85-88: LGTM: Consistent default initialization pattern.The restructured
ListenOutputChunkconstruction using..Default::default()maintains consistency with other STT implementations and ensures all fields are properly initialized.plugins/local-stt/src/server.rs (2)
155-155: LGTM: Consistent default initialization.Using
..Default::default()forSimpleAudioChunkconstruction follows the same good pattern applied throughout the codebase.
170-182: LGTM: Proper metadata extraction and propagation.The metadata is correctly extracted from the chunk and properly included in the
ListenOutputChunk, completing the metadata handling pipeline from transcription to output.plugins/listener-interface/src/lib.rs (3)
19-19: LGTM: Default derivation enables easier struct initialization.The addition of
Defaulttrait derivation allows for convenient initialization using..Default::default()pattern, which aligns with the metadata handling improvements across the codebase.
40-40: LGTM: Consistent Default derivation for output structure.Adding
DefaulttoListenOutputChunkenables the same initialization pattern and supports the new optionalmetafield.
42-42: LGTM: Optional metadata field maintains backward compatibility.The new
metafield is appropriately optional, ensuring existing code continues to work while enabling metadata propagation through the transcription pipeline.crates/whisper-local/src/stream.rs (5)
22-22: LGTM: Consistent naming and ownership model for metadata.The method rename from
metadata()tometa()and the change to return ownedOption<serde_json::Value>instead of borrowed values creates a cleaner, more consistent API that aligns with the pattern used incrates/whisper-local/src/model.rs.
28-28: LGTM: Field rename maintains consistency.The field rename from
metadatatometaaligns with the trait method rename and creates consistent naming throughout the codebase.
36-38: LGTM: Proper implementation of owned metadata return.The implementation correctly returns a cloned value to match the new trait signature. The cloning is necessary to convert from the stored
Option<serde_json::Value>to the owned return type.
156-156: LGTM: Updated method call reflects API changes.The change from
chunk.metadata()tochunk.meta()correctly uses the renamed method from theAudioChunktrait.
180-180: LGTM: Parameter rename maintains consistency.The parameter rename from
metadatatometaaligns with the broader naming convention changes throughout the codebase.
There was a problem hiding this comment.
Actionable comments posted: 5
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
apps/app/server/src/native/listen/realtime.rs(1 hunks)crates/audio-utils/src/lib.rs(1 hunks)crates/whisper-cloud/src/client.rs(1 hunks)crates/ws-utils/Cargo.toml(1 hunks)crates/ws-utils/src/lib.rs(2 hunks)crates/ws/src/client.rs(2 hunks)plugins/listener-interface/Cargo.toml(1 hunks)plugins/listener-interface/src/lib.rs(4 hunks)plugins/listener/src/client.rs(7 hunks)plugins/listener/src/fsm.rs(2 hunks)plugins/local-stt/src/server.rs(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- crates/ws-utils/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/listener-interface/Cargo.toml
- plugins/listener/src/fsm.rs
- plugins/local-stt/src/server.rs
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{js,ts,tsx,rs}`: 1. No error handling. 2. No unused imports, variables, or functions. 3. For comments, keep it minimal. It should be about "Why", not "What".
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
apps/app/server/src/native/listen/realtime.rscrates/ws-utils/src/lib.rscrates/audio-utils/src/lib.rscrates/whisper-cloud/src/client.rscrates/ws/src/client.rsplugins/listener/src/client.rsplugins/listener-interface/src/lib.rs
🧬 Code Graph Analysis (2)
crates/ws-utils/src/lib.rs (1)
crates/audio-utils/src/lib.rs (1)
bytes_to_f32_samples(46-53)
crates/ws/src/client.rs (2)
crates/whisper-cloud/src/client.rs (1)
to_input(94-96)plugins/listener/src/client.rs (2)
to_input(119-123)to_input(147-152)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: ci (macos, macos-latest)
- GitHub Check: ci (windows, windows-latest)
- GitHub Check: ci
🔇 Additional comments (11)
crates/whisper-cloud/src/client.rs (1)
90-96: LGTM! Clean trait implementation update.The addition of the associated type
Dataand corresponding method signature update properly implements the generalizedWebSocketIOtrait while maintaining the same functionality.crates/ws-utils/src/lib.rs (1)
37-37: LGTM! Good use of the utility function.Replacing the manual conversion with
bytes_to_f32_samplesimproves code reuse and maintainability.crates/ws/src/client.rs (2)
9-17: LGTM! Excellent trait generalization.The addition of the associated type
Datacreates a flexible, type-safe interface that can handle different input data types while maintaining the same WebSocket communication pattern.
28-31: LGTM! Consistent method signature update.The
from_audiomethod correctly uses the new genericT::Datatype, enabling support for both single audio (bytes::Bytes) and dual audio (tuple of byte buffers) inputs.plugins/listener/src/client.rs (3)
32-48: LGTM: Clear method naming and correct audio mode handling.The renaming to
build_singleand explicit setting ofAudioMode::Singleprovides good clarity for the dual audio feature.
137-164: LGTM: Correct dual audio WebSocketIO implementation.The tuple data type and conversion to
DualAudiovariant properly handles dual audio streams.
181-191: LGTM: Correct dual stream handling.The zip operation properly synchronizes mic and speaker streams for dual audio processing.
plugins/listener-interface/src/lib.rs (4)
19-19: LGTM: Appropriate Default derive for data structure.Adding
Defaultto theWordstruct enables easier instantiation and testing.
40-42: LGTM: Good extensibility with optional metadata field.The optional
metafield provides flexibility for additional metadata while maintaining backward compatibility.
55-61: LGTM: Correct dual audio variant definition.The
DualAudiovariant properly separates mic and speaker channels with appropriate binary serialization.
73-94: LGTM: Well-designed AudioMode enum with sensible defaults.The
AudioModeenum properly supports both single and dual audio modes withSingleas a backward-compatible default.
Resolves #1013