deepgram compatibility for whisper-client#1281
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThis update introduces a major architectural refactor and interface evolution for the transcription and streaming components. The codebase now uses a generic Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server
participant BaseTranscribeService
participant Handler
participant TranscriptionAPI
Client->>Server: HTTP/WebSocket upgrade request
Server->>BaseTranscribeService: call(request)
BaseTranscribeService->>Handler: handle_socket(socket, params)
Handler->>TranscriptionAPI: Stream audio data
TranscriptionAPI-->>Handler: Streaming transcription results
Handler-->>Client: Send StreamResponse messages over WebSocket
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
crates/transcribe-deepgram/src/lib.rs (1)
59-61: Incomplete implementation:handle_socketmethod is empty.The WebSocket handler doesn't process any messages. Is this intentional or a work in progress?
Would you like me to help implement the WebSocket message handling logic?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
crates/transcribe-deepgram/Cargo.toml(1 hunks)crates/transcribe-deepgram/src/lib.rs(2 hunks)crates/ws-utils/src/lib.rs(5 hunks)owhisper/owhisper-client/Cargo.toml(1 hunks)owhisper/owhisper-client/src/lib.rs(5 hunks)owhisper/owhisper-config/src/lib.rs(1 hunks)owhisper/owhisper-interface/src/lib.rs(1 hunks)owhisper/owhisper-server/src/server.rs(1 hunks)owhisper/schema.json(2 hunks)plugins/local-stt/src/server.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
owhisper/owhisper-server/src/server.rsplugins/local-stt/src/server.rsowhisper/owhisper-interface/src/lib.rsowhisper/owhisper-config/src/lib.rscrates/ws-utils/src/lib.rscrates/transcribe-deepgram/src/lib.rsowhisper/owhisper-client/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: ci
- GitHub Check: ci (macos, macos-latest)
- GitHub Check: ci (windows, windows-latest)
🔇 Additional comments (13)
owhisper/owhisper-client/Cargo.toml (1)
21-22: LGTM! Rodio dependency addition looks appropriate.The addition of
rodioas a dev-dependency aligns with its usage for audio decoding in asynchronous test cases as mentioned in the AI summary.owhisper/owhisper-config/src/lib.rs (1)
54-54: LGTM! Making base_url optional adds configuration flexibility.This change allows users to omit the
base_urlfield while maintaining functionality through a default value, improving the user experience.owhisper/owhisper-interface/src/lib.rs (1)
58-58: LGTM! Data structure consolidation improves consistency.Consolidating separate mic and speaker fields into a single interleaved
datafield standardizes dual audio representation across the codebase, as mentioned in the comprehensive changes described in the AI summary.owhisper/owhisper-server/src/server.rs (1)
129-129: LGTM! Route path standardization improves consistency.The route change from "/v1/stt/realtime" to "/v1/listen" aligns with the coordinated endpoint naming updates across the client and server components.
plugins/local-stt/src/server.rs (1)
91-91: LGTM! Route simplification aligns with endpoint standardization.The route change from "/api/desktop/listen/realtime" to "/listen" is consistent with the coordinated endpoint naming updates across the project, simplifying the API surface.
crates/transcribe-deepgram/Cargo.toml (1)
29-31: LGTM!The addition of workspace dev dependencies for testing is appropriate.
owhisper/schema.json (2)
50-55: Good schema organization improvement.Moving the
typeproperty to the first position across all model configurations improves readability and consistency.Also applies to: 77-82, 108-113
92-96: LGTM! Makingbase_urloptional is a good design choice.Changing
base_urlfrom required to optional allows for default values, which aligns with the implementation incrates/transcribe-deepgram/src/lib.rs.crates/ws-utils/src/lib.rs (2)
130-138: LGTM! Correct de-interleaving implementation.The logic properly splits interleaved audio data by taking alternating samples for mic and speaker channels.
94-94: LGTM!Making the constructor public enables external usage as needed.
crates/transcribe-deepgram/src/lib.rs (1)
43-47: Add error handling for URL parsing.The
unwrap()call could panic if the base URL is invalid.- config - .base_url - .unwrap_or("https://api.deepgram.com/v1".to_string()) - .parse::<url::Url>() - .unwrap(), + config + .base_url + .unwrap_or("https://api.deepgram.com/v1".to_string()) + .parse::<url::Url>() + .map_err(|e| Error::from(e))?,Likely an incorrect or invalid review comment.
owhisper/owhisper-client/src/lib.rs (2)
40-47: LGTM! Dynamic path construction is more flexible.The implementation correctly handles trailing slashes and appends the "listen" endpoint.
181-195: LGTM! Correct audio interleaving implementation.The implementation properly interleaves mic and speaker audio at the byte level, maintaining 16-bit sample integrity.
There was a problem hiding this comment.
cubic analysis
5 issues found across 10 files • Review in cubic
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
owhisper/owhisper-server/src/commands/run.rs (1)
27-95: Remove large blocks of commented codeThis file contains extensive commented-out code that should be removed to improve maintainability. If this code is needed for reference, it should be preserved in version control history.
Consider removing all the commented code blocks and keeping only the active implementation.
♻️ Duplicate comments (8)
owhisper/owhisper-client/src/lib.rs (2)
40-47: Path construction may duplicate "listen"If the supplied base path already ends with "listen", it will be duplicated resulting in an incorrect endpoint like "/listen/listen".
Add a check to prevent duplication:
{ let mut path = url.path().to_string(); if !path.ends_with('/') { path.push('/'); } - path.push_str("listen"); + if !path.ends_with("listen") { + path.push_str("listen"); + } url.set_path(&path); }
81-84: Authorization header inconsistency across codebaseThe authorization header prefix was changed from "Bearer" to "Token", but the server and other parts of the codebase still expect "Bearer". This will break authentication.
Either revert this change back to "Bearer" or update all related server and client code to consistently use "Token" as outlined in the previous review comment.
crates/transcribe-deepgram/src/lib.rs (6)
41-46: URL parsing can panic - handle the error insteadThe
unwrap()call on URL parsing can cause a panic if the base_url is malformed. This should return an error instead.Apply this diff:
- let base_url = config - .base_url - .unwrap_or("https://api.deepgram.com/v1".to_string()) - .parse::<url::Url>() - .unwrap(); + let base_url = config + .base_url + .unwrap_or("https://api.deepgram.com/v1".to_string()) + .parse::<url::Url>() + .map_err(|e| Error::Configuration(format!("Invalid base URL: {}", e)))?;
211-216: Response builder unwrap can panicThe
unwrap()call on Response::builder() can panic if the builder fails. Handle the error properly.Apply this diff:
Ok(Response::builder() .status(StatusCode::OK) .header("Content-Type", "application/json") .body(Body::from(json_response)) - .unwrap()) + .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)
249-253: Response builder unwrap can panic in error handlingApply this diff to handle the error:
Err(_) => Ok(Response::builder() .status(StatusCode::BAD_REQUEST) .body(Body::from("Invalid WebSocket upgrade request")) - .unwrap()), + .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error")))),
258-262: Response builder unwrap can panicApply this diff:
return Ok(Response::builder() .status(StatusCode::BAD_REQUEST) .body(Body::from("Failed to read request body")) - .unwrap()) + .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error"))))
267-271: Response builder unwrap can panicApply this diff:
Err(status) => Ok(Response::builder() .status(status) .body(Body::from("Transcription failed")) - .unwrap()), + .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error")))),
273-277: Response builder unwrap can panicApply this diff:
Ok(Response::builder() .status(StatusCode::METHOD_NOT_ALLOWED) .body(Body::from("Only WebSocket and POST requests are supported")) - .unwrap()) + .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error"))))
🧹 Nitpick comments (1)
crates/transcribe-deepgram/src/lib.rs (1)
218-218: Use tracing instead of eprintln!Use the tracing crate for consistent error logging across the application.
- eprintln!("Deepgram transcription error: {:?}", e); + tracing::error!("Deepgram transcription error: {:?}", e);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
crates/transcribe-deepgram/Cargo.toml(1 hunks)crates/transcribe-deepgram/src/lib.rs(4 hunks)owhisper/owhisper-client/src/lib.rs(5 hunks)owhisper/owhisper-config/Cargo.toml(1 hunks)owhisper/owhisper-config/src/error.rs(1 hunks)owhisper/owhisper-config/src/lib.rs(3 hunks)owhisper/owhisper-interface/src/lib.rs(2 hunks)owhisper/owhisper-server/Cargo.toml(1 hunks)owhisper/owhisper-server/src/commands/pull.rs(1 hunks)owhisper/owhisper-server/src/commands/run.rs(1 hunks)owhisper/owhisper-server/src/commands/serve.rs(1 hunks)owhisper/owhisper-server/src/server.rs(5 hunks)owhisper/schema.json(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- owhisper/owhisper-config/Cargo.toml
- owhisper/owhisper-server/Cargo.toml
- owhisper/schema.json
- owhisper/owhisper-server/src/commands/pull.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/transcribe-deepgram/Cargo.toml
- owhisper/owhisper-interface/src/lib.rs
- owhisper/owhisper-config/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
owhisper/owhisper-server/src/commands/run.rsowhisper/owhisper-config/src/error.rsowhisper/owhisper-server/src/commands/serve.rsowhisper/owhisper-server/src/server.rsowhisper/owhisper-client/src/lib.rscrates/transcribe-deepgram/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ci (macos, macos-latest)
- GitHub Check: ci (windows, windows-latest)
- GitHub Check: ci
🔇 Additional comments (11)
owhisper/owhisper-config/src/error.rs (1)
1-5: Clean error handling implementationGood use of
thiserrorfor error derivation and transparent error forwarding fromconfig::ConfigError.owhisper/owhisper-server/src/commands/run.rs (2)
12-13: Good addition of optional config parameterMaking the config parameter optional provides flexibility for users to run with default configuration.
17-19: Proper error handling and shutdown signal integrationGood use of the new
Config::newAPI with error propagation and integration of shutdown signal for graceful termination.owhisper/owhisper-server/src/commands/serve.rs (1)
1-22: Consistent implementation with run commandThe changes properly integrate optional configuration and graceful shutdown, maintaining consistency with the
runcommand implementation.owhisper/owhisper-server/src/server.rs (4)
85-89: Server now binds only to localhostThe server binding was changed from
UNSPECIFIED(0.0.0.0) toLOCALHOST(127.0.0.1), making it accessible only from the local machine. Ensure this is intentional as it prevents remote connections.
108-109: Good refactoring to use query parametersThe change from path parameters to query parameters provides better flexibility and aligns with Deepgram's API patterns. The model selection logic with fallback to the first available service is well implemented.
Also applies to: 144-155
98-103: Improved server execution patternDirectly awaiting the server instead of spawning a background task is cleaner and provides better error propagation.
215-350: Comprehensive test coverageGood addition of integration tests for WhisperCpp, Deepgram, and AWS transcription services. The tests properly exercise the server with real audio data.
owhisper/owhisper-client/src/lib.rs (3)
74-89: Good refactoring with build_request methodExtracting the common request building logic reduces code duplication between
build_singleandbuild_dual.
179-194: Correct audio interleaving implementationThe implementation properly interleaves mic and speaker audio samples at the sample level (16-bit samples), which aligns with the unified dual audio data handling.
59-60: Model parameter properly added to query stringGood addition of the model parameter with a sensible default empty string fallback.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
crates/ws-utils/src/lib.rs (1)
76-79: Fix: Mixing audio samples with itself produces no effect.The current implementation mixes the interleaved samples with itself, which is meaningless. For dual audio, the samples are already interleaved (mic, speaker, mic, speaker...). You should either:
- De-interleave first then mix the separate channels, or
- Return the interleaved samples as-is without mixing
AudioProcessResult::DualSamples(samples) => { - let mixed = mix_audio_channels(&samples, &samples); - Some((mixed, receiver)) + // Samples are already interleaved, no mixing needed + Some((samples, receiver)) }owhisper/owhisper-client/src/lib.rs (2)
40-47: Fix URL construction to prevent path duplication.The current implementation unconditionally appends "listen" to the path, which could result in duplication if the base URL already ends with "listen" (e.g., "https://api.example.com/listen" becomes "https://api.example.com/listen/listen").
{ let mut path = url.path().to_string(); - if !path.ends_with('/') { - path.push('/'); - } - path.push_str("listen"); + if !path.ends_with("/listen") { + if !path.ends_with('/') { + path.push('/'); + } + path.push_str("listen"); + } url.set_path(&path); }
76-91: Ensure consistent "Token" prefix across all Authorization headers.You've updated owhisper-client to use "Token" prefix but this creates inconsistency with server-side code that still expects "Bearer". This will break authentication.
The server in
owhisper-server/src/server.rs(lines 12, 198) still usesTypedHeader<Authorization<Bearer>>which expects "Bearer" prefix. Either:
- Update the server to handle "Token" prefix, or
- Revert this change back to "Bearer" for consistency
- .with_header("Authorization", format!("Token {}", key)), + .with_header("Authorization", format!("Bearer {}", key)),
🧹 Nitpick comments (1)
owhisper/owhisper-server/src/server.rs (1)
224-359: Review test environment variable handling.The tests expect specific environment variables (DEEPGRAM_API_KEY, AWS_REGION, etc.) but don't handle missing variables gracefully. Consider adding conditional test execution or more descriptive panics.
+#[tokio::test] +#[ignore = "requires DEEPGRAM_API_KEY environment variable"] async fn test_deepgram() { + if std::env::var("DEEPGRAM_API_KEY").is_err() { + eprintln!("Skipping test_deepgram: DEEPGRAM_API_KEY not set"); + return; + } let addr = start().await;
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
Cargo.toml(1 hunks)admin/server/tsconfig.json(1 hunks)apps/desktop/src-tauri/capabilities/default.json(1 hunks)apps/desktop/src-tauri/capabilities/macos.json(1 hunks)apps/desktop/src-tauri/tauri.conf.json(1 hunks)crates/data/src/english_3/raw.json(1 hunks)crates/data/src/english_4/raw.json(1 hunks)crates/data/src/english_5/raw.json(1 hunks)crates/data/src/english_7/raw.json(1 hunks)crates/transcribe-deepgram/Cargo.toml(2 hunks)crates/ws-utils/src/lib.rs(5 hunks)dprint.json(3 hunks)owhisper/owhisper-client/src/lib.rs(5 hunks)owhisper/owhisper-interface/Cargo.toml(1 hunks)owhisper/owhisper-server/src/server.rs(5 hunks)owhisper/schema.json(3 hunks)
✅ Files skipped from review due to trivial changes (12)
- Cargo.toml
- admin/server/tsconfig.json
- apps/desktop/src-tauri/tauri.conf.json
- crates/data/src/english_4/raw.json
- crates/data/src/english_7/raw.json
- crates/data/src/english_5/raw.json
- apps/desktop/src-tauri/capabilities/macos.json
- owhisper/owhisper-interface/Cargo.toml
- apps/desktop/src-tauri/capabilities/default.json
- crates/data/src/english_3/raw.json
- dprint.json
- owhisper/schema.json
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/transcribe-deepgram/Cargo.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
crates/ws-utils/src/lib.rsowhisper/owhisper-server/src/server.rsowhisper/owhisper-client/src/lib.rs
🔇 Additional comments (13)
crates/ws-utils/src/lib.rs (5)
10-10: LGTM! Enum variant simplified correctly.The consolidation from separate
micandspeakervectors to a single interleavedVec<f32>aligns with the PR objective of unifying dual audio input.
17-23: LGTM! Binary message handling improved.The direct conversion of binary data to samples is more efficient than the previous approach and correctly handles the new unified audio format.
32-34: LGTM! DualAudio parsing updated correctly.The parsing now correctly handles the single interleaved data vector instead of separate mic and speaker vectors.
101-101: LGTM! Constructor visibility improved.Making the constructor public enables external usage as needed for the architecture changes.
137-144: LGTM! Dual audio splitting logic is correct.The implementation correctly de-interleaves the samples by chunking pairs and unzipping them into separate mic and speaker vectors. This properly handles the new interleaved format.
owhisper/owhisper-server/src/server.rs (5)
6-6: LGTM! Import updated for Query extraction.The import change from
PathtoQuerycorrectly supports the new parameter extraction approach.
82-104: LGTM! Server execution logic improved.The changes eliminate the background task spawn, use proper logging, and provide better error handling with consistent return types.
108-109: LGTM! Routes now support Query parameter extraction.The addition of both
/listenand/v1/listenroutes with Query parameter extraction correctly addresses the previous issue where Path extraction required dynamic segments.
144-155: LGTM! Model selection logic is robust.The parameter extraction from query string with fallback to the first available model provides good flexibility for both explicit model selection and default behavior.
165-189: LGTM! Error handling enhanced with specific messages.The structured error responses with specific error codes like "aws_server_error", "deepgram_server_error" will help with debugging and client-side error handling.
owhisper/owhisper-client/src/lib.rs (3)
55-58: LGTM! Deepgram-compatible query parameters added.The query parameters (model, sample_rate, encoding) align with Deepgram's WebSocket API requirements as documented in the referenced URL.
110-128: LGTM! WebSocketIO updated for binary data handling.The change from JSON-encoded messages to direct binary data transmission is more efficient and aligns with the Deepgram WebSocket protocol.
179-193: LGTM! Audio interleaving logic is correct.The implementation properly interleaves mic and speaker samples at the 2-byte level, creating the expected format for dual audio processing.
There was a problem hiding this comment.
Actionable comments posted: 9
🔭 Outside diff range comments (2)
crates/db-script/src/lib.rs (2)
26-86: Critical: No error handling throughout the migration functionThe
runfunction ignores all errors, which could lead to silent data corruption or incomplete migrations. This is especially concerning for a database migration script.Key issues:
- Line 32: Database query errors are silently ignored
- Lines 39, 45, 49: Row parsing errors skip records without logging
- Line 54: JSON deserialization failures default to empty vec
- Lines 66-77: Update errors continue without notification
- Line 70: Serialization errors could corrupt data
Consider implementing proper error handling:
-pub async fn run(conn: &libsql::Connection) { +pub async fn run(conn: &libsql::Connection) -> Result<(), Box<dyn std::error::Error>> { let mut rows = match conn .query("SELECT id, conversations FROM sessions", ()) .await { Ok(rows) => rows, - Err(_) => return, + Err(e) => return Err(format!("Failed to query sessions: {}", e).into()), };Also add logging for skipped records and consider transactional updates to ensure data consistency.
54-54: Potential data loss from silent deserialization failuresUsing
unwrap_or_default()here means any malformed JSON data will be silently converted to an empty vector, potentially losing conversation data.-let conversations: Vec<owhisper_interface::ConversationChunk> = - serde_json::from_str(&conversations_str).unwrap_or_default(); +let conversations: Vec<owhisper_interface::ConversationChunk> = match serde_json::from_str(&conversations_str) { + Ok(convs) => convs, + Err(e) => { + tracing::error!("Failed to parse conversations for session {}: {}", id, e); + continue; + } +};
♻️ Duplicate comments (3)
crates/transcribe-deepgram/src/handler.rs (1)
28-39: URL parsing can panic with unwrap()The unwrap() call on URL parsing can cause a panic if the base URL is malformed.
Propagate the error instead:
- let base_url = config - .base_url - .unwrap_or("https://api.deepgram.com/v1".to_string()) - .parse::<url::Url>() - .unwrap(); + let base_url = config + .base_url + .unwrap_or("https://api.deepgram.com/v1".to_string()) + .parse::<url::Url>() + .map_err(|e| crate::Error::from(format!("Invalid base URL: {}", e)))?;owhisper/owhisper-client/src/lib.rs (2)
40-47: Path construction may duplicate "listen" segmentIf the base URL already ends with "listen", this code will create an incorrect path like "/listen/listen".
Apply this fix to prevent duplication:
{ let mut path = url.path().to_string(); if !path.ends_with('/') { path.push('/'); } - path.push_str("listen"); + if !path.ends_with("listen") { + path.push_str("listen"); + } url.set_path(&path); }
82-86: Authorization header prefix inconsistencyThe change from "Bearer" to "Token" creates inconsistency with other parts of the codebase that still expect "Bearer".
This breaking change requires coordinated updates across multiple files as detailed in the previous review. Either:
- Update all related files to use "Token" consistently, or
- Revert this change back to "Bearer"
The previous review lists all affected locations that need updating.
🧹 Nitpick comments (7)
crates/transcribe-whisper-local/src/service/streaming.rs (2)
223-286: Performance optimization opportunity in word processingThe current implementation splits the text multiple times which is inefficient.
Cache the split result to avoid repeated processing:
-let words = text - .split_whitespace() - .filter(|w| !w.is_empty()) - .enumerate() - .map(|(i, w)| { - let word_duration = duration as f64 / text.split_whitespace().count() as f64; +let text_words: Vec<&str> = text.split_whitespace().filter(|w| !w.is_empty()).collect(); +let word_count = text_words.len() as f64; +let word_duration = if word_count > 0.0 { duration as f64 / word_count } else { 0.0 }; + +let words = text_words + .into_iter() + .enumerate() + .map(|(i, w)| { let word_start = start as f64 + (i as f64 * word_duration);
245-246: Consider more accurate word timing distributionThe current implementation assumes all words have equal duration, which may not reflect actual speech patterns. Short words like "a", "the" typically take less time than longer words.
For more accurate timing, consider:
- Using word length as a weight for duration distribution
- Implementing a phoneme-based timing model
- Using the actual timing data from the whisper model if available
crates/transcribe-gcp/src/handler.rs (1)
101-107: Document unimplemented functionalityThe batch transcription returns NOT_IMPLEMENTED without any explanation.
Add a descriptive error message:
async fn handle_batch_transcription( self, _audio_data: Bytes, ) -> Result<Response<Body>, StatusCode> { - // GCP Speech-to-Text batch transcription not implemented yet + // TODO: Implement GCP Speech-to-Text batch transcription + tracing::warn!("GCP batch transcription not implemented"); Err(StatusCode::NOT_IMPLEMENTED) }crates/transcribe-deepgram/src/handler.rs (1)
119-119: Use consistent loggingUse tracing instead of eprintln! for consistency.
- eprintln!("Deepgram transcription error: {:?}", e); + tracing::error!("Deepgram transcription error: {:?}", e);crates/transcribe-whisper-local/src/handler.rs (1)
45-57: Consider adding error contextErrors are converted to generic status codes without preserving context.
Log the actual error for debugging:
// Save audio data to file - if let Err(_) = tokio::fs::write(&temp_file, &audio_data).await { + if let Err(e) = tokio::fs::write(&temp_file, &audio_data).await { + tracing::error!("Failed to write temp file: {}", e); return Err(StatusCode::INTERNAL_SERVER_ERROR); }owhisper/owhisper-client/src/lib.rs (1)
177-191: Audio interleaving implementation looks correctThe implementation properly interleaves audio at the 2-byte (16-bit) sample level. Note that
chunks_exactwill drop any remaining bytes if the chunks aren't exactly divisible by 2, and thezipwill stop when the shorter stream ends, potentially losing samples from the longer stream.Consider handling remainder bytes and length mismatches if sample-perfect accuracy is required:
let mic_samples = mic_chunk.chunks_exact(2); let speaker_samples = speaker_chunk.chunks_exact(2); + +// Handle any remainder bytes +let mic_remainder = mic_samples.remainder(); +let speaker_remainder = speaker_samples.remainder(); for (mic_sample, speaker_sample) in mic_samples.zip(speaker_samples) { interleaved.extend_from_slice(mic_sample); interleaved.extend_from_slice(speaker_sample); } + +// Log if there are unprocessed samples +if !mic_remainder.is_empty() || !speaker_remainder.is_empty() { + tracing::warn!("Dropped {} mic bytes and {} speaker bytes", + mic_remainder.len(), speaker_remainder.len()); +}crates/transcribe-aws/src/handler.rs (1)
82-139: Consider error handling for completion messageThe transcription logic is well-implemented. However, the completion message send on line 137 ignores potential errors.
// Send completion message -let _ = result_tx.send(WsMessage::Complete).await; +if let Err(e) = result_tx.send(WsMessage::Complete).await { + info!("Failed to send completion message: {}", e); +}
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (39)
Cargo.toml(3 hunks)crates/audio-utils/src/lib.rs(1 hunks)crates/data/build.rs(2 hunks)crates/db-script/src/lib.rs(2 hunks)crates/db-user/src/init.rs(2 hunks)crates/db-user/src/sessions_ops.rs(2 hunks)crates/db-user/src/sessions_types.rs(1 hunks)crates/stt/src/realtime/clova.rs(2 hunks)crates/stt/src/realtime/deepgram.rs(2 hunks)crates/stt/src/realtime/mod.rs(2 hunks)crates/stt/src/realtime/whisper.rs(2 hunks)crates/template/src/filters.rs(2 hunks)crates/transcribe-aws/Cargo.toml(1 hunks)crates/transcribe-aws/src/handler.rs(1 hunks)crates/transcribe-aws/src/lib.rs(1 hunks)crates/transcribe-deepgram/Cargo.toml(2 hunks)crates/transcribe-deepgram/src/handler.rs(1 hunks)crates/transcribe-deepgram/src/lib.rs(1 hunks)crates/transcribe-gcp/Cargo.toml(1 hunks)crates/transcribe-gcp/src/handler.rs(1 hunks)crates/transcribe-gcp/src/lib.rs(1 hunks)crates/transcribe-interface/Cargo.toml(1 hunks)crates/transcribe-interface/src/lib.rs(1 hunks)crates/transcribe-whisper-local/Cargo.toml(1 hunks)crates/transcribe-whisper-local/src/handler.rs(1 hunks)crates/transcribe-whisper-local/src/service/recorded.rs(2 hunks)crates/transcribe-whisper-local/src/service/streaming.rs(4 hunks)crates/ws-utils/src/lib.rs(3 hunks)owhisper/owhisper-client/src/lib.rs(7 hunks)owhisper/owhisper-interface/src/lib.rs(6 hunks)owhisper/owhisper-interface/src/stream.rs(1 hunks)owhisper/owhisper-server/src/misc.rs(1 hunks)owhisper/schema.json(4 hunks)plugins/db/src/commands/sessions.rs(2 hunks)plugins/listener/src/events.rs(1 hunks)plugins/listener/src/fsm.rs(2 hunks)plugins/local-stt/Cargo.toml(0 hunks)plugins/local-stt/benches/simple.rs(0 hunks)plugins/local-stt/src/events.rs(1 hunks)
💤 Files with no reviewable changes (2)
- plugins/local-stt/Cargo.toml
- plugins/local-stt/benches/simple.rs
✅ Files skipped from review due to trivial changes (14)
- crates/transcribe-whisper-local/Cargo.toml
- crates/template/src/filters.rs
- crates/stt/src/realtime/whisper.rs
- owhisper/owhisper-server/src/misc.rs
- crates/data/build.rs
- crates/stt/src/realtime/clova.rs
- plugins/listener/src/events.rs
- crates/transcribe-gcp/Cargo.toml
- plugins/local-stt/src/events.rs
- crates/db-user/src/sessions_ops.rs
- crates/audio-utils/src/lib.rs
- owhisper/schema.json
- crates/transcribe-whisper-local/src/service/recorded.rs
- crates/transcribe-aws/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (4)
- crates/transcribe-deepgram/Cargo.toml
- Cargo.toml
- crates/ws-utils/src/lib.rs
- owhisper/owhisper-interface/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
crates/stt/src/realtime/mod.rscrates/stt/src/realtime/deepgram.rscrates/transcribe-interface/src/lib.rscrates/db-user/src/sessions_types.rscrates/db-user/src/init.rscrates/db-script/src/lib.rscrates/transcribe-gcp/src/handler.rsplugins/db/src/commands/sessions.rscrates/transcribe-deepgram/src/handler.rscrates/transcribe-whisper-local/src/service/streaming.rscrates/transcribe-whisper-local/src/handler.rscrates/transcribe-deepgram/src/lib.rsplugins/listener/src/fsm.rscrates/transcribe-aws/src/lib.rsowhisper/owhisper-interface/src/stream.rscrates/transcribe-aws/src/handler.rsowhisper/owhisper-client/src/lib.rscrates/transcribe-gcp/src/lib.rs
🔇 Additional comments (24)
crates/stt/src/realtime/mod.rs (1)
206-206: LGTM! Consistent type migration.The changes from
Vec<owhisper_interface::Word>toVec<owhisper_interface::Word2>in both test functions are consistent with the broader interface migration across the codebase.Also applies to: 235-235
crates/db-user/src/sessions_types.rs (1)
18-18: LGTM! Database schema alignment with interface migration.The
wordsfield type update fromVec<owhisper_interface::Word>toVec<owhisper_interface::Word2>correctly aligns the database session model with the updated interface.crates/stt/src/realtime/deepgram.rs (1)
11-11: LGTM! Consistent Deepgram integration with Word2 migration.The import update and vector type changes from
WordtoWord2are correctly implemented. The field mappings in the constructor remain consistent, ensuring proper transcription data handling.Also applies to: 66-83
crates/transcribe-interface/Cargo.toml (1)
7-16: LGTM! Dependencies align with WebSocket transcription architecture.The dependency overhaul appropriately supports the shift to async WebSocket-based transcription services. The addition of
axumwith WebSocket features,tokioruntime, andowhisper-interfaceintegration supports the PR objectives for Deepgram compatibility.crates/db-user/src/init.rs (1)
657-661: LGTM! Test data seeding aligned with Word2 migration.The deserialization type updates from
Vec<owhisper_interface::Word>toVec<owhisper_interface::Word2>in both instances correctly maintain test data compatibility with the updated interface.Also applies to: 679-683
plugins/db/src/commands/sessions.rs (1)
40-53: LGTM! Type migration looks correctThe return type changes from
WordtoWord2are properly implemented for bothget_words_onboardingandget_wordsfunctions. Error handling is maintained appropriately.Also applies to: 58-72
crates/transcribe-gcp/src/lib.rs (1)
1-38: Clean refactoring to use BaseTranscribeServiceThe refactoring successfully simplifies the service implementation by delegating to a generic base service. The separation of concerns between service setup and handler logic improves maintainability.
crates/transcribe-interface/src/lib.rs (1)
21-27: Well-designed trait abstractionThe trait design is solid with appropriate bounds and clear separation of concerns between WebSocket and batch transcription handling.
crates/transcribe-deepgram/src/lib.rs (1)
1-39: Clean refactoring to use BaseTranscribeServiceThe refactoring successfully simplifies the service implementation by delegating to the generic BaseTranscribeService. This addresses the architectural concerns and removes duplicate WebSocket handling code.
crates/transcribe-whisper-local/src/handler.rs (3)
194-280: Well-structured helper functionsThe helper functions for segment conversion and VAD processing are well-implemented with good separation of concerns.
166-166: Handle JSON serialization errorsThe unwrap() could panic if serialization fails.
let response = segment_to_stream_response(segment); - let message = serde_json::to_string(&response).unwrap(); + let message = match serde_json::to_string(&response) { + Ok(msg) => msg, + Err(e) => { + tracing::error!("Failed to serialize response: {}", e); + continue; + } + };Likely an incorrect or invalid review comment.
76-80: Avoid unwrap() in response buildingThe unwrap() could panic if response building fails.
Handle the error properly:
Ok(Response::builder() .status(StatusCode::OK) .header("Content-Type", "application/json") .body(Body::from(response_body)) - .unwrap()) + .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)Likely an incorrect or invalid review comment.
owhisper/owhisper-interface/src/stream.rs (1)
87-119: Well-designed compatibility test!Good approach to ensure type compatibility with Deepgram SDK by serializing their types and deserializing into yours.
owhisper/owhisper-client/src/lib.rs (5)
7-7: LGTM!Import change aligns with the refactoring to use
StreamResponsetype.
55-58: LGTM!Query parameters properly configured for Deepgram API compatibility with appropriate defaults.
76-101: Good refactoring!Extracting the common request building logic into a private method eliminates duplication and improves maintainability.
110-128: LGTM!WebSocketIO implementation properly simplified to use raw bytes for audio data and binary messages for better efficiency.
206-214: LGTM!Test properly updated for Deepgram API compatibility with appropriate endpoint and model configuration.
crates/transcribe-aws/src/lib.rs (2)
4-6: LGTM!Good modularization with handler separation and appropriate public re-exports to maintain API compatibility.
15-39: Excellent refactoring!The wrapper pattern with
BaseTranscribeService<Handler>provides a clean, consistent architecture across all transcription services. The delegation is properly implemented for alltower::Servicemethods.crates/transcribe-aws/src/handler.rs (4)
26-45: LGTM!Well-structured configuration with sensible defaults for AWS Transcribe streaming.
62-79: LGTM!Proper AWS client initialization with flexible region configuration and appropriate fallback.
146-185: LGTM!Well-structured WebSocket handling with proper task management, error handling, and cleanup.
187-194: LGTM!Appropriate handling of unsupported batch transcription with correct HTTP status code.
| Ok(Response::builder() | ||
| .status(StatusCode::OK) | ||
| .header("Content-Type", "application/json") | ||
| .body(Body::from(json_response)) | ||
| .unwrap()) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid unwrap() on Response builder
The unwrap() could panic if response building fails.
Handle the potential error:
Ok(Response::builder()
.status(StatusCode::OK)
.header("Content-Type", "application/json")
.body(Body::from(json_response))
- .unwrap())
+ .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)🤖 Prompt for AI Agents
In crates/transcribe-deepgram/src/handler.rs around lines 112 to 116, the
Response builder uses unwrap() which can cause a panic if building the response
fails. Replace unwrap() with proper error handling by matching on the Result
returned from body() or using a method like expect() with a clear error message,
or propagate the error upwards to handle it gracefully without panicking.
| Err(_) => Ok(Response::builder() | ||
| .status(StatusCode::BAD_REQUEST) | ||
| .body(Body::from("invalid_websocket_upgrade_request")) | ||
| .unwrap()), |
There was a problem hiding this comment.
Avoid using unwrap() in production code
The unwrap() call could panic if Response::builder() fails. While this is unlikely, it's better to handle the error gracefully.
Consider propagating the error:
- Err(_) => Ok(Response::builder()
- .status(StatusCode::BAD_REQUEST)
- .body(Body::from("invalid_websocket_upgrade_request"))
- .unwrap()),
+ Err(_) => Response::builder()
+ .status(StatusCode::BAD_REQUEST)
+ .body(Body::from("invalid_websocket_upgrade_request"))
+ .map_err(|_| std::convert::Infallible),Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In crates/transcribe-interface/src/lib.rs around lines 71 to 74, the use of
unwrap() on Response::builder().body() can cause a panic if building the
response fails. Replace unwrap() by properly handling the Result returned by
body(), either by propagating the error using the ? operator or by matching on
the Result and returning an appropriate error response to avoid panics in
production code.
| // TODO | ||
| while let Some(result) = listen_stream.next().await { | ||
| let _meta = result.meta.clone(); | ||
|
|
||
| // We don't have to do this, and inefficient. But this is what works at the moment. | ||
| if let owhisper_interface::StreamResponse::TranscriptResponse { | ||
| channel, .. | ||
| } = result | ||
| { | ||
| let updated_words = update_session(&app, &session.id, result.words) | ||
| .await | ||
| .unwrap(); | ||
| let words = channel.alternatives.first().unwrap().words.clone(); | ||
| let words2: Vec<owhisper_interface::Word2> = | ||
| words.into_iter().map(|w| w.into()).collect(); | ||
|
|
||
| // We don't have to do this, and inefficient. But this is what works at the moment. | ||
| { | ||
| let updated_words = | ||
| update_session(&app, &session.id, words2).await.unwrap(); | ||
|
|
||
| SessionEvent::Words { | ||
| words: updated_words, | ||
| SessionEvent::Words { | ||
| words: updated_words, | ||
| } | ||
| .emit(&app) | ||
| } | ||
| .emit(&app) | ||
| .unwrap(); | ||
| } |
There was a problem hiding this comment.
Potential panic and inefficient code structure
Several issues in this transcription handling code:
- Line 488:
unwrap()onfirst()will panic if alternatives is empty - The nested block structure (lines 493-502) seems unnecessarily complex
- The TODO comment acknowledges inefficiency
Consider this safer and cleaner approach:
-// TODO
while let Some(result) = listen_stream.next().await {
if let owhisper_interface::StreamResponse::TranscriptResponse {
channel, ..
} = result
{
- let words = channel.alternatives.first().unwrap().words.clone();
+ let Some(first_alt) = channel.alternatives.first() else {
+ tracing::warn!("Received transcript response with no alternatives");
+ continue;
+ };
+ let words = first_alt.words.clone();
let words2: Vec<owhisper_interface::Word2> =
words.into_iter().map(|w| w.into()).collect();
- // We don't have to do this, and inefficient. But this is what works at the moment.
- {
- let updated_words =
- update_session(&app, &session.id, words2).await.unwrap();
-
- SessionEvent::Words {
- words: updated_words,
- }
- .emit(&app)
- }
- .unwrap();
+ match update_session(&app, &session.id, words2).await {
+ Ok(updated_words) => {
+ if let Err(e) = SessionEvent::Words { words: updated_words }.emit(&app) {
+ tracing::error!("Failed to emit words event: {}", e);
+ }
+ }
+ Err(e) => {
+ tracing::error!("Failed to update session: {}", e);
+ }
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // TODO | |
| while let Some(result) = listen_stream.next().await { | |
| let _meta = result.meta.clone(); | |
| // We don't have to do this, and inefficient. But this is what works at the moment. | |
| if let owhisper_interface::StreamResponse::TranscriptResponse { | |
| channel, .. | |
| } = result | |
| { | |
| let updated_words = update_session(&app, &session.id, result.words) | |
| .await | |
| .unwrap(); | |
| let words = channel.alternatives.first().unwrap().words.clone(); | |
| let words2: Vec<owhisper_interface::Word2> = | |
| words.into_iter().map(|w| w.into()).collect(); | |
| // We don't have to do this, and inefficient. But this is what works at the moment. | |
| { | |
| let updated_words = | |
| update_session(&app, &session.id, words2).await.unwrap(); | |
| SessionEvent::Words { | |
| words: updated_words, | |
| SessionEvent::Words { | |
| words: updated_words, | |
| } | |
| .emit(&app) | |
| } | |
| .emit(&app) | |
| .unwrap(); | |
| } | |
| while let Some(result) = listen_stream.next().await { | |
| if let owhisper_interface::StreamResponse::TranscriptResponse { | |
| channel, .. | |
| } = result | |
| { | |
| let Some(first_alt) = channel.alternatives.first() else { | |
| tracing::warn!("Received transcript response with no alternatives"); | |
| continue; | |
| }; | |
| let words = first_alt.words.clone(); | |
| let words2: Vec<owhisper_interface::Word2> = | |
| words.into_iter().map(|w| w.into()).collect(); | |
| match update_session(&app, &session.id, words2).await { | |
| Ok(updated_words) => { | |
| if let Err(e) = | |
| SessionEvent::Words { words: updated_words }.emit(&app) | |
| { | |
| tracing::error!("Failed to emit words event: {}", e); | |
| } | |
| } | |
| Err(e) => { | |
| tracing::error!("Failed to update session: {}", e); | |
| } | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In plugins/listener/src/fsm.rs around lines 482 to 503, avoid panicking by
replacing unwrap() on alternatives.first() with a safe check to handle empty
alternatives gracefully. Simplify the nested block structure by removing
unnecessary braces and directly awaiting update_session and emitting the event.
Refactor the code to improve clarity and maintainability while preserving
functionality.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
owhisper/owhisper-client/src/lib.rs (2)
86-86: Coordinate authorization header prefix change across codebase.The change from "Bearer" to "Token" creates inconsistency with other parts of the codebase that still expect "Bearer". This breaking change requires updating servers, middleware, and all clients to use "Token" consistently.
40-47: Fix URL construction to prevent path duplication.The code unconditionally appends "listen" to the path, which will cause duplication if the base URL already ends with "listen" (e.g., "https://api.example.com/listen" becomes "https://api.example.com/listen/listen").
{ let mut path = url.path().to_string(); if !path.ends_with('/') { path.push('/'); } - path.push_str("listen"); + if !path.ends_with("listen") && !path.ends_with("listen/") { + path.push_str("listen"); + } url.set_path(&path); }
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/lib.rs (1)
55-58: Consider making audio parameters configurable.The hardcoded sample rate "16000" and encoding "linear16" may not match actual audio streams. While this supports Deepgram compatibility, consider making these configurable through
ListenParams.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/transcribe-whisper-local/src/service/streaming.rs(5 hunks)crates/ws-utils/src/lib.rs(4 hunks)owhisper/owhisper-client/src/lib.rs(7 hunks)owhisper/schema.json(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- owhisper/schema.json
- crates/transcribe-whisper-local/src/service/streaming.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
owhisper/owhisper-client/src/lib.rscrates/ws-utils/src/lib.rs
🔇 Additional comments (9)
crates/ws-utils/src/lib.rs (4)
27-31: LGTM! Simplified WebSocket message handling.The direct binary message processing is more efficient than the previous JSON-based approach and correctly handles different message types.
50-50: LGTM! Constructor made public for external usage.This change enables the new split functions to return ChannelAudioSource instances.
72-122: LGTM! Well-implemented interleaved stereo processing.The function correctly processes interleaved i16 stereo data, properly normalizes to f32, and handles WebSocket message types appropriately. The use of unbounded channels is suitable for real-time audio streaming.
124-155: LGTM! Efficient dual audio source splitting.The function elegantly handles f32 stereo pairs using
chunks_exact(2)andunzip()for clean separation into mic and speaker channels.owhisper/owhisper-client/src/lib.rs (5)
94-95: LGTM! Good refactoring to eliminate code duplication.Extracting the common request building logic improves maintainability and follows DRY principles.
Also applies to: 99-100
110-120: LGTM! Simplified to binary message handling.The move from JSON chunks to raw bytes is more efficient for audio streaming, and using
StreamResponseprovides better structured output.
135-146: LGTM! Consistent binary message handling.The dual client now handles binary data consistently with the single client implementation.
177-198: LGTM! Proper byte-level audio interleaving.The rewritten interleaving logic correctly handles sample-level interleaving and ensures complete i16 samples, which is much more robust than the previous chunk-level approach.
213-222: LGTM! Test updates for Deepgram compatibility.The test configuration correctly uses Deepgram's API endpoint and specifies a concrete model for consistent testing.
Summary by cubic
Added Deepgram compatibility to the whisper-client and updated the audio streaming protocol for better integration.