Skip to content

deepgram compatibility for whisper-client#1281

Closed
yujonglee wants to merge 18 commits intomainfrom
owhisper-deepgram-compat
Closed

deepgram compatibility for whisper-client#1281
yujonglee wants to merge 18 commits intomainfrom
owhisper-deepgram-compat

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Aug 3, 2025

Summary by cubic

Added Deepgram compatibility to the whisper-client and updated the audio streaming protocol for better integration.

  • New Features
    • Added support for Deepgram's WebSocket API in the client and server.
    • Unified dual audio input into a single interleaved data field.
    • Updated API paths and config to match Deepgram requirements.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 3, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

This update introduces a major architectural refactor and interface evolution for the transcription and streaming components. The codebase now uses a generic BaseTranscribeService abstraction with handler modules for AWS, Deepgram, and GCP transcription services, replacing manual WebSocket and streaming logic. The streaming response format is standardized with a new StreamResponse type. Many APIs and data models are updated to use Word2 instead of Word, and configuration handling is improved with error propagation. Numerous formatting and dependency updates are also included.

Changes

Cohort / File(s) Change Summary
Transcription Service Refactor (AWS, Deepgram, GCP)
crates/transcribe-aws/src/lib.rs, crates/transcribe-aws/src/handler.rs, crates/transcribe-deepgram/src/lib.rs, crates/transcribe-deepgram/src/handler.rs, crates/transcribe-gcp/src/lib.rs, crates/transcribe-gcp/src/handler.rs, crates/transcribe-interface/src/lib.rs
Refactored each transcription service to use a generic BaseTranscribeService<Handler> abstraction, moving all WebSocket and streaming logic to handler modules implementing a common trait. Service structs are now thin wrappers delegating to these handlers.
Streaming Response Model & Word2 Migration
owhisper/owhisper-interface/src/lib.rs, owhisper/owhisper-interface/src/stream.rs, crates/stt/src/realtime/whisper.rs, crates/stt/src/realtime/deepgram.rs, crates/stt/src/realtime/clova.rs, crates/stt/src/realtime/mod.rs, crates/db-script/src/lib.rs, crates/db-user/src/init.rs, crates/db-user/src/sessions_ops.rs, crates/db-user/src/sessions_types.rs, plugins/db/src/commands/sessions.rs, plugins/listener/src/events.rs, plugins/listener/src/fsm.rs, plugins/local-stt/src/events.rs, crates/template/src/filters.rs
Introduced a new StreamResponse streaming API model and migrated all word representations from Word to Word2 across the codebase, updating data structures, APIs, and deserialization logic.
Client & Interface Updates
owhisper/owhisper-client/src/lib.rs, owhisper/owhisper-interface/Cargo.toml
Refactored client-side code to use the new streaming response types and updated the construction of WebSocket requests and audio chunk handling to match the new API.
Configuration Handling Improvements
owhisper/owhisper-config/src/lib.rs, owhisper/owhisper-config/src/error.rs, owhisper/owhisper-config/Cargo.toml
Improved configuration loading with error propagation, added a public error type, made config paths optional, and updated model config fields for flexibility.
Server and Command Routing Changes
owhisper/owhisper-server/src/server.rs, owhisper/owhisper-server/src/commands/run.rs, owhisper/owhisper-server/src/commands/serve.rs, owhisper/owhisper-server/src/commands/pull.rs, plugins/local-stt/src/server.rs
Updated server and CLI command logic to accommodate new config handling, changed API endpoints, improved error handling, and added comprehensive integration tests for various transcription services.
Audio Processing Refactor
crates/audio-utils/src/lib.rs, crates/ws-utils/src/lib.rs
Added a new mix_audio_channels function, refactored audio source splitting for stereo and dual-channel audio, and simplified WebSocket audio stream parsing.
Dependency and Metadata Updates
Cargo.toml, crates/transcribe-aws/Cargo.toml, crates/transcribe-deepgram/Cargo.toml, crates/transcribe-gcp/Cargo.toml, crates/transcribe-gcp/Cargo.toml, crates/transcribe-whisper-local/Cargo.toml, owhisper/owhisper-server/Cargo.toml, owhisper/owhisper-client/Cargo.toml, crates/transcribe-interface/Cargo.toml
Added, updated, or reorganized dependencies to support new abstractions, streaming, and async logic.
Formatting-Only JSON and Config Changes
crates/data/src/english_3/raw.json, crates/data/src/english_4/raw.json, crates/data/src/english_5/raw.json, crates/data/src/english_7/raw.json, owhisper/schema.json, apps/desktop/src-tauri/capabilities/default.json, apps/desktop/src-tauri/capabilities/macos.json, apps/desktop/src-tauri/tauri.conf.json, dprint.json, admin/server/tsconfig.json
Reformatted arrays and properties in JSON/config files for consistency and readability; no semantic changes.
Miscellaneous and Test Cleanup
crates/data/build.rs, plugins/local-stt/benches/simple.rs, owhisper/owhisper-server/src/misc.rs
Updated imports to use new types, removed or reformatted test/benchmark files, and made minor formatting changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant BaseTranscribeService
    participant Handler
    participant TranscriptionAPI

    Client->>Server: HTTP/WebSocket upgrade request
    Server->>BaseTranscribeService: call(request)
    BaseTranscribeService->>Handler: handle_socket(socket, params)
    Handler->>TranscriptionAPI: Stream audio data
    TranscriptionAPI-->>Handler: Streaming transcription results
    Handler-->>Client: Send StreamResponse messages over WebSocket
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • Extract local whisper inference as tower service #1268: Refactors local whisper inference to use a tower service abstraction, similar to this PR's refactor of transcription services to use handler-based tower services, removing manual WebSocket and streaming logic from the service struct.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch owhisper-deepgram-compat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
crates/transcribe-deepgram/src/lib.rs (1)

59-61: Incomplete implementation: handle_socket method is empty.

The WebSocket handler doesn't process any messages. Is this intentional or a work in progress?

Would you like me to help implement the WebSocket message handling logic?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b295ca9 and 4883941.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • crates/transcribe-deepgram/Cargo.toml (1 hunks)
  • crates/transcribe-deepgram/src/lib.rs (2 hunks)
  • crates/ws-utils/src/lib.rs (5 hunks)
  • owhisper/owhisper-client/Cargo.toml (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (5 hunks)
  • owhisper/owhisper-config/src/lib.rs (1 hunks)
  • owhisper/owhisper-interface/src/lib.rs (1 hunks)
  • owhisper/owhisper-server/src/server.rs (1 hunks)
  • owhisper/schema.json (2 hunks)
  • plugins/local-stt/src/server.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • owhisper/owhisper-server/src/server.rs
  • plugins/local-stt/src/server.rs
  • owhisper/owhisper-interface/src/lib.rs
  • owhisper/owhisper-config/src/lib.rs
  • crates/ws-utils/src/lib.rs
  • crates/transcribe-deepgram/src/lib.rs
  • owhisper/owhisper-client/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: ci
  • GitHub Check: ci (macos, macos-latest)
  • GitHub Check: ci (windows, windows-latest)
🔇 Additional comments (13)
owhisper/owhisper-client/Cargo.toml (1)

21-22: LGTM! Rodio dependency addition looks appropriate.

The addition of rodio as a dev-dependency aligns with its usage for audio decoding in asynchronous test cases as mentioned in the AI summary.

owhisper/owhisper-config/src/lib.rs (1)

54-54: LGTM! Making base_url optional adds configuration flexibility.

This change allows users to omit the base_url field while maintaining functionality through a default value, improving the user experience.

owhisper/owhisper-interface/src/lib.rs (1)

58-58: LGTM! Data structure consolidation improves consistency.

Consolidating separate mic and speaker fields into a single interleaved data field standardizes dual audio representation across the codebase, as mentioned in the comprehensive changes described in the AI summary.

owhisper/owhisper-server/src/server.rs (1)

129-129: LGTM! Route path standardization improves consistency.

The route change from "/v1/stt/realtime" to "/v1/listen" aligns with the coordinated endpoint naming updates across the client and server components.

plugins/local-stt/src/server.rs (1)

91-91: LGTM! Route simplification aligns with endpoint standardization.

The route change from "/api/desktop/listen/realtime" to "/listen" is consistent with the coordinated endpoint naming updates across the project, simplifying the API surface.

crates/transcribe-deepgram/Cargo.toml (1)

29-31: LGTM!

The addition of workspace dev dependencies for testing is appropriate.

owhisper/schema.json (2)

50-55: Good schema organization improvement.

Moving the type property to the first position across all model configurations improves readability and consistency.

Also applies to: 77-82, 108-113


92-96: LGTM! Making base_url optional is a good design choice.

Changing base_url from required to optional allows for default values, which aligns with the implementation in crates/transcribe-deepgram/src/lib.rs.

crates/ws-utils/src/lib.rs (2)

130-138: LGTM! Correct de-interleaving implementation.

The logic properly splits interleaved audio data by taking alternating samples for mic and speaker channels.


94-94: LGTM!

Making the constructor public enables external usage as needed.

crates/transcribe-deepgram/src/lib.rs (1)

43-47: Add error handling for URL parsing.

The unwrap() call could panic if the base URL is invalid.

-        config
-            .base_url
-            .unwrap_or("https://api.deepgram.com/v1".to_string())
-            .parse::<url::Url>()
-            .unwrap(),
+        config
+            .base_url
+            .unwrap_or("https://api.deepgram.com/v1".to_string())
+            .parse::<url::Url>()
+            .map_err(|e| Error::from(e))?,

Likely an incorrect or invalid review comment.

owhisper/owhisper-client/src/lib.rs (2)

40-47: LGTM! Dynamic path construction is more flexible.

The implementation correctly handles trailing slashes and appends the "listen" endpoint.


181-195: LGTM! Correct audio interleaving implementation.

The implementation properly interleaves mic and speaker audio at the byte level, maintaining 16-bit sample integrity.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

5 issues found across 10 files • Review in cubic

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
owhisper/owhisper-server/src/commands/run.rs (1)

27-95: Remove large blocks of commented code

This file contains extensive commented-out code that should be removed to improve maintainability. If this code is needed for reference, it should be preserved in version control history.

Consider removing all the commented code blocks and keeping only the active implementation.

♻️ Duplicate comments (8)
owhisper/owhisper-client/src/lib.rs (2)

40-47: Path construction may duplicate "listen"

If the supplied base path already ends with "listen", it will be duplicated resulting in an incorrect endpoint like "/listen/listen".

Add a check to prevent duplication:

 {
     let mut path = url.path().to_string();
     if !path.ends_with('/') {
         path.push('/');
     }
-    path.push_str("listen");
+    if !path.ends_with("listen") {
+        path.push_str("listen");
+    }
     url.set_path(&path);
 }

81-84: Authorization header inconsistency across codebase

The authorization header prefix was changed from "Bearer" to "Token", but the server and other parts of the codebase still expect "Bearer". This will break authentication.

Either revert this change back to "Bearer" or update all related server and client code to consistently use "Token" as outlined in the previous review comment.

crates/transcribe-deepgram/src/lib.rs (6)

41-46: URL parsing can panic - handle the error instead

The unwrap() call on URL parsing can cause a panic if the base_url is malformed. This should return an error instead.

Apply this diff:

-        let base_url = config
-            .base_url
-            .unwrap_or("https://api.deepgram.com/v1".to_string())
-            .parse::<url::Url>()
-            .unwrap();
+        let base_url = config
+            .base_url
+            .unwrap_or("https://api.deepgram.com/v1".to_string())
+            .parse::<url::Url>()
+            .map_err(|e| Error::Configuration(format!("Invalid base URL: {}", e)))?;

211-216: Response builder unwrap can panic

The unwrap() call on Response::builder() can panic if the builder fails. Handle the error properly.

Apply this diff:

                Ok(Response::builder()
                    .status(StatusCode::OK)
                    .header("Content-Type", "application/json")
                    .body(Body::from(json_response))
-                    .unwrap())
+                    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)

249-253: Response builder unwrap can panic in error handling

Apply this diff to handle the error:

                    Err(_) => Ok(Response::builder()
                        .status(StatusCode::BAD_REQUEST)
                        .body(Body::from("Invalid WebSocket upgrade request"))
-                        .unwrap()),
+                        .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error")))),

258-262: Response builder unwrap can panic

Apply this diff:

                        return Ok(Response::builder()
                            .status(StatusCode::BAD_REQUEST)
                            .body(Body::from("Failed to read request body"))
-                            .unwrap())
+                            .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error"))))

267-271: Response builder unwrap can panic

Apply this diff:

                    Err(status) => Ok(Response::builder()
                        .status(status)
                        .body(Body::from("Transcription failed"))
-                        .unwrap()),
+                        .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error")))),

273-277: Response builder unwrap can panic

Apply this diff:

                Ok(Response::builder()
                    .status(StatusCode::METHOD_NOT_ALLOWED)
                    .body(Body::from("Only WebSocket and POST requests are supported"))
-                    .unwrap())
+                    .unwrap_or_else(|_| Response::new(Body::from("Internal Server Error"))))
🧹 Nitpick comments (1)
crates/transcribe-deepgram/src/lib.rs (1)

218-218: Use tracing instead of eprintln!

Use the tracing crate for consistent error logging across the application.

-                eprintln!("Deepgram transcription error: {:?}", e);
+                tracing::error!("Deepgram transcription error: {:?}", e);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4883941 and b3a9a4b.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • crates/transcribe-deepgram/Cargo.toml (1 hunks)
  • crates/transcribe-deepgram/src/lib.rs (4 hunks)
  • owhisper/owhisper-client/src/lib.rs (5 hunks)
  • owhisper/owhisper-config/Cargo.toml (1 hunks)
  • owhisper/owhisper-config/src/error.rs (1 hunks)
  • owhisper/owhisper-config/src/lib.rs (3 hunks)
  • owhisper/owhisper-interface/src/lib.rs (2 hunks)
  • owhisper/owhisper-server/Cargo.toml (1 hunks)
  • owhisper/owhisper-server/src/commands/pull.rs (1 hunks)
  • owhisper/owhisper-server/src/commands/run.rs (1 hunks)
  • owhisper/owhisper-server/src/commands/serve.rs (1 hunks)
  • owhisper/owhisper-server/src/server.rs (5 hunks)
  • owhisper/schema.json (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • owhisper/owhisper-config/Cargo.toml
  • owhisper/owhisper-server/Cargo.toml
  • owhisper/schema.json
  • owhisper/owhisper-server/src/commands/pull.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/transcribe-deepgram/Cargo.toml
  • owhisper/owhisper-interface/src/lib.rs
  • owhisper/owhisper-config/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • owhisper/owhisper-server/src/commands/run.rs
  • owhisper/owhisper-config/src/error.rs
  • owhisper/owhisper-server/src/commands/serve.rs
  • owhisper/owhisper-server/src/server.rs
  • owhisper/owhisper-client/src/lib.rs
  • crates/transcribe-deepgram/src/lib.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: ci (macos, macos-latest)
  • GitHub Check: ci (windows, windows-latest)
  • GitHub Check: ci
🔇 Additional comments (11)
owhisper/owhisper-config/src/error.rs (1)

1-5: Clean error handling implementation

Good use of thiserror for error derivation and transparent error forwarding from config::ConfigError.

owhisper/owhisper-server/src/commands/run.rs (2)

12-13: Good addition of optional config parameter

Making the config parameter optional provides flexibility for users to run with default configuration.


17-19: Proper error handling and shutdown signal integration

Good use of the new Config::new API with error propagation and integration of shutdown signal for graceful termination.

owhisper/owhisper-server/src/commands/serve.rs (1)

1-22: Consistent implementation with run command

The changes properly integrate optional configuration and graceful shutdown, maintaining consistency with the run command implementation.

owhisper/owhisper-server/src/server.rs (4)

85-89: Server now binds only to localhost

The server binding was changed from UNSPECIFIED (0.0.0.0) to LOCALHOST (127.0.0.1), making it accessible only from the local machine. Ensure this is intentional as it prevents remote connections.


108-109: Good refactoring to use query parameters

The change from path parameters to query parameters provides better flexibility and aligns with Deepgram's API patterns. The model selection logic with fallback to the first available service is well implemented.

Also applies to: 144-155


98-103: Improved server execution pattern

Directly awaiting the server instead of spawning a background task is cleaner and provides better error propagation.


215-350: Comprehensive test coverage

Good addition of integration tests for WhisperCpp, Deepgram, and AWS transcription services. The tests properly exercise the server with real audio data.

owhisper/owhisper-client/src/lib.rs (3)

74-89: Good refactoring with build_request method

Extracting the common request building logic reduces code duplication between build_single and build_dual.


179-194: Correct audio interleaving implementation

The implementation properly interleaves mic and speaker audio samples at the sample level (16-bit samples), which aligns with the unified dual audio data handling.


59-60: Model parameter properly added to query string

Good addition of the model parameter with a sensible default empty string fallback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
crates/ws-utils/src/lib.rs (1)

76-79: Fix: Mixing audio samples with itself produces no effect.

The current implementation mixes the interleaved samples with itself, which is meaningless. For dual audio, the samples are already interleaved (mic, speaker, mic, speaker...). You should either:

  1. De-interleave first then mix the separate channels, or
  2. Return the interleaved samples as-is without mixing
 AudioProcessResult::DualSamples(samples) => {
-    let mixed = mix_audio_channels(&samples, &samples);
-    Some((mixed, receiver))
+    // Samples are already interleaved, no mixing needed
+    Some((samples, receiver))
 }
owhisper/owhisper-client/src/lib.rs (2)

40-47: Fix URL construction to prevent path duplication.

The current implementation unconditionally appends "listen" to the path, which could result in duplication if the base URL already ends with "listen" (e.g., "https://api.example.com/listen" becomes "https://api.example.com/listen/listen").

 {
     let mut path = url.path().to_string();
-    if !path.ends_with('/') {
-        path.push('/');
-    }
-    path.push_str("listen");
+    if !path.ends_with("/listen") {
+        if !path.ends_with('/') {
+            path.push('/');
+        }
+        path.push_str("listen");
+    }
     url.set_path(&path);
 }

76-91: Ensure consistent "Token" prefix across all Authorization headers.

You've updated owhisper-client to use "Token" prefix but this creates inconsistency with server-side code that still expects "Bearer". This will break authentication.

The server in owhisper-server/src/server.rs (lines 12, 198) still uses TypedHeader<Authorization<Bearer>> which expects "Bearer" prefix. Either:

  1. Update the server to handle "Token" prefix, or
  2. Revert this change back to "Bearer" for consistency
-            .with_header("Authorization", format!("Token {}", key)),
+            .with_header("Authorization", format!("Bearer {}", key)),
🧹 Nitpick comments (1)
owhisper/owhisper-server/src/server.rs (1)

224-359: Review test environment variable handling.

The tests expect specific environment variables (DEEPGRAM_API_KEY, AWS_REGION, etc.) but don't handle missing variables gracefully. Consider adding conditional test execution or more descriptive panics.

+#[tokio::test]
+#[ignore = "requires DEEPGRAM_API_KEY environment variable"]
 async fn test_deepgram() {
+    if std::env::var("DEEPGRAM_API_KEY").is_err() {
+        eprintln!("Skipping test_deepgram: DEEPGRAM_API_KEY not set");
+        return;
+    }
     let addr = start().await;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3a9a4b and b9f0fd9.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • Cargo.toml (1 hunks)
  • admin/server/tsconfig.json (1 hunks)
  • apps/desktop/src-tauri/capabilities/default.json (1 hunks)
  • apps/desktop/src-tauri/capabilities/macos.json (1 hunks)
  • apps/desktop/src-tauri/tauri.conf.json (1 hunks)
  • crates/data/src/english_3/raw.json (1 hunks)
  • crates/data/src/english_4/raw.json (1 hunks)
  • crates/data/src/english_5/raw.json (1 hunks)
  • crates/data/src/english_7/raw.json (1 hunks)
  • crates/transcribe-deepgram/Cargo.toml (2 hunks)
  • crates/ws-utils/src/lib.rs (5 hunks)
  • dprint.json (3 hunks)
  • owhisper/owhisper-client/src/lib.rs (5 hunks)
  • owhisper/owhisper-interface/Cargo.toml (1 hunks)
  • owhisper/owhisper-server/src/server.rs (5 hunks)
  • owhisper/schema.json (3 hunks)
✅ Files skipped from review due to trivial changes (12)
  • Cargo.toml
  • admin/server/tsconfig.json
  • apps/desktop/src-tauri/tauri.conf.json
  • crates/data/src/english_4/raw.json
  • crates/data/src/english_7/raw.json
  • crates/data/src/english_5/raw.json
  • apps/desktop/src-tauri/capabilities/macos.json
  • owhisper/owhisper-interface/Cargo.toml
  • apps/desktop/src-tauri/capabilities/default.json
  • crates/data/src/english_3/raw.json
  • dprint.json
  • owhisper/schema.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/transcribe-deepgram/Cargo.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/ws-utils/src/lib.rs
  • owhisper/owhisper-server/src/server.rs
  • owhisper/owhisper-client/src/lib.rs
🔇 Additional comments (13)
crates/ws-utils/src/lib.rs (5)

10-10: LGTM! Enum variant simplified correctly.

The consolidation from separate mic and speaker vectors to a single interleaved Vec<f32> aligns with the PR objective of unifying dual audio input.


17-23: LGTM! Binary message handling improved.

The direct conversion of binary data to samples is more efficient than the previous approach and correctly handles the new unified audio format.


32-34: LGTM! DualAudio parsing updated correctly.

The parsing now correctly handles the single interleaved data vector instead of separate mic and speaker vectors.


101-101: LGTM! Constructor visibility improved.

Making the constructor public enables external usage as needed for the architecture changes.


137-144: LGTM! Dual audio splitting logic is correct.

The implementation correctly de-interleaves the samples by chunking pairs and unzipping them into separate mic and speaker vectors. This properly handles the new interleaved format.

owhisper/owhisper-server/src/server.rs (5)

6-6: LGTM! Import updated for Query extraction.

The import change from Path to Query correctly supports the new parameter extraction approach.


82-104: LGTM! Server execution logic improved.

The changes eliminate the background task spawn, use proper logging, and provide better error handling with consistent return types.


108-109: LGTM! Routes now support Query parameter extraction.

The addition of both /listen and /v1/listen routes with Query parameter extraction correctly addresses the previous issue where Path extraction required dynamic segments.


144-155: LGTM! Model selection logic is robust.

The parameter extraction from query string with fallback to the first available model provides good flexibility for both explicit model selection and default behavior.


165-189: LGTM! Error handling enhanced with specific messages.

The structured error responses with specific error codes like "aws_server_error", "deepgram_server_error" will help with debugging and client-side error handling.

owhisper/owhisper-client/src/lib.rs (3)

55-58: LGTM! Deepgram-compatible query parameters added.

The query parameters (model, sample_rate, encoding) align with Deepgram's WebSocket API requirements as documented in the referenced URL.


110-128: LGTM! WebSocketIO updated for binary data handling.

The change from JSON-encoded messages to direct binary data transmission is more efficient and aligns with the Deepgram WebSocket protocol.


179-193: LGTM! Audio interleaving logic is correct.

The implementation properly interleaves mic and speaker samples at the 2-byte level, creating the expected format for dual audio processing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🔭 Outside diff range comments (2)
crates/db-script/src/lib.rs (2)

26-86: Critical: No error handling throughout the migration function

The run function ignores all errors, which could lead to silent data corruption or incomplete migrations. This is especially concerning for a database migration script.

Key issues:

  • Line 32: Database query errors are silently ignored
  • Lines 39, 45, 49: Row parsing errors skip records without logging
  • Line 54: JSON deserialization failures default to empty vec
  • Lines 66-77: Update errors continue without notification
  • Line 70: Serialization errors could corrupt data

Consider implementing proper error handling:

-pub async fn run(conn: &libsql::Connection) {
+pub async fn run(conn: &libsql::Connection) -> Result<(), Box<dyn std::error::Error>> {
     let mut rows = match conn
         .query("SELECT id, conversations FROM sessions", ())
         .await
     {
         Ok(rows) => rows,
-        Err(_) => return,
+        Err(e) => return Err(format!("Failed to query sessions: {}", e).into()),
     };

Also add logging for skipped records and consider transactional updates to ensure data consistency.


54-54: Potential data loss from silent deserialization failures

Using unwrap_or_default() here means any malformed JSON data will be silently converted to an empty vector, potentially losing conversation data.

-let conversations: Vec<owhisper_interface::ConversationChunk> =
-    serde_json::from_str(&conversations_str).unwrap_or_default();
+let conversations: Vec<owhisper_interface::ConversationChunk> = match serde_json::from_str(&conversations_str) {
+    Ok(convs) => convs,
+    Err(e) => {
+        tracing::error!("Failed to parse conversations for session {}: {}", id, e);
+        continue;
+    }
+};
♻️ Duplicate comments (3)
crates/transcribe-deepgram/src/handler.rs (1)

28-39: URL parsing can panic with unwrap()

The unwrap() call on URL parsing can cause a panic if the base URL is malformed.

Propagate the error instead:

-        let base_url = config
-            .base_url
-            .unwrap_or("https://api.deepgram.com/v1".to_string())
-            .parse::<url::Url>()
-            .unwrap();
+        let base_url = config
+            .base_url
+            .unwrap_or("https://api.deepgram.com/v1".to_string())
+            .parse::<url::Url>()
+            .map_err(|e| crate::Error::from(format!("Invalid base URL: {}", e)))?;
owhisper/owhisper-client/src/lib.rs (2)

40-47: Path construction may duplicate "listen" segment

If the base URL already ends with "listen", this code will create an incorrect path like "/listen/listen".

Apply this fix to prevent duplication:

 {
     let mut path = url.path().to_string();
     if !path.ends_with('/') {
         path.push('/');
     }
-    path.push_str("listen");
+    if !path.ends_with("listen") {
+        path.push_str("listen");
+    }
     url.set_path(&path);
 }

82-86: Authorization header prefix inconsistency

The change from "Bearer" to "Token" creates inconsistency with other parts of the codebase that still expect "Bearer".

This breaking change requires coordinated updates across multiple files as detailed in the previous review. Either:

  1. Update all related files to use "Token" consistently, or
  2. Revert this change back to "Bearer"

The previous review lists all affected locations that need updating.

🧹 Nitpick comments (7)
crates/transcribe-whisper-local/src/service/streaming.rs (2)

223-286: Performance optimization opportunity in word processing

The current implementation splits the text multiple times which is inefficient.

Cache the split result to avoid repeated processing:

-let words = text
-    .split_whitespace()
-    .filter(|w| !w.is_empty())
-    .enumerate()
-    .map(|(i, w)| {
-        let word_duration = duration as f64 / text.split_whitespace().count() as f64;
+let text_words: Vec<&str> = text.split_whitespace().filter(|w| !w.is_empty()).collect();
+let word_count = text_words.len() as f64;
+let word_duration = if word_count > 0.0 { duration as f64 / word_count } else { 0.0 };
+
+let words = text_words
+    .into_iter()
+    .enumerate()
+    .map(|(i, w)| {
         let word_start = start as f64 + (i as f64 * word_duration);

245-246: Consider more accurate word timing distribution

The current implementation assumes all words have equal duration, which may not reflect actual speech patterns. Short words like "a", "the" typically take less time than longer words.

For more accurate timing, consider:

  1. Using word length as a weight for duration distribution
  2. Implementing a phoneme-based timing model
  3. Using the actual timing data from the whisper model if available
crates/transcribe-gcp/src/handler.rs (1)

101-107: Document unimplemented functionality

The batch transcription returns NOT_IMPLEMENTED without any explanation.

Add a descriptive error message:

     async fn handle_batch_transcription(
         self,
         _audio_data: Bytes,
     ) -> Result<Response<Body>, StatusCode> {
-        // GCP Speech-to-Text batch transcription not implemented yet
+        // TODO: Implement GCP Speech-to-Text batch transcription
+        tracing::warn!("GCP batch transcription not implemented");
         Err(StatusCode::NOT_IMPLEMENTED)
     }
crates/transcribe-deepgram/src/handler.rs (1)

119-119: Use consistent logging

Use tracing instead of eprintln! for consistency.

-                eprintln!("Deepgram transcription error: {:?}", e);
+                tracing::error!("Deepgram transcription error: {:?}", e);
crates/transcribe-whisper-local/src/handler.rs (1)

45-57: Consider adding error context

Errors are converted to generic status codes without preserving context.

Log the actual error for debugging:

         // Save audio data to file
-        if let Err(_) = tokio::fs::write(&temp_file, &audio_data).await {
+        if let Err(e) = tokio::fs::write(&temp_file, &audio_data).await {
+            tracing::error!("Failed to write temp file: {}", e);
             return Err(StatusCode::INTERNAL_SERVER_ERROR);
         }
owhisper/owhisper-client/src/lib.rs (1)

177-191: Audio interleaving implementation looks correct

The implementation properly interleaves audio at the 2-byte (16-bit) sample level. Note that chunks_exact will drop any remaining bytes if the chunks aren't exactly divisible by 2, and the zip will stop when the shorter stream ends, potentially losing samples from the longer stream.

Consider handling remainder bytes and length mismatches if sample-perfect accuracy is required:

 let mic_samples = mic_chunk.chunks_exact(2);
 let speaker_samples = speaker_chunk.chunks_exact(2);
+
+// Handle any remainder bytes
+let mic_remainder = mic_samples.remainder();
+let speaker_remainder = speaker_samples.remainder();

 for (mic_sample, speaker_sample) in mic_samples.zip(speaker_samples) {
     interleaved.extend_from_slice(mic_sample);
     interleaved.extend_from_slice(speaker_sample);
 }
+
+// Log if there are unprocessed samples
+if !mic_remainder.is_empty() || !speaker_remainder.is_empty() {
+    tracing::warn!("Dropped {} mic bytes and {} speaker bytes", 
+                   mic_remainder.len(), speaker_remainder.len());
+}
crates/transcribe-aws/src/handler.rs (1)

82-139: Consider error handling for completion message

The transcription logic is well-implemented. However, the completion message send on line 137 ignores potential errors.

 // Send completion message
-let _ = result_tx.send(WsMessage::Complete).await;
+if let Err(e) = result_tx.send(WsMessage::Complete).await {
+    info!("Failed to send completion message: {}", e);
+}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9f0fd9 and 30d82be.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (39)
  • Cargo.toml (3 hunks)
  • crates/audio-utils/src/lib.rs (1 hunks)
  • crates/data/build.rs (2 hunks)
  • crates/db-script/src/lib.rs (2 hunks)
  • crates/db-user/src/init.rs (2 hunks)
  • crates/db-user/src/sessions_ops.rs (2 hunks)
  • crates/db-user/src/sessions_types.rs (1 hunks)
  • crates/stt/src/realtime/clova.rs (2 hunks)
  • crates/stt/src/realtime/deepgram.rs (2 hunks)
  • crates/stt/src/realtime/mod.rs (2 hunks)
  • crates/stt/src/realtime/whisper.rs (2 hunks)
  • crates/template/src/filters.rs (2 hunks)
  • crates/transcribe-aws/Cargo.toml (1 hunks)
  • crates/transcribe-aws/src/handler.rs (1 hunks)
  • crates/transcribe-aws/src/lib.rs (1 hunks)
  • crates/transcribe-deepgram/Cargo.toml (2 hunks)
  • crates/transcribe-deepgram/src/handler.rs (1 hunks)
  • crates/transcribe-deepgram/src/lib.rs (1 hunks)
  • crates/transcribe-gcp/Cargo.toml (1 hunks)
  • crates/transcribe-gcp/src/handler.rs (1 hunks)
  • crates/transcribe-gcp/src/lib.rs (1 hunks)
  • crates/transcribe-interface/Cargo.toml (1 hunks)
  • crates/transcribe-interface/src/lib.rs (1 hunks)
  • crates/transcribe-whisper-local/Cargo.toml (1 hunks)
  • crates/transcribe-whisper-local/src/handler.rs (1 hunks)
  • crates/transcribe-whisper-local/src/service/recorded.rs (2 hunks)
  • crates/transcribe-whisper-local/src/service/streaming.rs (4 hunks)
  • crates/ws-utils/src/lib.rs (3 hunks)
  • owhisper/owhisper-client/src/lib.rs (7 hunks)
  • owhisper/owhisper-interface/src/lib.rs (6 hunks)
  • owhisper/owhisper-interface/src/stream.rs (1 hunks)
  • owhisper/owhisper-server/src/misc.rs (1 hunks)
  • owhisper/schema.json (4 hunks)
  • plugins/db/src/commands/sessions.rs (2 hunks)
  • plugins/listener/src/events.rs (1 hunks)
  • plugins/listener/src/fsm.rs (2 hunks)
  • plugins/local-stt/Cargo.toml (0 hunks)
  • plugins/local-stt/benches/simple.rs (0 hunks)
  • plugins/local-stt/src/events.rs (1 hunks)
💤 Files with no reviewable changes (2)
  • plugins/local-stt/Cargo.toml
  • plugins/local-stt/benches/simple.rs
✅ Files skipped from review due to trivial changes (14)
  • crates/transcribe-whisper-local/Cargo.toml
  • crates/template/src/filters.rs
  • crates/stt/src/realtime/whisper.rs
  • owhisper/owhisper-server/src/misc.rs
  • crates/data/build.rs
  • crates/stt/src/realtime/clova.rs
  • plugins/listener/src/events.rs
  • crates/transcribe-gcp/Cargo.toml
  • plugins/local-stt/src/events.rs
  • crates/db-user/src/sessions_ops.rs
  • crates/audio-utils/src/lib.rs
  • owhisper/schema.json
  • crates/transcribe-whisper-local/src/service/recorded.rs
  • crates/transcribe-aws/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/transcribe-deepgram/Cargo.toml
  • Cargo.toml
  • crates/ws-utils/src/lib.rs
  • owhisper/owhisper-interface/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/stt/src/realtime/mod.rs
  • crates/stt/src/realtime/deepgram.rs
  • crates/transcribe-interface/src/lib.rs
  • crates/db-user/src/sessions_types.rs
  • crates/db-user/src/init.rs
  • crates/db-script/src/lib.rs
  • crates/transcribe-gcp/src/handler.rs
  • plugins/db/src/commands/sessions.rs
  • crates/transcribe-deepgram/src/handler.rs
  • crates/transcribe-whisper-local/src/service/streaming.rs
  • crates/transcribe-whisper-local/src/handler.rs
  • crates/transcribe-deepgram/src/lib.rs
  • plugins/listener/src/fsm.rs
  • crates/transcribe-aws/src/lib.rs
  • owhisper/owhisper-interface/src/stream.rs
  • crates/transcribe-aws/src/handler.rs
  • owhisper/owhisper-client/src/lib.rs
  • crates/transcribe-gcp/src/lib.rs
🔇 Additional comments (24)
crates/stt/src/realtime/mod.rs (1)

206-206: LGTM! Consistent type migration.

The changes from Vec<owhisper_interface::Word> to Vec<owhisper_interface::Word2> in both test functions are consistent with the broader interface migration across the codebase.

Also applies to: 235-235

crates/db-user/src/sessions_types.rs (1)

18-18: LGTM! Database schema alignment with interface migration.

The words field type update from Vec<owhisper_interface::Word> to Vec<owhisper_interface::Word2> correctly aligns the database session model with the updated interface.

crates/stt/src/realtime/deepgram.rs (1)

11-11: LGTM! Consistent Deepgram integration with Word2 migration.

The import update and vector type changes from Word to Word2 are correctly implemented. The field mappings in the constructor remain consistent, ensuring proper transcription data handling.

Also applies to: 66-83

crates/transcribe-interface/Cargo.toml (1)

7-16: LGTM! Dependencies align with WebSocket transcription architecture.

The dependency overhaul appropriately supports the shift to async WebSocket-based transcription services. The addition of axum with WebSocket features, tokio runtime, and owhisper-interface integration supports the PR objectives for Deepgram compatibility.

crates/db-user/src/init.rs (1)

657-661: LGTM! Test data seeding aligned with Word2 migration.

The deserialization type updates from Vec<owhisper_interface::Word> to Vec<owhisper_interface::Word2> in both instances correctly maintain test data compatibility with the updated interface.

Also applies to: 679-683

plugins/db/src/commands/sessions.rs (1)

40-53: LGTM! Type migration looks correct

The return type changes from Word to Word2 are properly implemented for both get_words_onboarding and get_words functions. Error handling is maintained appropriately.

Also applies to: 58-72

crates/transcribe-gcp/src/lib.rs (1)

1-38: Clean refactoring to use BaseTranscribeService

The refactoring successfully simplifies the service implementation by delegating to a generic base service. The separation of concerns between service setup and handler logic improves maintainability.

crates/transcribe-interface/src/lib.rs (1)

21-27: Well-designed trait abstraction

The trait design is solid with appropriate bounds and clear separation of concerns between WebSocket and batch transcription handling.

crates/transcribe-deepgram/src/lib.rs (1)

1-39: Clean refactoring to use BaseTranscribeService

The refactoring successfully simplifies the service implementation by delegating to the generic BaseTranscribeService. This addresses the architectural concerns and removes duplicate WebSocket handling code.

crates/transcribe-whisper-local/src/handler.rs (3)

194-280: Well-structured helper functions

The helper functions for segment conversion and VAD processing are well-implemented with good separation of concerns.


166-166: Handle JSON serialization errors

The unwrap() could panic if serialization fails.

         let response = segment_to_stream_response(segment);
-        let message = serde_json::to_string(&response).unwrap();
+        let message = match serde_json::to_string(&response) {
+            Ok(msg) => msg,
+            Err(e) => {
+                tracing::error!("Failed to serialize response: {}", e);
+                continue;
+            }
+        };

Likely an incorrect or invalid review comment.


76-80: Avoid unwrap() in response building

The unwrap() could panic if response building fails.

Handle the error properly:

         Ok(Response::builder()
             .status(StatusCode::OK)
             .header("Content-Type", "application/json")
             .body(Body::from(response_body))
-            .unwrap())
+            .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)

Likely an incorrect or invalid review comment.

owhisper/owhisper-interface/src/stream.rs (1)

87-119: Well-designed compatibility test!

Good approach to ensure type compatibility with Deepgram SDK by serializing their types and deserializing into yours.

owhisper/owhisper-client/src/lib.rs (5)

7-7: LGTM!

Import change aligns with the refactoring to use StreamResponse type.


55-58: LGTM!

Query parameters properly configured for Deepgram API compatibility with appropriate defaults.


76-101: Good refactoring!

Extracting the common request building logic into a private method eliminates duplication and improves maintainability.


110-128: LGTM!

WebSocketIO implementation properly simplified to use raw bytes for audio data and binary messages for better efficiency.


206-214: LGTM!

Test properly updated for Deepgram API compatibility with appropriate endpoint and model configuration.

crates/transcribe-aws/src/lib.rs (2)

4-6: LGTM!

Good modularization with handler separation and appropriate public re-exports to maintain API compatibility.


15-39: Excellent refactoring!

The wrapper pattern with BaseTranscribeService<Handler> provides a clean, consistent architecture across all transcription services. The delegation is properly implemented for all tower::Service methods.

crates/transcribe-aws/src/handler.rs (4)

26-45: LGTM!

Well-structured configuration with sensible defaults for AWS Transcribe streaming.


62-79: LGTM!

Proper AWS client initialization with flexible region configuration and appropriate fallback.


146-185: LGTM!

Well-structured WebSocket handling with proper task management, error handling, and cleanup.


187-194: LGTM!

Appropriate handling of unsupported batch transcription with correct HTTP status code.

Comment on lines +112 to +116
Ok(Response::builder()
.status(StatusCode::OK)
.header("Content-Type", "application/json")
.body(Body::from(json_response))
.unwrap())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid unwrap() on Response builder

The unwrap() could panic if response building fails.

Handle the potential error:

                 Ok(Response::builder()
                     .status(StatusCode::OK)
                     .header("Content-Type", "application/json")
                     .body(Body::from(json_response))
-                    .unwrap())
+                    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)
🤖 Prompt for AI Agents
In crates/transcribe-deepgram/src/handler.rs around lines 112 to 116, the
Response builder uses unwrap() which can cause a panic if building the response
fails. Replace unwrap() with proper error handling by matching on the Result
returned from body() or using a method like expect() with a clear error message,
or propagate the error upwards to handle it gracefully without panicking.

Comment on lines +71 to +74
Err(_) => Ok(Response::builder()
.status(StatusCode::BAD_REQUEST)
.body(Body::from("invalid_websocket_upgrade_request"))
.unwrap()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid using unwrap() in production code

The unwrap() call could panic if Response::builder() fails. While this is unlikely, it's better to handle the error gracefully.

Consider propagating the error:

-                    Err(_) => Ok(Response::builder()
-                        .status(StatusCode::BAD_REQUEST)
-                        .body(Body::from("invalid_websocket_upgrade_request"))
-                        .unwrap()),
+                    Err(_) => Response::builder()
+                        .status(StatusCode::BAD_REQUEST)
+                        .body(Body::from("invalid_websocket_upgrade_request"))
+                        .map_err(|_| std::convert::Infallible),

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/transcribe-interface/src/lib.rs around lines 71 to 74, the use of
unwrap() on Response::builder().body() can cause a panic if building the
response fails. Replace unwrap() by properly handling the Result returned by
body(), either by propagating the error using the ? operator or by matching on
the Result and returning an appropriate error response to avoid panics in
production code.

Comment on lines +482 to 503
// TODO
while let Some(result) = listen_stream.next().await {
let _meta = result.meta.clone();

// We don't have to do this, and inefficient. But this is what works at the moment.
if let owhisper_interface::StreamResponse::TranscriptResponse {
channel, ..
} = result
{
let updated_words = update_session(&app, &session.id, result.words)
.await
.unwrap();
let words = channel.alternatives.first().unwrap().words.clone();
let words2: Vec<owhisper_interface::Word2> =
words.into_iter().map(|w| w.into()).collect();

// We don't have to do this, and inefficient. But this is what works at the moment.
{
let updated_words =
update_session(&app, &session.id, words2).await.unwrap();

SessionEvent::Words {
words: updated_words,
SessionEvent::Words {
words: updated_words,
}
.emit(&app)
}
.emit(&app)
.unwrap();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential panic and inefficient code structure

Several issues in this transcription handling code:

  1. Line 488: unwrap() on first() will panic if alternatives is empty
  2. The nested block structure (lines 493-502) seems unnecessarily complex
  3. The TODO comment acknowledges inefficiency

Consider this safer and cleaner approach:

-// TODO
 while let Some(result) = listen_stream.next().await {
     if let owhisper_interface::StreamResponse::TranscriptResponse {
         channel, ..
     } = result
     {
-        let words = channel.alternatives.first().unwrap().words.clone();
+        let Some(first_alt) = channel.alternatives.first() else {
+            tracing::warn!("Received transcript response with no alternatives");
+            continue;
+        };
+        let words = first_alt.words.clone();
         let words2: Vec<owhisper_interface::Word2> =
             words.into_iter().map(|w| w.into()).collect();

-        // We don't have to do this, and inefficient. But this is what works at the moment.
-        {
-            let updated_words =
-                update_session(&app, &session.id, words2).await.unwrap();
-
-            SessionEvent::Words {
-                words: updated_words,
-            }
-            .emit(&app)
-        }
-        .unwrap();
+        match update_session(&app, &session.id, words2).await {
+            Ok(updated_words) => {
+                if let Err(e) = SessionEvent::Words { words: updated_words }.emit(&app) {
+                    tracing::error!("Failed to emit words event: {}", e);
+                }
+            }
+            Err(e) => {
+                tracing::error!("Failed to update session: {}", e);
+            }
+        }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TODO
while let Some(result) = listen_stream.next().await {
let _meta = result.meta.clone();
// We don't have to do this, and inefficient. But this is what works at the moment.
if let owhisper_interface::StreamResponse::TranscriptResponse {
channel, ..
} = result
{
let updated_words = update_session(&app, &session.id, result.words)
.await
.unwrap();
let words = channel.alternatives.first().unwrap().words.clone();
let words2: Vec<owhisper_interface::Word2> =
words.into_iter().map(|w| w.into()).collect();
// We don't have to do this, and inefficient. But this is what works at the moment.
{
let updated_words =
update_session(&app, &session.id, words2).await.unwrap();
SessionEvent::Words {
words: updated_words,
SessionEvent::Words {
words: updated_words,
}
.emit(&app)
}
.emit(&app)
.unwrap();
}
while let Some(result) = listen_stream.next().await {
if let owhisper_interface::StreamResponse::TranscriptResponse {
channel, ..
} = result
{
let Some(first_alt) = channel.alternatives.first() else {
tracing::warn!("Received transcript response with no alternatives");
continue;
};
let words = first_alt.words.clone();
let words2: Vec<owhisper_interface::Word2> =
words.into_iter().map(|w| w.into()).collect();
match update_session(&app, &session.id, words2).await {
Ok(updated_words) => {
if let Err(e) =
SessionEvent::Words { words: updated_words }.emit(&app)
{
tracing::error!("Failed to emit words event: {}", e);
}
}
Err(e) => {
tracing::error!("Failed to update session: {}", e);
}
}
}
}
🤖 Prompt for AI Agents
In plugins/listener/src/fsm.rs around lines 482 to 503, avoid panicking by
replacing unwrap() on alternatives.first() with a safe check to handle empty
alternatives gracefully. Simplify the nested block structure by removing
unnecessary braces and directly awaiting update_session and emitting the event.
Refactor the code to improve clarity and maintainability while preserving
functionality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
owhisper/owhisper-client/src/lib.rs (2)

86-86: Coordinate authorization header prefix change across codebase.

The change from "Bearer" to "Token" creates inconsistency with other parts of the codebase that still expect "Bearer". This breaking change requires updating servers, middleware, and all clients to use "Token" consistently.


40-47: Fix URL construction to prevent path duplication.

The code unconditionally appends "listen" to the path, which will cause duplication if the base URL already ends with "listen" (e.g., "https://api.example.com/listen" becomes "https://api.example.com/listen/listen").

 {
     let mut path = url.path().to_string();
     if !path.ends_with('/') {
         path.push('/');
     }
-    path.push_str("listen");
+    if !path.ends_with("listen") && !path.ends_with("listen/") {
+        path.push_str("listen");
+    }
     url.set_path(&path);
 }
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/lib.rs (1)

55-58: Consider making audio parameters configurable.

The hardcoded sample rate "16000" and encoding "linear16" may not match actual audio streams. While this supports Deepgram compatibility, consider making these configurable through ListenParams.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f27b995 and 705ffbb.

📒 Files selected for processing (4)
  • crates/transcribe-whisper-local/src/service/streaming.rs (5 hunks)
  • crates/ws-utils/src/lib.rs (4 hunks)
  • owhisper/owhisper-client/src/lib.rs (7 hunks)
  • owhisper/schema.json (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • owhisper/schema.json
  • crates/transcribe-whisper-local/src/service/streaming.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • owhisper/owhisper-client/src/lib.rs
  • crates/ws-utils/src/lib.rs
🔇 Additional comments (9)
crates/ws-utils/src/lib.rs (4)

27-31: LGTM! Simplified WebSocket message handling.

The direct binary message processing is more efficient than the previous JSON-based approach and correctly handles different message types.


50-50: LGTM! Constructor made public for external usage.

This change enables the new split functions to return ChannelAudioSource instances.


72-122: LGTM! Well-implemented interleaved stereo processing.

The function correctly processes interleaved i16 stereo data, properly normalizes to f32, and handles WebSocket message types appropriately. The use of unbounded channels is suitable for real-time audio streaming.


124-155: LGTM! Efficient dual audio source splitting.

The function elegantly handles f32 stereo pairs using chunks_exact(2) and unzip() for clean separation into mic and speaker channels.

owhisper/owhisper-client/src/lib.rs (5)

94-95: LGTM! Good refactoring to eliminate code duplication.

Extracting the common request building logic improves maintainability and follows DRY principles.

Also applies to: 99-100


110-120: LGTM! Simplified to binary message handling.

The move from JSON chunks to raw bytes is more efficient for audio streaming, and using StreamResponse provides better structured output.


135-146: LGTM! Consistent binary message handling.

The dual client now handles binary data consistently with the single client implementation.


177-198: LGTM! Proper byte-level audio interleaving.

The rewritten interleaving logic correctly handles sample-level interleaving and ensures complete i16 samples, which is much more robust than the previous chunk-level approach.


213-222: LGTM! Test updates for Deepgram compatibility.

The test configuration correctly uses Deepgram's API endpoint and specifies a concrete model for consistent testing.

@yujonglee yujonglee closed this Aug 5, 2025
@yujonglee yujonglee deleted the owhisper-deepgram-compat branch August 5, 2025 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant