Skip to content

Extract local whisper inference as tower service#1268

Merged
yujonglee merged 4 commits intomainfrom
local-whisper-service
Aug 1, 2025
Merged

Extract local whisper inference as tower service#1268
yujonglee merged 4 commits intomainfrom
local-whisper-service

Conversation

@yujonglee
Copy link
Contributor

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 1, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This change introduces a new hypr-transcribe-whisper-local crate providing streaming and recorded transcription services, and refactors the local speech-to-text (STT) plugin to use this crate. The update removes the previous process_recorded command and all associated code, permissions, and tests from the local-stt plugin, consolidating streaming transcription logic within the new crate and simplifying the server implementation.

Changes

Cohort / File(s) Change Summary
Add hypr-transcribe-whisper-local crate
crates/transcribe-whisper-local/*
Introduces a new crate with streaming and recorded transcription services, connection management, error handling, and feature flags for hardware backends. Exports a streaming service via WebSocket and a function for processing recorded audio.
Refactor local-stt plugin to use new crate
plugins/local-stt/Cargo.toml, plugins/local-stt/src/server.rs, plugins/local-stt/src/ext.rs, plugins/local-stt/src/lib.rs, plugins/local-stt/src/manager.rs, plugins/local-stt/src/commands.rs, plugins/local-stt/build.rs, plugins/local-stt/js/bindings.gen.ts
Removes all code, commands, and permissions related to the previous recorded transcription implementation. Switches dependencies and feature flags to the new crate. Simplifies the server to delegate streaming to the new service. Cleans up tests, command exports, and connection management.
Remove recorded transcription permissions
plugins/local-stt/permissions/autogenerated/commands/process_recorded.toml, plugins/local-stt/permissions/autogenerated/reference.md, plugins/local-stt/permissions/default.toml, plugins/local-stt/permissions/schemas/schema.json
Deletes permission files and schema entries for the removed process_recorded command. Updates documentation and schema to reflect the removal.
Add transcribe-interface crate
crates/transcribe-interface/Cargo.toml, crates/transcribe-interface/src/lib.rs
Adds a new, empty interface crate with serialization and error handling dependencies.
AWS transcribe error handling improvements
crates/transcribe-aws/Cargo.toml, crates/transcribe-aws/src/error.rs, crates/transcribe-aws/src/lib.rs
Updates error handling in the AWS transcribe crate to use more specific error types and transparent conversions. Adjusts function signatures to use the new error type.
Workspace dependency updates
Cargo.toml
Adds the new hypr-transcribe-whisper-local crate as a workspace dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Client (WebSocket)
    participant WhisperStreamingService (hypr-transcribe-whisper-local)
    participant WhisperModel
    participant VAD/Chunker

    Client->>WhisperStreamingService: Connect & send audio stream
    WhisperStreamingService->>VAD/Chunker: Chunk audio (VAD)
    VAD/Chunker-->>WhisperStreamingService: Audio chunks
    WhisperStreamingService->>WhisperModel: Transcribe audio chunks
    WhisperModel-->>WhisperStreamingService: Transcription results
    WhisperStreamingService->>Client: Stream JSON transcription results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • Audio file upload groundwork #1078: Adds a processRecorded command and UI for audio file upload/processing, which is now removed or replaced by this PR, making them directly related but with opposing changes to the same feature area.
  • Add language detect constraint #1212: Modifies language handling in the transcription service by changing single language fields to multiple languages and updates the Whisper model builder to accept multiple languages with detection logic, related to the transcription service refactoring in this PR.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6848797 and c58ab7c.

📒 Files selected for processing (1)
  • crates/transcribe-aws/src/error.rs (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch local-whisper-service

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
crates/transcribe-interface/Cargo.toml (1)

6-9: Dependencies declared but not used in empty library.

The dependencies are reasonable for a transcription interface but are currently unused since src/lib.rs is empty. This relates to the unused dependencies issue in the library file.

🧹 Nitpick comments (3)
crates/transcribe-interface/src/lib.rs (1)

1-2: Consider removing unused dependencies or adding placeholder content.

The library file is empty but the Cargo.toml declares dependencies on serde, serde_json, and thiserror that are not used. This violates the coding guideline about unused dependencies.

If this is intentionally a placeholder, consider adding a comment explaining the future purpose or adding minimal placeholder types:

+// Placeholder for shared transcription interfaces
+// TODO: Define common traits and types for transcription services

Alternatively, remove the unused dependencies from Cargo.toml until they're needed.

crates/transcribe-whisper-local/src/service/recorded.rs (1)

55-56: Remove unnecessary clone and address TODO.

The word.clone() is unnecessary since word is not used after this point. Also, the TODO comment should be addressed or removed.

-            // TODO
-            words.push(word.clone());
+            words.push(word);
crates/transcribe-aws/src/lib.rs (1)

1-1: Consider removing or expanding the draft comment.

The comment "AWS draft" provides minimal context. Either remove it or expand it to explain what aspects are still in draft state.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a0d6f5 and 6848797.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (25)
  • Cargo.toml (1 hunks)
  • crates/transcribe-aws/Cargo.toml (1 hunks)
  • crates/transcribe-aws/src/error.rs (1 hunks)
  • crates/transcribe-aws/src/lib.rs (4 hunks)
  • crates/transcribe-interface/Cargo.toml (1 hunks)
  • crates/transcribe-interface/src/lib.rs (1 hunks)
  • crates/transcribe-whisper-local/Cargo.toml (1 hunks)
  • crates/transcribe-whisper-local/src/error.rs (1 hunks)
  • crates/transcribe-whisper-local/src/lib.rs (1 hunks)
  • crates/transcribe-whisper-local/src/manager.rs (1 hunks)
  • crates/transcribe-whisper-local/src/service/mod.rs (1 hunks)
  • crates/transcribe-whisper-local/src/service/recorded.rs (1 hunks)
  • crates/transcribe-whisper-local/src/service/streaming.rs (1 hunks)
  • plugins/local-stt/Cargo.toml (3 hunks)
  • plugins/local-stt/build.rs (0 hunks)
  • plugins/local-stt/js/bindings.gen.ts (0 hunks)
  • plugins/local-stt/permissions/autogenerated/commands/process_recorded.toml (0 hunks)
  • plugins/local-stt/permissions/autogenerated/reference.md (0 hunks)
  • plugins/local-stt/permissions/default.toml (0 hunks)
  • plugins/local-stt/permissions/schemas/schema.json (1 hunks)
  • plugins/local-stt/src/commands.rs (0 hunks)
  • plugins/local-stt/src/ext.rs (1 hunks)
  • plugins/local-stt/src/lib.rs (1 hunks)
  • plugins/local-stt/src/manager.rs (0 hunks)
  • plugins/local-stt/src/server.rs (4 hunks)
💤 Files with no reviewable changes (7)
  • plugins/local-stt/permissions/default.toml
  • plugins/local-stt/build.rs
  • plugins/local-stt/permissions/autogenerated/reference.md
  • plugins/local-stt/src/commands.rs
  • plugins/local-stt/permissions/autogenerated/commands/process_recorded.toml
  • plugins/local-stt/js/bindings.gen.ts
  • plugins/local-stt/src/manager.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/transcribe-interface/src/lib.rs
  • crates/transcribe-whisper-local/src/service/mod.rs
  • plugins/local-stt/src/lib.rs
  • crates/transcribe-whisper-local/src/error.rs
  • crates/transcribe-whisper-local/src/manager.rs
  • crates/transcribe-aws/src/lib.rs
  • crates/transcribe-whisper-local/src/lib.rs
  • crates/transcribe-whisper-local/src/service/recorded.rs
  • plugins/local-stt/src/ext.rs
  • crates/transcribe-aws/src/error.rs
  • plugins/local-stt/src/server.rs
  • crates/transcribe-whisper-local/src/service/streaming.rs
🪛 GitHub Actions: .github/workflows/fmt.yaml
crates/transcribe-aws/src/error.rs

[error] 5-5: dprint formatting error: left behind trailing whitespace at line 5. rustfmt failed to format due to this error.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: ci
  • GitHub Check: ci (windows, windows-latest)
  • GitHub Check: ci (macos, macos-latest)
🔇 Additional comments (17)
crates/transcribe-aws/Cargo.toml (1)

24-25: LGTM! AWS Smithy dependencies added for error handling.

The addition of aws-smithy-runtime-api and aws-smithy-types dependencies supports the error handling refactoring mentioned in the summary. The specific versions ensure reproducible builds.

Cargo.toml (1)

66-66: LGTM! Workspace dependency correctly added.

The new hypr-transcribe-whisper-local workspace dependency properly integrates the new crate following the established pattern of other workspace dependencies.

crates/transcribe-whisper-local/src/lib.rs (1)

1-6: LGTM! Clean module facade following Rust conventions.

The selective re-export of error and service modules while keeping manager internal demonstrates good API design and encapsulation.

crates/transcribe-whisper-local/src/service/mod.rs (1)

1-5: LGTM! Standard service module structure.

The use of wildcard re-exports appropriately exposes the streaming and recorded service APIs.

plugins/local-stt/permissions/schemas/schema.json (1)

454-457: Permission schema correctly updated to reflect command removal.

The default permission descriptions properly exclude the removed allow-process-recorded permission, maintaining consistency with the architectural changes.

plugins/local-stt/src/lib.rs (2)

32-45: Command list correctly updated.

The removal of process_recorded command from the Specta builder aligns with the architectural refactor to extract transcription functionality.


9-9: Visibility pattern is intentional and correct.

The server module is kept private (mod server) but its public items are re-exported via pub use server::*, allowing external code to access only the intended symbols without exposing the module’s internal path. No changes needed.

crates/transcribe-whisper-local/src/service/recorded.rs (1)

3-61: Well-structured audio processing pipeline.

The function follows a clear workflow: decode → resample → segment → transcribe → format results. The integration of rodio, hypr libraries, and Whisper/pyannote models is appropriate for recorded audio processing.

crates/transcribe-whisper-local/src/manager.rs (1)

19-19: Handle potential mutex poisoning.

Using unwrap() on mutex lock can panic if the mutex is poisoned. Consider handling this error case gracefully.

-        let mut slot = self.inner.lock().unwrap();
+        let mut slot = match self.inner.lock() {
+            Ok(guard) => guard,
+            Err(poisoned) => poisoned.into_inner(),
+        };

Likely an incorrect or invalid review comment.

plugins/local-stt/src/ext.rs (1)

117-117: LGTM!

The change from default() to builder() follows a clearer builder pattern convention.

crates/transcribe-aws/src/error.rs (1)

3-21: Excellent error handling refactor!

The change from generic string-based errors to specific SDK error wrappers with transparent error propagation is a significant improvement. This provides better error context and type safety.

crates/transcribe-whisper-local/Cargo.toml (1)

1-38: Well-structured crate manifest!

The Cargo.toml is properly configured with:

  • Clear feature flags for various hardware acceleration backends
  • Consistent use of workspace dependencies
  • Appropriate dependency selection for transcription functionality
plugins/local-stt/Cargo.toml (1)

16-24: LGTM! Clean dependency consolidation.

The refactoring successfully consolidates multiple crates (hypr-whisper-local, hypr-pyannote-local) into a single hypr-transcribe-whisper-local crate, improving modularity. The removal of the "multipart" feature from axum aligns with the elimination of the process_recorded functionality.

Also applies to: 48-48, 65-65

plugins/local-stt/src/server.rs (2)

98-100: LGTM! Clean server implementation and appropriate test coverage.

The simplified health endpoint and the focused test for the health check are well-implemented. The server structure is now more maintainable with the extraction of transcription logic to the dedicated service.

Also applies to: 102-130


80-86: Add error handling for service initialization.

The WhisperStreamingService::builder().build() call could potentially fail if the model path is invalid or inaccessible. Consider handling potential initialization errors.

Consider wrapping the service creation in error handling:

-fn make_service_router(state: ServerState) -> Router {
+fn make_service_router(state: ServerState) -> Result<Router, crate::Error> {
     let model_path = state.model_cache_dir.join(state.model_type.file_name());
 
     let whisper_service = hypr_transcribe_whisper_local::WhisperStreamingService::builder()
         .model_path(model_path)
         .build();
 
-    Router::new()
+    Ok(Router::new()
         .route("/health", get(health))
         .route_service("/api/desktop/listen/realtime", whisper_service)
         .layer(
             CorsLayer::new()
                 .allow_origin(cors::Any)
                 .allow_methods(cors::Any)
                 .allow_headers(cors::Any),
-        )
+        ))
 }

And update the caller:

-    let router = make_service_router(state);
+    let router = make_service_router(state)?;

Likely an incorrect or invalid review comment.

crates/transcribe-whisper-local/src/service/streaming.rs (2)

59-102: Excellent service architecture and implementation.

The streaming service is well-designed with:

  • Proper use of the tower Service trait
  • Clean separation between single and dual channel handling
  • Good connection management with cancellation support
  • Appropriate error logging in the VAD stream processor

The WebSocket upgrade handling and async stream processing demonstrate good understanding of async Rust patterns.

Also applies to: 104-194, 196-283


49-56: Replace unwrap() calls with proper error handling.

The builder's build() method uses unwrap() which will panic if required fields are not set. Consider returning a Result or using a type-state pattern to ensure compile-time safety.

-    pub fn build(self) -> WhisperStreamingService {
-        WhisperStreamingService {
-            model_path: self.model_path.unwrap(),
-            connection_manager: self
-                .connection_manager
-                .unwrap_or_else(ConnectionManager::default),
-        }
-    }
+    pub fn build(self) -> Result<WhisperStreamingService, &'static str> {
+        let model_path = self.model_path.ok_or("model_path is required")?;
+        Ok(WhisperStreamingService {
+            model_path,
+            connection_manager: self
+                .connection_manager
+                .unwrap_or_else(ConnectionManager::default),
+        })
+    }

Likely an incorrect or invalid review comment.

@yujonglee yujonglee merged commit 48a206f into main Aug 1, 2025
5 of 6 checks passed
@yujonglee yujonglee deleted the local-whisper-service branch August 1, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant