Skip to content

Refactor with continuous VAD stream + AI bump#1358

Merged
yujonglee merged 6 commits intomainfrom
continuous-vad
Aug 17, 2025
Merged

Refactor with continuous VAD stream + AI bump#1358
yujonglee merged 6 commits intomainfrom
continuous-vad

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Aug 17, 2025

Summary by cubic

Refactored audio processing to a new VAD crate with a continuous stream that emits audio and speech events, and updated transcription services to use a simpler speech_chunks API. Also bumped llama and whisper dependencies and adapted code to the new whisper API.

  • New Features

    • Added hypr-vad with ContinuousVadStream that outputs AudioSamples, SpeechStart, and SpeechEnd events without dropping audio.
    • Introduced VadExt::speech_chunks(redemption_time) to return finalized speech chunks; replaces the old vad_chunks API.
    • Removed the old chunker crate and updated transcribe-moonshine, transcribe-whisper-local, and listener to use hypr_vad and speech_chunks.
    • Added tests to verify no audio loss in continuous VAD and robust speech chunking.
  • Refactors

    • Ported whisper-local to whisper-rs 0.15.0: use get_segment, timestamps now f64 seconds, confidence set to 1.0 (upstream API change).
    • Updated llama-cpp-2/-sys to branch update-llama-cpp-2025-08-16 (bindgen 0.72.0). Switched whisper-rs to crates.io.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 17, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Replaced the hypr-chunker crate with a new hypr-vad crate across the workspace, removing the old chunker implementation and exports and adding a continuous VAD stream and speech-chunking API. Updated downstream services, some crate manifests, and Whisper/LLM dependency pins.

Changes

Cohort / File(s) Summary
Workspace deps switch (chunker → vad)
Cargo.toml, crates/transcribe-moonshine/Cargo.toml, crates/transcribe-whisper-local/Cargo.toml, plugins/listener/Cargo.toml
Removed hypr-chunker entries and added hypr-vad in workspace and crate dependencies.
Remove chunker crate API
crates/chunker/src/chunker.rs, crates/chunker/src/lib.rs
Deleted the chunker implementation (ChunkStream, VadChunkStream, VadExt, AudioChunk) and removed crate root re-exports; chunker crate public surface removed.
Introduce hypr-vad API
crates/vad/src/continuous.rs, crates/vad/src/lib.rs, crates/vad/Cargo.toml
Added ContinuousVadStream, VadStreamItem, AudioChunk, VadExt extension, error module and tests; renamed package to vad and added rodio as a dev-dependency.
Streaming services migrate to hypr-vad
crates/transcribe-moonshine/src/service/streaming.rs, crates/transcribe-whisper-local/src/service/streaming.rs
Switched imports and stream item bounds to hypr_vad and replaced vad_chunks() calls with speech_chunks(); updated generic stream bounds to hypr_vad::AudioChunk.
LLM dep branch bump
crates/llama/Cargo.toml
Updated branch for llama-cpp-2 and llama-cpp-sys-2 to update-llama-cpp-2025-08-16.
Whisper dependency source change
crates/whisper-local/Cargo.toml
Replaced git-sourced whisper-rs with crates.io version = "0.15.0" (same features).
Whisper-local API adjustments
crates/whisper-local/src/model.rs
Migrated to WhisperTokenId, removed token_eot, changed Segment timestamps to f64, removed per-segment confidence calculation, and updated related callbacks/accessors.

Sequence Diagram(s)

sequenceDiagram
  participant Mic as AsyncSource (mic/speaker)
  participant VAD as ContinuousVadStream (hypr-vad)
  participant Svc as Streaming Service
  participant ASR as Transcriber

  Mic->>VAD: stream audio samples
  VAD-->>Svc: VadStreamItem::AudioSamples
  VAD-->>Svc: VadStreamItem::SpeechStart
  VAD-->>Svc: VadStreamItem::SpeechEnd(samples, timestamps)
  Svc->>ASR: AudioChunk(samples, start, end)
  ASR-->>Svc: transcription segments
Loading
sequenceDiagram
  participant Source as AsyncSource
  participant Ext as VadExt::speech_chunks()
  participant Consumer as process_vad_stream

  Source->>Ext: build VadConfig (redemption_time)
  Ext->>Ext: run ContinuousVadStream -> detect SpeechEnd -> emit AudioChunk
  Ext-->>Consumer: Stream<Result<AudioChunk, Error>>
  Consumer-->>Downstream: forward SimpleAudioChunk
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Support Windows audio #1056 — related to changes in the vad crate's API/exports that overlap with adding hypr-vad.
  • Tweak accelerators #993 — modifies the same crates/llama/Cargo.toml dependency branch updates.
  • Filter silence #1200 — touches the VAD/chunker surface and types (ChunkStream/VadChunkStream, VadExt, AudioChunk), closely related to this replacement.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch continuous-vad

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

1 issue found across 17 files • Review in cubic

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (9)
crates/whisper-local/src/model.rs (2)

158-160: Consider tracking confidence calculation as technical debt

The hardcoded confidence value of 1.0 with the comment about API changes suggests this is a temporary workaround. Consider creating a tracking issue for properly implementing confidence scoring once the new whisper-rs API is better understood.

Would you like me to create an issue to track the re-implementation of confidence calculation once the whisper-rs v1.7.6+ API is better documented?


322-322: Consider removing or conditionalizing debug output

The println! statement appears to be debug output that should be removed or wrapped in a debug flag before merging to production.

         let segments = whisper.transcribe(&audio).unwrap();
-        println!("segments: {:#?}", segments);
+        #[cfg(debug_assertions)]
+        println!("segments: {:#?}", segments);
         assert!(segments.len() > 0);
crates/vad/src/lib.rs (3)

15-45: Equality test for continuous VAD audio passthrough — OK, but heavy

Exact f32 equality is acceptable here since the VAD path does not transform samples; however, this test decodes and buffers the entire asset twice, which is heavy for CI.

Consider reducing test asset size or marking as ignore for quick CI cycles and enabling in a nightly/extended test run.


47-69: Replace magic number 16.0 with explicit sample rate

Avoid magic constants; derive time from an explicit sample rate for clarity and maintainability.

Apply:

-        let how_many_sec = (all_audio_from_vad.len() as f64 / 16.0) / 1000.0;
+        const SAMPLE_RATE_HZ: f64 = 16_000.0;
+        let how_many_sec = all_audio_from_vad.len() as f64 / SAMPLE_RATE_HZ;

70-80: Avoid writing artifacts during tests

Writing ./test.wav in tests pollutes the workspace and can cause conflicts in parallel runs. Gate it behind an env var or write to a temp dir.

Apply:

-        let wav = hound::WavSpec {
-            channels: 1,
-            sample_rate: 16000,
-            bits_per_sample: 32,
-            sample_format: hound::SampleFormat::Float,
-        };
-        let mut writer = hound::WavWriter::create("./test.wav", wav).unwrap();
-        for sample in all_audio_from_vad {
-            writer.write_sample(sample).unwrap();
-        }
+        if std::env::var("WRITE_VAD_TEST_WAV").is_ok() {
+            let wav = hound::WavSpec {
+                channels: 1,
+                sample_rate: 16000,
+                bits_per_sample: 32,
+                sample_format: hound::SampleFormat::Float,
+            };
+            let mut writer = hound::WavWriter::create("./test.wav", wav).unwrap();
+            for sample in all_audio_from_vad {
+                writer.write_sample(sample).unwrap();
+            }
+        }
crates/transcribe-whisper-local/src/service/streaming.rs (3)

157-160: Rename var to reflect new semantics (speech chunks, not VAD frames).

Minor readability nit: the source now yields speech chunks; the variable name can follow suit.

Apply this diff:

-    let vad_chunks = audio_source.speech_chunks(redemption_time);
+    let speech_chunks = audio_source.speech_chunks(redemption_time);
 
-    let chunked = hypr_whisper_local::AudioChunkStream(process_vad_stream(vad_chunks, "mixed"));
+    let chunked = hypr_whisper_local::AudioChunkStream(process_vad_stream(speech_chunks, "mixed"));

176-178: Consistent naming: mic_speech_chunks.

Align naming with the new API for clarity.

-        let mic_vad_chunks = mic_source.speech_chunks(redemption_time);
-        hypr_whisper_local::AudioChunkStream(process_vad_stream(mic_vad_chunks, "mic"))
+        let mic_speech_chunks = mic_source.speech_chunks(redemption_time);
+        hypr_whisper_local::AudioChunkStream(process_vad_stream(mic_speech_chunks, "mic"))

181-183: Consistent naming: speaker_speech_chunks.

Same nit for the speaker side.

-        let speaker_vad_chunks = speaker_source.speech_chunks(redemption_time);
-        hypr_whisper_local::AudioChunkStream(process_vad_stream(speaker_vad_chunks, "speaker"))
+        let speaker_speech_chunks = speaker_source.speech_chunks(redemption_time);
+        hypr_whisper_local::AudioChunkStream(process_vad_stream(speaker_speech_chunks, "speaker"))
crates/vad/src/continuous.rs (1)

57-69: Consider caching the source stream to avoid rebuilding per poll.

Recreating as_stream() and pinning it on every poll_next can add overhead. If as_stream() isn’t a trivial proxy, store it once in the struct and poll that directly.

If you choose to refactor, add a source_stream field initialized in new() and use it in poll_next instead of re-deriving each time.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 285784d and bda8896.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • Cargo.toml (1 hunks)
  • crates/chunker/src/chunker.rs (0 hunks)
  • crates/chunker/src/lib.rs (0 hunks)
  • crates/llama/Cargo.toml (1 hunks)
  • crates/transcribe-moonshine/Cargo.toml (1 hunks)
  • crates/transcribe-moonshine/src/service/streaming.rs (4 hunks)
  • crates/transcribe-whisper-local/Cargo.toml (1 hunks)
  • crates/transcribe-whisper-local/src/service/streaming.rs (4 hunks)
  • crates/vad/Cargo.toml (1 hunks)
  • crates/vad/src/continuous.rs (1 hunks)
  • crates/vad/src/lib.rs (1 hunks)
  • crates/whisper-local/Cargo.toml (1 hunks)
  • crates/whisper-local/src/model.rs (8 hunks)
  • plugins/listener/Cargo.toml (1 hunks)
💤 Files with no reviewable changes (2)
  • crates/chunker/src/lib.rs
  • crates/chunker/src/chunker.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/transcribe-whisper-local/src/service/streaming.rs
  • crates/vad/src/lib.rs
  • crates/transcribe-moonshine/src/service/streaming.rs
  • crates/whisper-local/src/model.rs
  • crates/vad/src/continuous.rs
🧬 Code Graph Analysis (4)
crates/transcribe-whisper-local/src/service/streaming.rs (1)
crates/transcribe-moonshine/src/service/streaming.rs (1)
  • process_vad_stream (220-302)
crates/vad/src/lib.rs (1)
crates/vad/src/continuous.rs (1)
  • new (39-54)
crates/transcribe-moonshine/src/service/streaming.rs (1)
crates/transcribe-whisper-local/src/service/streaming.rs (1)
  • process_vad_stream (274-303)
crates/whisper-local/src/model.rs (1)
crates/transcribe-whisper-local/src/service/streaming.rs (1)
  • params (96-100)
🔇 Additional comments (20)
crates/llama/Cargo.toml (1)

19-20: Verify the branch name format consistency

The branch name update-llama-cpp-2025-08-16 appears to use a future date (August 16, 2025). This seems unusual as we're currently in early 2025. Please verify if this is intentional or if it should be 2024-08-16 instead.

crates/whisper-local/Cargo.toml (1)

32-32: LGTM! Migration to crates.io version improves stability

Good move switching from a Git revision to the published crates.io version 0.15.0. This improves dependency stability and makes builds more reproducible.

crates/whisper-local/src/model.rs (3)

8-8: LGTM! Consistent type migration to WhisperTokenId

The migration from WhisperToken to WhisperTokenId is consistently applied throughout the file, which aligns with the whisper-rs 0.15.0 API changes.


134-146: LGTM! Proper segment API migration

The updated segment retrieval logic correctly handles the new API:

  • Uses full_n_segments() for count
  • Properly handles optional segments with match
  • Correctly converts timestamps to seconds with /100.0 division

268-269: LGTM! Consistent f64 precision for timestamps

The migration from f32 to f64 for timestamp fields improves precision, which is beneficial for accurate time calculations, especially for longer audio segments.

Also applies to: 283-292

crates/transcribe-whisper-local/Cargo.toml (1)

27-27: LGTM! Consistent VAD migration

The replacement of hypr-chunker with hypr-vad aligns with the workspace-wide migration to the new VAD streaming architecture.

crates/vad/Cargo.toml (2)

2-2: LGTM! Clear package rename from chunker to vad

The package rename from "chunker" to "vad" better reflects the module's purpose and aligns with the workspace-wide VAD migration.


9-9: LGTM! Appropriate dev dependency for audio testing

Adding rodio as a dev dependency is appropriate for testing VAD functionality with audio streams.

crates/transcribe-moonshine/Cargo.toml (1)

18-18: No Remaining hypr-chunker References – Dependency Swap Approved

I’ve searched the entire codebase (code, TOML manifests, lockfile) and found no lingering chunker or hypr-chunker references. The switch to hypr-vad = { workspace = true } is consistent with the PR objectives and other crates.

plugins/listener/Cargo.toml (1)

30-30: Approve dependency update to hypr-vad
Updated plugins/listener/Cargo.toml (line 30) to use the workspace-wide hypr-vad per the VAD migration. Ran a recursive grep for any hypr-chunker or chunker:: occurrences and found none.

Cargo.toml (1)

74-74: Confirm complete migration of VAD crate — LGTM

All automated checks passed:

  • No leftover references to hypr-chunker.
  • Verified usage of VadExt::with_vad and speech_chunks in dependent crates.
  • No duplicate vad entries in Cargo.toml.

The addition of

hypr-vad = { path = "crates/vad", package = "vad" }

to workspace dependencies is correct. Approving the change.

crates/transcribe-moonshine/src/service/streaming.rs (5)

191-193: Mic stream migrated to speech_chunks — consistent

The mic path mirrors the single-channel change; looks good.


196-198: Speaker stream migrated to speech_chunks — consistent

The speaker path mirrors the mic path; looks good.


226-226: Stream bound now returns hypr_vad::AudioChunk — correct

The new bound matches the new chunk type; downstream consumption of chunk.samples remains valid.


21-21: Verify .speech_chunks Usage on WebSocketAudioSource

The optional sanity check confirms:

  • .speech_chunks(...) is called on WebSocketAudioSource in crates/transcribe-moonshine/src/service/streaming.rs (lines 174, 191, 196).
  • The WebSocketAudioSource struct is declared in crates/ws-utils/src/lib.rs (lines 81–84), but there is no direct impl Source for WebSocketAudioSource in that crate.

Before approving, please ensure that WebSocketAudioSource implements the Source trait from hypr_vad, either by

  • adding impl hypr_vad::Source for WebSocketAudioSource { … } in crates/ws-utils/src/lib.rs, or
  • confirming that a blanket implementation (impl<T: Source> VadExt for T) in hypr_vad covers this type via another path.

Once WebSocketAudioSource: Source is established, the .speech_chunks(...) extension method will resolve correctly.


174-175: speech_chunks migration: redemption window clamp verified

The redemption window in crates/transcribe-moonshine/src/service/streaming.rs is still clamped to 300–1200 ms (default 500 ms) and passed unchanged into speech_chunks. The calls for audio_source, mic_source, and speaker_source now all use VAD-driven chunking with the correct redemption window:

  • Lines 151–155:
    let redemption_time = Duration::from_millis(std::cmp::min(
        std::cmp::max(params.redemption_time_ms.unwrap_or(500), 300),
        1200,
    ));
  • Line 174 (and similarly at 191, 196):
    let vad_chunks = audio_source.speech_chunks(redemption_time);
crates/vad/src/lib.rs (1)

1-6: Re-exports establish a clean crate root

pub use continuous::*; and pub use error::*; provide a concise public surface. No unused items evident.

crates/transcribe-whisper-local/src/service/streaming.rs (2)

20-20: Switch to hypr_vad import looks correct.

The VadExt trait is used below via speech_chunks(...) and compiles cleanly with the new crate path.


279-281: Stream bound updated to hypr_vad::AudioChunk — LGTM.

The generic bound now matches speech_chunks’s item type; downstream mapping remains compatible.

crates/vad/src/continuous.rs (1)

38-55: Constructor sets sample rate and 30ms framing — LGTM.

Syncing config.sample_rate with the source and establishing a 30ms frame size fits Silero’s expectations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
crates/vad/src/continuous.rs (2)

184-199: Good: chunk emission uses SpeechEnd-provided samples (prevents leading-edge truncation).

Emitting chunks from VadTransition::SpeechEnd samples fixes the “start-of-speech cut-off” issue noted in earlier reviews and aligns with continuous VAD semantics.


161-168: Avoid panic in library code: make with_vad fallible and handle it in speech_chunks.

Unwrapping here will crash callers if VAD construction fails. Propagate the error and emit a one-shot error stream from speech_chunks instead.

Apply this diff:

@@
 pub trait VadExt: AsyncSource + Sized {
-    fn with_vad(self, config: VadConfig) -> ContinuousVadStream<Self>
+    fn with_vad(self, config: VadConfig) -> Result<ContinuousVadStream<Self>, crate::Error>
     where
         Self: Unpin,
     {
-        ContinuousVadStream::new(self, config).unwrap()
+        ContinuousVadStream::new(self, config)
     }
@@
-    fn speech_chunks(
+    fn speech_chunks(
         self,
         redemption_time: Duration,
-    ) -> impl Stream<Item = Result<AudioChunk, crate::Error>>
+    ) -> Pin<Box<dyn Stream<Item = Result<AudioChunk, crate::Error>>>>
     where
         Self: Unpin + 'static,
     {
         let config = VadConfig {
             redemption_time,
             pre_speech_pad: redemption_time,
             post_speech_pad: Duration::from_millis(0),
             min_speech_time: Duration::from_millis(50),
             ..Default::default()
         };
 
-        self.with_vad(config).filter_map(|item| {
-            future::ready(match item {
-                Ok(VadStreamItem::SpeechEnd {
-                    samples,
-                    start_timestamp_ms,
-                    end_timestamp_ms,
-                }) => Some(Ok(AudioChunk {
-                    samples,
-                    start_timestamp_ms,
-                    end_timestamp_ms,
-                })),
-                Ok(_) => None,
-                Err(e) => Some(Err(e)),
-            })
-        })
+        match self.with_vad(config) {
+            Ok(vad) => Box::pin(vad.filter_map(|item| {
+                future::ready(match item {
+                    Ok(VadStreamItem::SpeechEnd {
+                        samples,
+                        start_timestamp_ms,
+                        end_timestamp_ms,
+                    }) => Some(Ok(AudioChunk {
+                        samples,
+                        start_timestamp_ms,
+                        end_timestamp_ms,
+                    })),
+                    Ok(_) => None,
+                    Err(e) => Some(Err(e)),
+                })
+            })),
+            Err(e) => Box::pin(futures_util::stream::once(future::ready(Err(e)))),
+        }
     }
 }

Note: This changes the extension trait’s API; ensure downstream call sites are updated accordingly.

Also applies to: 169-199

🧹 Nitpick comments (2)
crates/vad/src/continuous.rs (2)

44-47: Avoid float rounding for chunk size; use integer math.

Floating-point conversion can introduce off-by-one sample drift across long streams. Compute 30ms frames with integer arithmetic.

-        let chunk_duration = Duration::from_millis(30);
-        let chunk_samples = (chunk_duration.as_secs_f64() * config.sample_rate as f64) as usize;
+        const CHUNK_MS: u64 = 30;
+        // 30ms worth of samples at the configured sample rate
+        let chunk_samples = ((config.sample_rate as u64) * CHUNK_MS / 1000) as usize;

84-116: Duplicate chunk-processing logic; consider DRYing with a helper.

The “process -> enqueue AudioSamples + transitions -> return first item” sequence appears twice. Extract to a private helper to reduce branching and keep behavior consistent.

Example shape:

impl<S: AsyncSource + Unpin> ContinuousVadStream<S> {
    fn process_and_enqueue(&mut self, chunk: Vec<f32>) -> Result<Option<VadStreamItem>, crate::Error> {
        match self.vad_session.process(&chunk) {
            Ok(transitions) => {
                self.pending_items.push_back(VadStreamItem::AudioSamples(chunk));
                for t in transitions {
                    match t {
                        VadTransition::SpeechStart { timestamp_ms } =>
                            self.pending_items.push_back(VadStreamItem::SpeechStart { timestamp_ms }),
                        VadTransition::SpeechEnd { start_timestamp_ms, end_timestamp_ms, samples } =>
                            self.pending_items.push_back(VadStreamItem::SpeechEnd { start_timestamp_ms, end_timestamp_ms, samples }),
                    }
                }
                Ok(self.pending_items.pop_front())
            }
            Err(e) => Err(crate::Error::VadProcessingFailed(e.to_string())),
        }
    }
}

Then use it in both sites to return Poll::Ready(Some(Ok(item))) or Err accordingly.

Also applies to: 126-157

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bda8896 and 89d8f28.

📒 Files selected for processing (1)
  • crates/vad/src/continuous.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/vad/src/continuous.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ci (windows, windows-latest)
  • GitHub Check: ci (macos, macos-latest)
🔇 Additional comments (3)
crates/vad/src/continuous.rs (3)

69-71: Confirm AsyncSource::as_stream semantics; recreating the stream every poll.

poll_next constructs a fresh pinned stream view on each call. This is fine if as_stream returns a lightweight borrowed view into S and doesn’t maintain internal waker/state. If it constructs a stateful stream, this can regress wakeups or waste work.

  • Please confirm as_stream returns a borrow-based, stateless view into the underlying source. If not, consider storing a pinned stream inside ContinuousVadStream to avoid recreations.

169-199: End-of-stream speech finalization: verify last chunk isn’t lost if stream ends mid-speech.

If the input ends without enough trailing silence to trigger a VadTransition::SpeechEnd, the final speech segment may never be emitted by speech_chunks. Ensure silero-rs finalizes on EOF, or add a flush strategy (e.g., feeding synthetic silence or a library-provided finalize).

  • Please confirm via tests that a speech segment that ends at EOS still yields a SpeechEnd event and chunk.

12-23: API shape looks solid.

Public stream items and chunk struct provide the right granularity for downstream services and testing.

Also applies to: 25-30

@yujonglee yujonglee merged commit 2ec859c into main Aug 17, 2025
6 checks passed
@yujonglee yujonglee deleted the continuous-vad branch August 17, 2025 17:05
@coderabbitai coderabbitai bot mentioned this pull request Sep 1, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant