Skip to content

feat(owhisper-client): add response parsing utilities#2111

Merged
yujonglee merged 1 commit intomainfrom
devin/1764837210-owhisper-parsing-utilities
Dec 4, 2025
Merged

feat(owhisper-client): add response parsing utilities#2111
yujonglee merged 1 commit intomainfrom
devin/1764837210-owhisper-parsing-utilities

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 4, 2025

feat(owhisper-client): add response parsing utilities

Summary

Adds a new parsing.rs module to the owhisper-client crate with shared utilities for response parsing, then refactors three STT adapters to use these utilities.

New utilities in parsing.rs:

  • parse_speaker_id(value: &str) -> Option<usize> - Parse speaker ID from various formats
  • ms_to_secs(ms: u64) -> f64 - Convert milliseconds to seconds
  • ms_to_secs_opt(ms: Option<u64>) -> f64 - Convert optional milliseconds to seconds
  • HasTimeSpan trait with start_time()/end_time() methods, implemented for Word
  • calculate_time_span<T: HasTimeSpan>(words: &[T]) -> (f64, f64) - Calculate start time and duration
  • WordBuilder - Fluent API for constructing Word structs

Refactored adapters:

  • soniox/live.rs - Uses ms_to_secs_opt and WordBuilder
  • assemblyai/live.rs - Uses ms_to_secs, calculate_time_span, and WordBuilder
  • fireworks/live.rs - Uses WordBuilder

Review & Testing Checklist for Human

  • Verify WordBuilder defaults match original behavior: punctuated_word is set to Some(word.clone()) and confidence defaults to 1.0
  • Note: parse_speaker_id is currently unused (generates compiler warning) - soniox adapter has its own SpeakerId enum. Consider if this should be removed or if soniox should be refactored to use it.
  • calculate_time_span is only used in assemblyai - soniox/fireworks still have inline time span logic due to different source types (tokens vs words)

Test plan: Run cargo test -p owhisper-client to verify all unit tests pass. Integration tests are ignored (require API keys).

Notes

- Add parsing.rs with utility functions and traits:
  - parse_speaker_id: Parse speaker ID from various formats
  - ms_to_secs: Convert milliseconds to seconds
  - ms_to_secs_opt: Convert optional milliseconds to seconds
  - HasTimeSpan trait with start_time/end_time methods
  - calculate_time_span: Calculate start time and duration from word list
  - WordBuilder: Fluent API for constructing Word structs

- Refactor adapters to use new utilities:
  - soniox/live.rs: Use ms_to_secs_opt and WordBuilder
  - assemblyai/live.rs: Use ms_to_secs, calculate_time_span, WordBuilder
  - fireworks/live.rs: Use WordBuilder

This eliminates repetitive Word construction and timestamp conversion code.

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit 90c22cd
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/6931479171e6c300083d062a
😎 Deploy Preview https://deploy-preview-2111--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Dec 4, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit 90c22cd
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/69314791186e960008047d12
😎 Deploy Preview https://deploy-preview-2111--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

Introduces a new parsing module with utilities for word timing and construction (WordBuilder, time converters), then refactors multiple live adapters (AssemblyAI, Fireworks, Soniox) to use WordBuilder instead of direct Word struct construction with standardized time conversion.

Changes

Cohort / File(s) Summary
New parsing module
owhisper/owhisper-client/src/adapter/parsing.rs, owhisper/owhisper-client/src/adapter/mod.rs
Adds parsing module with WordBuilder fluent builder for Word construction, HasTimeSpan trait for accessing word timing, time conversion utilities (ms_to_secs, ms_to_secs_opt), speaker ID parsing, and time-span calculation. Re-exports module in adapter's public API.
Live adapter refactoring
owhisper/owhisper-client/src/adapter/assemblyai/live.rs, owhisper/owhisper-client/src/adapter/fireworks/live.rs, owhisper/owhisper-client/src/adapter/soniox/live.rs
Replaces direct Word struct construction with WordBuilder-based construction across all three adapters. Updates imports to use WordBuilder and time conversion utilities from parsing module. Adjusts start/end time computation via ms_to_secs and ms_to_secs_opt, and replaces manual span calculation with calculate_time_span().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring attention:

  • WordBuilder implementation correctness and field initialization logic
  • Each adapter's specific application of WordBuilder to ensure field mapping aligns with original construction
  • Time conversion edge cases (e.g., ms_to_secs_opt handling of None values)
  • Consistency of calculate_time_span() usage across adapters

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding response parsing utilities to the owhisper-client crate, which matches the raw summary showing a new parsing.rs module with utilities and refactored adapters.
Description check ✅ Passed The description provides comprehensive details about the new parsing utilities, refactored adapters, and testing checklist, which directly relates to the changeset's addition of parsing.rs module and adapter refactoring.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764837210-owhisper-parsing-utilities

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
owhisper/owhisper-client/src/adapter/soniox/live.rs (1)

2-8: WordBuilder usage is solid; consider clamping negative durations

The move to WordBuilder with ms_to_secs_opt for token start/end and speaker IDs preserves prior behavior and centralizes word defaults, which is good.

One small improvement: because ms_to_secs_opt returns 0.0 when a timestamp is None, (end_secs - start_secs) can go negative if end_ms is missing or less than start_ms. If the downstream code assumes non‑negative durations, consider clamping:

-        let (start, duration) = if let (Some(first), Some(last)) = (tokens.first(), tokens.last()) {
-            let start_secs = ms_to_secs_opt(first.start_ms);
-            let end_secs = ms_to_secs_opt(last.end_ms);
-            (start_secs, end_secs - start_secs)
+        let (start, duration) = if let (Some(first), Some(last)) = (tokens.first(), tokens.last()) {
+            let start_secs = ms_to_secs_opt(first.start_ms);
+            let end_secs = ms_to_secs_opt(last.end_ms);
+            let duration = (end_secs - start_secs).max(0.0);
+            (start_secs, duration)

Also applies to: 233-245, 247-250

owhisper/owhisper-client/src/adapter/parsing.rs (2)

22-46: HasTimeSpan + calculate_time_span abstraction is clean; consider guarding duration

The HasTimeSpan trait and its Word implementation give a nice generic hook for computing (start, duration) from sequences, and calculate_time_span removes duplicated first/last logic across adapters.

If there’s any chance callers pass unsorted items or data where end_time < start_time, you may want to defensively clamp the duration to non‑negative:

-        (Some(first), Some(last)) => {
-            let start = first.start_time();
-            let end = last.end_time();
-            (start, end - start)
-        }
+        (Some(first), Some(last)) => {
+            let start = first.start_time();
+            let end = last.end_time();
+            let duration = (end - start).max(0.0);
+            (start, duration)
+        }

110-187: Tests give good coverage; note potential speaker parsing reuse

The tests exercise numeric/prefixed speaker IDs, time conversion helpers, empty/single/multiple calculate_time_span cases, and full WordBuilder construction, which gives solid confidence in this module.

Given you also have similar string→speaker parsing logic in SpeakerId::as_i32 in soniox/live.rs, you might later consider reusing parse_speaker_id there (with a cast) to avoid duplicated rules for extracting numeric IDs from strings.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e8e8b8 and 90c22cd.

📒 Files selected for processing (5)
  • owhisper/owhisper-client/src/adapter/assemblyai/live.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/fireworks/live.rs (3 hunks)
  • owhisper/owhisper-client/src/adapter/mod.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/parsing.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/soniox/live.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
owhisper/owhisper-client/src/adapter/assemblyai/live.rs (1)
owhisper/owhisper-client/src/adapter/parsing.rs (4)
  • calculate_time_span (37-46)
  • ms_to_secs (14-16)
  • new (59-70)
  • start (72-75)
owhisper/owhisper-client/src/adapter/fireworks/live.rs (1)
owhisper/owhisper-client/src/adapter/parsing.rs (1)
  • new (59-70)
owhisper/owhisper-client/src/adapter/soniox/live.rs (1)
owhisper/owhisper-client/src/adapter/parsing.rs (4)
  • ms_to_secs_opt (18-20)
  • speaker (87-90)
  • new (59-70)
  • start (72-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Redirect rules - hyprnote-storybook
  • GitHub Check: Header rules - hyprnote-storybook
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote-storybook
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Devin
🔇 Additional comments (5)
owhisper/owhisper-client/src/adapter/mod.rs (1)

7-7: Exposing parsing module is consistent with new utilities

Making parsing public under adapter matches how other modules import crate::adapter::parsing and cleanly exposes the new helpers.

owhisper/owhisper-client/src/adapter/fireworks/live.rs (1)

2-8: WordBuilder integration preserves Fireworks semantics

Switching to WordBuilder for per-word construction keeps the previous field mapping (word text, start/end, probability→confidence, language) while centralizing defaults (confidence and punctuated_word). Both segment and text branches remain functionally equivalent, with cleaner construction.

Also applies to: 79-88, 124-135

owhisper/owhisper-client/src/adapter/assemblyai/live.rs (1)

2-8: WordBuilder + calculate_time_span simplification looks correct

Using WordBuilder with ms_to_secs for start/end and delegating start/duration computation to calculate_time_span(&words) keeps the existing timing behavior while reducing duplication and handling the empty-words case gracefully.

Also applies to: 217-231

owhisper/owhisper-client/src/adapter/parsing.rs (2)

3-20: Speaker ID parsing and ms→sec helpers are straightforward and well-tested

parse_speaker_id correctly handles pure numeric IDs and simple prefixed forms (e.g. SPEAKER_1), and the ms_to_secs / ms_to_secs_opt helpers provide a clear, reusable place for time conversion, with unit tests covering the expected cases.


48-108: WordBuilder design aligns with Word and centralizes sensible defaults

WordBuilder mirrors the Word fields, sets reasonable defaults (start/end = 0.0, confidence = 1.0, punctuated_word = Some(word.clone())), and provides a concise fluent API used by the adapters. The build method cleanly transfers all builder fields into a Word, matching expectations.

@yujonglee yujonglee merged commit 1f513e0 into main Dec 4, 2025
12 of 13 checks passed
@yujonglee yujonglee deleted the devin/1764837210-owhisper-parsing-utilities branch December 4, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant