feat(owhisper): add OpenAI batch transcription adapter by yujonglee · Pull Request #2125 · fastrepl/char

yujonglee · 2025-12-04T23:17:45Z

feat(owhisper): add OpenAI batch transcription adapter

Summary

Implements BatchSttAdapter for OpenAI's speech-to-text API (Whisper), enabling batch transcription of audio files via the /v1/audio/transcriptions endpoint.

Key implementation details:

Uses verbose_json response format with word-level timestamps
Converts OpenAI's response format to the internal BatchResponse structure
Returns empty words Vec when word-level timestamps aren't available (rather than faking per-word data from segments)
Strips punctuation for word field, preserves original in punctuated_word
Supports common audio formats (mp3, wav, m4a, ogg, flac, webm)

Updates since last revision

Addressed review feedback:

Fixed fallback filename to use actual file extension instead of hardcoded .wav
word field now has punctuation stripped; punctuated_word contains the original token
Removed segment-to-word fallback; returns empty words Vec when word-level data unavailable
Kept confidence: 1.0 consistent with other adapters (Soniox, Fireworks, AssemblyAI use unwrap_or(1.0))

Review & Testing Checklist for Human

Verify the timestamp_granularities[] parameter format is correct for OpenAI's API (line 96)
Confirm empty words Vec is acceptable for downstream consumers when OpenAI doesn't return word-level timestamps
Review strip_punctuation only handles ASCII punctuation - may miss Unicode punctuation marks
Note: confidence is hardcoded to 1.0 and speaker is always None since OpenAI doesn't provide these

Recommended test plan: Run the ignored test with a valid OpenAI API key:

OPENAI_API_KEY="your-key" cargo test -p owhisper-client test_openai_transcribe -- --ignored --nocapture

Notes

Manually verified working with provided API key - successfully transcribed a 387-word audio file
Link to Devin run: https://app.devin.ai/sessions/38b85be931104b3dbabcbfb7525c62d5
Requested by: yujonglee (@yujonglee)

Implement BatchSttAdapter for OpenAI's speech-to-text API: - Add openai/mod.rs with OpenAIAdapter struct - Add openai/batch.rs with transcribe_file implementation - Support verbose_json response format with word-level timestamps - Convert OpenAI response format to BatchResponse - Include ignored test for manual verification with API key Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

devin-ai-integration · 2025-12-04T23:17:47Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR that start with 'DevinAI' or '@devin'.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

netlify · 2025-12-04T23:17:51Z

✅ Deploy Preview for hyprnote-storybook ready!

Name	Link
🔨 Latest commit	`7327eb3`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote-storybook/deploys/693222e96cbc870008198feb
😎 Deploy Preview	https://deploy-preview-2125--hyprnote-storybook.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2025-12-04T23:17:51Z

✅ Deploy Preview for hyprnote ready!

Name	Link
🔨 Latest commit	`7327eb3`
🔍 Latest deploy log	https://app.netlify.com/projects/hyprnote/deploys/693222e981b6d2000800c29c
😎 Deploy Preview	https://deploy-preview-2125--hyprnote.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-12-04T23:18:07Z

Warning

Rate limit exceeded

@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6af1c0a and 7327eb3.

📒 Files selected for processing (1)

owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)

📝 Walkthrough

Walkthrough

Adds a new OpenAI adapter to the owhisper client: a public OpenAIAdapter type and a batch transcription implementation that reads audio files, builds a multipart request to OpenAI's /audio/transcriptions endpoint, authenticates with a Bearer token, parses verbose JSON responses, and converts them into the client's BatchResponse shape.

Changes

Cohort / File(s)	Summary
Adapter module export `owhisper/owhisper-client/src/adapter/mod.rs`	Adds `mod openai;` and `pub use openai::*;` to expose the OpenAI adapter from the adapter module
OpenAI adapter entry `owhisper/owhisper-client/src/adapter/openai/mod.rs`	Introduces `pub struct OpenAIAdapter` (derives `Clone`, `Default`) and declares `mod batch;`
OpenAI batch transcription implementation `owhisper/owhisper-client/src/adapter/openai/batch.rs`	Implements `BatchSttAdapter::transcribe_file` (async boxed future) with: file reading and filename resolution, MIME type mapping, multipart `Form` construction (model/response_format/timestamp_granularity/language), POST to `${api_base}/audio/transcriptions` with Bearer auth, handling non-2xx responses, deserializing OpenAIVerboseResponse (words/segments), converting into internal Words/Alternatives/Channel/BatchResponse, and test placeholder

Sequence Diagram(s)

sequenceDiagram
    actor Caller
    participant OpenAIAdapter
    participant FileSystem
    participant OpenAIAPI as "OpenAI API"
    participant Converter as "Response Converter"

    Caller->>OpenAIAdapter: transcribe_file(params, file_path, api_key, client, api_base)
    OpenAIAdapter->>FileSystem: async read file bytes & infer filename
    FileSystem-->>OpenAIAdapter: file bytes + filename
    OpenAIAdapter->>OpenAIAdapter: build multipart Form (file, model, response_format, timestamps, language?)
    OpenAIAdapter->>OpenAIAPI: POST /audio/transcriptions (multipart + Bearer)
    alt 2xx Success
        OpenAIAPI-->>OpenAIAdapter: verbose JSON
        OpenAIAdapter->>Converter: deserialize OpenAIVerboseResponse
        Converter-->>OpenAIAdapter: BatchResponse (words, timings, alternatives, metadata)
        OpenAIAdapter-->>Caller: BatchResponse
    else non-2xx
        OpenAIAPI-->>OpenAIAdapter: status + body
        OpenAIAdapter-->>Caller: Error::UnexpectedStatus(status, body)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Review async lifetime and BatchFuture correctness in the trait implementation.
Validate multipart form fields and MIME mapping logic (mime_type_from_extension) for audio edge cases.
Inspect OpenAIVerboseResponse deserialization and conversion to internal Word/Segment structures (punctuation stripping, timestamps, confidence handling).
Check error mapping for file IO, MIME resolution, and non-2xx HTTP responses.

Possibly related PRs

Refactor STT adapters #2059 — Adds BatchSttAdapter trait and refactors adapters to that interface; closely related as this PR implements an OpenAI-backed BatchSttAdapter.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding an OpenAI batch transcription adapter to the owhisper project.
Description check	✅ Passed	The description comprehensively explains the feature implementation, key details, updates, and includes testing guidance directly related to the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

123-134: Consider using the mime_guess crate for more comprehensive MIME type detection.

The current implementation covers common audio formats adequately, but a dedicated crate would handle edge cases and additional formats automatically.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 301ae71 and 42a0b61.

📒 Files selected for processing (3)

owhisper/owhisper-client/src/adapter/mod.rs (2 hunks)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1 hunks)
owhisper/owhisper-client/src/adapter/openai/mod.rs (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote
GitHub Check: Devin
GitHub Check: fmt

🔇 Additional comments (9)

owhisper/owhisper-client/src/adapter/mod.rs (1)

7-7: LGTM! OpenAI adapter properly integrated.

The module declaration and re-export follow the established pattern used by other adapters in this file.

Also applies to: 20-20

owhisper/owhisper-client/src/adapter/openai/mod.rs (1)

1-4: LGTM! Clean adapter entry point.

The structure properly exposes the OpenAI adapter with appropriate derives and module organization.

owhisper/owhisper-client/src/adapter/openai/batch.rs (7)

1-14: LGTM! Clean imports and sensible defaults.

The imports cover all necessary dependencies, and the default values match OpenAI's API requirements.

15-27: LGTM! Trait implementation follows standard pattern.

The delegation to do_transcribe_file with proper future boxing is appropriate.

29-59: LGTM! Response models properly structured.

The data structures appropriately model OpenAI's verbose_json response format with proper serde attributes.

104-120: LGTM! Request handling is robust.

The Bearer token authentication, multipart form submission, and error handling for non-2xx responses are all properly implemented.

165-185: LGTM! Response structure assembly is clean.

The conversion to Alternatives, Channel, and BatchResponse with metadata properly packages the transcription results.

187-220: LGTM! Test structure is appropriate for integration testing.

The test is properly marked with #[ignore] for external API calls, validates response structure, and uses environment variables for credentials.

87-95: No issues found. The code correctly implements OpenAI's audio transcription API parameters:

timestamp_granularities[] with value "word" is the correct array notation for multipart forms

lang.iso639().code() returns ISO 639-1 two-letter codes (e.g., "en", "es", "zh") as required by OpenAI's language parameter

Both formats match OpenAI's API expectations.

owhisper/owhisper-client/src/adapter/openai/batch.rs

- Fix fallback filename to use actual file extension instead of hardcoded .wav - Strip punctuation for word field, keep original for punctuated_word - Return empty words Vec when word-level timestamps unavailable (instead of multi-word segment entries) - Keep confidence at 1.0 consistent with other adapters (Soniox, Fireworks, AssemblyAI) Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

owhisper/owhisper-client/src/adapter/openai/batch.rs (1)

161-161: Hardcoded confidence of 1.0 remains unaddressed.

As flagged in previous reviews, OpenAI doesn't provide confidence scores. A value of 1.0 suggests certainty that isn't actually present. Consider using a documented sentinel value or adding a comment.

Also applies to: 173-173

🧹 Nitpick comments (3)

owhisper/owhisper-client/src/adapter/openai/batch.rs (3)
128-139: Consider adding support for mpeg and mpga extensions.

OpenAI's API also supports mpeg and mpga file extensions (both audio/mpeg). While the fallback to application/octet-stream should work, explicit mapping would be more robust.
         Some("mp3") => "audio/mpeg",
+        Some("mpeg") => "audio/mpeg",
+        Some("mpga") => "audio/mpeg",
         Some("mp4") => "audio/mp4",
141-144: Minor: strip_punctuation only handles ASCII punctuation.

For non-English transcriptions, OpenAI may return Unicode punctuation characters (e.g., 「, 」, 。). The current ASCII-only approach should work for most cases but may leave punctuation attached to words in some languages.

193-226: Consider adding unit tests for helper functions.

The integration test is appropriately ignored for CI, but the helper functions (mime_type_from_extension, strip_punctuation, convert_response) would benefit from unit tests that don't require an API key.

Example test cases:
#[test]
fn test_strip_punctuation() {
    assert_eq!(strip_punctuation("hello,"), "hello");
    assert_eq!(strip_punctuation("..."), "...");  // all punctuation preserved
    assert_eq!(strip_punctuation(""), "");
}

#[test]
fn test_mime_type_from_extension() {
    assert_eq!(mime_type_from_extension(Path::new("audio.mp3")), "audio/mpeg");
    assert_eq!(mime_type_from_extension(Path::new("audio.unknown")), "application/octet-stream");
}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a0b61 and 6af1c0a.

📒 Files selected for processing (1)

owhisper/owhisper-client/src/adapter/openai/batch.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

owhisper/owhisper-client/src/adapter/openai/batch.rs (2)

owhisper/owhisper-client/src/adapter/mod.rs (1)

transcribe_file (72-79)

owhisper/owhisper-interface/src/batch.rs (2)

alternatives (69-73)

channel (85-89)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Redirect rules - hyprnote
GitHub Check: Header rules - hyprnote
GitHub Check: Pages changed - hyprnote
GitHub Check: fmt
GitHub Check: Devin

🔇 Additional comments (6)

owhisper/owhisper-client/src/adapter/openai/batch.rs (6)

15-27: LGTM!

The trait implementation correctly adapts the path and delegates to the async do_transcribe_file function with proper lifetime handling.

68-88: LGTM!

File name fallback now correctly uses the file extension, and the MIME type determination is appropriate for common audio formats.

146-169: Good improvements to word processing logic.

The implementation now correctly:

Strips punctuation for word while preserving it in punctuated_word

Handles punctuation-only tokens gracefully (lines 154-158)

Returns an empty vector instead of misusing segments as words

171-191: LGTM!

Response structure correctly maps OpenAI's output to the internal BatchResponse format with appropriate metadata.

98-100: Language code format is correct.

The code uses codes_iso_639::part_1::LanguageCode, which is the ISO 639-1 standard providing 2-letter language codes (e.g., "en", "es", "fr"). This matches OpenAI's API requirements exactly. The schema validation regex confirms the 2-letter format. No changes needed.

29-59: The struct definitions are correct and align with OpenAI's audio transcription API documentation. No changes needed.

OpenAIWord fields (word: String, start: f64, end: f64) match the documented word object structure

OpenAISegment fields correctly include seek: i32 as the frame offset, start/end: f64 for timestamps in seconds

OpenAIVerboseResponse structure properly reflects the verbose_json response format

The #[allow(dead_code)] attributes and #[serde(default)] decorators are appropriate

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

Merge origin/main into devin/1764890226-openai-batch-adapter

7327eb3

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>

yujonglee merged commit c914b4f into main Dec 5, 2025
12 of 13 checks passed

yujonglee deleted the devin/1764890226-openai-batch-adapter branch December 5, 2025 00:13

coderabbitai bot mentioned this pull request Dec 18, 2025

Add owhisper-providers and share it in client adapters and proxy server #2396

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(owhisper): add OpenAI batch transcription adapter#2125

feat(owhisper): add OpenAI batch transcription adapter#2125
yujonglee merged 3 commits intomainfrom
devin/1764890226-openai-batch-adapter

yujonglee commented Dec 4, 2025 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Dec 4, 2025

Uh oh!

netlify bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

netlify bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yujonglee commented Dec 4, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(owhisper): add OpenAI batch transcription adapter

Summary

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Dec 4, 2025

🤖 Devin AI Engineer

Uh oh!

netlify bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote-storybook ready!

Uh oh!

netlify bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hyprnote ready!

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yujonglee commented Dec 4, 2025 •

edited by devin-ai-integration bot

Loading

netlify bot commented Dec 4, 2025 •

edited

Loading

netlify bot commented Dec 4, 2025 •

edited

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading