feat(owhisper): add OpenAI batch transcription adapter#2125
Conversation
Implement BatchSttAdapter for OpenAI's speech-to-text API: - Add openai/mod.rs with OpenAIAdapter struct - Add openai/batch.rs with transcribe_file implementation - Support verbose_json response format with word-level timestamps - Convert OpenAI response format to BatchResponse - Include ignored test for manual verification with API key Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Warning Rate limit exceeded@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 37 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new OpenAI adapter to the owhisper client: a public Changes
Sequence Diagram(s)sequenceDiagram
actor Caller
participant OpenAIAdapter
participant FileSystem
participant OpenAIAPI as "OpenAI API"
participant Converter as "Response Converter"
Caller->>OpenAIAdapter: transcribe_file(params, file_path, api_key, client, api_base)
OpenAIAdapter->>FileSystem: async read file bytes & infer filename
FileSystem-->>OpenAIAdapter: file bytes + filename
OpenAIAdapter->>OpenAIAdapter: build multipart Form (file, model, response_format, timestamps, language?)
OpenAIAdapter->>OpenAIAPI: POST /audio/transcriptions (multipart + Bearer)
alt 2xx Success
OpenAIAPI-->>OpenAIAdapter: verbose JSON
OpenAIAdapter->>Converter: deserialize OpenAIVerboseResponse
Converter-->>OpenAIAdapter: BatchResponse (words, timings, alternatives, metadata)
OpenAIAdapter-->>Caller: BatchResponse
else non-2xx
OpenAIAPI-->>OpenAIAdapter: status + body
OpenAIAdapter-->>Caller: Error::UnexpectedStatus(status, body)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)
123-134: Consider using themime_guesscrate for more comprehensive MIME type detection.The current implementation covers common audio formats adequately, but a dedicated crate would handle edge cases and additional formats automatically.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
owhisper/owhisper-client/src/adapter/mod.rs(2 hunks)owhisper/owhisper-client/src/adapter/openai/batch.rs(1 hunks)owhisper/owhisper-client/src/adapter/openai/mod.rs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Devin
- GitHub Check: fmt
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/mod.rs (1)
7-7: LGTM! OpenAI adapter properly integrated.The module declaration and re-export follow the established pattern used by other adapters in this file.
Also applies to: 20-20
owhisper/owhisper-client/src/adapter/openai/mod.rs (1)
1-4: LGTM! Clean adapter entry point.The structure properly exposes the OpenAI adapter with appropriate derives and module organization.
owhisper/owhisper-client/src/adapter/openai/batch.rs (7)
1-14: LGTM! Clean imports and sensible defaults.The imports cover all necessary dependencies, and the default values match OpenAI's API requirements.
15-27: LGTM! Trait implementation follows standard pattern.The delegation to
do_transcribe_filewith proper future boxing is appropriate.
29-59: LGTM! Response models properly structured.The data structures appropriately model OpenAI's verbose_json response format with proper serde attributes.
104-120: LGTM! Request handling is robust.The Bearer token authentication, multipart form submission, and error handling for non-2xx responses are all properly implemented.
165-185: LGTM! Response structure assembly is clean.The conversion to
Alternatives,Channel, andBatchResponsewith metadata properly packages the transcription results.
187-220: LGTM! Test structure is appropriate for integration testing.The test is properly marked with
#[ignore]for external API calls, validates response structure, and uses environment variables for credentials.
87-95: No issues found. The code correctly implements OpenAI's audio transcription API parameters:
timestamp_granularities[]with value"word"is the correct array notation for multipart formslang.iso639().code()returns ISO 639-1 two-letter codes (e.g., "en", "es", "zh") as required by OpenAI's language parameterBoth formats match OpenAI's API expectations.
- Fix fallback filename to use actual file extension instead of hardcoded .wav - Strip punctuation for word field, keep original for punctuated_word - Return empty words Vec when word-level timestamps unavailable (instead of multi-word segment entries) - Keep confidence at 1.0 consistent with other adapters (Soniox, Fireworks, AssemblyAI) Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (1)
161-161: Hardcoded confidence of 1.0 remains unaddressed.As flagged in previous reviews, OpenAI doesn't provide confidence scores. A value of
1.0suggests certainty that isn't actually present. Consider using a documented sentinel value or adding a comment.Also applies to: 173-173
🧹 Nitpick comments (3)
owhisper/owhisper-client/src/adapter/openai/batch.rs (3)
128-139: Consider adding support formpegandmpgaextensions.OpenAI's API also supports
mpegandmpgafile extensions (bothaudio/mpeg). While the fallback toapplication/octet-streamshould work, explicit mapping would be more robust.Some("mp3") => "audio/mpeg", + Some("mpeg") => "audio/mpeg", + Some("mpga") => "audio/mpeg", Some("mp4") => "audio/mp4",
141-144: Minor:strip_punctuationonly handles ASCII punctuation.For non-English transcriptions, OpenAI may return Unicode punctuation characters (e.g.,
「,」,。). The current ASCII-only approach should work for most cases but may leave punctuation attached to words in some languages.
193-226: Consider adding unit tests for helper functions.The integration test is appropriately ignored for CI, but the helper functions (
mime_type_from_extension,strip_punctuation,convert_response) would benefit from unit tests that don't require an API key.Example test cases:
#[test] fn test_strip_punctuation() { assert_eq!(strip_punctuation("hello,"), "hello"); assert_eq!(strip_punctuation("..."), "..."); // all punctuation preserved assert_eq!(strip_punctuation(""), ""); } #[test] fn test_mime_type_from_extension() { assert_eq!(mime_type_from_extension(Path::new("audio.mp3")), "audio/mpeg"); assert_eq!(mime_type_from_extension(Path::new("audio.unknown")), "application/octet-stream"); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
owhisper/owhisper-client/src/adapter/openai/batch.rs (2)
owhisper/owhisper-client/src/adapter/mod.rs (1)
transcribe_file(72-79)owhisper/owhisper-interface/src/batch.rs (2)
alternatives(69-73)channel(85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: fmt
- GitHub Check: Devin
🔇 Additional comments (6)
owhisper/owhisper-client/src/adapter/openai/batch.rs (6)
15-27: LGTM!The trait implementation correctly adapts the path and delegates to the async
do_transcribe_filefunction with proper lifetime handling.
68-88: LGTM!File name fallback now correctly uses the file extension, and the MIME type determination is appropriate for common audio formats.
146-169: Good improvements to word processing logic.The implementation now correctly:
- Strips punctuation for
wordwhile preserving it inpunctuated_word- Handles punctuation-only tokens gracefully (lines 154-158)
- Returns an empty vector instead of misusing segments as words
171-191: LGTM!Response structure correctly maps OpenAI's output to the internal
BatchResponseformat with appropriate metadata.
98-100: Language code format is correct.The code uses
codes_iso_639::part_1::LanguageCode, which is the ISO 639-1 standard providing 2-letter language codes (e.g., "en", "es", "fr"). This matches OpenAI's API requirements exactly. The schema validation regex confirms the 2-letter format. No changes needed.
29-59: The struct definitions are correct and align with OpenAI's audio transcription API documentation. No changes needed.
OpenAIWordfields (word: String,start: f64,end: f64) match the documented word object structureOpenAISegmentfields correctly includeseek: i32as the frame offset,start/end: f64for timestamps in secondsOpenAIVerboseResponsestructure properly reflects the verbose_json response format- The
#[allow(dead_code)]attributes and#[serde(default)]decorators are appropriate
Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
feat(owhisper): add OpenAI batch transcription adapter
Summary
Implements
BatchSttAdapterfor OpenAI's speech-to-text API (Whisper), enabling batch transcription of audio files via the/v1/audio/transcriptionsendpoint.Key implementation details:
verbose_jsonresponse format with word-level timestampsBatchResponsestructurewordsVec when word-level timestamps aren't available (rather than faking per-word data from segments)wordfield, preserves original inpunctuated_wordUpdates since last revision
Addressed review feedback:
.wavwordfield now has punctuation stripped;punctuated_wordcontains the original tokenwordsVec when word-level data unavailableconfidence: 1.0consistent with other adapters (Soniox, Fireworks, AssemblyAI useunwrap_or(1.0))Review & Testing Checklist for Human
timestamp_granularities[]parameter format is correct for OpenAI's API (line 96)wordsVec is acceptable for downstream consumers when OpenAI doesn't return word-level timestampsstrip_punctuationonly handles ASCII punctuation - may miss Unicode punctuation marksconfidenceis hardcoded to 1.0 andspeakeris alwaysNonesince OpenAI doesn't provide theseRecommended test plan: Run the ignored test with a valid OpenAI API key:
Notes