Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support longer audio files reducing memory usage with chunking #2256

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggarber
Copy link

@ggarber ggarber commented Jul 2, 2024

Description

The current implementation of Whisper loads the entire audio file into memory during transcription. This approach demands a substantial amount of memory, making it highly likely to run out of memory, especially for very long audio files (many hours).

This proposed change modifies the processing mechanism so that the audio file, when processed by ffmpeg, is loaded in chunks with a maximum duration of two hours. These chunks are processed sequentially, significantly reducing memory usage.

Test Plan

Unit Tests: Verified with existing unit tests to ensure no regression.
Long Audio Files: Tested with audio files up to 16 hours in duration to confirm stability and efficiency.

@kentslaney
Copy link

kentslaney commented Jul 10, 2024

I have a real-time transcription repo that would also solve this problem by buffering the read operations, but I'm still in the process of turning it into a PR. Hopefully by the end of the week or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants