Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes OOM Errors - too high RAM usage by VAD #1198

Merged
merged 4 commits into from
Dec 12, 2024

Conversation

Purfview
Copy link
Contributor

@Purfview Purfview commented Dec 10, 2024

fixes #1193 and #1169
VAD implementation consumes humongous memory amount [original Silero doesn't have this problem]

This PR should fix the OOM problem.
Alt solution could be removing lru_cache.

@MahmoudAshraf97
Copy link
Collaborator

Thanks, for reference microsoft/onnxruntime#11627

BTW, there is still room for OOM on smaller ram systems caused by this:

batched_audio = batched_audio.reshape(-1, num_samples + context_size_samples)
encoder_output = self.encoder_session.run(None, {"input": batched_audio})[0]
encoder_output = encoder_output.reshape(batch_size, -1, 128)

the input here is (num_segments, 576) and num_segments depends on the audio length and it's unbounded, we should replace this with a for loop over num_segments with a large batch size (10k for example or the lowest number where we start to lose speed), this is helpful for very long audios

Reported problems:
SYSTRAN#1193
SYSTRAN#1169

VAD implementations consumes humongous memory amounts [original Silero doesn't have this problem]

This PR should fix the OOM problem.
Alt solution could be removing 'lru_cache'.
@MahmoudAshraf97 MahmoudAshraf97 merged commit 1b24f28 into SYSTRAN:master Dec 12, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM when using VAD
2 participants