Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing #12

Closed
desh2608 opened this issue Oct 22, 2022 · 0 comments
Closed

Batch processing #12

desh2608 opened this issue Oct 22, 2022 · 0 comments

Comments

@desh2608
Copy link
Owner

Currently, each cut (segment) is processed one at a time. This is fine for long segments which usually occupy the entire GPU memory, but may be wasteful for shorter segments. The conventional method (e.g. in ASR) is to do mini-batch processing by padding several segments to the same length. However, we have to be careful doing that here because of 2 reasons:

  1. The CACGMM implementations are currently for 3-dimensional input (channels, time, frequency), and adding a batch dimension would require modifying a lot of internal implementation (which is done efficiently through einops).
  2. The CACGMM inference step computes sums over the whole time duration, so adding padding would require some kind of masking.

For these reasons, it may be better to a different kind of "batching". Instead of combining segments in parallel, we can combine them sequentially --- but only if they are from the same recording and have the same target speaker. This is to ensure that we do not create a permutation problem in the mask estimation.

If we combine in this way, we can even remove individual contexts from each segment, and instead add the context to the combined "super-segment", which would further save compute/memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant