Replies: 1 comment
-
Some more simulation methods (and papers):
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Several new tasks such as multi-talker ASR, continuous speech separation, and speaker diarization are solved by training a neural network on simulated multi-speaker mixtures. Several new papers are looking at what are the best strategies to simulate such mixtures (e.g. by modeling turn-taking using HMMs, using statistics from real conversational data, and so on). Some of these methods from different groups are listed below:
It may be useful to implement one or more of these simulation methods in Lhotse, since such simulation constitutes an important part of the data pipeline for these tasks. I am envisioning a
MixtureGenerator
class that can be extended for the different techniques. It should have afit()
function to learn statistics from a providedCutSet
, and agenerate()
function to generate the desired number of mixtures in the form ofCutSet
s. This can either be done in eager mode or lazily at the time of data loading.I don't have time to implement this right away, but I might take this up at some point in the future.
Beta Was this translation helpful? Give feedback.
All reactions