Methods for simulating multi-speaker mixtures #823

desh2608 · 2022-09-26T18:28:04Z

desh2608
Sep 26, 2022
Collaborator

Several new tasks such as multi-talker ASR, continuous speech separation, and speaker diarization are solved by training a neural network on simulated multi-speaker mixtures. Several new papers are looking at what are the best strategies to simulate such mixtures (e.g. by modeling turn-taking using HMMs, using statistics from real conversational data, and so on). Some of these methods from different groups are listed below:

https://github.com/fgnt/mms_msg (Paderborn)
https://github.com/BUTSpeechFIT/EEND_dataprep (BUT) --> implemented in [workflow] Multi-talker meeting simulation #929
https://github.com/jackdeadman/turn-taking (Sheffield)

It may be useful to implement one or more of these simulation methods in Lhotse, since such simulation constitutes an important part of the data pipeline for these tasks. I am envisioning a MixtureGenerator class that can be extended for the different techniques. It should have a fit() function to learn statistics from a provided CutSet, and a generate() function to generate the desired number of mixtures in the form of CutSets. This can either be done in eager mode or lazily at the time of data loading.

I don't have time to implement this right away, but I might take this up at some point in the future.

desh2608 · 2022-12-16T14:11:03Z

desh2608
Dec 16, 2022
Collaborator Author

Some more simulation methods (and papers):

EEND (original from Hitachi): https://arxiv.org/abs/1909.06247 --> implemented in [workflow] Multi-talker meeting simulation #929
EEND conversational (new from Hitachi): http://arxiv.org/abs/2204.11232

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methods for simulating multi-speaker mixtures #823

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Methods for simulating multi-speaker mixtures #823

desh2608 Sep 26, 2022 Collaborator

Replies: 1 comment

desh2608 Dec 16, 2022 Collaborator Author

desh2608
Sep 26, 2022
Collaborator

desh2608
Dec 16, 2022
Collaborator Author