Speaker Diarization goes haywire due to small segments of audio #9523

AatikaNazneen · 2024-06-24T09:45:42Z

Describe the bug

I have a long audio of around 3 hours that spans multiple speakers. The speaker diarization label a single speaker when this audio is passed. When I break down into this audio in parts and pass each part separately, some of the parts get assigned speakers correctly but the rest of the portion has the same bug. I identified some 1 min chunks that when added in this audio cause the model to behave this way. I'm seeking possible explanations or solutions to this behavior since I believe that the model should be resilient enough.

I think this might be related to having lots of overlap and a good number of speakers resulting in exceeding Nemo's limit of 20 max speakers.

Steps/Code to reproduce bug

Test Speaker Diarization on the audio

Expected behavior

A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

Environment location: AWS
Method of NeMo install: pip install

Environment details

AWS Linux 2
PyTorch version: 2.3.1
Python version: 3.10

Additional context

GPU model

tango4j · 2024-08-15T00:15:08Z

Please note that clustering based speaker diarization is a type of self-supervised machine learning system, not a rule-based software. Thus, speaker diarization can generate incorrect results and such behavior should be regarded as the limitation in accuracy, not a type of bug.

Especially, there is no guarantee that the speaker diarization system would generate the same speaker assignment to the truncated shorter audio clips from the original audio clips.

Closing since there is no clear ways to avoid this case.

AatikaNazneen added the bug Something isn't working label Jun 24, 2024

elliottnv assigned nithinraok Jul 3, 2024

nithinraok assigned tango4j Aug 1, 2024

tango4j closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker Diarization goes haywire due to small segments of audio #9523

Speaker Diarization goes haywire due to small segments of audio #9523

AatikaNazneen commented Jun 24, 2024 •

edited

Loading

tango4j commented Aug 15, 2024 •

edited

Loading

Speaker Diarization goes haywire due to small segments of audio #9523

Speaker Diarization goes haywire due to small segments of audio #9523

Comments

AatikaNazneen commented Jun 24, 2024 • edited Loading

tango4j commented Aug 15, 2024 • edited Loading

AatikaNazneen commented Jun 24, 2024 •

edited

Loading

tango4j commented Aug 15, 2024 •

edited

Loading