Long recordings with an unknown number of speakers #62

NormanTUD · 2021-08-05T14:09:18Z

Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.

But I want to achieve something quite complex I guess.

I have long audio recordings in which several people speak, and I do not know in advance how many there are.

My problems:

I do not have a big GPU or much RAM, so I need to split them into junks. But then, I lose information. If let's say 3 people speak, and from 00:00:00 to 00:02:00 (the chunk size that works on my computer) only the first 2 persons speak (lets call them speaker 1 an speaker 2), and then I go to the next chunk, where only speaker 2 and speaker 3 speak, then they would both be only "Speaker 1" and "Speaker 2", since I cannot find a way to carry information about previous utterances of different speakers to the new run. Is there any way to do that?
I do not know how many speakers there are and when exactly they spoke. I don't care about giving them names, I only want a list "speaker 1 spoke from ... to ... and from ... to ... and so on, and speaker 2 from ... to ... and so on, ..., and speaker n from ... to ... and so on).

Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.

milind-soni · 2021-08-29T08:05:49Z

Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.

But I want to achieve something quite complex I guess.

I have long audio recordings in which several people speak, and I do not know in advance how many there are.

My problems:

I do not have a big GPU or much RAM, so I need to split them into junks. But then, I lose information. If let's say 3 people speak, and from 00:00:00 to 00:02:00 (the chunk size that works on my computer) only the first 2 persons speak (lets call them speaker 1 an speaker 2), and then I go to the next chunk, where only speaker 2 and speaker 3 speak, then they would both be only "Speaker 1" and "Speaker 2", since I cannot find a way to carry information about previous utterances of different speakers to the new run. Is there any way to do that?

I do not know how many speakers there are and when exactly they spoke. I don't care about giving them names, I only want a list "speaker 1 spoke from ... to ... and from ... to ... and so on, and speaker 2 from ... to ... and so on, ..., and speaker n from ... to ... and so on).

Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.

Hey! did you find any solution to this problem?

NormanTUD · 2021-11-06T12:02:48Z

Hey! did you find any solution to this problem?

No, I have not yet found a solution. Sorry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long recordings with an unknown number of speakers #62

Long recordings with an unknown number of speakers #62

NormanTUD commented Aug 5, 2021

milind-soni commented Aug 29, 2021

NormanTUD commented Nov 6, 2021

Long recordings with an unknown number of speakers #62

Long recordings with an unknown number of speakers #62

Comments

NormanTUD commented Aug 5, 2021

milind-soni commented Aug 29, 2021

NormanTUD commented Nov 6, 2021