You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.
But I want to achieve something quite complex I guess.
I have long audio recordings in which several people speak, and I do not know in advance how many there are.
My problems:
I do not have a big GPU or much RAM, so I need to split them into junks. But then, I lose information. If let's say 3 people speak, and from 00:00:00 to 00:02:00 (the chunk size that works on my computer) only the first 2 persons speak (lets call them speaker 1 an speaker 2), and then I go to the next chunk, where only speaker 2 and speaker 3 speak, then they would both be only "Speaker 1" and "Speaker 2", since I cannot find a way to carry information about previous utterances of different speakers to the new run. Is there any way to do that?
I do not know how many speakers there are and when exactly they spoke. I don't care about giving them names, I only want a list "speaker 1 spoke from ... to ... and from ... to ... and so on, and speaker 2 from ... to ... and so on, ..., and speaker n from ... to ... and so on).
Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.
The text was updated successfully, but these errors were encountered:
Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.
But I want to achieve something quite complex I guess.
I have long audio recordings in which several people speak, and I do not know in advance how many there are.
My problems:
I do not have a big GPU or much RAM, so I need to split them into junks. But then, I lose information. If let's say 3 people speak, and from 00:00:00 to 00:02:00 (the chunk size that works on my computer) only the first 2 persons speak (lets call them speaker 1 an speaker 2), and then I go to the next chunk, where only speaker 2 and speaker 3 speak, then they would both be only "Speaker 1" and "Speaker 2", since I cannot find a way to carry information about previous utterances of different speakers to the new run. Is there any way to do that?
I do not know how many speakers there are and when exactly they spoke. I don't care about giving them names, I only want a list "speaker 1 spoke from ... to ... and from ... to ... and so on, and speaker 2 from ... to ... and so on, ..., and speaker n from ... to ... and so on).
Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.
Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.
But I want to achieve something quite complex I guess.
I have long audio recordings in which several people speak, and I do not know in advance how many there are.
My problems:
Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.
The text was updated successfully, but these errors were encountered: