Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long recordings with an unknown number of speakers #62

Open
NormanTUD opened this issue Aug 5, 2021 · 2 comments
Open

Long recordings with an unknown number of speakers #62

NormanTUD opened this issue Aug 5, 2021 · 2 comments

Comments

@NormanTUD
Copy link

Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.

But I want to achieve something quite complex I guess.

I have long audio recordings in which several people speak, and I do not know in advance how many there are.

My problems:

  • I do not have a big GPU or much RAM, so I need to split them into junks. But then, I lose information. If let's say 3 people speak, and from 00:00:00 to 00:02:00 (the chunk size that works on my computer) only the first 2 persons speak (lets call them speaker 1 an speaker 2), and then I go to the next chunk, where only speaker 2 and speaker 3 speak, then they would both be only "Speaker 1" and "Speaker 2", since I cannot find a way to carry information about previous utterances of different speakers to the new run. Is there any way to do that?
  • I do not know how many speakers there are and when exactly they spoke. I don't care about giving them names, I only want a list "speaker 1 spoke from ... to ... and from ... to ... and so on, and speaker 2 from ... to ... and so on, ..., and speaker n from ... to ... and so on).

Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.

@milind-soni
Copy link

Hi, this is a great project I've been waiting for for quite some time, and it works really really exceptionally well. So thanks for that, first.

But I want to achieve something quite complex I guess.

I have long audio recordings in which several people speak, and I do not know in advance how many there are.

My problems:

  • I do not have a big GPU or much RAM, so I need to split them into junks. But then, I lose information. If let's say 3 people speak, and from 00:00:00 to 00:02:00 (the chunk size that works on my computer) only the first 2 persons speak (lets call them speaker 1 an speaker 2), and then I go to the next chunk, where only speaker 2 and speaker 3 speak, then they would both be only "Speaker 1" and "Speaker 2", since I cannot find a way to carry information about previous utterances of different speakers to the new run. Is there any way to do that?
  • I do not know how many speakers there are and when exactly they spoke. I don't care about giving them names, I only want a list "speaker 1 spoke from ... to ... and from ... to ... and so on, and speaker 2 from ... to ... and so on, ..., and speaker n from ... to ... and so on).

Is this somehow realizable with this tool? I'm by no means an expert on how to tinker with the source code properly to achieve that, but it would be of immense help to have that ability, since Resemblyzer is the only diarization tool I've come across that really works as expected.

Hey! did you find any solution to this problem?

@NormanTUD
Copy link
Author

Hey! did you find any solution to this problem?

No, I have not yet found a solution. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants