-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Right ASR Wrong Speaker diarization #303
Comments
There are known problems with long audios in diarization, the current solution is to split it up |
How long is it generally appropriate to segment? |
1 hour is tested to work well, just split it every 1 hour |
As I mentioned above, for a test video that is only 10 minutes long, the automatic speech recognition (ASR) content is relatively accurate, but the speaker identification (speaker ID) is not very accurate. |
Hello, I used the script to perform speaker diarization on a segment of audio, and the resulting content has relatively accurate timings and text. However, there were originally five people speaking in the audio, but this method only detected one speaker, with all content labeled as Speaker 0. I would like to ask if there are any parameters that could be adjusted to optimize this result.
python3 diarize.py -a input.mp4 --whisper-model faster-whisper-large-v3
The text was updated successfully, but these errors were encountered: