Speaker diarization using stereo channels #585

erickalfaro · 2022-11-23T16:34:57Z

erickalfaro
Nov 23, 2022

I recently realized that the audio files I am working with have two channels, left channel for speaker 1 and right channel for speaker 2.

If I could parse out the time stamps of each channel then I would have a surefire way of transcribing the audio of each speaker.

Does anyone know how I can identify audio timestamps for each channel? (I'm also working with thousands of audio files some over 1 hour long).

Edit#1:
I've found that in the pydub library there is the split_to_mono function to split out each channel as a blurb of audio. It does not return the timestamps of each channel. It simply merges the entirety of each channel into a separate audio variable.

I wonder if there is a way to keep the timestamps intact for each channel?

Edit#2:
I've found a series of steps that may work for me:

Using ffmpeg mute one channel (doesnt matter which one) and save the audio file
Using ffmpeg use silencedetect on the newly created audio file from step 1
silencedetect can provide you with a series of start and end times where you can assume the silenced portions of this new audio file represent the "other" speaker.
Use whisper to transcribe the original unmodified audio file
Use the start and end times from step 3 and the timestamps from Whisper to correctly match the transcription to the right speaker.

Steps 1 - 3 on a four hour long audio file completed in under 20 seconds for me.

glangford · 2022-11-23T21:52:48Z

glangford
Nov 23, 2022

I use srt (link below) to merge subtitle files in different languages into one. It doesn't work perfectly for my use case, but you could run whisper separately on each audio channel, then merge the left and right channel subtitle files using srt mux.

https://github.com/cdown/srt/blob/develop/srt_tools/srt-mux

update: you can split left and right audio channels using ffmpeg which whisper requires anyway, for example
https://avpres.net/FFmpeg/stereo_FL-FR

2 replies

glangford Nov 23, 2022

...if there is a long intro from speaker A on the left channel, then speaker B starts talking, I don't know if whisper will get the timestamps right in this situation. See eg. #237

erickalfaro Nov 23, 2022
Author

The problem with splitting the channels is that you lose the time dimension. Each channel now has its own start and end times relative to its own audio file.

daytonturner · 2022-11-30T20:20:38Z

daytonturner
Nov 30, 2022

Are you still happy with the steps from your Edit #2? I'm working on the same scenario and your logic seems solid, will see how far I get with that strategy!

3 replies

daytonturner Dec 3, 2022

@erickalfaro I dont suppose you have a code sample of your steps from your Edit 2 above? Im attempting to do the same thing, and will just re-write it using your logic, but figured you may have a working example already available.

daytonturner Feb 24, 2023

Sorry to re-ping on this one, but I'm curious how your solution is working @erickalfaro - Im using ffmpeg silencedetect on the right channel, but find that the accuracy of silencedetect requires quite a bit of fine-tuning when processing low quality audio (like a phone call that has background noise). Very curious how you ended up sorting this stereo issue out!

mohith7548 Dec 21, 2023

Can you share the code for the Edit #2 ?
ANd How fast is this approach in comparision to using whisperx aligment & diariazation models?

Majdoddin · 2023-07-21T05:24:54Z

Majdoddin
Jul 21, 2023

Hi @erickalfaro,
www.lexicaps.com seamlessly adds diarization to Whispers transcription. No 3rd party packages.
Please contact me if you need more features.

Announcement: #1537
Repo: https://github.com/Majdoddin/lexicaps

1 reply

mohith7548 Dec 21, 2023

Hi @Majdoddin , do you have any repo opensourced?

elmamoun · 2024-06-05T09:59:26Z

elmamoun
Jun 5, 2024

Hello,
Asking for update
is there a working solution, i'm facing the same problem and the quality of AI based diarization is not that good.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker diarization using stereo channels #585

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Speaker diarization using stereo channels #585

Replies: 4 comments · 6 replies

erickalfaro Nov 23, 2022 Author

Replies: 4 comments 6 replies

erickalfaro Nov 23, 2022
Author