Speaker diarization using stereo channels #585
Replies: 4 comments 6 replies
-
I use srt (link below) to merge subtitle files in different languages into one. It doesn't work perfectly for my use case, but you could run whisper separately on each audio channel, then merge the left and right channel subtitle files using https://github.com/cdown/srt/blob/develop/srt_tools/srt-mux update: you can split left and right audio channels using ffmpeg which whisper requires anyway, for example |
Beta Was this translation helpful? Give feedback.
-
Are you still happy with the steps from your Edit #2? I'm working on the same scenario and your logic seems solid, will see how far I get with that strategy! |
Beta Was this translation helpful? Give feedback.
-
Hi @erickalfaro, Announcement: #1537 |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
I recently realized that the audio files I am working with have two channels, left channel for speaker 1 and right channel for speaker 2.
If I could parse out the time stamps of each channel then I would have a surefire way of transcribing the audio of each speaker.
Does anyone know how I can identify audio timestamps for each channel? (I'm also working with thousands of audio files some over 1 hour long).
Edit#1:
I've found that in the pydub library there is the split_to_mono function to split out each channel as a blurb of audio. It does not return the timestamps of each channel. It simply merges the entirety of each channel into a separate audio variable.
I wonder if there is a way to keep the timestamps intact for each channel?
Edit#2:
I've found a series of steps that may work for me:
Steps 1 - 3 on a four hour long audio file completed in under 20 seconds for me.
Beta Was this translation helpful? Give feedback.
All reactions