You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the web client in my office on my laptop running in the background. Using Deepgram as the transcription provider.
My cofounder and I are routinely identified as "Bob" in the summaries, which is annoying. We're the only two in the office. I definitely want to have at minimum our voices identified.
When I use diarization myself in Deepgram it identifies ppl as Speaker 0, Speaker 1, etc so it has some features built in so I know that it can do it.
Orig conversation
NinjaA — Today at 1:39 PM
alright. Let me know how to upload voice embeddings so it recognizes me and a couple other key speakers
or even just me
etown — Today at 1:40 PM
You can put an audio file and then set the voice sample configuration
NinjaA — Today at 4:50 PM
I do not see instructions for setting this speaker_verification_audio path except in test code, and inside a file called async_whisper_transcription_server. Not sure if the latter is used when the transcription service provider is Deepgram. If you give me high level feedback possibly adding this functionality to the docs / code can be my first contribution. Can also open a gh issue, lmk
etown — Today at 4:52 PM
That would be amazing! It was not integrated with everything but it should be
Right now, you can only specify one sample, and verification only happens the final transcription but only if you’re using whisper
At a high level:
We have two types of abstract stt services: streaming and async
Streaming is done in real time and is mainly for the upcoming assistant/agent stuff
But when a conversation ends it goes to async transcription
Both of these services can be configured to be different providers, such as whisper or deepgram
Here is where verification occurs:
https://github.com/OwlAIProject/Owl/blob/main/owl/services/stt/asynchronous/async_whisper/async_whisper_transcription_server.py#L103
GitHub
[Owl/owl/services/stt/asynchronous/async_whisper/async_whisper_trans...](https://github.com/OwlAIProject/Owl/blob/main/owl/services/stt/asynchronous/async_whisper/async_whisper_transcription_server.py)
A personal wearable AI that runs locally. Contribute to OwlAIProject/Owl development by creating an account on GitHub.
Owl/owl/services/stt/asynchronous/async_whisper/async_whisper_trans...
What it does is take each utterance and compute the embedding and compares it against the sample and if it reaches a threshold it overrides the generic speaker name from diarization
It uses speech brain but could use any voice embedding model
I think ideally we would have a separate service for speaker identification
It would probably take the audio file the transcript and then a list of known speakers (name, embedding) and then it would do the similarity and then return the transcript with the updated speaker names
etown — Today at 4:59 PM
This way it could for any provider (whisper/deepgram/etc) and for streaming and async transcription
It should not be hard to move it, but even a step in that direction would be amazing. Created stt we can discuss more there
https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_verification.py
There are also a lot of other embedding models besides speech brain, it would be cool to test some more out for accuracy
The text was updated successfully, but these errors were encountered:
I'm using the web client in my office on my laptop running in the background. Using Deepgram as the transcription provider.
My cofounder and I are routinely identified as "Bob" in the summaries, which is annoying. We're the only two in the office. I definitely want to have at minimum our voices identified.
When I use diarization myself in Deepgram it identifies ppl as Speaker 0, Speaker 1, etc so it has some features built in so I know that it can do it.
Orig conversation
The text was updated successfully, but these errors were encountered: