Skip to content

Python cli for generating diarized transcripts from audio files

Notifications You must be signed in to change notification settings

samjhecht/whatdisay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whatdisay

"What'd I Say?!"

A python utility to generate a diarized transcript from an audio file leveraging Open-AI's Whisper module for transcription and Deepgram for diarization.

The project leverages the following libraries/APIs:

  • OpenAI's Whisper model: Used for speech recognition. The following command will pull and install the latest commit from this repository, along with its Python dependencies:
  • Deepgram: The default solution for speaker diarization. You'll need to create an account and get an API key if you want to leverage the speaker diarization capabilities of this library. Deepgram also provides transcription functionality, but it's not as good as Whisper, so this library just leverages Deepgram's diarization function and then uses Whisper to generating the transcriptions. (Deepgram does have a 'beta' version of functionality to allow you to set your model to "whisper" for transcription, but at this time it does not support diarization)
  • Pyannote: While Deepgram is the default, the library also supports using Pyannote instead for speaker diarization. This option is best if you would like to leverage Pyannote's solution for annotating your own dataset to improve accuracy of speaker diarization.

Setup

The following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/samjhecht/whatdisay.git

To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/samjhecht/whatdisay.git

CLI Usage

First, run the --configure command to configure the library. If you don't plan to leverage speaker diarization features, you can simply leave the config properties blank, but you'll still be required to create a config.yaml file the first time you run the CLI. You'll be prompt to update it later if you attempt to use functionality that requires a property that was not set up front.

whatdisay --configure

The following command will take an audio file and generate a transcription using OpenAI Whisper:

whatdisay --transcript audio_filename.wav

To generate a diarized transcript:

whatdisay --transcript audio_filename.wav --diarize

Currently only wav files are supported for audio file inputs.

By default, it will use Whisper's large model and Deepgram's "Enhanced" tier meeting model. If you would like to change either to use other available models, you can do so via your config.yaml file. Documentation on available models found here for Deepgram and here for Whisper.

TODOs:

  • Add functionality to allow for customization of location for transcription output directory.
  • add support for other file types for input audio besides wav
  • make the transcription and diarization faster for longer files by using asyncio for whisper transcription step
  • add tool that assists in a cleanup step after the diarization is complete to allow the user to assign human names to replace the values for 'SPEAKER_1','SPEAKER_2', etc.
  • potentially add option to parallelize whisper transcription when someone just runs transcript w/o diarize, by chopping up big file and running multiple async whisper tasks

About

Python cli for generating diarized transcripts from audio files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published