
A Xircuits component library for transcribing audio into text with speaker diarization. This library provides components for:
- Loading and processing audio files
- Performing speaker diarization (identifying who spoke when)
- Transcribing speech to text
- Combining diarization and transcription results
- Saving formatted transcripts
This component library requires:
-
Access to the Hugging Face Hub models:
- You need to accept the terms of use for the pyannote models at:
- A Hugging Face access token for the diarization models
-
Sufficient disk space for the downloaded models (approximately 1-2GB)
To use this component library, ensure you have Xircuits installed, then simply run:
xircuits install https://github.com/xpressai/xai-transcribe
Alternatively you may manually copy the directory / clone or submodule the repository to your working Xircuits project directory then install the packages using:
pip install -r requirements.txt
The library provides components for a complete audio transcription pipeline:
TranscribeLoadAudioFile
- Load an audio file or use a sample datasetTranscribeSpeakerDiarization
- Identify different speakers in the audioTranscribeSpeechTranscription
- Transcribe the audio to text with timestampsTranscribeCombineDiarizationAndTranscription
- Combine speaker information with transcriptionTranscribeSaveTranscriptToFile
- Save the formatted transcript to a file
Create a new Xircuits workflow and add the components in sequence:
- Start with
TranscribeLoadAudioFile
and provide a path to your audio file - Connect to
TranscribeSpeakerDiarization
(set use_auth_token to True if using Hugging Face models) - Add
TranscribeSpeechTranscription
(defaults to Whisper base model) - Connect both to
TranscribeCombineDiarizationAndTranscription
- Finally connect to
TranscribeSaveTranscriptToFile
to save the results
A github action to test your workflow runs has been provided. Simply add the path of your workflows here.