This repository contains a simple speech-to-text script using OpenAI whisper models that can run locally.
To work with the repository, one can simply clone it in the local machine:
git clone "https://github.com/andreabellome/speech_to_text"
If one wants to specify a specific target directory (should be empty):
git clone "https://github.com/andreabellome/speech_to_text" /path/to/your/target/directory
where /path/to/your/target/directory
should be replaced with the desired local taregt directory.
The requirements are listed below:
-
The toolbox requires Python version 3.10 or above.
-
The tool requires ffmpeg, that can be downloaded using chocolatey using administrator shell
choco install ffmpeg
. -
One should run in a terminal
pip install -r requirements.txt
to install all the required libraries specified in the file requirements.txt. -
To run medium and large whisper models, one should have at least 2 GB on the hard disk. Please check the documentation.
-
(Optional) To run the model on the local GPU, if available, you need to install CUDA.
-
(Optional) To integrate ollama models to do other operations on the text (e.g., summarize, ask information,...) one could also install ollama llama3. To integrate ollama, before starting the script run
ollama serve
. Please check audioTranscriber to see how ollama can be integrated.
Please use the script joudiciously as it might take high computational effort.
The work is under license CC BY-NC-SA 4.0, that is an Attribution Non-Commercial license. One can find the specifics in the LICENSE file.
Only invited developers can contribute to the repository.
A Python class is defined audioTranscriber that allows to transcribe files and save them either to .txt or .docx files (if not present, these are automatically created). Please, check the audioTranscriber class that should be self-explanatory.
A main file is used with the following lines:
# Import required libraries
from audioTranscriber import AudioTranscriber
# Initialize the transcriber
transcriber = AudioTranscriber( model_name="large-v2" )
The model_name
is important. Please check whisper. Other options can be small
or medium
.
Then, one can start the transcription of an audio file and save it to a .txt:
# Transcribe the large audio file
result = transcriber.transcribe_large_file("audio1.m4a")
# Save the transcription to a text file
filename1 = 'transcription.txt'
transcriber.save_to_txt(result, filename1)
Be careful as the code might take high computational effort and time, especially if not working on GPU.
Happy coding!