Skip to content

andreabellome/speech_to_text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech-to-text using Artificial Intelligence

This repository contains a simple speech-to-text script using OpenAI whisper models that can run locally.

Installation

To work with the repository, one can simply clone it in the local machine:

git clone "https://github.com/andreabellome/speech_to_text"

If one wants to specify a specific target directory (should be empty):

git clone "https://github.com/andreabellome/speech_to_text" /path/to/your/target/directory

where /path/to/your/target/directory should be replaced with the desired local taregt directory.

The requirements are listed below:

  • The toolbox requires Python version 3.10 or above.

  • The tool requires ffmpeg, that can be downloaded using chocolatey using administrator shell choco install ffmpeg.

  • One should run in a terminal pip install -r requirements.txt to install all the required libraries specified in the file requirements.txt.

  • To run medium and large whisper models, one should have at least 2 GB on the hard disk. Please check the documentation.

  • (Optional) To run the model on the local GPU, if available, you need to install CUDA.

  • (Optional) To integrate ollama models to do other operations on the text (e.g., summarize, ask information,...) one could also install ollama llama3. To integrate ollama, before starting the script run ollama serve. Please check audioTranscriber to see how ollama can be integrated.

Please use the script joudiciously as it might take high computational effort.

License

The work is under license CC BY-NC-SA 4.0, that is an Attribution Non-Commercial license. One can find the specifics in the LICENSE file.

Only invited developers can contribute to the repository.

Usage

A Python class is defined audioTranscriber that allows to transcribe files and save them either to .txt or .docx files (if not present, these are automatically created). Please, check the audioTranscriber class that should be self-explanatory.

A main file is used with the following lines:

# Import required libraries
from audioTranscriber import AudioTranscriber

# Initialize the transcriber
transcriber = AudioTranscriber( model_name="large-v2" )

The model_name is important. Please check whisper. Other options can be small or medium.

Then, one can start the transcription of an audio file and save it to a .txt:

# Transcribe the large audio file
result = transcriber.transcribe_large_file("audio1.m4a")

# Save the transcription to a text file
filename1 = 'transcription.txt'
transcriber.save_to_txt(result, filename1)

Be careful as the code might take high computational effort and time, especially if not working on GPU.

Happy coding!

About

A simple speech-to-text script using OpenAI whisper models that run locally

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages