This small Python app uses OpenAI's Whisper model to transcribe audio files to text. It converts audio to mono and 16kHz before processing and returns the transcription. The app supports various audio formats and ensures the file is optimized for transcription.
├── README.md
├── assets
│ ├── harvard.wav
│ └── jackhammer.wav
└── audio-to-text.py
Note: A .env
Note: A .env
file is used to store the OpenAI API key and is not included in this repository for security reasons.
- Python 3.7+
- OpenAI Python package
- pydub
- FFmpeg (required by
pydub
for audio processing)
- Clone the repository:
git clone https://github.com/ivansing/audio-to-text-app.git
cd audio-to-text-app
- Install dependencies:
pip install openai pydub python-dotenv
-
Install FFmpeg:
- On macOS (using Homebrew):
brew install ffmpeg
- On Ubuntu:
sudo apt install ffmpeg
- On Windows: Download and install from ffmpeg.org
- On macOS (using Homebrew):
-
Set up your OpenAI API key:
- Create a
.env
file at the root of the project and add your API key:OPENAI_API_KEY=your-api-key-here
- Create a
-
Add your aduio file to the
assets
directory or use the provided samples wav files (e.g.,jackhammer.wav
). -
Run the
audio-to-text.py
script to convert and transcribre your aduio file:
python3 audio-to-text.py
- The transcription will be prited in the console.
- Convert Audio: The script first converts the input audio file to mono and resamples it to 16kHz using
pydub
. - Transcription: It then sends the processed audio to OpenAI's Whisper model for transcription.
- Output: The transcribed text is printed.
This project is licensed under the MIT License.