Whisper Realtime Transcription GUI

A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.

Features

🎙 Real-time audio transcription using OpenAI's Whisper
🌈 Beautiful, modern UI with animated audio visualizer
🚀 GPU acceleration support (Apple Silicon/CUDA)
🌍 Multi-language support (English, French, Vietnamese)
📊 Live audio waveform visualization with dynamic effects
💫 Smooth animations and transitions
🎯 Multiple Whisper model options (tiny, base, small, medium, large)
⚡️ Optimized streaming for better real-time performance
🎨 Enhanced visual feedback with glowing effects

Components

1. Real-time Transcription (`whisper_gui.py`)

Modern GUI application for real-time speech recognition
Dynamic waveform visualization with:
- Smooth wave transitions
- Responsive amplitude changes
- Glowing effects during recording
Support for multiple languages and models
GPU acceleration for better performance
Optimized audio streaming with 0.3s chunks
Automatic model initialization

2. File Transcription (`file-to-text.py`)

Convert audio/video files to text
Supports multiple file formats:
- Audio: mp3, wav, m4a, etc.
- Video: mp4, mkv, avi, etc.
Batch processing capability
Output formats:
- Plain text (.txt)
- Microsoft Word (.docx)
- Timestamps support

Requirements

Python 3.11+
macOS (tested on Apple Silicon)
GPU recommended for better performance

Installation

For macOS Users

Download the latest .dmg file from the Releases page
Open the downloaded .dmg file
Drag the application to your Applications folder
Double click to run the application

For Developers

Clone the repository:

git clone https://github.com/phongthanhbuiit/whisper-realtime-gui.git
cd whisper-realtime-gui

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On macOS/Linux

Install the required packages:

pip install -r requirements.txt

Release Information

Version: 1.0.0
Release Date: 2023-02-20
Changes:
- Initial release with real-time transcription and file transcription features

Usage

Real-time Transcription

Activate the virtual environment if not already activated:

source venv/bin/activate  # On macOS/Linux

Run the GUI application:

python whisper_gui.py

Select your preferred model and language from the dropdown menus
Click "Start Recording" to begin transcription
Speak into your microphone
Watch the beautiful waveform animation and real-time transcription

Model Selection

Choose from different Whisper models based on your needs:

tiny: Fastest, lowest accuracy (good for testing)
base: Good balance of speed and accuracy
small: Better accuracy, still reasonable speed
medium: High accuracy, slower processing
large: Best accuracy, requires more resources

Language Support

Currently supports:

English
Vietnamese
French

The language can be changed in real-time during transcription.

Performance Tips

Model Selection:
- Start with tiny or base model for testing
- Use small for general use
- Use medium or large only if you need highest accuracy
GPU Acceleration:
- The app automatically uses GPU if available
- Recommended for medium and large models
Audio Input:
- Speak clearly and at a moderate pace
- Keep microphone at a consistent distance
- Avoid background noise for better accuracy

Troubleshooting

If you encounter issues:

Audio Not Detected:
- Check your microphone permissions
- Verify input device in system settings
Slow Performance:
- Try a smaller model
- Ensure GPU acceleration is working
- Check CPU/Memory usage
Transcription Issues:
- Try changing the language setting
- Speak more clearly
- Adjust your microphone position

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenAI Whisper for the amazing speech recognition model
PySide6 for the modern GUI framework
sounddevice for real-time audio processing

Author

Thompson Bui (@phongthanhbuiit)
Blog: LinkedIn
Twitter: @windsora

Support

If you found this project helpful, please give it a ⭐️!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
file-to-text.py		file-to-text.py
realtime_speech.py		realtime_speech.py
realtime_whisper.py		realtime_whisper.py
requirements.txt		requirements.txt
whisper_gui.py		whisper_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Realtime Transcription GUI

Features

Components

1. Real-time Transcription (`whisper_gui.py`)

2. File Transcription (`file-to-text.py`)

Requirements

Installation

For macOS Users

For Developers

Release Information

Usage

Real-time Transcription

Model Selection

Language Support

Performance Tips

Troubleshooting

Contributing

License

Acknowledgments

Author

Support

About

Releases 1

Packages

Languages

License

phongthanhbuiit/whisper-realtime-gui

Folders and files

Latest commit

History

Repository files navigation

Whisper Realtime Transcription GUI

Features

Components

1. Real-time Transcription (whisper_gui.py)

2. File Transcription (file-to-text.py)

Requirements

Installation

For macOS Users

For Developers

Release Information

Usage

Real-time Transcription

Model Selection

Language Support

Performance Tips

Troubleshooting

Contributing

License

Acknowledgments

Author

Support

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

1. Real-time Transcription (`whisper_gui.py`)

2. File Transcription (`file-to-text.py`)

Packages