A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.
- 🎙 Real-time audio transcription using OpenAI's Whisper
- 🌈 Beautiful, modern UI with animated audio visualizer
- 🚀 GPU acceleration support (Apple Silicon/CUDA)
- 🌍 Multi-language support (English, French, Vietnamese)
- 📊 Live audio waveform visualization with dynamic effects
- 💫 Smooth animations and transitions
- 🎯 Multiple Whisper model options (tiny, base, small, medium, large)
- ⚡️ Optimized streaming for better real-time performance
- 🎨 Enhanced visual feedback with glowing effects
- Modern GUI application for real-time speech recognition
- Dynamic waveform visualization with:
- Smooth wave transitions
- Responsive amplitude changes
- Glowing effects during recording
- Support for multiple languages and models
- GPU acceleration for better performance
- Optimized audio streaming with 0.3s chunks
- Automatic model initialization
- Convert audio/video files to text
- Supports multiple file formats:
- Audio: mp3, wav, m4a, etc.
- Video: mp4, mkv, avi, etc.
- Batch processing capability
- Output formats:
- Plain text (.txt)
- Microsoft Word (.docx)
- Timestamps support
- Python 3.11+
- macOS (tested on Apple Silicon)
- GPU recommended for better performance
- Download the latest
.dmg
file from the Releases page - Open the downloaded
.dmg
file - Drag the application to your Applications folder
- Double click to run the application
- Clone the repository:
git clone https://github.com/phongthanhbuiit/whisper-realtime-gui.git
cd whisper-realtime-gui
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On macOS/Linux
- Install the required packages:
pip install -r requirements.txt
- Version: 1.0.0
- Release Date: 2023-02-20
- Changes:
- Initial release with real-time transcription and file transcription features
- Activate the virtual environment if not already activated:
source venv/bin/activate # On macOS/Linux
- Run the GUI application:
python whisper_gui.py
- Select your preferred model and language from the dropdown menus
- Click "Start Recording" to begin transcription
- Speak into your microphone
- Watch the beautiful waveform animation and real-time transcription
Choose from different Whisper models based on your needs:
tiny
: Fastest, lowest accuracy (good for testing)base
: Good balance of speed and accuracysmall
: Better accuracy, still reasonable speedmedium
: High accuracy, slower processinglarge
: Best accuracy, requires more resources
Currently supports:
- English
- Vietnamese
- French
The language can be changed in real-time during transcription.
-
Model Selection:
- Start with
tiny
orbase
model for testing - Use
small
for general use - Use
medium
orlarge
only if you need highest accuracy
- Start with
-
GPU Acceleration:
- The app automatically uses GPU if available
- Recommended for
medium
andlarge
models
-
Audio Input:
- Speak clearly and at a moderate pace
- Keep microphone at a consistent distance
- Avoid background noise for better accuracy
If you encounter issues:
-
Audio Not Detected:
- Check your microphone permissions
- Verify input device in system settings
-
Slow Performance:
- Try a smaller model
- Ensure GPU acceleration is working
- Check CPU/Memory usage
-
Transcription Issues:
- Try changing the language setting
- Speak more clearly
- Adjust your microphone position
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for the amazing speech recognition model
- PySide6 for the modern GUI framework
- sounddevice for real-time audio processing
If you found this project helpful, please give it a ⭐️!