A Docker-powered service that transcribes audio files using OpenAI's Whisper model. This service is optimized for handling audio files of any size and runs locally on your machine using GPU acceleration (if available).
- π Easy setup with Docker
- π¦ No file size limits with optimized memory handling
- π― Supports multiple audio formats (.mp3, .wav, .m4a, .ogg, .flac)
- β‘ GPU acceleration with CUDA 12.1 support
- π Concurrent processing with job management
- π Real-time job status tracking
- π§Ή Automatic memory cleanup and optimization
- π Secure file handling with non-root user execution
- π RESTful API with comprehensive endpoints
- π Convenient command-line utilities
You'll need:
- Docker and Docker Compose installed on your machine
- NVIDIA GPU with CUDA 12.1 support (optional, but recommended for better performance)
- NVIDIA Container Toolkit (if using GPU)
- FFmpeg (installed automatically in container)
- Clone this repository:
git clone https://github.com/your-repo/whisper-service
cd whisper-service
- Start the service:
docker compose up --build
- Open
http://localhost:8000
in your browser - You'll see a simple interface listing all available endpoints
- Visit
http://localhost:8000/docs
for interactive API documentation
The service provides several endpoints for managing transcription jobs:
curl -X POST "http://localhost:8000/transcribe/" \
-F "file=@path/to/your/audio.mp3"
Response:
{
"job_id": "job_1234567890_abcd",
"status": "queued",
"message": "Transcription job queued successfully",
"file_info": {
"name": "audio.mp3",
"size": 1048576
}
}
curl "http://localhost:8000/status/job_1234567890_abcd"
curl "http://localhost:8000/jobs"
curl -X DELETE "http://localhost:8000/jobs/job_1234567890_abcd"
curl "http://localhost:8000/health"
A dedicated command-line client is available at whisper-client, providing a convenient interface for transcribing files, managing jobs, and tracking progress.
The service includes two utility scripts for processing audio files:
Direct audio file processing script:
python3 process_audio.py input.mp3
This will create a JSON output file with the full transcription results.
Utility for extracting plain text from transcription JSON files:
python3 process_whisper.py --dir /path/to/transcripts
This will process all JSON transcription files in the directory and create corresponding .txt files with just the transcribed text.
The service includes several performance optimizations configured in the Dockerfile:
# GPU Memory Optimization
ENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
# Tokenizer Performance
ENV TOKENIZERS_PARALLELISM=true
These environment variables can be adjusted in the Dockerfile to optimize performance for your specific use case.
The service is configured in docker-compose.yml
with optimized settings for handling large audio files:
services:
whisper-api:
# ... other settings ...
shm_size: '8gb' # Shared memory size for large file processing
ulimits:
memlock: -1 # Unlimited locked-in-memory address space
stack: 67108864 # Stack size limit
command: >
uvicorn main:app
--host 0.0.0.0
--port 8000
--timeout-keep-alive 300 # Keep-alive timeout in seconds
--workers 1 # Number of worker processes
--log-level info
--reload # Auto-reload on code changes (development)
These settings can be adjusted based on your system resources and requirements:
shm_size
: Increase for better performance with large filesworkers
: Increase for better concurrent request handling (if CPU allows)timeout-keep-alive
: Adjust based on expected transcription durations--reload
: Remove in production for better performance
The service runs as a non-root user for enhanced security:
- Dedicated 'whisper' user created in container
- All processes run with limited permissions
- Upload and temp directories with controlled access (777 permissions required for operation)
The service automatically detects and uses your NVIDIA GPU if available. GPU support is configured in docker-compose.yml
:
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
To disable GPU support, simply remove these sections from the docker-compose.yml file.
{
"text": "Complete transcribed text...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Segment text..."
}
]
}
{
"status": "healthy",
"model": "whisper-base",
"supported_formats": [".mp3", ".wav", ".m4a", ".ogg", ".flac"],
"max_file_size": "unlimited",
"gpu_available": true,
"active_jobs": 1,
"max_concurrent_jobs": 3
}
-
"Error: GPU not available"
- Check CUDA 12.1 compatibility with your GPU
- Verify NVIDIA Container Toolkit is installed
- Try running
nvidia-smi
to confirm GPU is detected
-
"Error: Job queue full"
- Wait for current jobs to complete
- Monitor active jobs using the /jobs endpoint
- Consider adjusting the number of workers if system resources allow
-
Memory Issues with Large Files
- Increase
shm_size
in docker-compose.yml - Adjust PYTORCH_CUDA_ALLOC_CONF in Dockerfile
- Monitor container resources with
docker stats
- Increase
-
Service Performance
- Remove
--reload
flag in production - Adjust number of workers based on CPU cores
- Consider GPU acceleration for faster processing
- Tune TOKENIZERS_PARALLELISM based on workload
- Remove
-
Permission Issues
- Ensure upload/temp directories have correct permissions
- Verify Docker user mapping if using custom UID/GID
- Check file ownership in container
main.py
: FastAPI application with job management and API endpointsprocess_audio.py
: Direct audio transcription utilityprocess_whisper.py
: JSON transcript to text converterDockerfile
: Container image definition with CUDA support and optimizationsdocker-compose.yml
: Service orchestration and resource configuration
We welcome contributions! Please feel free to submit issues and pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE
file for details.
- Check the FAQs (if available)
- Open an issue
- Read OpenAI's Whisper documentation
Built with β€οΈ using OpenAI Whisper and FastAPI