Feel free to join my Discord Server to discuss this model!
A foundational model for voice audio, could be effectively fine-tuned on a single GPU to implement text-to-speech, text enchancement and diarization. Based on original Speechflow Paper: Generative Pre-training for Speech with Flow Matching
- Supervoice Enhance - cleanup of background noise
MIT