What's Changed
v0.5.0 by @R3gm in #45
- Added option Overlap Reduction
- OpenAI API Key Integration for Transcription, translation, and TTS
- More output types: subtitles by speaker, separate audio sound, and video only with subtitles
- Access to a better-performing version of Whisper for transcribing speech on the Hugging Face Whisper page. Copy the repository ID and paste it into the 'Whisper ASR model' section in 'Advanced Settings'; e.g.,
kotoba-tech/kotoba-whisper-v1.1
for Japanese transcription available here - Support for ASS subtitles and batch processing with subtitles
- Vocal enhancement before transcription
- Added CPU mode with
app_rvc.py --cpu_mode
- TTS now supports up to 12 speakers
- OpenVoiceV2 integration for voice imitation
- PDF to videobook (displays images from the PDF)
- GUI language translation in Persian and Afrikaans
- New Language Support:
- Complete support: Estonian, Macedonian, Malay, Swahili, Afrikaans, Bosnian, Latin, Myanmar Burmese, Norwegian, Traditional Chinese, Assamese, Basque, Hausa, Haitian Creole, Armenian, Lao, Malagasy, Mongolian, Maltese, Punjabi, Pashto, Slovenian, Shona, Somali, Tajik, Turkmen, Tatar, Uzbek, and Yoruba
- Non-transcription: Aymara, Bambara, Cebuano, Chichewa, Divehi, Dogri, Ewe, Guarani, Iloko, Kinyarwanda, Krio, Kurdish, Kirghiz, Ganda, Maithili, Oriya, Oromo, Quechua, Samoan, Tigrinya, Tsonga, Akan, and Uighur
Full Changelog: 0.4.0...0.5.0