You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I am trying to build a model with your version of Tacotron and my own data in US english.
As I am collecting data and formatting it, I am wondering what is the amount of data necessary to start getting some good results for the target voice? Does anyone know any empirical experiments so that I can set a target.
I have around 3/4h right now.
Thanks for the hard work on the repo
The text was updated successfully, but these errors were encountered:
Amazon has some papers about the amount of the data needed for TTS. I'd say you can try to finetune one of the released models with your own data. That'd be the easiest way to go. You can also start from scratch but I'd guess the data is not sufficient. In general, my personal estimation is around 15 hours for something reasonable.
Hey, I am trying to build a model with your version of Tacotron and my own data in US english.
As I am collecting data and formatting it, I am wondering what is the amount of data necessary to start getting some good results for the target voice? Does anyone know any empirical experiments so that I can set a target.
I have around 3/4h right now.
Thanks for the hard work on the repo
The text was updated successfully, but these errors were encountered: