You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I heard tacotron 2 needs very little data 100-300 sentences for good sounding speech. However, it has bad tempo shit. I've seen wavenet can be curated for music and I wondered if the model can be conditioned to tts with rhythm. Even if it it is possible (hopefully), I have heard it requires large amounts of data in the 10's of GB's. Can wavenet can be trained with only 1-2 GB maybe no more than 4GB to get good results? And if it can, how does one prepare a dataset (like condition it)? So I chop audio or spit it in to each line the rapper spoke or give the full acapella? Do I use one wave file or multiple (oh what audio format and number of channels and sample rate)? Sorry, I am extremely new. Any help would be appreciated. Thanks.
Flavius Valerius Constantinus, The Last Roman Emperor
The text was updated successfully, but these errors were encountered:
So I heard tacotron 2 needs very little data 100-300 sentences for good sounding speech. However, it has bad tempo shit. I've seen wavenet can be curated for music and I wondered if the model can be conditioned to tts with rhythm. Even if it it is possible (hopefully), I have heard it requires large amounts of data in the 10's of GB's. Can wavenet can be trained with only 1-2 GB maybe no more than 4GB to get good results? And if it can, how does one prepare a dataset (like condition it)? So I chop audio or spit it in to each line the rapper spoke or give the full acapella? Do I use one wave file or multiple (oh what audio format and number of channels and sample rate)? Sorry, I am extremely new. Any help would be appreciated. Thanks.
Flavius Valerius Constantinus, The Last Roman Emperor
The text was updated successfully, but these errors were encountered: