You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now we are using (a subset) of Libri Lite which is a very big (60k hours) dataset of audiobooks read by thousands of speakers. It is pretty good but there is a lot of (probably more expressive and emotional) speech available in YouTube videos. For the final training run it would be great to have more varied data to improve the quality of the model.
The text was updated successfully, but these errors were encountered:
I think we need native speakers to ensure high quality material and build the best global open source TTS system.
I am thinking of setting up a common format and some docs to help people prepare, validate and upload multilingual speech data to Huggingface to include into WhisperSpeech base model training.
jpc
changed the title
6. Gather more data
6. Gather more multi-lingual data
Jan 22, 2024
Right now we are using (a subset) of Libri Lite which is a very big (60k hours) dataset of audiobooks read by thousands of speakers. It is pretty good but there is a lot of (probably more expressive and emotional) speech available in YouTube videos. For the final training run it would be great to have more varied data to improve the quality of the model.
The text was updated successfully, but these errors were encountered: