Training 2-3 models, suggestions? #1157

prakharpbuf · 2023-01-18T14:11:49Z

Hi,
Great work with Real Time Voice Cloning!

I already got some experience training the models. I fine tuned the model with one of the speakers from dev-clean LibriSpeech and successfully got noticeable improvement in the output quality.
Now I'm going to train two or three models:
1. Everything from scratch using LibriTTS dataset.
I know blue-fish (Now @ghost) and @mbdash tried to train a model using LibriTTS in #449 but the output did not
improve. They were still trying and moved the discussion to a slack channel so I don't know what the end result was. If
anyone knows what happened after they trained the new encoder and everything and can share the result (and even better,
the model) will be much appreciated!
After this model is trained, I might also fine tune it for my voice. (same as point 2. in this post)
2. Fine tune the pre trained model on 1 hour of my own voice.
I know it has been noted by blue-fish in #437 that fine tuning on your voice with 0.2hr of data and training for a few
thousand steps improves the quality of output for your voice. But I wonder what will happen if instead of 0.2hr, I use a whole
hour (maybe more) and train it for more than just a few thousand, maybe in the order of 10 thousands.
3. Maybe also a model using the pretrained model and train it for a few additional 100-200k steps using more data from
Mozilla Common Voice or something else.
Do you think this will be useful?

In #126, @sberryman trained all three models from scratch but he was not happy with the synthesizer and vocoder that he
trained. I'm not sure what it means because I don't have any experience with AI but he says that the synthesizer did not
align well? Also, he said the encoder was pretty good though. He has uploaded his models and the link still works, maybe I
can try something with the models he trained? He has a better encoder.

I don't have a very fast memory (I'll be using external hard drives)
I have NVIDIA Quadro P2000 GPUs
But, to train each model, I'll use separate PCs with same specs so they all train in parallel.

Any suggestions on playing around with hparams, different ideas for training, or anything else?
All suggestions are welcomed and appreciated.

Also, if you have any ides on training, lemme know. Like let me know what to train (one of the pretrained/train from scratch), what dataset to use, what hparams, and how much to train, and I'll do it. I have plenty of time.

Thanks!

oops408 · 2023-03-15T07:17:59Z

try using model architecture (eg. location vs. contact-based) and loss functions as hparams and see if those help fine tune it. i'm trying out SGD optimization to see if that would improve the results. oh yeah, maybe pitch shifting would be interesting as well...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training 2-3 models, suggestions? #1157

Training 2-3 models, suggestions? #1157

prakharpbuf commented Jan 18, 2023 •

edited

Loading

oops408 commented Mar 15, 2023

Training 2-3 models, suggestions? #1157

Training 2-3 models, suggestions? #1157

Comments

prakharpbuf commented Jan 18, 2023 • edited Loading

oops408 commented Mar 15, 2023

prakharpbuf commented Jan 18, 2023 •

edited

Loading