Improving repeatability of voice cloning #384

ghost · 2020-06-27T03:13:26Z

How can I make the voice cloning results repeatable? For a given model with same input.wav + text to synthesize, is there a way to ensure that I get the same output every time?

I see this code in synthesizer/train.py:

# Start by setting a seed for repeatability
tf.compat.v1.set_random_seed(hparams.tacotron_random_seed)

I tried inserting this in demo_cli.py and still get different results each time. Even if the program is restarted in between cloning attempts.

The text was updated successfully, but these errors were encountered:

ghost · 2020-07-08T20:15:16Z

After reviewing code from @plummet555 , I realize that we should also set random.seed() for the python built-in RNG and np.random.seed() for the numpy RNG. I will try this later and see if repeatability improves.

ghost · 2020-07-19T16:50:08Z

With the changes on the 384_repeatable_voice_cloning branch of my fork, I get fully repeatable voice cloning output using demo_toolbox.py . Tested on a computer without a GPU.

Test case

Reference audio: https://google.github.io/tacotron/publications/speaker_adaptation/demos/groundtruth/p240_00000.wav
Text: Take a look at these pages for crooked creek drive.
Export vocoder output as .wav file

You should find that the exported .wav files are identical for subsequent synthesize and vocode attempts within a session, and even across new toolbox sessions.

Required changes

Reload synthesizer and vocoder models every time they are used
Set torch.manual_seed() every time the vocoder is used
Set tensorflow seed every time synthesizer is used
- The other RNG seeds are not relevant.
- Also see 384 repeatable voice cloning #432 (comment)

Other changes

I reverted these changes and results are still repeatable on my platform. Listing here in case it helps troubleshooting in the future.

Force tensorflow to use a single-threaded session
Use kernel initializer for tf.nn.rnn_cell.GRUCell() and tf.layers.dense()
- This might affect repeatability of training, but without these, inference is still repeatable.

References

(2) https://pytorch.org/docs/stable/notes/randomness.html
(3, 4) https://stackoverflow.com/a/52897216
(5) https://stackoverflow.com/a/51558159

ghost · 2020-07-22T10:58:59Z

Closing now that #432 is merged.

This was referenced Jun 30, 2020

Feedback: My experience using this (very impressive) project #360

Closed

Synthesizer got different output with same input #306

Closed

This was referenced Jul 19, 2020

384 repeatable voice cloning #432

Merged

Single speaker fine-tuning process and results #437

Closed

ghost closed this as completed Jul 22, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving repeatability of voice cloning #384

Improving repeatability of voice cloning #384

ghost commented Jun 27, 2020

ghost commented Jul 8, 2020

ghost commented Jul 19, 2020 •

edited by ghost

Loading

ghost commented Jul 22, 2020

Improving repeatability of voice cloning #384

Improving repeatability of voice cloning #384

Comments

ghost commented Jun 27, 2020

ghost commented Jul 8, 2020

ghost commented Jul 19, 2020 • edited by ghost Loading

Test case

Required changes

Other changes

References

ghost commented Jul 22, 2020

ghost commented Jul 19, 2020 •

edited by ghost

Loading