Update XTTS cloning #3207

erogol · 2023-11-13T12:06:01Z

Optionally chunk input audio and average the computed latents. It prevents long silences, especially against references with many silent segments.

Edresson · 2023-11-13T12:19:24Z

TTS/tts/models/xtts.py

@@ -255,39 +255,57 @@ def device(self):
        return next(self.parameters()).device

    @torch.inference_mode()
-    def get_gpt_cond_latents(self, audio, sr, length: int = 3):
+    def get_gpt_cond_latents(self, audio, sr, length: int = 30, chunk_length: int = 6):


I think the default value here and in the config should be the same.

I think it will be better, users can try to call this function individually and then get very different results. I think the better is both be equal to avoid issues like it.

config as in the code or the release model's config

Edresson · 2023-11-13T12:21:53Z

All looks good to me.

WeberJulian · 2023-11-13T12:32:30Z

TTS/TTS/tts/models/xtts.py

Line 376 in a16360a

audio = audio.mean(0, keepdim=True)

This can be removed since it's done in load_audio

TTS/TTS/tts/models/xtts.py

Line 372 in a16360a

# load the audio in 24khz to avoid issued with multiple sr references

Comment should say 22khz

apart from that LGTM

ukemamaster · 2023-11-13T15:16:11Z

@erogol With this PR we will be able to use upto 30s of speaker_wav for voice cloning? Instead of 6s (before)?

erogol · 2023-11-16T10:22:00Z

You can use any length and any number of samples. Just don't go wild. Samples should be consistent in style, pitch, etc.

erogol added 2 commits November 13, 2023 13:00

Implement chunking gpt_cond

a16360a

Make style

b2682d3

erogol requested review from Edresson and WeberJulian November 13, 2023 12:06

Edresson approved these changes Nov 13, 2023

View reviewed changes

Fixup

92fa988

erogol merged commit f32a465 into dev Nov 13, 2023
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update XTTS cloning #3207

Update XTTS cloning #3207

erogol commented Nov 13, 2023

Edresson Nov 13, 2023

erogol Nov 13, 2023

Edresson Nov 13, 2023

erogol Nov 13, 2023

Edresson commented Nov 13, 2023

WeberJulian commented Nov 13, 2023

ukemamaster commented Nov 13, 2023

erogol commented Nov 16, 2023

Update XTTS cloning #3207

Update XTTS cloning #3207

Conversation

erogol commented Nov 13, 2023

Edresson Nov 13, 2023

Choose a reason for hiding this comment

erogol Nov 13, 2023

Choose a reason for hiding this comment

Edresson Nov 13, 2023

Choose a reason for hiding this comment

erogol Nov 13, 2023

Choose a reason for hiding this comment

Edresson commented Nov 13, 2023

WeberJulian commented Nov 13, 2023

ukemamaster commented Nov 13, 2023

erogol commented Nov 16, 2023