[WIP] Add VITS-2 #1510

ezerhouni · 2024-02-20T07:30:49Z

Add VITS-2 Recipe for LJSpeech.
For the moment, I am facing a bug tracked here :
#1508

I will add the License etc later on, once the PR work

csukuangfj · 2024-02-20T07:49:08Z

Does torch.min(inputs, dim=None) work for you?

ezerhouni · 2024-02-20T07:54:10Z

For now, I am getting

  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 118, in rational_quadratic_spline
    if torch.min(inputs, dim=None) < left or torch.max(inputs) > right:
RuntimeError: Please look up dimensions by name, got: name = None.

Debugging as we speak

Edit: The previous crash happened at epoch 918. I think it might come from the transformer layer I am adding

csukuangfj · 2024-02-20T07:56:22Z

What is the output of

print(type(inputs))

ezerhouni · 2024-02-20T07:57:22Z

What is the output of
print(type(inputs))

<class 'torch.Tensor'>

csukuangfj · 2024-02-20T08:08:31Z

As a last attempt, could you replace

torch.min(inputs, dim=None)

with

torch.min(inputs.reshape(-1), dim=0)

I just searched for the new error, and found
pytorch/pytorch#70925

Not sure if inputs is a named tensor.

ezerhouni · 2024-02-20T08:13:44Z

@csukuangfj Thanks for your help, I have replaced by:

    if torch.min(inputs.reshape(-1), dim=0)[0] < left or torch.max(inputs.reshape(-1), dim=0)[0] > right:
        raise ValueError("Input to a transform is not within its domain")

and it seems to run. It takes ~ 24h on my machine to train a model for 1000 epochs. I will let you know once the training is done

ezerhouni · 2024-02-20T13:46:22Z

egs/ljspeech/TTS/vits2/residual_coupling.py

+ xa, xb = x.split(x.size(1) // 2, dim=1)
+
+ x_trans_mask = make_pad_mask(torch.sum(x_mask, dim=[1, 2]).type(torch.int64))
+ xa_ = self.pre_transformer(


@csukuangfj I will try to debug more, but I think the issue might comes from there

ezerhouni · 2024-02-21T07:52:36Z

@csukuangfj I tried with the fix that you proposed and with the new tokenization, both throw the same error.

csukuangfj · 2024-02-21T07:54:31Z

Could you check that inputs is not empty in

    if torch.min(inputs.reshape(-1), dim=0)[0] < left or torch.max(inputs.reshape(-1), dim=0)[0] > right:

?

If it is empty, could you find why it is empty?

ezerhouni · 2024-02-26T08:39:11Z

@csukuangfj The training works fine. The quality is not optimal for the moment, so I am playing a bit with the parameters and will let you know

ezerhouni · 2024-03-04T07:12:28Z

@csukuangfj I am still having issues to train a model. I will close this PR and re-open one once I have a more stable branch.

ezerhouni added 5 commits February 20, 2024 08:28

Add MAS to VIT-1

cafc33b

Add transformer block

b9fdeba

Add duration discrimination loss

e5c04a2

Add speaker embedding in text encoder

a4e4f80

Add symbolic link

0377ccc

ezerhouni commented Feb 20, 2024

View reviewed changes

ezerhouni added 2 commits February 26, 2024 09:34

Add new tokenizer

9c083b4

Fix typo

e96d915

ezerhouni added 2 commits March 1, 2024 09:57

Expose coupling transformer parameters

87e1d28

Add new phonemizer to infer.py

16ccbc5

ezerhouni closed this Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add VITS-2 #1510

[WIP] Add VITS-2 #1510

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024 •

edited

Loading

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024

ezerhouni Feb 20, 2024

ezerhouni commented Feb 21, 2024

csukuangfj commented Feb 21, 2024

ezerhouni commented Feb 26, 2024

ezerhouni commented Mar 4, 2024

[WIP] Add VITS-2 #1510

[WIP] Add VITS-2 #1510

Conversation

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024 • edited Loading

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024

ezerhouni Feb 20, 2024

Choose a reason for hiding this comment

ezerhouni commented Feb 21, 2024

csukuangfj commented Feb 21, 2024

ezerhouni commented Feb 26, 2024

ezerhouni commented Mar 4, 2024

ezerhouni commented Feb 20, 2024 •

edited

Loading