Implementation of VITS-2 #1508

ezerhouni · 2024-02-19T17:14:04Z

Hello, I am trying to implement VITS2 but I am getting the following error :

  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 38, in piecewise_rational_quadratic_transform
    outputs, logabsdet = spline_fn(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 85, in unconstrained_rational_quadratic_spline
    ) = rational_quadratic_spline(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 118, in rational_quadratic_spline
    if torch.min(inputs) < left or torch.max(inputs) > right:
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

Do you have an idea where it might come from ? I know that without code it is difficult to know, I will do a PR of the implementation later this week. Thank you

The text was updated successfully, but these errors were encountered:

csukuangfj · 2024-02-19T22:52:05Z

Can you try
torch.min(inputs, dim=None)?

The error shows you need to specify the dim argument for torch.min(), though your code looks correct to me.

nshmyrev · 2024-02-19T23:29:01Z

Same issue coqui-ai/TTS#2555

It comes from the bad data file which doesn't align properly.

csukuangfj · 2024-02-20T01:47:33Z

@ezerhouni

I suggest that you use
https://github.com/rhasspy/piper-phonemize
to convert text to tokens.

Otherwise, it may be difficult, if not impossible, to deploy the trained model with C++.

You can find pre-built wheels for Linux and Windows at
https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5

@yaozengwei

Do you have any code to share about using piper-phonemizer to convert text to tokens?

ezerhouni · 2024-02-20T07:31:45Z

@csukuangfj Let me try torch.min(inputs, dim=None)
I am trying the LJSpeech recipe for the moment with VITS-2

csukuangfj · 2024-02-20T07:35:38Z

I am trying the LJSpeech recipe for the moment with VITS-2

Ok, but we are switching to piper-phonemize for converting text to tokens.

Hope that @yaozengwei can push the new tokenizer soon.

yaozengwei · 2024-02-20T10:10:41Z

I am trying the LJSpeech recipe for the moment with VITS-2

Ok, but we are switching to piper-phonemize for converting text to tokens.

Hope that @yaozengwei can push the new tokenizer soon.

I just uploaded the code here #1511.

ezerhouni · 2024-02-20T12:52:35Z

@csukuangfj Now I am getting:

  File "/vits2/egs/ljspeech/TTS/vits2/duration_predictor.py", line 191, in forward
    z = flow(z, x_mask, g=x, inverse=inverse)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/vits2/egs/ljspeech/TTS/vits2/flow.py", line 297, in forward
    xb, logdet_abs = piecewise_rational_quadratic_transform(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 38, in piecewise_rational_quadratic_transform
    outputs, logabsdet = spline_fn(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 85, in unconstrained_rational_quadratic_spline
    ) = rational_quadratic_spline(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 175, in rational_quadratic_spline
    assert (discriminant >= 0).all()
AssertionError

I will try with the new tokenizer to see if it fixes the issue

csukuangfj · 2024-02-20T13:16:13Z

@yaozengwei
Could you have a look at the above error?

yaozengwei · 2024-02-20T13:38:12Z

Hello, I am trying to implement VITS2 but I am getting the following error :

  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 38, in piecewise_rational_quadratic_transform
    outputs, logabsdet = spline_fn(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 85, in unconstrained_rational_quadratic_spline
    ) = rational_quadratic_spline(
  File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 118, in rational_quadratic_spline
    if torch.min(inputs) < left or torch.max(inputs) > right:
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

Do you have an idea where it might come from ? I know that without code it is difficult to know, I will do a PR of the implementation later this week. Thank you

Seems the tensor inputs for torch.min is empty.

csukuangfj · 2024-02-20T13:43:38Z

>>> import torch
>>> a = torch.empty((0,))
>>> torch.min(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

An empty tensor will indeed throw the same error.

ezerhouni · 2024-02-22T13:49:26Z

@csukuangfj I might have some good news but it needs a bit more testing. I will let you know next week

ezerhouni · 2024-02-22T14:13:21Z

Unrelated to VITS-2 (please tell me if you prefer that I open a proper issue), it seems that for the VITS recipes, you are using spectrogram which is using Wav2Spec while the loss is computed using Wav2LogFilterBank is on purpose ?

JinZr · 2024-03-18T13:52:43Z

hmm, i think we didn't choose this setup on purpose @yaozengwei am i right?

yaozengwei · 2024-03-18T14:34:07Z

Unrelated to VITS-2 (please tell me if you prefer that I open a proper issue), it seems that for the VITS recipes, you are using spectrogram which is using Wav2Spec while the loss is computed using Wav2LogFilterBank is on purpose ?

We just follows the VITS paper (https://arxiv.org/pdf/2106.06103.pdf), which uses linear spectrogram as input of the posterior encoder (Sec 2.1.3 and Fig.1), and uses mel-scale spectrograms to compute the reconstruction loss (Sec 2.1.2).

ezerhouni · 2024-03-18T14:36:16Z

@yaozengwei Yes my bad, I misunderstood part of the code

ezerhouni mentioned this issue Feb 20, 2024

[WIP] Add VITS-2 #1510

Closed

JinZr closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of VITS-2 #1508

Implementation of VITS-2 #1508

ezerhouni commented Feb 19, 2024

csukuangfj commented Feb 19, 2024

nshmyrev commented Feb 19, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

yaozengwei commented Feb 20, 2024 •

edited

Loading

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

yaozengwei commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 22, 2024

ezerhouni commented Feb 22, 2024

JinZr commented Mar 18, 2024

yaozengwei commented Mar 18, 2024

ezerhouni commented Mar 18, 2024

Implementation of VITS-2 #1508

Implementation of VITS-2 #1508

Comments

ezerhouni commented Feb 19, 2024

csukuangfj commented Feb 19, 2024

nshmyrev commented Feb 19, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

yaozengwei commented Feb 20, 2024 • edited Loading

ezerhouni commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

yaozengwei commented Feb 20, 2024

csukuangfj commented Feb 20, 2024

ezerhouni commented Feb 22, 2024

ezerhouni commented Feb 22, 2024

JinZr commented Mar 18, 2024

yaozengwei commented Mar 18, 2024

ezerhouni commented Mar 18, 2024

yaozengwei commented Feb 20, 2024 •

edited

Loading