Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] RuntimeError: stft requires the return_complex parameter be given for real inputs #2449

Closed
thoraxe opened this issue Mar 22, 2023 · 10 comments · Fixed by eginhard/coqui-tts#20
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.

Comments

@thoraxe
Copy link

thoraxe commented Mar 22, 2023

Describe the bug

When trying to fine-tune YourTTS with an LJSpeech formatted dataset after computing the encodings, this error appears.

To Reproduce

Run the your_tts recipe:
https://gist.github.com/thoraxe/d75d47990b5a07b2201ddeaebdb71362

Expected behavior

Training should occur.

Logs

Traceback (most recent call last):
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/trainer/trainer.py", line 1591, in fit
    self._fit()
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/trainer/trainer.py", line 1544, in _fit
    self.train_epoch()
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/trainer/trainer.py", line 1309, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/trainer/trainer.py", line 1126, in train_step
    batch = self.format_batch(batch)
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/trainer/trainer.py", line 926, in format_batch
    batch = self.model.format_batch_on_device(batch)
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 1503, in format_batch_on_device
    batch["spec"] = wav_to_spec(wav, ac.fft_size, ac.hop_length, ac.win_length, center=False)
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 123, in wav_to_spec
    spec = torch.stft(
  File "/home/thoraxe/.pyenv/versions/tts-310/lib/python3.10/site-packages/torch/functional.py", line 641, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3080"
        ],
        "available": true,
        "version": "11.8"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.0+cu118",
        "TTS": "0.12.0",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.9",
        "version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023"
    }
}

Additional context

No response

@thoraxe thoraxe added the bug Something isn't working label Mar 22, 2023
@erogol
Copy link
Member

erogol commented Mar 23, 2023

try dev branch. should be fixed there.

@pivolan
Copy link

pivolan commented Mar 23, 2023

one bad variant to fix by hardcode in local env:

nano /opt/conda/lib/python3.10/site-packages/torch/functional.py:

at line 641:

insert this before return statement:

    if not return_complex:
        return torch.view_as_real(_VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
                                           normalized, onesided, return_complex=True))

image

@leenajenniferedwin
Copy link

Hi,

I have come across this same problem when trying to run the NVidiaQuartzNetMic.ipynb, tried fixing it by inserting this statement in my /usr/local/lib/python3.9/dist-packages/torch/functional.py at line 641, but it still gives the same error. Could you please help me.

/usr/local/lib/python3.9/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex)
639 input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
640 input = input.view(input.shape[-signal_dim:])
--> 641 if not return_complex:
642 return torch.view_as_real(_VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
643 normalized, onesided, return_complex=True))

RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

@offside609
Copy link

I have same issue in VITS model. But it comes as warning not a bug. But it leads to losses becoming Nan later and training interrupts.
#2555

@stale
Copy link

stale bot commented May 31, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@QinHsiu
Copy link

QinHsiu commented Sep 18, 2023

I have the same problem, can anyone help me fix this problem?

@pivolan
Copy link

pivolan commented Sep 18, 2023

I have the same problem, can anyone help me fix this problem?

as I tried some time ago, in the last version of coqui, this problem is already solved. But if you still have it, and use old version of coqui tts, try my method with changes in sources of torch library.

@Jasmijn888
Copy link

one bad variant to fix by hardcode in local env:

nano /opt/conda/lib/python3.10/site-packages/torch/functional.py:

at line 641:

insert this before return statement:

    if not return_complex:
        return torch.view_as_real(_VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
                                           normalized, onesided, return_complex=True))

image

it works for me. Thanks!

@eginhard
Copy link
Contributor

eginhard commented May 4, 2024

This has also been fixed in our fork, available via pip install coqui-tts

@Snailgoo
Copy link

Snailgoo commented Jul 8, 2024

one bad variant to fix by hardcode in local env:

nano /opt/conda/lib/python3.10/site-packages/torch/functional.py:

at line 641:

insert this before return statement:

    if not return_complex:
        return torch.view_as_real(_VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
                                           normalized, onesided, return_complex=True))

image

**Traceback (most recent call last):
File "/D2Former/evaluation.py", line 102, in
evaluation(args.model_path, noisy_dir, clean_dir, args.save_tracks, args.save_dir, out_file)
File "/D2Former/evaluation.py", line 72, in evaluation
est_audio, length = enhance_one_track(model, noisy_path, saved_dir, 16000*8, n_fft, n_fft//4, save_tracks)
File "/py310_audio/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/D2Former/evaluation.py", line 42, in enhance_one_track
est_audio = torch.istft(est_spec_uncompress, n_fft, hop, window=torch.hamming_window(n_fft).cuda(),
RuntimeError: istft requires a complex-valued input tensor matching the output from stft with return_complex=True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants