Problems using OpenVoice with cuda and >5s source audio #234

eginhard · 2024-12-23T13:14:17Z

Discussed in #232

^{Originally posted by CiobanuPaul December 23, 2024}
I have just upgraded coqui-tts to 0.25.1 to be able to use OpenVoice voice converter.
One issue I get is that an exception occurs if I use "cuda". It works only on "cpu".
The second issue is that the output of the vc has always only 5 seconds of content, the rest of it is white noise (if the source wav is bigger than 5 seconds).

I am using python3.10
This is a part of the exception message for the first issue:

 File "/home/catalin/Documents/virtual_envs/venv/lib/python3.10/site-packages/TTS/vc/models/openvoice.py", line 288, in extract_se
    y = torch.FloatTensor(audio_ref)
TypeError: expected TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)) (got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

eginhard self-assigned this Dec 23, 2024

eginhard added the bug Something isn't working label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems using OpenVoice with cuda and >5s source audio #234

Problems using OpenVoice with cuda and >5s source audio #234

eginhard commented Dec 23, 2024

Problems using OpenVoice with cuda and >5s source audio #234

Problems using OpenVoice with cuda and >5s source audio #234

Comments

eginhard commented Dec 23, 2024

Discussed in #232