[Bug] Exception while using "--speaker_wav" #1440

lokeshhctm · 2022-03-24T08:45:55Z

🐛 Description

(base) root@ip-192-168-0-200:/

/root/miniconda3/bin/tts --text "Awesome, Pretty Good" --model_name "tts_models/en/vctk/vits" --out_path "chunk11_encoded.wav" --speaker_wav "chunk10.wav"

tts_models/en/vctk/vits is already downloaded.
Using model: vits
Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:45
| > do_sound_norm:False
| > do_amp_to_db_linear:False
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
initialization of speaker-embedding layers.
Using Griffin-Lim as no vocoder model defined
Text: Awesome, Pretty Good
Text splitted to sentences.
['Awesome, Pretty Good']
Traceback (most recent call last):
File "/root/miniconda3/bin/tts", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.9/site-packages/TTS/bin/synthesize.py", line 287, in main
wav = synthesizer.tts(args.text, args.speaker_idx, args.language_idx, args.speaker_wav)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/utils/synthesizer.py", line 245, in tts
speaker_embedding = self.tts_model.speaker_manager.compute_d_vector_from_clip(speaker_wav)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/tts/utils/speakers.py", line 287, in compute_d_vector_from_clip
d_vector = _compute(wf)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/tts/utils/speakers.py", line 270, in _compute
waveform = self.speaker_encoder_ap.load_wav(wav_file, sr=self.speaker_encoder_ap.sample_rate)
AttributeError: 'NoneType' object has no attribute 'load_wav'

Expected behavior

Environment

🐸TTS Version (e.g., 1.3.0):
PyTorch Version (e.g., 1.8)
Python version:
OS (e.g., Linux):
CUDA/cuDNN version:
GPU models and configuration:
How you installed PyTorch (conda, pip, source):
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

WeberJulian · 2022-03-24T15:28:52Z

Hey, that's not a bug. The model tts_models/en/vctk/vits doesn't use an external speaker embedding, you can only use the speakers it was trained on. You can see thoses speakers here tts --model_name "tts_models/en/vctk/vits" --list_speaker_idx.

To use clone someone voice with --speaker_wav you can use YourTTS tts_models/multilingual/multi-dataset/your_tts

WeberJulian · 2022-03-30T12:32:51Z

If you have more questions about this, feel free to reopen the issue, or ask them on our Gitter.

jreus · 2022-05-08T17:47:40Z

Heya @WeberJulian -- maybe a more informative error message would be useful here? Since this isn't really an error - otherwise it looks like a bug

Fixes coqui-ai#1440. Passing a `speaker_wav` argument to regular Vits models failed because they don't support voice cloning. Now that argument is simply ignored.

* Revert "fix for issue 3067" This reverts commit 041b4b6. Fixes #3143. The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for. * fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file * fix: only compute spk embeddings for models that support it Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed because they don't support voice cloning. Now that argument is simply ignored.

lokeshhctm added the bug Something isn't working label Mar 24, 2022

WeberJulian self-assigned this Mar 24, 2022

WeberJulian closed this as completed Mar 30, 2022

TheLocalLab mentioned this issue Nov 20, 2023

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

Closed

eginhard mentioned this issue Nov 20, 2023

Fix tts_with_vc #3275

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Exception while using "--speaker_wav" #1440

[Bug] Exception while using "--speaker_wav" #1440

lokeshhctm commented Mar 24, 2022

WeberJulian commented Mar 24, 2022 •

edited

Loading

WeberJulian commented Mar 30, 2022

jreus commented May 8, 2022

[Bug] Exception while using "--speaker_wav" #1440

[Bug] Exception while using "--speaker_wav" #1440

Comments

lokeshhctm commented Mar 24, 2022

🐛 Description

Expected behavior

Environment

Additional context

WeberJulian commented Mar 24, 2022 • edited Loading

WeberJulian commented Mar 30, 2022

jreus commented May 8, 2022

WeberJulian commented Mar 24, 2022 •

edited

Loading