Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Exception while using "--speaker_wav" #1440

Closed
lokeshhctm opened this issue Mar 24, 2022 · 3 comments · Fixed by #3275
Closed

[Bug] Exception while using "--speaker_wav" #1440

lokeshhctm opened this issue Mar 24, 2022 · 3 comments · Fixed by #3275
Assignees
Labels
bug Something isn't working

Comments

@lokeshhctm
Copy link

🐛 Description

(base) root@ip-192-168-0-200:/

/root/miniconda3/bin/tts --text "Awesome, Pretty Good" --model_name "tts_models/en/vctk/vits" --out_path "chunk11_encoded.wav" --speaker_wav "chunk10.wav"

tts_models/en/vctk/vits is already downloaded.
Using model: vits
Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:45
| > do_sound_norm:False
| > do_amp_to_db_linear:False
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
initialization of speaker-embedding layers.
Using Griffin-Lim as no vocoder model defined
Text: Awesome, Pretty Good
Text splitted to sentences.
['Awesome, Pretty Good']
Traceback (most recent call last):
File "/root/miniconda3/bin/tts", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.9/site-packages/TTS/bin/synthesize.py", line 287, in main
wav = synthesizer.tts(args.text, args.speaker_idx, args.language_idx, args.speaker_wav)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/utils/synthesizer.py", line 245, in tts
speaker_embedding = self.tts_model.speaker_manager.compute_d_vector_from_clip(speaker_wav)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/tts/utils/speakers.py", line 287, in compute_d_vector_from_clip
d_vector = _compute(wf)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/tts/utils/speakers.py", line 270, in _compute
waveform = self.speaker_encoder_ap.load_wav(wav_file, sr=self.speaker_encoder_ap.sample_rate)
AttributeError: 'NoneType' object has no attribute 'load_wav'

Expected behavior

Environment

  • 🐸TTS Version (e.g., 1.3.0):
  • PyTorch Version (e.g., 1.8)
  • Python version:
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • Any other relevant information:

Additional context

@lokeshhctm lokeshhctm added the bug Something isn't working label Mar 24, 2022
@WeberJulian
Copy link
Contributor

WeberJulian commented Mar 24, 2022

Hey, that's not a bug. The model tts_models/en/vctk/vits doesn't use an external speaker embedding, you can only use the speakers it was trained on. You can see thoses speakers here tts --model_name "tts_models/en/vctk/vits" --list_speaker_idx.

To use clone someone voice with --speaker_wav you can use YourTTS tts_models/multilingual/multi-dataset/your_tts

@WeberJulian WeberJulian self-assigned this Mar 24, 2022
@WeberJulian
Copy link
Contributor

If you have more questions about this, feel free to reopen the issue, or ask them on our Gitter.

@jreus
Copy link
Contributor

jreus commented May 8, 2022

Heya @WeberJulian -- maybe a more informative error message would be useful here? Since this isn't really an error - otherwise it looks like a bug

eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Nov 20, 2023
Fixes coqui-ai#1440. Passing a `speaker_wav` argument to regular Vits models failed
because they don't support voice cloning. Now that argument is simply ignored.
erogol pushed a commit that referenced this issue Nov 24, 2023
* Revert "fix for issue 3067"

This reverts commit 041b4b6.

Fixes #3143. The original issue (#3067) was people trying to use
tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has
integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there
is no point in passing it through FreeVC afterwards. So, reverting this commit
because it breaks tts.tts_with_vc_to_file() for any model that doesn't have
integrated VC, i.e. all models this method is meant for.

* fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file

* fix: only compute spk embeddings for models that support it

Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed
because they don't support voice cloning. Now that argument is simply ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants