-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vocoder Preprocessing Failure #833
Comments
OK, I did a bit more tracing. Based on the above error, in synthesizer\synthesize.py, line 69, I changed the line from:
to:
and ran vocoder.preprocess.py using the command line:
This cleared the collate_synthesizer error, but still failed to run the preprocess. Here is the output I received:
Here are the relevant lines from synthesize.py:
Not sure where to go with this one...I am using a GPU, CUDA 11.1, num_workers=0 (because of Win10 pickle error). |
Per earlier comment by blufish, line 87 should read: _, mels_out, _, _ = model(texts, mels,embeds) |
Regarding the latest problem, please see: #729 (comment) If you don't mind, please submit a pull request containing the modifications needed to make the vocoder preprocess code work. |
Thanks @netman789 and @blue-fish . I will try the #729 solution and test. If everything runs properly, I will then submit pull requests to change train.py (in synthesizer, and vocoder) to fix pickle errors in win10, a pull request to fix synthesize.py for print(hparams_debug_string()) and collate_synthesizer issues and add the #729 fix as a pull request also. |
TC2, if the vocoder_preprocess runs successfully now, I would be interested to know. I have reached an impasse with a different problem. I am running a slightly different dataset and am getting this error: initializing synthesizer/synthesize {'allow_clipping_in_normalization': True, Loading weights at synthesizer\saved_models\pretrained\pretrained.pt |
@netman789 , My run (see below) does not say that...mine goes straight to arguments. Don't know why it would initialize synthesizer/synthesize I just ran vocoder_preprocess.py after inserting the #729 solution in synthesize.py. It ran...up to 38% complete, then I got a CUDA out of memory halt. Here is the code:
Then I tried again with the --cpu argument. But it did run... |
It seems the command line option is not successfully forcing CPU use. Try changing this line to:
|
For a fixed model size, the Only way I know of to get around OOM is to cut the sample size. |
@blue-fish Thanks I put in the fix you suggested and the vocoder_preprocess.py worked properly in the cpu. I will put in the pull requests. Next I will try vocoder_train.py @netman789 Thanks. Reducing sample size (to 1/3 of the total samples) was my "Plan B", then run the preprocessor 3 times (once for each batch of samples) and combine the output results manually. |
I was able to train the vocoder on top of the pretrained WaveRNN vocoder. Took about 25 min, |
1) added Win10 pickle fix per issue CorentinJ#669 and blue-fish/Real-Time-Voice-Cloning@89a9964 (lines 12 and 70) 2) edited line 19 to delete "hparams" as argument; hparams_debug_string() needs 0 arguments. Allows vocoder_preprocess.py to run properly on WIn10/GPU (CUDA) systems) 3) edited line 87 to fix improper definition for mels_out per @blue-fish recommendation in issue#729 and issue CorentinJ#833; Allows vocoder_preprocess.py to run properly on WIn10/GPU (CUDA) systems)
@Tomcattwo changed line 47 per recommendation by @blue-fish in issue CorentinJ#833; allows --cpu argument to be properly recognized
Pull request #838 submitted for all of the above fixes. This issue is ready to be closed. |
Hello @blue-fish and all,
I am running the demo_toolbox on Win10, under Anaconda3 (run as administrator), env: VoiceClone, using an NVidia GEForce RTS2070 Super on an EVGA 08G-P4-3172-KR card, 8GB GDDR6, using python 3.7, pytorch Win10/CUDA version 11.1, with all other requirements met. The toolbox GUI (demo_toolbox.py) works fine on this setup.
My project is to use the toolbox to clone 15 voices from a computer simulation (to be able to add additional voice material (.wav files) in those voices back into the sim), one voice at a time, using the Single Voice method described in Issue #437 I have been able to preprocess my datasets (see #832 ) and single-voice train them onto the LibriSpeech 295K pretrained synthesizer with good results.
During this experiment, I tried to conduct Vocoder training on dataset V13M (see #832 ), as described in the README.TXT file from the zip file provided by @blue-fish in #437
I used the command line:
python vocoder_preprocess.py datasets_root --model_dir synthesizer/saved_models/V13M_LS_pretrained
It could not find dataset_root\SV2TTS\vocoder\mels_gta
So I created dataset_root\SV2TTS\vocoder\mels_gta, copied all the mels from dataset_root\SV2TTS\synthesizer\mels into dataset_root\SV2TTS\vocoder\mels_gta and ran it again
I ran into the following issues:
While attempting to run vocoder_preprocess.py on the single voice trained synthesizer and dataset V13M, I ran into the Win10 "pickle" issue, in ...\vocoder\train.py. This issue was identical to the pickle error I encountered when doing synthesizer training on the dataset. I solved it in exactly the same way, by recoding ...vocoder\train.py to use the workaround provided here: blue-fish@89a9964 This corrected the pickle issue for vocoder.preprocess.py
Next I encountered error in vocoder.preprocess.py: "hparams_debug_string() takes 0 positional arguments but one was given"
a) vocoder_preprocess.py imports hparams from synthesizer.hparams
b) synthesizer.hparams defines the hparams_debug_string() as "def hparams_debug_string():" in the second to last line
c) synthesize.py (which is where the error occurs) includes in line 17: "print(hparams_debug_string(hparams))"
By changing this line to: "print(hparams_debug_string())", I was able to clear the error, but I think this may have then caused the next issue
When I ran vocoder_preprocess.py again, I received the following:
`(VoiceClone) C:\Utilities\SV2TTS>python vocoder_preprocess.py datasets_root --model_dir synthesizer/saved_models/V13M_LS_pretrained
Arguments:
datasets_root: datasets_root
model_dir: synthesizer/saved_models/V13M_LS_pretrained
hparams:
no_trim: False
cpu: False
{'allow_clipping_in_normalization': True,
'clip_mels_length': True,
'fmax': 7600,
'fmin': 55,
'griffin_lim_iters': 60,
'hop_size': 200,
'max_abs_value': 4.0,
'max_mel_frames': 900,
'min_level_db': -100,
'n_fft': 800,
'num_mels': 80,
'power': 1.5,
'preemphasis': 0.97,
'preemphasize': True,
'ref_level_db': 20,
'rescale': True,
'rescaling_max': 0.9,
'sample_rate': 16000,
'signal_normalization': True,
'silence_min_duration_split': 0.4,
'speaker_embedding_size': 256,
'symmetric_mels': True,
'synthesis_batch_size': 16,
'trim_silence': True,
'tts_cleaner_names': ['english_cleaners'],
'tts_clip_grad_norm': 1.0,
'tts_decoder_dims': 128,
'tts_dropout': 0.5,
'tts_embed_dims': 512,
'tts_encoder_K': 5,
'tts_encoder_dims': 256,
'tts_eval_interval': 500,
'tts_eval_num_samples': 1,
'tts_lstm_dims': 1024,
'tts_num_highways': 4,
'tts_postnet_K': 5,
'tts_postnet_dims': 512,
'tts_schedule': [(2, 0.001, 20000, 12),
(2, 0.0005, 40000, 12),
(2, 0.0002, 80000, 12),
(2, 0.0001, 160000, 12),
(2, 3e-05, 320000, 12),
(2, 1e-05, 640000, 12)],
'tts_stop_threshold': -3.4,
'use_lws': False,
'utterance_min_duration': 1.6,
'win_size': 800}
Synthesizer using device: cuda
Trainable Parameters: 30.870M
Loading weights at synthesizer\saved_models\V13M_LS_pretrained\V13M_LS_pretrained.pt
Tacotron weights loaded from step 297000
Using inputs from:
datasets_root\SV2TTS\synthesizer\train.txt
datasets_root\SV2TTS\synthesizer\mels
datasets_root\SV2TTS\synthesizer\embeds
Found 325 samples
0%| | 0/21 [00:00<?, ?it/s]
Traceback (most recent call last):
File "vocoder_preprocess.py", line 58, in
run_synthesis(args.in_dir, args.out_dir, args.model_dir, modified_hp)
File "C:\Utilities\SV2TTS\synthesizer\synthesize.py", line 78, in run_synthesis
for i, (texts, mels, embeds, idx) in tqdm(enumerate(data_loader), total=len(data_loader)):
File "C:\Users\Colt_.conda\envs\VoiceClone\lib\site-packages\tqdm\std.py", line 1185, in iter
for obj in iterable:
File "C:\Users\Colt_.conda\envs\VoiceClone\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self.next_data()
File "C:\Users\Colt.conda\envs\VoiceClone\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\Colt.conda\envs\VoiceClone\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "C:\Utilities\SV2TTS\synthesizer\synthesize.py", line 69, in
collate_fn=lambda batch: collate_synthesizer(batch, r),
TypeError: collate_synthesizer() missing 1 required positional argument: 'hparams'`
At this point I could not trace the code back any further, but it looks like the hparams are not getting properly sent to vocoder.train.py
If you need any other information, I will try to provide it. ,Please let me know.
Regards,
Tomcattwo
The text was updated successfully, but these errors were encountered: