-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickle error running synthesizer_train.py #669
Comments
Thanks for reporting this issue @woodrow73 . Please try and see if this is reproducible with a normal Python installation, instead of Anaconda. Reference the Windows install instructions if needed. There are several bugs which seem specific to conda (#644, #646) and we don't have enough developer support to squash them. |
Same issue reported here, but without Anaconda. #472 (comment) @rallandr Did you ever find a solution to the pickle problem when training on Windows? |
Confirming an identical error without a virtual environment. For a workaround I attempted to coerce CPU usage over GPU - short of reinstalling CUDA (since changing file names & environment variables didn't do the trick - maybe it's due to the native pytorch files), I did try manually changing the 10 instances in the repository where torch.cuda.is_available() to False, but .pyc files might make that approach moot; same error message. Gonna try running with your Ubuntu 20.04 instructions |
@woodrow73 As a workaround, can you try setting Real-Time-Voice-Cloning/synthesizer/train.py Lines 146 to 151 in 9a35b3e
References: |
Thanks for the workaround - just got around to trying it out & initially got a memory error:
So I edited synthesizer/hparams.py to decrease batch size. synthesis_batch_size didn't seem to have an effect, so I left that at 16, and I changed the batch_size values in tts_schedule to 5 - which seems to be the highest value I can give it without running out of memory. I timed the training duration for the first 10 epochs(steps?), which took 5 minutes 3 seconds - here's what the console prints for me:
The program appears to be using between 20-50% of my Nvidia GeForce GTX 1060 6GB & 19-35% of my i5-6600k. Thanks for the help - this approach seems to result in training at the same speed as ori-pixel's CPU approach with a i5-4690k, if each step = an epoch. I can easily get 400 epochs overnight- I'm thrilled that it's working now; I think I'll still give Ubuntu a try. |
Thanks for the update. The training speed seems slow to me. Since neither your CPU nor GPU are at 100%, I think the bottleneck is the storage medium. Try moving your datasets_root/SV2TTS folder to a SSD. Is Your GPU should be capable of 1 step/sec and handle a batch size between 16 and 24 with reduction factor r=2. See if it is faster on Ubuntu. A quick read you may find helpful: https://docs.paperspace.com/machine-learning/wiki/epoch |
No trouble, thanks for the pointers. I hadn't considered the storage medium a variable, but makes sense with a lot of reading & writing; however, it is already on an internal SSD (850 evo). Yes, my max_mel_frames are 900 - after preprocessing the data with
After preprocessing the data with
This has me curious what the cost on the output is as a result of altering both max_mel_frames & batch size - as the Epoch description in the link would suggest, a smaller batch size likely creates more noise as it is too small to properly represent all the data. I'm not ML savvy enough (yet) to understand what exactly the console output is communicating in the first part - should 1/10000 (2/2) be alarming? I'll try it with Ubuntu within the week, after I post my 3-17 second utterance extractor from a single Wav-File (already finished, just wanna clean & document it more). I'm also gonna rig something up to help automate the process of writing down the timestamp of when each word is said (nothing fancy, like playing the audio clip slowly, then pressing a button at the start of each word) - or maybe I'll give in and try a forced aligner like you showed here. |
You posted output for these cases (corresponding to max_mel_frames = 900, 600 and 300 respectively):
Notice that lowering This diagram may be helpful to understand the display:
|
Interesting - aside from being smaller, I imagine there may be additional noise issues as a result of abruptly cutting off words from the Wav-Files. If I'm understanding this correctly, having max_mel_frames = 900 will still cut off Wav-Files at 11.25 seconds? Well, this is definitely yet another motivation to unlock the potential of my hardware via an Ubuntu installation. Thanks for the info; I'll also post the errors / workarounds here since it's on topic for synthesizer training on windows. |
error running synthesizer_preprocess_audio.py (and almost identical one to running synthesizer_preprocess_embeds.py):
Pastebin of the full error logs from both synthesizer_preprocess_audio.py and synthesizer_preprocess_embeds.py Another error I encountered afterwards was a Cuda memory error, which I traced to synthesizer_preprocess_embeds.py, where I found the helpful information needed for the workaround coded in the \help command:
|
Wav files that are too long are not truncated. Instead, they are dropped from the training set entirely. This can be avoided by using alignment data to split the wavs. |
My workaround for this issue of lambda objects not being pickled on Windows was to install and use multiprocessing_on_dill instead of python's multiprocessing module. This meant replacing two references to the native multiprocessing module in the There should be a better way of doing this via the |
I may have spoken too soon. Using dill instead of pickle prevents the serialization error, but there doesn't seem to be any performance change with the value of With my NVIDIA GeForce RTX 2080 Ti:
Not sure multiprocessing is working at all with my fix. |
Thanks for sharing your observations @arstropica . Not sure GPU acceleration is working properly there. I would expect 5-10x faster training speed with a 2080ti. |
@blue-fish Thanks for your observation. I am not sure how to address the cuda performance issue. The GPU is definitely working but only at a minimum rate. Benchmark tests show my SSD performance within expected values. Perhaps it is due to my GPU driver being out of sync with the cuda toolkit. Here is the output from nvidia-smi.
Not sure why performance is so bad. I have tried installing different toolkit versions: 10.0, 11.1 and 11.2 with the same result. Do I need to downgrade my driver to match the toolkit? |
Going back to the original problem, I don't think this problem is widespread enough on Windows to change the default to I am going to close this issue. @arstropica , you're invited to open an issue for the training speed problem. I do not have any ideas, but someone else might. |
#tomcattwo edited synthesizer\train.py per issue CorentinJ#669 and blue-fish/Real-Time-Voice-Cloning@89a9964 to fix Win10 pickle issue
@Tomcattwo edited vocoder/train.py per issue CorentinJ#669 and blue-fish/Real-Time-Voice-Cloning@89a9964 to fix Win10 pickle issue. This workaround will allow Windows users running on GPU to conduct vocoder_preprocessing and vocoder training.
1) added Win10 pickle fix per issue CorentinJ#669 and blue-fish/Real-Time-Voice-Cloning@89a9964 (lines 12 and 70) 2) edited line 19 to delete "hparams" as argument; hparams_debug_string() needs 0 arguments. Allows vocoder_preprocess.py to run properly on WIn10/GPU (CUDA) systems) 3) edited line 87 to fix improper definition for mels_out per @blue-fish recommendation in issue#729 and issue CorentinJ#833; Allows vocoder_preprocess.py to run properly on WIn10/GPU (CUDA) systems)
I've read that Python multi-threading doesn't work well on windows 10 (and that this repo has better Linux support), so my plan B is to set up a Linux dual-boot for the sole purpose of training single speaker models.
I have the latest version of this repo, with visual studio 2019, CUDA 11.0, the compatable Cudnn version, webrtcvad - I've installed pytorch 1.7.1 with CUDA 11.0 support, and the latest Nvidia drivers (and rebooted my system). torch.cuda.is_available() returns true, and I'm able to run demo_toolbox.py without errors.
I'm testing this on the logs-singlespeaker zip I found somewhere in this repo, and made a simple script to reformat each line in 211-122425.alignment.txt to become a new .txt file, matching it to the correct Flac-file. I cleared the SV2TTS/synthesizer folder to recreate the single-speaker training process, and had no issues generating the files in the audio folder, embeds folder, mels folder, and train.txt - with the commands
Here is the error from running synthesizer_train.py:
The text was updated successfully, but these errors were encountered: