-
Notifications
You must be signed in to change notification settings - Fork 16
This repository contains the scripts to use CURRENNT
License
nii-yamagishilab/project-CURRENNT-scripts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
########################################################################### ## prodject-CURRENNT-scrits ------------------------------------------- # ## --------------------------------------------------------------------- # ## # ## Copyright (c) 2020 National Institute of Informatics # ## # ## THE NATIONAL INSTITUTE OF INFORMATICS AND THE CONTRIBUTORS TO THIS # ## WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING # ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT # ## SHALL THE NATIONAL INSTITUTE OF INFORMATICS NOR THE CONTRIBUTORS # ## BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY # ## DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, # ## WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS # ## ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE # ## OF THIS SOFTWARE. # ########################################################################### ## Author: Xin Wang # ## Date: 2016 - 2020 # ## Contact: wangxin at nii.ac.jp # ########################################################################### This repository contains the scripts to use CURRENNT \- waveform-modeling: scripts for waveform models \- acoustic-modeling: scripts for acoustic models For NSF waveform-models, pytorch re-implementation is available now: https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts ----------------- 0. before using ---------------------- Please install CURRENNT and pyTools before using this repository. https://github.com/nii-yamagishilab/project-CURRENNT-public Please also install sox http://sox.sourceforge.net Please modify the environment variables in ./init.sh Please read the intruction below to use this repository ----------------- 1. waveform models -------------------- ./waveform-modeling | - DATA Directory to store the data for model training (validation will be selected automatically from these data) - TESTDATA Directory to store the data for test - TESTDATA-for-pretrained Directory to store the data for test for the project-WaveNet-pretrained and project-NSF-pretrained - SCRIPTS Scripts of the training/generation processes - project-NSF-pretrained Neural source-filter model trained on SLT. This is a demonstration script to generate waveforms from a trained NSF model. The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v1.html Note: due to historical reason, meanstd.bin for these pre-trained NSF models were not calculated using public-CURRENNT-scripts. Models project-NSF-pretrained can only use project-NSF-pretrained/meanstd.bin If you want to train models on your own corpus, please use the training scripts in project-NSF - project-NSF-v2-pretrained Simplified and harmonic-plus-noise (hn-sinc) NSFs trained on SLT. This is a demonstration script to generate waveforms from a trained NSF model. These new NSF models are explained here: https://nii-yamagishilab.github.io/samples-nsf/nsf-v2.html, https://nii-yamagishilab.github.io/samples-nsf/nsf-v3.html If you want to train models on your own corpus, please use the training scripts in project-NSF - project-NSF-pretrained-v4-CMU-4speakers hn-sinc-NSF using different types of source signals, including cyclic-noise-based source. All models trained on CMU-arctic using CLB, RMS, BDL, and SLT in a speaker-independent wavy. This is a demonstration script to generate waveforms from a trained NSF model. These new NSF models are explained here: https://nii-yamagishilab.github.io/samples-nsf/nsf-v4.html, If you want to train models on your own corpus, please use the training scripts in project-NSF - project-NSF-pretrained-v4-VCTK hn-sinc-NSF and cyclic-noise-based hn-sinc-NSF (see protocol in downloaded TESTDATA-for-pretrained-vc-VCTK after running 00_gen_from_pretrained_models.sh) These new NSF models are explained here: https://nii-yamagishilab.github.io/samples-nsf/nsf-v4.html - project-NSF Scripts to train NSF on CMU-arctic SLT voice. Note that project-NSF/MODELS contains both network.jsn for models in both project-NSF-pretrained/MODELS and project-NSF-v2-pretrained/MODELS: ./project-NSF/MODELS |- s-NSF: simplified NSF, which is used in project-NSF-v2-pretrained/MODELS/s-NSF |- h-NSF: harmonic-plus-noise NSF, which is used in project-NSF-v2-pretrained/MODELS/h-NSF |- h-sinc-NSF: h-NSF with trainable filters, which is used in project-NSF-v2-pretrained/MODELS/h-sinc-NSF |- cyclic-noise-h-sinc-NSF, h-sinc-NSF using cyclic-noise-source, which is used in /project-NSF-pretrained-v4-CMU-4speakers/MODELS/05_beta2/ |- NSF: baseline NSF, which is used in project-NSF-pretrained/MODELS/NSF |- NSF-L3, NSF-MSE, NSF-N2, NSF-S3, which used in project-NSF-pretrained/MODELS/NSF-* You can find other model definition files called network.jsn in pre-trained model directories. - project-WaveNet-pretrained Wavenet trained on SLT. This is a demonstration script to generate waveforms from a trained WaveNet model. The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v1.html - project-WaveNet-pretrained-v4-CMU-4speakers Wavenet trained on SLT, BDL, CLB, RMS. This is a demonstration script to generate waveforms from a trained WaveNet model. The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v4.html - project-WaveNet-pretrained-v4-VCTK WaveNet trained on VCTK (see protocol in downloaded TESTDATA-for-pretrained-vc-VCTK after running 00_gen_from_pretrained_models.sh) The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v4.html - project-WaveNet Scripts to train WaveNet on CMU-arctic SLT voice. Usage: 1. Install CURRENNT and pyTools, which can be downloaded from https://github.com/nii-yamagishilab/project-CURRENNT-public 2. For a quick check 2.1 modify the path in ./init.sh 2.2 run commands: $: source ./init.sh $: cd waveform-modeling/project-NSF-pretrained/ $: run 01_gen.sh Waveforms should be generated in ./waveform-modeling/project-NSF-pretrained/MODELS/NSF/output 2. For model training using the provided sample data $: source ./init.sh $: cd waveform-modeling/project-NSF/ $: run 00_run.sh After which you can get a trained model in ./waveform-modeling/project-NSF/MODELS/h-sinc-NSF/trained_network.jsn $: run 01_gen.sh After which you can get some waveforms in ./waveform-modeling/project-NSF/MODELS/h-sinc-NSF/output 00_run.sh and 01_gen.sh are only for demonstration. Please don't expect good output since the model is only trained using less than 10 utterances. To train a good model, you may need to use the whole data from one speaker of the CMU-arctic corpus. 3. For training using your own data (or CMU-arctic data): 1. Put waveforms and acoustic features in ./DATA, which stores the training data (validation data will be automatically selected from ./DATA) 2. Read and configure config.py in project-NSF or project-Wavenet 3. Run 00_run.sh 4. Put test data in ./TESTDATA, configure config.py and Run 01_gen.sh ----------------- 2. acoustic models -------------------- ./acoustic-modeling |- DATA Directory to store the data for model training (for demonstration) - TESTDATA Directory to store the data for test (for demonstration) - project-DAR-continuous Project scripts to train a DAR for continuous-valued output features. The demonstration is for training a DAR model that does ppg, xvector, f0 -> Mel-spectrogram - project-RNN Project scripts to train a RNN for continuous-valued output features. The demonstration is for training an RNN that does linguistic_features, one-hot_speaker_vector -> MGC, LF0, UV, BAP - SCRIPTS General scripts of the training/generation processes. Usage: please read README in project-***, and follow the steps to use ---------------------- Notes ----------------------------------- -------- Data IO: 1. All the feature files except waveforms should be saved as binary, float32, little-endian. You may check the data included in waveform-modeling/TESTDATA-for-pretrained/mfbsp/*. $: source ./init.sh $: python >>> from ioTools import readwrite >>> mel = readwrite.read_raw_mat('./waveform-modeling/TESTDATA-for-pretrained/mfbsp/arctic_a0001.mfbsp', 80) >>> mel.shape (671, 80) >>> mel[0] array([-0.5804557 , 0.64407444, 0.95468473, 0.9076864 , 0.7409921 , 0.5190042 , 0.2543492 , 0.07272982, -0.02690054, -0.08740515, -0.14761803, -0.24591875, -0.40614623, -0.49193513, -0.69396436, -0.6916913 , -0.67114395, -0.7134891 , -0.8428167 , -0.9381612 , -0.9604182 , -1.020919 , -1.0435845 , -1.077764 , -1.1341914 , -1.2459651 , -1.2637303 , -1.3517176 , -1.3730341 , -1.4908298 , -1.5803422 , -1.6830492 , -1.495914 , -1.3974593 , -1.4675162 , -1.5840678 , -1.6725454 , -1.584573 , -1.7740629 , -2.0816884 , -1.8013108 , -1.7561418 , -2.1837492 , -2.3368633 , -2.0934896 , -2.2621613 , -2.200103 , -2.3064234 , -1.9597349 , -2.2220097 , -2.296505 , -2.0561726 , -2.4601874 , -1.997487 , -2.0367327 , -2.2498186 , -2.8739617 , -2.859662 , -2.680149 , -3.1865113 , -3.172631 , -2.8548572 , -3.0105433 , -2.7395592 , -2.7438028 , -2.625576 , -2.8859007 , -2.8262408 , -2.6442852 , -2.847031 , -2.9952297 , -2.9672284 , -2.6831682 , -3.2138064 , -3.3470006 , -3.4002392 , -2.8704414 , -2.958755 , -3.2552214 , -3.5245233 ], dtype=float32) >>> mel[0,0] -0.5804557 >>> f0 = readwrite.read_raw_mat('./waveform-modeling/TESTDATA-for-pretrained/f0/arctic_a0001.f0', 1) >>> f0.shape (671,) Here, the binary mel-spectrogram is a matrix with 671 frames and 80 dimenions/frame. The F0 is one-dimensional vector with 671 frames. Notice that in physical memory, one datum in a two dimensional data matrix (e.g., mel-spectrom) is accessed through DataArray[D * n + d], where D is the feature dimension, n is the frame index, and d is the dimension index within one frame. 2. You can use numpy.tofile to write the data into binary,float32,litten-endian format. You can also use the write_raw_mat function in pyTools, which is a wrapper of numpy.tofile $: source ./init.sh $: python >>> from ioTools import readwrite >>> import numpy as np >>> data = np.random.randn(10,2) >>> data array([[-1.0792218 , -2.00836936], [-0.84080859, 1.81592092], [ 0.48318553, -0.76937456], [-1.90552536, -0.68052287], [-0.72223174, -2.97435219], [ 0.19932163, 0.29391472], [ 0.36049151, 0.64871376], [ 0.73407896, 0.6574951 ], [ 0.5137015 , -0.67185778], [-0.62208806, -0.35707845]]) # write data to './tmp_data.bin' >>> readwrite.write_raw_mat(data, 'tmp_data.bin') True # read data from './tmp_data.bin' >>> data_read =readwrite.read_raw_mat('tmp_data.bin', 2) # data_read should be the same to data # (except for the nuemrical reason due to the conversion from # float64 to float32) >>> data_read array([[-1.0792218 , -2.0083694 ], [-0.8408086 , 1.815921 ], [ 0.48318553, -0.76937455], [-1.9055253 , -0.68052286], [-0.72223175, -2.9743521 ], [ 0.19932163, 0.2939147 ], [ 0.3604915 , 0.64871377], [ 0.73407894, 0.6574951 ], [ 0.5137015 , -0.6718578 ], [-0.6220881 , -0.35707846]], dtype=float32) -------- Useful CURRENNT commands 1. Continue training using epoch***.autosave $: currennt --continue epoch***.autosave If autosave = true is configured in train_config.cfg, CURRENNT will save trained models after every training epoch. The name of the saved model will be epoch***.autosave. If the training is terminated for any reason, you can simply use the above command to resume the training from the lastes training epoch. No additional argument is needed because epoch***.autosave saves all the arguments, weights, intermediate gradients ... 2. Convert ***.autosave to ***.jsn $: currennt --print_weight_to epoch***.jsn --print_weight_opt 2 --cuda off --network epoch***.autosave ***.autosave is the trained model after *** epochs. It can be used as a trained network to generate output. However, ***.autosave training arguments, gradients, etc, which makes ***.autosave very large. ***.jsn is the trained model, without data unnecessary for generation. To save space, use the above command to convert ***.auto to ***.jsn. Note that network.jsn usually denotes the initial network without any trained weight. 3. Plot the network typology $: currennt --network_graph network.gv --network network.jsn --cuda off $: dot -Tpdf -o network.pdf network.gv Step1 converts network.jsn to network.gv, a file in dot-language Step2 uses "dot" to produce a picture given the network.gv You can check ./acoustic-modeling/project-DAR-continuous/MODELS/DAR_001/network.pdf 4. For debugging, you can use $: gdb --args currennt *** where *** is the argument string to CURRENNT. To compile a version of currennt for debugging, please check README in https://github.com/nii-yamagishilab/project-CURRENNT-public/tree/master/CURRENNT_codes ---------- Error messages 1. Memory error when using CURRENNT 'thrust::system::system_error' what(): device free failed: an illegal memory access was encountered Aborted (core dumped) or Could not create layer: __copy::trivial_device_copy H->D: failed: invalid argument The two errors above indicate insufficient GPU memory. Either reduce layer size, utterance length, or ask Xin Wang 2. Scripts configuration error: resolution not found in --resolutionsFAILED: Resolution error This means that upsampling_rate in wavefor-modeling/*/config.py is not correctly configured. Note that this upsampling_rate denotes the up-sampling of acoustic features (from frame-level to waveform-sampling-level), its value should be equal to frame_shift * sampling_rate_of_waveform. See config.py for example
About
This repository contains the scripts to use CURRENNT
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published