tensorflow-gpu的版本为1.14.0 **参照requirements.txt**安装相应的库
以开源语音数据集thchs30中的说话人D8为例,微调 master分支中预训练的tacotron模型
git clone https://github.com/lturing/tacotronv2_wavernn_chinese.git
cd tacotronv2_wavernn_chinese
git checkout remotes/origin/adaptive #切换到adaptive分支
python tacotron_synthesize.py --text '现在是凌晨零点二十七分,帮您订好上午八点的闹钟。'
#合成的wav、attention align等在./tacotron_inference_output下
dataset = 'D8',
feat_out_dir = 'training_data',
tacotron_input = 'D8_train.txt',
tacotron_fine_tuning = True,
pretrained_model_checkpoint_path = 'logs-Tacotron-2/taco_pretrained/tacotron_model.ckpt-206500',
pretrained_tacotron_input = 'biaobei_train.txt',
tacotron_initial_learning_rate = 1e-3, #starting learning rate
fmin = 55, #Set this to 55 if your speaker is male! if female, 95 should help taking off noise. (To test depending on dataset. Pitch info: male~[65, 260], female~[100, 525])
fmax = 7600, #To be increased/reduced depending on data.
#M-AILABS (and other datasets) trim params (there parameters are usually correct for any data, but definitely must be tuned for specific speakers)
trim_silence = True, #Whether to clip silence in Audio (at beginning and end of audio only, not the middle)
trim_fft_size = 2048, #Trimming window size
trim_hop_size = 512, #Trimmin hop length
trim_top_db = 22, #Trimming db difference from reference db (smaller==harder trim.)
tacotron_fine_tuning 设为True
fmin 根据男声和女声取不同的值,男声(55),女声(95)
trim_top_db 去掉音频首尾的静音部分,根据数据集,自行调整(对结果有影响)
unzip D8.zip # 解压D8的原始数据(wav以及对应的文本)
python tacotron_preprocess.py
python tacotron_train.py
TacotronV2生成Mel文件,利用griffin lim算法恢复语音,修改脚本 tacotron_synthesize.py 中text
python tacotron_synthesize.py
python tacotron_synthesize.py --text '国内知名的视频弹幕网站,这里有最及时的动漫新番。'