GitHub - zxp54332/Feed-ForwardTransformerTTS: 實作基於Transformer之中文語音合成 (Transformer-Based Chinese Speech Synthesis)

Transformer-Based End-to-End Feed-Forward Neural Chinese Speech Synthesis

The text input by the client is sent to the server to synthesize speech and return, and the server and client are connected through socket to transmit data. (https://www.youtube.com/watch?v=kqExk3rd_wQ)

At present, the problem of synthesizing polyphonic words is encountered, and the International Phonetic Alphabet is used as data, which can only be judged and modified manually during synthesis.

★ Because there is a lack of part of the MelGAN vocoder code, it can only be synthesized in the Griffin Lim method. The program code is for reference only.

MelGAN vocoder is trained on the BZNSYP corpus

MelGAN

Installation

Make sure you have:

Python >= 3.6

Install espeak as phonemizer backend:

sudo apt-get install espeak-ng

Then install the rest with pip:

pip install -r requirements.txt

Custom dataset

Prepare a folder containing your metadata and wav files

|- dataset_folder/
|   |- metadata.csv
|   |- wavs/
|       |- file1.wav
|       |- ...

if metadata.csv has the following format wav_file_name|transcription you can use the bznsyp preprocessor in data/metadata_readers.py, otherwise add your own under the same file.

Make sure that:

the metadata reader function name is the same as data_name field in training_config.yaml.
the metadata file (can be anything) is specified under metadata_path in training_config.yaml

Training

Change the --config argument based on the configuration of your choice.

Train Aligner Model

Create training dataset

python create_training_data.py --config config/training_config.yaml

This will populate the training data directory (default transformer_tts_data.bznsyp).

Training

python train_aligner.py --config config/training_config.yaml

Train TTS Model

Compute alignment dataset

Use the aligner model to create the durations dataset

python extract_durations.py --config config/training_config.yaml

this will add the durations.<session name> as well as the char-wise pitch folders to the training data directory.

Training

python train_tts.py --config config/training_config.yaml

Training & Model configuration

Training and model settings can be configured in training_config.yaml

Resume or restart training

To resume training simply use the same configuration files
To restart training, delete the weights and/or the logs from the logs folder with the training flag --reset_dir (both) or --reset_logs, --reset_weights

Monitor training

tensorboard --logdir /logs/directory/

Prediction

In a python script

from data.audio import Audio
from model.models import ForwardTransformer
from utils.training_config_manager import TrainingConfigManager

audio = Audio.from_config(model.config)

# Feed Forward
FF_model = ForwardTransformer.load_model('/path/to/weights/')
FF_out = FF_model.predict('Please, say something.')

# Autoregressive
AR_model = config_loader.load_model()
AR_out = AR_model.predict('Please, say something.')

# Convert spectrogram to wav (with griffin lim)
FF_wav = audio.reconstruct_waveform(FF_out['mel'].numpy().T)
AR_wav = audio.reconstruct_waveform(AR_out['mel'].numpy().T)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
config		config
data		data
model		model
notebooks		notebooks
tests		tests
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
AR_grif.wav		AR_grif.wav
AR_melgan.wav		AR_melgan.wav
FFT_grif.wav		FFT_grif.wav
FFT_melgan.wav		FFT_melgan.wav
GUI_client.py		GUI_client.py
README.md		README.md
aligner_test_sentences.txt		aligner_test_sentences.txt
auto.sh		auto.sh
create_training_data.py		create_training_data.py
extract_durations.py		extract_durations.py
generator.py		generator.py
predict_tts.py		predict_tts.py
requirements.txt		requirements.txt
res_stack.py		res_stack.py
server.py		server.py
test.py		test.py
test_sentences.txt		test_sentences.txt
train_aligner.py		train_aligner.py
train_tts.py		train_tts.py
vocoder-main.py		vocoder-main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer-Based End-to-End Feed-Forward Neural Chinese Speech Synthesis

Installation

Custom dataset

Training

Train Aligner Model

Create training dataset

Training

Train TTS Model

Compute alignment dataset

Training

Training & Model configuration

Resume or restart training

Monitor training

Prediction

About

Releases

Packages

Languages

zxp54332/Feed-ForwardTransformerTTS

Folders and files

Latest commit

History

Repository files navigation

Transformer-Based End-to-End Feed-Forward Neural Chinese Speech Synthesis

Installation

Custom dataset

Training

Train Aligner Model

Create training dataset

Training

Train TTS Model

Compute alignment dataset

Training

Training & Model configuration

Resume or restart training

Monitor training

Prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages