Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX export for RadTTS #5880

Merged
merged 74 commits into from
Feb 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
f4ebdb2
Megatron positional encoding alibi fix (#5808) (#5863)
github-actions[bot] Jan 26, 2023
c8b9efb
Fix segmenting for pcla inference (#5849)
jubick1337 Jan 26, 2023
e4c1085
indentation fix (#5861) (#5862)
github-actions[bot] Jan 26, 2023
f5ea3b4
add ambernet to readme (#5872) (#5873)
github-actions[bot] Jan 27, 2023
99de963
Fix wrong label mapping in batch_inference for label_model (#5767) (#…
github-actions[bot] Jan 27, 2023
2b2064d
WAR for https://github.com/pytorch/pytorch/pull/91526
borisfom Jan 28, 2023
6fdd9f3
Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864)
stevehuang52 Jan 27, 2023
19d30f9
RETRO model finetuning (#5800)
yidong72 Jan 28, 2023
255d804
[TTS] GAN-based spectrogram enhancer (#5565)
racoiaws Jan 30, 2023
0ffad21
Optimizing distributed Adam when running with one work queue (#5560)
timmoon10 Jan 30, 2023
3b8b6a5
fix(readme): fix typo (#5883)
jqueguiner Jan 31, 2023
5eafd2f
TTS inference with Heteronym classification model, hc model inference…
ekmb Jan 31, 2023
df6b8af
take out retro doc (#5885) (#5886)
github-actions[bot] Jan 31, 2023
3c02a0e
Add option to disable distributed parameters in distributed Adam opti…
timmoon10 Jan 31, 2023
0f82b5b
[ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774)
artbataev Jan 31, 2023
1492174
transformer duration added and IPA config files added
MikyasDesta Jan 18, 2023
68aacb8
inference issue for pace resolved
MikyasDesta Jan 18, 2023
53a4c9c
Latest ONNX develpoments
borisfom Feb 2, 2023
41b9679
Remove MCD_DTW tarball (#5889)
redoctopus Jan 31, 2023
672cbcc
Block large files from being merged into NeMo main (#5898)
SeanNaren Feb 1, 2023
7bf8d9b
Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876)
gabitza-tech Feb 1, 2023
93e3f3b
set max_steps for lr decay through config (#5780)
anmolgupt Feb 1, 2023
58ee179
Fix transducer and question answering tutorial bugs bugs (#5809) (#5810)
github-actions[bot] Feb 1, 2023
328ef63
update apex install instructions (#5901) (#5902)
github-actions[bot] Feb 1, 2023
feabe97
Hybrid ASR-TTS models (#5659)
artbataev Feb 1, 2023
251e117
Set providers for ORT inference session (#5903)
athitten Feb 1, 2023
033f036
[ASR] Configurable metrics for audio-to-audio + removed experimental …
anteju Feb 1, 2023
f694dbc
Correct doc for RNNT transcribe() function (#5904)
titu1994 Feb 1, 2023
2e3a04c
Add segmentation export to Audacity label file (#5857)
Ca-ressemble-a-du-fake Feb 2, 2023
fa29629
Cross-Lingual objectives (XLM) and multilingual (many-many) support f…
MaximumEntropy Feb 2, 2023
990b2c8
ONNX export working
borisfom Feb 3, 2023
85c2954
Fixing unit test
borisfom Feb 4, 2023
d99ec3d
Update isort to the latest version (#5895)
artbataev Feb 3, 2023
6d49a7b
Pin isort version (#5914)
artbataev Feb 3, 2023
fe6d14f
Moved eval notebook data to aws (#5911)
redoctopus Feb 3, 2023
c99a261
FilterbankFeaturesTA to match FilterbankFeatures (#5913)
msis Feb 3, 2023
dc03d17
fixed missing long_description_content_type (#5909)
XuesongYang Feb 3, 2023
53636d7
added TPMLP for T5-based models (#5840) (#5841)
github-actions[bot] Feb 3, 2023
0eca70a
Fixing 0-size issue and ONNX BS>1 trace
borisfom Feb 4, 2023
971d5be
Fixing code scan alert
borisfom Feb 6, 2023
5268ff0
update container (#5917)
ericharper Feb 4, 2023
7d1ff97
remove conda pynini install (#5921)
ekmb Feb 4, 2023
755131a
Merge release main (#5916)
ericharper Feb 6, 2023
5937f93
Dynamic freezing in Nemo (#5879)
trias702 Feb 6, 2023
01654a2
Fix Windows bug with save_restore_connector (#5919)
trias702 Feb 6, 2023
30cd3b6
add new lannguages to doc (#5939)
yzhang123 Feb 6, 2023
0ba361c
Workarounds for ONNX export with autocast
borisfom Feb 7, 2023
d8bd89a
fix val loss computation in megatron (#5871)
anmolgupt Feb 6, 2023
44637ab
Restoring sigmas
borisfom Feb 7, 2023
d307171
Add core classes and functions for online clustering diarizer part 2 …
tango4j Feb 7, 2023
14a6eb8
Distributed Adam optimizer overlaps param all-gather with forward com…
timmoon10 Feb 7, 2023
b2a8add
[TTS][ZH] added new NGC model cards with polyphone disambiguation. (#…
XuesongYang Feb 7, 2023
62c8415
Moved truncation of context higher up
borisfom Feb 7, 2023
1314234
[TN] bugfix file handler is not closed. (#5955)
XuesongYang Feb 7, 2023
750e2c9
Added unit test for regulate_len. Unscripted sort_tensor for TRT
borisfom Feb 8, 2023
add5f33
Fixed slice
borisfom Feb 8, 2023
8e25ef4
[TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959)
XuesongYang Feb 8, 2023
65080af
bugfix: file handlers are not closed. (#5956)
XuesongYang Feb 8, 2023
2c009ca
[TTS][G2P] deprecate add_symbols (#5961)
XuesongYang Feb 8, 2023
cbfbd7b
fix broken link (#5968)
ericharper Feb 8, 2023
f361861
Fix hybridasr bug (#5950) (#5957)
github-actions[bot] Feb 8, 2023
9339a3f
Added list_available_models (#5967)
treacker Feb 9, 2023
3e342e0
Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` …
artbataev Feb 9, 2023
aaa4851
Fix torchaudio installation (#5850)
artbataev Feb 9, 2023
181bea9
Update fastpitch.py (#5969)
blisc Feb 9, 2023
e801915
Review comments
borisfom Feb 9, 2023
1bf8b56
per-micro-batch input loader (#5635)
erhoo82 Feb 9, 2023
368f57e
update container in readme (#5981)
fayejf Feb 10, 2023
8a879c4
Support Alignment Extraction for all RNNT Beam decoding methods (#5925)
titu1994 Feb 10, 2023
c319875
Add AWS SageMaker ASR Examples (#5638)
SeanNaren Feb 10, 2023
ebfad90
Update PUBLICATIONS.md (#5963)
titu1994 Feb 10, 2023
ce6f6af
[G2P] fixed typos and broken import library. (#5978) (#5979)
github-actions[bot] Feb 10, 2023
b939fb4
[G2P] added backward compatibility for english tokenizer and fixed un…
github-actions[bot] Feb 10, 2023
864ab57
Merge branch 'main' into lstm-onnx
blisc Feb 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 275 additions & 0 deletions examples/tts/conf/rad-tts_dec_ipa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
name: RadTTS
sample_rate: 22050

train_dataset: ???
validation_datasets: ???
ckpt_path: None
export_dir: ???
sup_data_path: ???
sup_data_types: ["log_mel", "align_prior_matrix", "pitch", "voiced_mask", "p_voiced", "energy"]



# these frame-wise values depend on pitch_fmin and pitch_fmax, you can get values
# by running `scripts/dataset_processing/tts/extract_sup_data.py`
pitch_mean: ??? # e.g. 212.35873413085938 for LJSpeech
pitch_std: ??? # e.g. 68.52806091308594 for LJSpeech

# default values from librosa.pyin
pitch_fmin: 65.40639132514966
pitch_fmax: 2093.004522404789

# default values for sample_rate=22050
n_mels: 80
n_window_size: 1024
n_window_stride: 256
n_fft: 1024
lowfreq: 0
highfreq: 8000
window: "hann"


phoneme_dict_path: "scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt"
heteronyms_path: "scripts/tts_dataset_files/heteronyms-052722"
whitelist_path: "nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv"
mapping_file_path: ""

model:
target: nemo.collections.tts.models.RadTTSModel
bin_loss_start_ratio: 0.2
bin_loss_warmup_epochs: 100

symbols_embedding_dim: 384
n_mel_channels: ${n_mels}

pitch_mean: ${pitch_mean}
pitch_std: ${pitch_std}

text_normalizer:
_target_: nemo_text_processing.text_normalization.normalize.Normalizer
lang: en
input_case: cased
whitelist: ${whitelist_path}

text_normalizer_call_kwargs:
verbose: false
punct_pre_process: true
punct_post_process: true

text_tokenizer:
_target_: nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer
punct: true
apostrophe: true
pad_with_space: true
g2p:
_target_: nemo_text_processing.g2p.modules.IPAG2P
phoneme_dict: ${phoneme_dict_path}
heteronyms: ${heteronyms_path}
phoneme_probability: 0.5
# Relies on the heteronyms list for anything that needs to be disambiguated
ignore_ambiguous_words: true
use_chars: true
use_stresses: true

train_ds:
dataset:
_target_: "nemo.collections.tts.torch.data.TTSDataset"
manifest_filepath: ${train_dataset}
sample_rate: ${sample_rate}
sup_data_path: ${sup_data_path}
sup_data_types: ${sup_data_types}
n_fft: ${n_fft}
win_length: ${n_window_size}
hop_length: ${n_window_stride}
window: ${window}
n_mels: ${n_mels}
lowfreq: ${lowfreq}
highfreq: ${highfreq}
max_duration: null
min_duration: 0.1
ignore_file: null
trim: False
pitch_fmin: ${pitch_fmin}
pitch_fmax: ${pitch_fmax}



text_tokenizer:
_target_: "nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.EnglishPhonemesTokenizer"
punct: True
stresses: True
chars: True
space: ' '
silence: null
apostrophe: True
sep: '|'
add_blank_at: null
pad_with_space: True
g2p:
_target_: "nemo_text_processing.g2p.modules.EnglishG2p"
phoneme_dict: ${phoneme_dict_path}
heteronyms: ${heteronyms_path}
phoneme_probability: 0.5
dataloader_params:
drop_last: false
shuffle: true
batch_size: 8
num_workers: 8
pin_memory: false

validation_ds:
dataset:
_target_: "nemo.collections.tts.torch.data.TTSDataset"
manifest_filepath: ${validation_datasets}
sample_rate: ${sample_rate}
sup_data_path: ${sup_data_path}
sup_data_types: ${sup_data_types}
n_fft: ${n_fft}
win_length: ${n_window_size}
hop_length: ${n_window_stride}
window: ${window}
n_mels: ${n_mels}
lowfreq: ${lowfreq}
highfreq: ${highfreq}
max_duration: null
min_duration: 0.1
ignore_file: null
trim: False
pitch_fmin: ${pitch_fmin}
pitch_fmax: ${pitch_fmax}

text_tokenizer:
_target_: "nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.EnglishPhonemesTokenizer"
punct: True
stresses: True
chars: True
space: ' '
silence: null
apostrophe: True
sep: '|'
add_blank_at: null
pad_with_space: True
g2p:
_target_: "nemo_text_processing.g2p.modules.EnglishG2p"
phoneme_dict: ${phoneme_dict_path}
heteronyms: ${heteronyms_path}
phoneme_probability: 0.5
dataloader_params:
drop_last: false
shuffle: false
batch_size: 8
num_workers: 8
pin_memory: false

optim:
name: RAdam
lr: 0.0001
betas: [0.9, 0.98]
weight_decay: 0.000001

sched:
name: exp_decay
warmup_steps: 40000
last_epoch: -1
d_model: 1 # Disable scaling based on model dim
trainerConfig:
sigma: 1
iters_per_checkpoint: 3000
seed: null
ignore_layers: []
finetune_layers: []
include_layers: []
with_tensorboard: true
dur_loss_weight: 1
ctc_loss_weight: 1
mask_unvoiced_f0: false
log_step: 1
binarization_start_iter: 6000
kl_loss_start_iter: 18000
loss_weights:
ctc_loss_weight: 0.1
dur_loss_weight: 1.0
f0_loss_weight: 1.0
energy_loss_weight: 1.0
vpred_loss_weight: 1.0
unfreeze_modules: "all"

load_from_checkpoint: False
init_from_ptl_ckpt: ${ckpt_path}
modelConfig:
_target_: "nemo.collections.tts.modules.radtts.RadTTSModule"
n_speakers: 1
n_speaker_dim: 16
n_text: 384 #185
n_text_dim: 512
n_flows: 8
n_conv_layers_per_step: 4
n_mel_channels: 80
n_hidden: 1024
mel_encoder_n_hidden: 512
dummy_speaker_embedding: false
n_early_size: 2
n_early_every: 2
n_group_size: 2
affine_model: wavenet
include_modules: "decatnvpred"
scaling_fn: tanh
matrix_decomposition: LUS
learn_alignments: true
use_context_lstm: true
context_lstm_norm: spectral
context_lstm_w_f0_and_energy: true
text_encoder_lstm_norm: spectral
n_f0_dims: 1
n_energy_avg_dims: 1
use_first_order_features: false
unvoiced_bias_activation: "relu"
decoder_use_partial_padding: false
decoder_use_unvoiced_bias: true
ap_pred_log_f0: true
ap_use_unvoiced_bias: true
ap_use_voiced_embeddings: true
dur_model_config: null
f0_model_config: null
energy_model_config: null
v_model_config :
name : dap
hparams :
n_speaker_dim : 16
take_log_of_input: false
bottleneck_hparams:
in_dim: 512
reduction_factor: 16
norm: weightnorm
non_linearity: relu
arch_hparams:
out_dim: 1
n_layers: 2
n_channels: 256
kernel_size: 3
p_dropout: 0.5

trainer:
devices: 8
precision: 16
max_epochs: 1000
num_nodes: 1
accelerator: gpu
strategy: ddp
accumulate_grad_batches: 1
enable_checkpointing: False
logger: False
gradient_clip_val: 1
log_every_n_steps: 100
check_val_every_n_epoch: 5

exp_manager:
exp_dir: ${export_dir}
name: ${name}
create_tensorboard_logger: True
create_checkpoint_callback: True
checkpoint_callback_params:
monitor: val/loss_ctc
mode: min
filepath: ${export_dir}
filename: model_checkpoint
Loading