forked from espnet/espnet
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'espnet:master' into master
- Loading branch information
Showing
60 changed files
with
1,387 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# RESULTS | ||
## Environments | ||
- date: `Mon Mar 21 16:06:03 UTC 2022` | ||
- python version: `3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]` | ||
- espnet version: `espnet 0.10.7a1` | ||
- pytorch version: `pytorch 1.11.0+cu102` | ||
- Git hash: `91325a1e58ca0b13494b94bf79b186b095fe0b58` | ||
- Commit date: `Mon Mar 21 00:40:52 2022 +0000` | ||
|
||
## asr_train_asr_conformer_xlsr_raw_bpe150_sp | ||
|
||
This recipe is for the Marathi language and is trained on the [OpenSLR Marathi](https://www.openslr.org/64/) multi-speaker speech data set. | ||
|
||
The following results are obtained by using an XLSR frontend. | ||
|
||
Train ASR Config: [conf/tuning/train_asr_conformer_xlsr.yaml](conf/tuning/train_asr_conformer_xlsr.yaml) | ||
|
||
Trained Model: [espnet/marathi_openslr64](https://huggingface.co/espnet/marathi_openslr64) | ||
|
||
### WER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|3625|72.9|22.5|4.7|1.7|28.9|88.6| | ||
|
||
### CER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|20557|91.4|3.1|5.5|1.9|10.5|88.6| | ||
|
||
### TER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|13562|86.5|6.3|7.1|1.4|14.9|88.6| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr1/asr.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr1/cmd.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/decode_transformer.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
--sample-frequency=16000 | ||
--num-mel-bins=80 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Default configuration | ||
command qsub -V -v PATH -S /bin/bash | ||
option name=* -N $0 | ||
option mem=* -l mem=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -l ncpus=$0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option num_nodes=* -l nodes=$0:ppn=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l ngpus=$0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option name=* -N $0 | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l gpu=$0 -q g.q |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Default configuration | ||
command sbatch --export=PATH | ||
option name=* --job-name $0 | ||
option time=* --time $0 | ||
option mem=* --mem-per-cpu $0 | ||
option mem=0 | ||
option num_threads=* --cpus-per-task $0 | ||
option num_threads=1 --cpus-per-task 1 | ||
option num_nodes=* --nodes $0 | ||
default gpu=0 | ||
option gpu=0 -p cpu | ||
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU | ||
# note: the --max-jobs-run option is supported as a special case | ||
# by slurm.pl and you don't have to handle it in the config file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/train_asr_conformer.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
lm_conf: | ||
nlayers: 2 | ||
unit: 650 | ||
optim: sgd # or adam | ||
batch_type: folded | ||
batch_size: 64 # batch size in LM training | ||
max_epoch: 20 # if the data size is large, we can reduce this | ||
patience: 3 | ||
|
||
best_model_criterion: | ||
- - valid | ||
- loss | ||
- min | ||
keep_nbest_models: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
lm_weight: 0.3 | ||
beam_size: 20 | ||
penalty: 0.0 | ||
maxlenratio: 0.0 | ||
minlenratio: 0.0 | ||
ctc_weight: 0.6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
batch_size: 16 | ||
beam_size: 20 | ||
penalty: 0.0 | ||
maxlenratio: 0.0 | ||
minlenratio: 0.0 | ||
ctc_weight: 0.5 | ||
lm_weight: 0.3 |
67 changes: 67 additions & 0 deletions
67
egs2/mr_openslr64/asr1/conf/tuning/train_asr_conformer.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
batch_type: numel | ||
batch_bins: 10000 | ||
accum_grad: 3 | ||
max_epoch: 60 | ||
patience: none | ||
init: xavier_uniform | ||
best_model_criterion: | ||
- - valid | ||
- acc | ||
- max | ||
keep_nbest_models: 5 | ||
|
||
encoder: conformer | ||
encoder_conf: | ||
output_size: 512 | ||
attention_heads: 4 | ||
linear_units: 1024 | ||
num_blocks: 3 | ||
dropout_rate: 0.3 | ||
positional_dropout_rate: 0.3 | ||
attention_dropout_rate: 0.3 | ||
input_layer: conv2d | ||
normalize_before: true | ||
macaron_style: false | ||
pos_enc_layer_type: "rel_pos" | ||
selfattention_layer_type: "rel_selfattn" | ||
activation_type: "swish" | ||
use_cnn_module: true | ||
cnn_module_kernel: 17 | ||
|
||
decoder: transformer | ||
decoder_conf: | ||
attention_heads: 4 | ||
linear_units: 1024 | ||
num_blocks: 3 | ||
dropout_rate: 0.3 | ||
positional_dropout_rate: 0.3 | ||
self_attention_dropout_rate: 0.3 | ||
src_attention_dropout_rate: 0.3 | ||
|
||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 | ||
length_normalized_loss: false | ||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.0005 | ||
scheduler: warmuplr | ||
scheduler_conf: | ||
warmup_steps: 20000 | ||
|
||
specaug: specaug | ||
specaug_conf: | ||
apply_time_warp: true | ||
time_warp_window: 5 | ||
time_warp_mode: bicubic | ||
apply_freq_mask: true | ||
freq_mask_width_range: | ||
- 0 | ||
- 30 | ||
num_freq_mask: 2 | ||
apply_time_mask: true | ||
time_mask_width_range: | ||
- 0 | ||
- 40 | ||
num_time_mask: 2 |
88 changes: 88 additions & 0 deletions
88
egs2/mr_openslr64/asr1/conf/tuning/train_asr_conformer_xlsr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
batch_type: numel | ||
batch_bins: 10000 | ||
accum_grad: 3 | ||
max_epoch: 60 | ||
patience: none | ||
init: xavier_uniform | ||
best_model_criterion: | ||
- - valid | ||
- acc | ||
- max | ||
keep_nbest_models: 5 | ||
|
||
freeze_param: [ | ||
"frontend.upstream" | ||
] | ||
|
||
frontend_conf: | ||
n_fft: 512 | ||
hop_length: 256 | ||
|
||
frontend: s3prl | ||
frontend_conf: | ||
frontend_conf: | ||
upstream: wav2vec2_xlsr # Note: If the upstream is changed, please change the input_size in the preencoder. | ||
download_dir: ./hub | ||
multilayer_feature: True | ||
|
||
preencoder: linear | ||
preencoder_conf: | ||
input_size: 1024 # Note: If the upstream is changed, please change this value accordingly. | ||
output_size: 80 | ||
|
||
encoder: conformer | ||
encoder_conf: | ||
output_size: 512 | ||
attention_heads: 4 | ||
linear_units: 1024 | ||
num_blocks: 3 | ||
dropout_rate: 0.3 | ||
positional_dropout_rate: 0.3 | ||
attention_dropout_rate: 0.3 | ||
input_layer: conv2d | ||
normalize_before: true | ||
macaron_style: false | ||
pos_enc_layer_type: "rel_pos" | ||
selfattention_layer_type: "rel_selfattn" | ||
activation_type: "swish" | ||
use_cnn_module: true | ||
cnn_module_kernel: 17 | ||
|
||
decoder: transformer | ||
decoder_conf: | ||
attention_heads: 4 | ||
linear_units: 1024 | ||
num_blocks: 3 | ||
dropout_rate: 0.3 | ||
positional_dropout_rate: 0.3 | ||
self_attention_dropout_rate: 0.3 | ||
src_attention_dropout_rate: 0.3 | ||
|
||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 | ||
length_normalized_loss: false | ||
extract_feats_in_collect_stats: false # Note: "False" means during collect stats (stage 10), generating dummy stats files rather than extract_feats by forward frontend. | ||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.0005 | ||
scheduler: warmuplr | ||
scheduler_conf: | ||
warmup_steps: 20000 | ||
|
||
specaug: specaug | ||
specaug_conf: | ||
apply_time_warp: true | ||
time_warp_window: 5 | ||
time_warp_mode: bicubic | ||
apply_freq_mask: true | ||
freq_mask_width_range: | ||
- 0 | ||
- 30 | ||
num_freq_mask: 2 | ||
apply_time_mask: true | ||
time_mask_width_range: | ||
- 0 | ||
- 40 | ||
num_time_mask: 2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# network architecture | ||
# encoder related | ||
encoder: vgg_rnn | ||
encoder_conf: | ||
rnn_type: lstm # encoder architecture type | ||
bidirectional: True | ||
use_projection: True | ||
num_layers: 4 | ||
hidden_size: 1024 | ||
output_size: 1024 | ||
|
||
# decoder related | ||
decoder: rnn | ||
decoder_conf: | ||
num_layers: 2 | ||
hidden_size: 1024 | ||
sampling_probability: 0 | ||
att_conf: | ||
atype: location | ||
adim: 1024 | ||
aconv_chans: 10 | ||
aconv_filts: 100 | ||
|
||
# hybrid CTC/attention | ||
model_conf: | ||
ctc_weight: 0.5 | ||
|
||
# minibatch related | ||
batch_size: 30 | ||
|
||
# optimization related | ||
optim: adadelta | ||
max_epoch: 15 | ||
patience: 3 | ||
|
Oops, something went wrong.