forked from espnet/espnet
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request espnet#4173 from rubenjohn1999/master
Add ml_openslr63 ASR recipe
- Loading branch information
Showing
27 changed files
with
631 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
<!-- Generated by scripts/utils/show_asr_result.sh --> | ||
# RESULTS | ||
## Environments | ||
- date: `Sat Mar 19 20:34:49 UTC 2022` | ||
- python version: `3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0]` | ||
- espnet version: `espnet 0.10.7a1` | ||
- pytorch version: `pytorch 1.10.1` | ||
- Git hash: `d2410457152872f63c51ee76ed746a6ea3153f09` | ||
- Commit date: `Sat Mar 19 09:04:54 2022 +0000` | ||
- Pretrained Model | ||
- Hugging Face Hub: | ||
https://huggingface.co/espnet/ml_openslr63 | ||
|
||
## asr_train_asr_conformer_s3prlfrontend_hubert_fused_raw_ml_bpe150_sp | ||
### WER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|2345|75.2|21.8|3.0|2.4|27.2|71.5| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|6136|67.0|28.7|4.3|2.6|35.6|71.8| | ||
|
||
### CER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|21321|96.1|2.2|1.7|0.9|4.7|71.5| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|57065|93.5|3.2|3.3|1.3|7.7|71.8| | ||
|
||
### TER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|13402|93.5|4.4|2.1|0.9|7.4|71.3| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|35911|89.9|6.3|3.8|1.3|11.4|70.4| | ||
|
||
<!-- Generated by scripts/utils/show_asr_result.sh --> | ||
# RESULTS | ||
## Environments | ||
- date: `Sat Mar 19 07:22:48 UTC 2022` | ||
- python version: `3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0]` | ||
- espnet version: `espnet 0.10.7a1` | ||
- pytorch version: `pytorch 1.10.1` | ||
- Git hash: `813ee348e36db8a6f8d0d717be8767f938b2e62b` | ||
- Commit date: `Fri Mar 18 11:12:20 2022 -0400` | ||
|
||
## asr_train_asr_conformer_s3prlfrontend_hubert_raw_ml_bpe150_sp | ||
### WER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|2345|71.4|24.4|4.2|2.5|31.1|72.6| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|6136|61.8|32.1|6.1|2.0|40.3|73.5| | ||
|
||
### CER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|21321|94.5|2.3|3.3|1.0|6.5|72.6| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|57065|90.9|3.4|5.8|1.1|10.3|73.5| | ||
|
||
### TER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|13402|91.3|4.5|4.1|0.9|9.6|72.6| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|35911|86.7|6.6|6.7|0.9|14.1|72.1| | ||
|
||
<!-- Generated by scripts/utils/show_asr_result.sh --> | ||
# RESULTS | ||
## Environments | ||
- date: `Fri Mar 18 17:25:39 UTC 2022` | ||
- python version: `3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0]` | ||
- espnet version: `espnet 0.10.7a1` | ||
- pytorch version: `pytorch 1.10.1` | ||
- Git hash: `9cb00370db63ced70ee39e1a2ba3137311842d44` | ||
- Commit date: `Fri Mar 18 10:47:05 2022 -0400` | ||
|
||
## asr_train_asr_conformer5_raw_ml_bpe150_sp | ||
### WER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|2345|71.0|25.5|3.5|2.4|31.4|73.2| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|6136|63.0|32.1|4.9|2.2|39.2|73.2| | ||
|
||
### CER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|21321|94.3|3.3|2.4|1.3|7.0|73.2| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|57065|91.1|4.8|4.0|1.5|10.4|73.2| | ||
|
||
### TER | ||
|
||
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| | ||
|---|---|---|---|---|---|---|---|---| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/dev_ml|369|13402|90.7|6.2|3.1|1.4|10.6|72.9| | ||
|decode_asr_lm_lm_train_lm_ml_bpe150_valid.loss.ave_asr_model_valid.acc.ave/test_ml|1062|35911|86.7|8.6|4.6|1.6|14.8|71.8| | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr1/asr.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr1/cmd.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/decode_transformer.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
--sample-frequency=16000 | ||
--num-mel-bins=80 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Default configuration | ||
command qsub -V -v PATH -S /bin/bash | ||
option name=* -N $0 | ||
option mem=* -l mem=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -l ncpus=$0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option num_nodes=* -l nodes=$0:ppn=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l ngpus=$0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option name=* -N $0 | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l gpu=$0 -q g.q |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Default configuration | ||
command sbatch --export=PATH | ||
option name=* --job-name $0 | ||
option time=* --time $0 | ||
option mem=* --mem-per-cpu $0 | ||
option mem=0 | ||
option num_threads=* --cpus-per-task $0 | ||
option num_threads=1 --cpus-per-task 1 | ||
option num_nodes=* --nodes $0 | ||
default gpu=0 | ||
option gpu=0 -p cpu | ||
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU | ||
# note: the --max-jobs-run option is supported as a special case | ||
# by slurm.pl and you don't have to handle it in the config file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
./tuning/train_asr_conformer.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
lm_conf: | ||
nlayers: 2 | ||
unit: 650 | ||
optim: sgd # or adam | ||
batch_type: folded | ||
batch_size: 64 # batch size in LM training | ||
max_epoch: 30 # if the data size is large, we can reduce this | ||
patience: 3 | ||
|
||
best_model_criterion: | ||
- - valid | ||
- loss | ||
- min | ||
keep_nbest_models: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
batch_size: 1 | ||
beam_size: 10 | ||
penalty: 0.0 | ||
maxlenratio: 0.0 | ||
minlenratio: 0.0 | ||
ctc_weight: 0.5 | ||
lm_weight: 0.3 |
78 changes: 78 additions & 0 deletions
78
egs2/ml_openslr63/asr1/conf/tuning/train_asr_conformer.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# network architecture | ||
|
||
# frontend related | ||
frontend: default | ||
frontend_conf: | ||
n_fft: 512 | ||
win_length: 400 | ||
hop_length: 160 | ||
|
||
# encoder related | ||
encoder: conformer | ||
encoder_conf: | ||
input_layer: conv2d | ||
num_blocks: 12 | ||
linear_units: 2048 | ||
dropout_rate: 0.1 | ||
output_size: 256 | ||
attention_heads: 4 | ||
attention_dropout_rate: 0.0 | ||
pos_enc_layer_type: rel_pos | ||
selfattention_layer_type: rel_selfattn | ||
activation_type: swish | ||
macaron_style: true | ||
use_cnn_module: true | ||
cnn_module_kernel: 15 | ||
|
||
|
||
# decoder related | ||
decoder: transformer | ||
decoder_conf: | ||
input_layer: embed | ||
num_blocks: 6 | ||
linear_units: 2048 | ||
dropout_rate: 0.1 | ||
|
||
# hybrid CTC/attention | ||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 | ||
length_normalized_loss: false | ||
|
||
# optimization related | ||
optim: adam | ||
accum_grad: 1 | ||
grad_clip: 3 | ||
max_epoch: 50 | ||
optim_conf: | ||
lr: 4.0 | ||
scheduler: noamlr | ||
scheduler_conf: | ||
model_size: 256 | ||
warmup_steps: 25000 | ||
|
||
# minibatch related | ||
batch_type: numel | ||
batch_bins: 2000000 | ||
|
||
best_model_criterion: | ||
- - valid | ||
- acc | ||
- max | ||
keep_nbest_models: 10 | ||
|
||
specaug: specaug | ||
specaug_conf: | ||
apply_time_warp: true | ||
time_warp_window: 5 | ||
time_warp_mode: bicubic | ||
apply_freq_mask: true | ||
freq_mask_width_range: | ||
- 0 | ||
- 30 | ||
num_freq_mask: 2 | ||
apply_time_mask: true | ||
time_mask_width_range: | ||
- 0 | ||
- 40 | ||
num_time_mask: 2 |
87 changes: 87 additions & 0 deletions
87
egs2/ml_openslr63/asr1/conf/tuning/train_asr_conformer_s3prlfrontend_hubert.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# network architecture | ||
|
||
freeze_param: [ | ||
"frontend.upstream" | ||
] | ||
|
||
# frontend related | ||
frontend: s3prl | ||
frontend_conf: | ||
frontend_conf: | ||
upstream: hubert_large_ll60k # Note: If the upstream is changed, please change the input_size in the preencoder. | ||
download_dir: ./hub | ||
multilayer_feature: True | ||
|
||
preencoder: linear | ||
preencoder_conf: | ||
input_size: 1024 # Note: If the upstream is changed, please change this value accordingly. | ||
output_size: 80 | ||
|
||
# encoder related | ||
encoder: conformer | ||
encoder_conf: | ||
input_layer: conv2d | ||
num_blocks: 12 | ||
linear_units: 2048 | ||
dropout_rate: 0.1 | ||
output_size: 256 | ||
attention_heads: 4 | ||
attention_dropout_rate: 0.0 | ||
pos_enc_layer_type: rel_pos | ||
selfattention_layer_type: rel_selfattn | ||
activation_type: swish | ||
macaron_style: true | ||
use_cnn_module: true | ||
cnn_module_kernel: 15 | ||
|
||
|
||
# decoder related | ||
decoder: transformer | ||
decoder_conf: | ||
input_layer: embed | ||
num_blocks: 6 | ||
linear_units: 2048 | ||
dropout_rate: 0.1 | ||
|
||
# hybrid CTC/attention | ||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 | ||
length_normalized_loss: false | ||
|
||
# optimization related | ||
optim: adam | ||
accum_grad: 1 | ||
grad_clip: 3 | ||
max_epoch: 50 | ||
optim_conf: | ||
lr: 4.0 | ||
scheduler: noamlr | ||
scheduler_conf: | ||
warmup_steps: 25000 | ||
|
||
# minibatch related | ||
batch_type: numel | ||
batch_bins: 2000000 | ||
|
||
best_model_criterion: | ||
- - valid | ||
- acc | ||
- max | ||
keep_nbest_models: 10 | ||
|
||
specaug: specaug | ||
specaug_conf: | ||
apply_time_warp: true | ||
time_warp_window: 5 | ||
time_warp_mode: bicubic | ||
apply_freq_mask: true | ||
freq_mask_width_range: | ||
- 0 | ||
- 30 | ||
num_freq_mask: 2 | ||
apply_time_mask: true | ||
time_mask_width_range: | ||
- 0 | ||
- 40 | ||
num_time_mask: 2 |
Oops, something went wrong.