Skip to content

Commit

Permalink
Update reverb result
Browse files Browse the repository at this point in the history
  • Loading branch information
kamo-naoyuki committed Jan 15, 2021
1 parent 03659ca commit 75eb357
Show file tree
Hide file tree
Showing 10 changed files with 519 additions and 4 deletions.
90 changes: 90 additions & 0 deletions egs2/reverb/asr1/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,95 @@
<!-- Generated by scripts/utils/show_asr_result.sh -->
# RESULTS
## Transformer ASR + Transformer LM + SpeedPerturbation + SpecAug + applying RIR and noise data on the fly
### Environments
- date: `Fri Jan 15 10:04:32 JST 2021`
- python version: `3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]`
- espnet version: `espnet 0.9.5`
- pytorch version: `pytorch 1.5.1`
- Git hash: `1bcf69d5d8c724cded6e5f9abef68e8000fd4b57`
- Commit date: `Mon Jan 4 13:47:44 2021 +0900`

### Config
- ASR: [conf/tuning/train_asr_transformer4.yaml](conf/tuning/train_asr_transformer4.yaml)
- LM: [conf/tuning/train_lm_transformer.yaml](conf/tuning/train_lm_transformer.yaml)
- Decode: [conf/tuning/decode.yaml](conf/tuning/decode.yaml)
- Pretrained model: [https://zenodo.org/record/4441309/files/asr_train_asr_transformer2_raw_en_char_rir_scpdatareverb_rir_singlewav.scp_noise_db_range12_17_noise_scpdatareverb_noise_singlewav.scp_speech_volume_normalize1.0_num_workers2_rir_apply_prob0.999_noise_apply_prob1._sp_valid.acc.ave.zip?download=1](https://zenodo.org/record/4441309/files/asr_train_asr_transformer2_raw_en_char_rir_scpdatareverb_rir_singlewav.scp_noise_db_range12_17_noise_scpdatareverb_noise_singlewav.scp_speech_volume_normalize1.0_num_workers2_rir_apply_prob0.999_noise_apply_prob1._sp_valid.acc.ave.zip?download=1)

### No frontend
#### WER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_far|89|1463|93.2|5.3|1.4|1.2|7.9|49.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_near|90|1603|94.8|3.9|1.2|0.6|5.8|47.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_far|742|12169|95.5|3.6|0.9|0.3|4.8|38.5|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_near|742|12169|96.9|2.5|0.6|0.2|3.3|29.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_far|186|2962|94.4|4.5|1.1|0.7|6.3|41.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_near|186|3131|94.8|4.2|1.0|0.8|6.0|45.7|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_far|1088|17986|95.7|3.5|0.8|0.4|4.7|39.3|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_near|1088|17986|96.6|2.9|0.6|0.3|3.7|34.3|

#### CER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_far|89|8845|96.9|1.6|1.5|1.1|4.2|49.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_near|90|9336|97.8|1.1|1.1|0.9|3.1|47.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_far|742|71524|98.1|1.0|0.9|0.4|2.3|38.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_near|742|71524|98.8|0.6|0.6|0.3|1.5|30.3|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_far|186|17261|97.5|1.3|1.2|0.9|3.4|41.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_near|186|18433|97.9|1.1|1.0|0.9|3.0|45.7|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_far|1088|105480|98.3|0.9|0.9|0.4|2.2|40.1|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_near|1088|105480|98.7|0.7|0.7|0.3|1.7|35.1|

### 1ch WPE
#### WER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_wpe_far|89|1463|93.3|5.5|1.2|1.2|7.9|48.3|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_wpe_near|90|1603|95.6|3.4|0.9|0.7|5.1|44.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_wpe_far|742|12169|95.7|3.5|0.8|0.3|4.6|37.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_wpe_near|742|12169|96.9|2.6|0.6|0.2|3.4|30.3|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_wpe_far|186|2962|94.9|4.1|1.1|0.6|5.8|39.2|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_wpe_near|186|3131|95.3|3.9|0.8|0.7|5.5|43.0|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_wpe_far|1088|17986|95.8|3.5|0.8|0.3|4.6|39.1|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_wpe_near|1088|17986|96.6|2.8|0.6|0.3|3.7|34.8|

#### CER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_wpe_far|89|8845|97.1|1.5|1.4|1.1|4.0|48.3|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_1ch_wpe_near|90|9336|98.1|0.9|0.9|0.8|2.7|44.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_wpe_far|742|71524|98.2|1.0|0.9|0.4|2.2|38.3|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_1ch_wpe_near|742|71524|98.8|0.6|0.6|0.3|1.5|30.7|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_wpe_far|186|17261|97.8|1.2|1.0|0.9|3.1|39.2|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_1ch_wpe_near|186|18433|98.0|1.1|0.9|0.9|2.8|43.0|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_wpe_far|1088|105480|98.3|0.8|0.9|0.4|2.1|39.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_1ch_wpe_near|1088|105480|98.7|0.6|0.7|0.3|1.6|35.7|

### 8ch WPE+Beamformit
#### WER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_8ch_beamformit_far|89|1463|95.6|3.7|0.8|0.7|5.1|42.7|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_8ch_beamformit_near|90|1603|96.6|2.7|0.6|0.4|3.8|38.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_8ch_beamformit_far|742|12169|96.8|2.6|0.6|0.3|3.5|30.5|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_8ch_beamformit_near|742|12169|97.1|2.3|0.6|0.2|3.1|29.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_8ch_beamformit_far|186|2962|96.2|3.3|0.5|0.6|4.4|34.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_8ch_beamformit_near|186|3131|96.8|2.7|0.5|0.4|3.6|32.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_8ch_beamformit_far|1088|17986|96.7|2.8|0.5|0.4|3.7|33.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_8ch_beamformit_near|1088|17986|96.8|2.7|0.5|0.3|3.5|33.0|

#### CER
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_8ch_beamformit_far|89|8845|98.2|0.9|0.9|0.5|2.4|42.7|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_real_8ch_beamformit_near|90|9336|98.6|0.7|0.7|0.5|1.9|38.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_8ch_beamformit_far|742|71524|98.7|0.6|0.6|0.3|1.6|30.9|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/dt_simu_8ch_beamformit_near|742|71524|98.9|0.6|0.6|0.2|1.4|29.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_8ch_beamformit_far|186|17261|98.5|0.8|0.7|0.6|2.0|34.4|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_real_8ch_beamformit_near|186|18433|98.7|0.7|0.6|0.5|1.8|32.8|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_8ch_beamformit_far|1088|105480|98.7|0.7|0.6|0.4|1.7|34.2|
|decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/et_simu_8ch_beamformit_near|1088|105480|98.8|0.6|0.6|0.3|1.6|33.6|

## Transformer + SpeedPerturbation + SpecAug
### Environments
- date: `Wed Nov 18 08:43:05 JST 2020`
Expand Down
68 changes: 68 additions & 0 deletions egs2/reverb/asr1/conf/tuning/train_asr_conformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# This configuration requires 4 GPUs with 32GB memory
batch_type: numel
batch_bins: 6000000
accum_grad: 3
max_epoch: 50
patience: none
init: xavier_uniform
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

encoder: conformer
encoder_conf:
output_size: 512
attention_heads: 8
linear_units: 2048
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d
normalize_before: true
macaron_style: true
pos_enc_layer_type: "rel_pos"
selfattention_layer_type: "rel_selfattn"
activation_type: "swish"
use_cnn_module: true
cnn_module_kernel: 15

decoder: transformer
decoder_conf:
attention_heads: 8
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false

optim: adam
optim_conf:
lr: 0.002
scheduler: warmuplr
scheduler_conf:
warmup_steps: 25000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 30
num_freq_mask: 2
apply_time_mask: true
time_mask_width_range:
- 0
- 40
num_time_mask: 2
74 changes: 74 additions & 0 deletions egs2/reverb/asr1/conf/tuning/train_asr_conformer2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# This configuration requires 4 GPUs with 32GB memory
batch_type: numel
batch_bins: 6000000
accum_grad: 3
max_epoch: 50
patience: none
init: xavier_uniform
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

encoder: conformer
encoder_conf:
output_size: 512
attention_heads: 8
linear_units: 2048
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d
normalize_before: true
macaron_style: true
pos_enc_layer_type: "rel_pos"
selfattention_layer_type: "rel_selfattn"
activation_type: "swish"
use_cnn_module: true
cnn_module_kernel: 15

decoder: transformer
decoder_conf:
attention_heads: 8
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false

optim: adam
optim_conf:
lr: 0.002
scheduler: warmuplr
scheduler_conf:
warmup_steps: 25000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 30
num_freq_mask: 2
apply_time_mask: true
time_mask_width_range:
- 0
- 40
num_time_mask: 2

normalize: utterance_mvn
normalize_conf:
norm_means: true
norm_vars: false
eps: 1.0e-20
2 changes: 1 addition & 1 deletion egs2/reverb/asr1/conf/tuning/train_asr_transformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
batch_type: numel
batch_bins: 16000000
accum_grad: 1
max_epoch: 50
max_epoch: 100
patience: none
# The initialization method for model parameters
init: xavier_uniform
Expand Down
70 changes: 70 additions & 0 deletions egs2/reverb/asr1/conf/tuning/train_asr_transformer2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Trained using GTX-1080ti x4. It takes about 2days.
batch_type: numel
batch_bins: 16000000
accum_grad: 1
max_epoch: 50
patience: none
# The initialization method for model parameters
init: xavier_uniform
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

encoder: transformer
encoder_conf:
output_size: 256
attention_heads: 4
linear_units: 2048
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.0
input_layer: conv2d
normalize_before: true

decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2028
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.0
src_attention_dropout_rate: 0.0

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false

optim: adam
optim_conf:
lr: 0.005
scheduler: warmuplr
scheduler_conf:
warmup_steps: 30000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 30
num_freq_mask: 2
apply_time_mask: true
time_mask_width_range:
- 0
- 40
num_time_mask: 2


normalize: utterance_mvn
normalize_conf:
norm_means: true
norm_vars: false
eps: 1.0e-20
70 changes: 70 additions & 0 deletions egs2/reverb/asr1/conf/tuning/train_asr_transformer3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Trained using GTX-1080ti x4. It takes about 2days.
batch_type: numel
batch_bins: 16000000
accum_grad: 1
max_epoch: 50
patience: none
# The initialization method for model parameters
init: xavier_uniform
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 10

encoder: transformer
encoder_conf:
output_size: 256
attention_heads: 4
linear_units: 2048
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.0
input_layer: conv2d
normalize_before: true

decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2028
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.0
src_attention_dropout_rate: 0.0

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false

optim: adam
optim_conf:
lr: 0.005
scheduler: warmuplr
scheduler_conf:
warmup_steps: 30000

specaug: specaug
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 30
num_freq_mask: 2
apply_time_mask: true
time_mask_width_range:
- 0
- 40
num_time_mask: 2


normalize: utterance_mvn
normalize_conf:
norm_means: true
norm_vars: false
eps: 1.0e-20
Loading

0 comments on commit 75eb357

Please sign in to comment.