pytorch large conformer with specaug + speed perturbation (8 GPUs) + Transformer LM (4 GPUs)

We used the same LM model in the previous report.

Model files (archived to model.tar.gz by $ pack_model.sh)
- model link: (pretrained model)
- training config file: conf/tuning/conformer/train_pytorch_conformer_large.yaml
- decoding config file: conf/decode.yaml
- cmvn file: data/train_sp/cmvn.ark
- e2e file: exp/train_960_pytorch_train_pytorch_conformer_transfer_specaug/results/model.val10.avg.best
- e2e JSON file: exp/train_960_pytorch_train_pytorch_conformer_transfer_specaug/results/model.json
- lm file: exp/train_rnnlm_transformer/rnnlm.model.best
- lm JSON file: exp/train_rnnlm_transformer/model.json
- dict file: data/lang_char
Results (paste them by yourself or obtained by $ pack_model.sh --results <results>)

exp/train_960_pytorch_train_pytorch_conformer_transfer_specaug/decode_dev_clean_model.val10.avg.best_decode_transformer/result.wrd.txt
|    SPKR            |    # Snt        # Wrd     |    Corr           Sub          Del           Ins           Err         S.Err     |
|    Sum/Avg         |    2703         54402     |    98.3           1.6          0.2           0.2           1.9          26.2     |
exp/train_960_pytorch_train_pytorch_conformer_transfer_specaug/decode_dev_other_model.val10.avg.best_decode_transformer/result.wrd.txt
|    SPKR            |    # Snt        # Wrd     |    Corr           Sub          Del           Ins           Err         S.Err     |
|    Sum/Avg         |    2864         50948     |    95.6           3.9          0.5           0.5           4.9          41.4     |
exp/train_960_pytorch_train_pytorch_conformer_transfer_specaug/decode_test_clean_model.val10.avg.best_decode_transformer/result.wrd.txt
|    SPKR            |    # Snt        # Wrd     |    Corr           Sub           Del           Ins           Err         S.Err     |
|    Sum/Avg         |    2620         52576     |    98.1           1.7           0.2           0.3           2.1          25.9     |
exp/train_960_pytorch_train_pytorch_conformer_transfer_specaug/decode_test_other_model.val10.avg.best_decode_transformer/result.wrd.txt
|    SPKR            |    # Snt        # Wrd     |    Corr           Sub           Del           Ins           Err         S.Err     |
|    Sum/Avg         |    2939         52343     |    95.6           3.9           0.5           0.5           4.9          44.0     |

pytorch large conformer with specaug (8 GPUs) + Transformer LM (4 GPUs)

We used the same LM model in the previous report.

Environments
- python version: 3.8.3 (default) [GCC 7.3.0]
- espnet version: espnet 0.9.2
- chainer version: chainer 6.0.0
- pytorch version: pytorch 1.4.0
Model files (archived to model.tar.gz by $ pack_model.sh)
- model link: (pretrained model)
- training config file: conf/tuning/conformer/train_pytorch_conformer_large.yaml
- decoding config file: conf/decode.yaml
- cmvn file: data/train_960/cmvn.ark
- e2e file: exp/train_960_pytorch_train_pytorch_conformer_large_specaug/results/model.val5.avg.best
- e2e JSON file: exp/train_960_pytorch_train_pytorch_conformer_large_specaug/results/model.json
- lm file: exp/train_rnnlm_transformer/rnnlm.model.best
- lm JSON file: exp/train_rnnlm_transformer/model.json
- dict file: data/lang_char
- Results (paste them by yourself or obtained by $ pack_model.sh --results <results>)

exp/train_960_pytorch_train_pytorch_conformer_large_specaug/decode_dev_clean_model.val5.avg.best_decode_transformer/result.wrd.txt
|    SPKR           |    # Snt        # Wrd    |    Corr           Sub          Del           Ins          Err         S.Err    |
|    Sum/Avg        |    2703         54402    |    98.2           1.6          0.2           0.2          2.0          26.3    |
exp/train_960_pytorch_train_pytorch_conformer_large_specaug/decode_dev_other_model.val5.avg.best_decode_transformer/result.wrd.txt
|    SPKR           |    # Snt        # Wrd    |    Corr           Sub          Del           Ins          Err         S.Err    |
|    Sum/Avg        |    2864         50948    |    95.6           3.9          0.5           0.5          4.9          40.8    |
exp/train_960_pytorch_train_pytorch_conformer_large_specaug/decode_test_clean_model.val5.avg.best_decode_transformer/result.wrd.txt
|    SPKR            |    # Snt       # Wrd     |    Corr          Sub           Del          Ins           Err        S.Err     |
|    Sum/Avg         |    2620        52576     |    98.1          1.7           0.2          0.3           2.2         26.6     |
exp/train_960_pytorch_train_pytorch_conformer_large_specaug/decode_test_other_model.val5.avg.best_decode_transformer/result.wrd.txt
|    SPKR            |    # Snt       # Wrd     |    Corr          Sub           Del          Ins           Err        S.Err     |
|    Sum/Avg         |    2939        52343     |    95.3          4.1           0.6          0.6           5.3         44.8     |

pytorch large conformer-transducer with specaug + speed perturbation (4 GPUs)

Environments
- python version: 3.8.3 (default) [GCC 7.3.0]
- espnet version: espnet 0.10.7a1
- chainer version: chainer 6.0.0
- pytorch version: pytorch 1.10.0
Model files (archived to model.tar.gz by $ pack_model.sh)
- model link: (pretrained model)
- training config file: conf/tuning/transducer/train_conformer-rnn_transducer.yaml
- decoding config file: conf/tuning/transducer/decode.yaml
- cmvn file: data/train_sp/cmvn.ark
- e2e file: exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/results/model.last10.avg.best
- e2e JSON file: exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/results/model.json
- dict file: data/lang_char
- Results (paste them by yourself or obtained by $ pack_model.sh --results <results>)

exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_dev_clean_model.last10.avg.best/result.wrd.txt
|    SPKR           |    # Snt       # Wrd     |    Corr          Sub          Del          Ins           Err        S.Err    |
|    Sum/Avg        |    2703        54402     |    97.6          2.2          0.2          0.3           2.7         33.0    |
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_dev_other_model.last10.avg.best/result.wrd.txt
|    SPKR           |    # Snt       # Wrd     |    Corr          Sub          Del          Ins           Err        S.Err    |
|    Sum/Avg        |    2864        50948     |    93.7          5.7          0.6          0.7           7.0         52.8    |
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_test_clean_model.last10.avg.best/result.wrd.txt
|    SPKR           |    # Snt        # Wrd    |    Corr          Sub           Del          Ins          Err         S.Err    |
|    Sum/Avg        |    2620         52576    |    97.4          2.3           0.3          0.3          2.9          33.1    |
exp/train_960_pytorch_transducer_train_conformer-rnn_transducer/decode_test_other_model.last10.avg.best/result.wrd.txt
|    SPKR           |    # Snt        # Wrd    |    Corr          Sub           Del          Ins          Err         S.Err    |
|    Sum/Avg        |    2939         52343    |    93.7          5.6           0.7          0.8          7.1          55.1    |

Lightweight/Dynamic convolution results

		# Snt	# Wrd	Corr	Sub	Del	Ins	Err	S.Err
exp/train_960_pytorch_train_pytorch_LC_specaug/decode_dev_clean_model.val5.avg.best_decode_lm/result.wrd.txt:	Sum/Avg	2703	54402	96.9	2.8	0.3	0.3	3.4	39.0
exp/train_960_pytorch_train_pytorch_SA-DC_specaug/decode_dev_other_model.val5.avg.best_decode_lm/result.wrd.txt:	Sum/Avg	2864	50948	92.7	6.5	0.8	0.9	8.2	55.9
exp/train_960_pytorch_train_pytorch_DC_specaug/decode_test_clean_model.val5.avg.best_decode_lm/result.wrd.txt:	Sum/Avg	2620	52576	96.9	2.9	0.3	0.4	3.5	37.9
exp/train_960_pytorch_train_pytorch_SA-DC2D_specaug/decode_test_other_model.val5.avg.best_decode_lm/result.wrd.txt:	Sum/Avg	2939	52343	92.5	6.7	0.8	1.0	8.5	60.2

pytorch large Transformer with specaug (4 GPUs) + Transformer LM (4 GPUs)

We used the same ASR model in the previous report.

Environments
- date: Tue Feb 4 14:50:50 JST 2020
- python version: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]
- espnet version: espnet 0.6.0
- chainer version: chainer 6.0.0
- pytorch version: pytorch 1.0.1.post2
- Git hash: 83799e69a0269450587a6857882c73bfb27551d5
- Commit date: Tue Feb 4 14:21:11 2020 +0900
Model files (archived to model.tar.gz by $ pack_model.sh)
- model link: https://drive.google.com/open?id=1RHYAhcnlKz08amATrf0ZOWFLzoQphtoc
- training config file: ./conf/train.yaml
- decoding config file: ./conf/decode.yaml
- cmvn file: ./data/train_960/cmvn.ark
- e2e file: ./librispeech.transformer.v1/exp/train_960_pytorch_train_pytorch_transformer.v1_aheads8_batch-bins15000000_specaug/results/model.val5.avg.best
- e2e JSON file: ./librispeech.transformer.v1/exp/train_960_pytorch_train_pytorch_transformer.v1_aheads8_batch-bins15000000_specaug/results/model.json
- lm file: ./exp/train_rnnlm_pytorch_lm_transformer_cosine_batchsize32_lr1e-4_layer16_unigram5000_ngpu4/rnnlm.model.best
- lm JSON file: ./exp/train_rnnlm_pytorch_lm_transformer_cosine_batchsize32_lr1e-4_layer16_unigram5000_ngpu4/model.json
- dict file: ./data/lang_char
Results (paste them by yourself or obtained by $ pack_model.sh --results <results>)

./exp/train_rnnlm_pytorch_lm_transformer_cosine_batchsize32_lr1e-4_layer16_unigram5000_ngpu4/decode_dev_clean_decode_ep43/result.wrd.txt
|    SPKR            |    # Snt        # Wrd     |    Corr            Sub           Del           Ins           Err         S.Err     |
|    Sum/Avg         |    2703         54402     |    98.1            1.7           0.2           0.2           2.1          26.9     |
./exp/train_rnnlm_pytorch_lm_transformer_cosine_batchsize32_lr1e-4_layer16_unigram5000_ngpu4/decode_dev_other_decode_ep43/result.wrd.txt
|    SPKR            |    # Snt        # Wrd     |    Corr            Sub           Del           Ins           Err         S.Err     |
|    Sum/Avg         |    2864         50948     |    95.3            4.2           0.5           0.6           5.3          43.8     |
./exp/train_rnnlm_pytorch_lm_transformer_cosine_batchsize32_lr1e-4_layer16_unigram5000_ngpu4/decode_test_clean_decode_ep43/result.wrd.txt
|    SPKR            |    # Snt         # Wrd     |    Corr           Sub           Del            Ins           Err         S.Err     |
|    Sum/Avg         |    2620          52576     |    97.8           1.9           0.2            0.3           2.5          28.3     |
./exp/train_rnnlm_pytorch_lm_transformer_cosine_batchsize32_lr1e-4_layer16_unigram5000_ngpu4/decode_test_other_decode_ep43/result.wrd.txt
|    SPKR            |    # Snt         # Wrd     |    Corr           Sub           Del            Ins           Err         S.Err     |
|    Sum/Avg         |    2939          52343     |    95.1           4.3           0.6            0.6           5.5          46.7     |

pytorch large Transformer with specaug (4 GPUs) + Large LSTM LM

Models

Model files (archived to train_960_pytorch_train_pytorch_transformer_large_ngpu4_specaug.tar.gz by $ pack_model.sh)
model link: https://drive.google.com/open?id=1BtQvAnsFvVi-dp_qsaFP7n4A_5cwnlR6
training config file: conf/tuning/train_pytorch_transformer_large_ngpu4.yaml
decoding config file: conf/tuning/decode_pytorch_transformer_large.yaml
cmvn file: data/train_960/cmvn.ark
e2e file: exp/train_960_pytorch_train_pytorch_transformer_large_ngpu4_specaug/results/model.val5.avg.best
e2e JSON file: exp/train_960_pytorch_train_pytorch_transformer_large_ngpu4_specaug/results/model.json
lm file: exp/irielm.ep11.last5.avg/rnnlm.model.best
lm JSON file: exp/irielm.ep11.last5.avg/model.json

Environments

date: Thu Jul 18 16:15:33 JST 2019
python version: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]
espnet version: espnet 0.4.0
chainer version: chainer 6.0.0
pytorch version: pytorch 1.0.1.post2
Git hash: f9f40861423ba9a9c9f5a45bd4369dbdb9b3bbf9
- Commit date: Thu Jul 18 15:40:51 2019 +0900

WER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_dev_clean_model.val5.avg.best_decode_pytorch_transformer_large_lm_large	2703	54402	98.0	1.8	0.2	0.2	2.2	27.9
decode_dev_other_model.val5.avg.best_decode_pytorch_transformer_large_lm_large	2864	50948	95.1	4.3	0.6	0.6	5.6	44.9
decode_test_clean_model.val5.avg.best_decode_pytorch_transformer_large_lm_large	2620	52576	97.7	2.0	0.3	0.3	2.6	29.9
decode_test_other_model.val5.avg.best_decode_pytorch_transformer_large_lm_large	2939	52343	95.0	4.4	0.6	0.6	5.7	47.7

pytorch Transformer (accum grad 8, single GPU)

Environments (obtained by $ get_sys_info.sh)
- date: Wed Jun 19 16:58:42 EDT 2019
- system information: Linux b14 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
- python version: Python 3.7.3
- espnet version: espnet 0.3.1
- chainer version: chainer 6.0.0
- pytorch version: pytorch 1.0.1.post2
- Git hash: b32af59f229b54801a2cf7e4b8a48cadccd5fe5a
Model files (archived to model.v1.tar.gz by $ pack_model.sh)
- model link: https://drive.google.com/open?id=1bOaOEIZBveERti0x6mnBYiNsn6MSRd2E
- training config file: conf/tuning/train_pytorch_transformer_lr5.0_ag8.v2.yaml
- decoding config file: conf/tuning/decode_pytorch_transformer.yaml
- cmvn file: data/train_960/cmvn.ark
- e2e file: exp/train_960_pytorch_train_pytorch_transformer_lr5.0_ag8.v2/results/model.last10.avg.best
- e2e JSON file: exp/train_960_pytorch_train_pytorch_transformer_lr5.0_ag8.v2/results/model.json
- lm file: exp/train_rnnlm_pytorch_lm_unigram5000/rnnlm.model.best
- lm JSON file: exp/train_rnnlm_pytorch_lm_unigram5000/model.json
Results (paste them by yourself or obtained by $ pack_model.sh --results <results>)

exp/train_960_pytorch_train_pytorch_transformer_lr5.0_ag8.v2/decode_dev_clean_decode_pytorch_transformer_lm/result.wrd.txt
|    SPKR           |   # Snt       # Wrd    |   Corr          Sub         Del          Ins         Err        S.Err    |
|    Sum/Avg        |   2703        54402    |   96.7          2.9         0.3          0.4         3.7         38.5    |
exp/train_960_pytorch_train_pytorch_transformer_lr5.0_ag8.v2/decode_dev_other_decode_pytorch_transformer_lm/result.wrd.txt
|    SPKR           |   # Snt       # Wrd    |   Corr          Sub         Del          Ins         Err        S.Err    |
|    Sum/Avg        |   2864        50948    |   91.4          7.7         0.9          1.3         9.8         59.7    |
exp/train_960_pytorch_train_pytorch_transformer_lr5.0_ag8.v2/decode_test_clean_decode_pytorch_transformer_lm/result.wrd.txt
|    SPKR           |   # Snt       # Wrd    |    Corr         Sub          Del          Ins         Err        S.Err    |
|    Sum/Avg        |   2620        52576    |    96.5         3.1          0.4          0.5         4.0         38.3    |
exp/train_960_pytorch_train_pytorch_transformer_lr5.0_ag8.v2/decode_test_other_decode_pytorch_transformer_lm/result.wrd.txt
|    SPKR           |   # Snt       # Wrd    |    Corr         Sub          Del          Ins         Err        S.Err    |
|    Sum/Avg        |   2939        52343    |    91.3         7.8          0.9          1.3        10.0         62.8    |

pytorch Transformer without any hyper-parameter tuning

train_960_pytorch_transformer_conv2d_e12_unit2048_d6_unit2048_aheads4_dim256_mtlalpha0.3_noam_sampprob0.0_ngpu3_bs32_lr10.0_warmup25000_mli512_mlo150_epochs100_accum2_lennormFalse_lsmunigram0.1/

decode_dev_clean_beam20_emodel.last10.avg.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024/result.wrd.txt: 3.8
decode_dev_other_beam20_emodel.last10.avg.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024/result.wrd.txt: 9.9
decode_test_clean_beam20_emodel.last10.avg.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024/result.wrd.txt: 4.2
decode_test_other_beam20_emodel.last10.avg.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024/result.wrd.txt: 9.8

pytorch VGG-3BLSTM 1024 units, #BPE 5000, latest RNNLM training with tuned decoding (ctc_weight=0.5, lm_weight=0.7), dropout 0.2

train_960_pytorch_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location_aconvc10_aconvf100_mtlalpha0.5_drop0.2_adadelta_sampprob0.0_bs20_mli800_mlo150

WER

decode_dev_clean_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 4.0
decode_dev_other_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 12.3
decode_test_clean_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 4.0
decode_test_other_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 12.7

pytorch VGG-3BLSTM 1024 units, #BPE 5000, latest RNNLM training with tuned decoding (ctc_weight=0.5, lm_weight=0.7)

train_960_pytorch_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_sampprob0.0_bs20_mli800_mlo150

WER

decode_dev_clean_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 4.2
decode_dev_other_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 12.5
decode_test_clean_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 4.2
decode_test_other_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.7_1layer_unit1024_sgd_bs1024: 13.6

pytorch VGG-3BLSTM 1024 units, #BPE 5000 more layers with tuned decoding (ctc_weight=0.5, lm_weight=0.5)

train_960_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs24_mli800_mlo150_unigram5000

WER

decode_dev_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.5: 4.5
decode_dev_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.5: 13.0
decode_test_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.5: 4.6
decode_test_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.5_rnnlm0.5: 13.7

pytorch VGG-3BLSTM 1024 units, #BPE 2000 (motivated by the RWTH setup, thanks to Albert Zeyer, Rohit Prabhavalkar, and Kazuki Irie for their comments)

train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs32_mli800_mlo150_unigram2000

WER

decode_dev_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3: 5.0
decode_dev_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3: 14.3
decode_test_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3: 5.0
decode_test_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3: 14.9

pytorch, BLSTMP 8layers

CER (numbers in parenthesis are ER for different lm_weight)

decode_dev_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.txt:| 2.9 (2.7 w/ 0.2, 2.7 w/ 0.3, 2.7 w/ 0.4)
decode_dev_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.txt:| 9.6 (9.2 w/ 0.2, 9.1 w/ 0.3, 9.0 w/ 0.4)
decode_test_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.txt:| 2.7 (2.6 w/ 0.2, 2.6 w/ 0.3, 2.6 w/ 0.4)
decode_test_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.txt:| 9.9 (9.6 w/ 0.2, 9.4 w/ 0.3, 9.3 w/ 0.4)

WER

decode_dev_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.wrd.txt:| 7.7 (7.2 w/ 0.2, 7.1 w/ 0.3, 7.2 w/ 0.4)
decode_dev_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.wrd.txt:| 21.1 (19.6 w/ 0.2, 19.2 w/ 0.3, 18.9 w/ 0.4)
decode_test_clean_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.wrd.txt:| 7.7 (7.2 w/ 0.2, 7.2 w/ 0.3, 7.1 w/ 0.4)
decode_test_other_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3/result.wrd.txt:| 21.9 (20.5 w/ 0.2, 20.0 w/ 0.3, 19.7 w/ 0.4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RESULTS.md

RESULTS.md

pytorch large conformer with specaug + speed perturbation (8 GPUs) + Transformer LM (4 GPUs)

pytorch large conformer with specaug (8 GPUs) + Transformer LM (4 GPUs)

pytorch large conformer-transducer with specaug + speed perturbation (4 GPUs)

Lightweight/Dynamic convolution results

pytorch large Transformer with specaug (4 GPUs) + Transformer LM (4 GPUs)

pytorch large Transformer with specaug (4 GPUs) + Large LSTM LM

Models

Environments

WER

pytorch Transformer (accum grad 8, single GPU)

pytorch Transformer without any hyper-parameter tuning

train_960_pytorch_transformer_conv2d_e12_unit2048_d6_unit2048_aheads4_dim256_mtlalpha0.3_noam_sampprob0.0_ngpu3_bs32_lr10.0_warmup25000_mli512_mlo150_epochs100_accum2_lennormFalse_lsmunigram0.1/

pytorch VGG-3BLSTM 1024 units, #BPE 5000, latest RNNLM training with tuned decoding (ctc_weight=0.5, lm_weight=0.7), dropout 0.2

train_960_pytorch_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location_aconvc10_aconvf100_mtlalpha0.5_drop0.2_adadelta_sampprob0.0_bs20_mli800_mlo150

WER

pytorch VGG-3BLSTM 1024 units, #BPE 5000, latest RNNLM training with tuned decoding (ctc_weight=0.5, lm_weight=0.7)

train_960_pytorch_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_sampprob0.0_bs20_mli800_mlo150

WER

pytorch VGG-3BLSTM 1024 units, #BPE 5000 more layers with tuned decoding (ctc_weight=0.5, lm_weight=0.5)

train_960_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs24_mli800_mlo150_unigram5000

WER

pytorch VGG-3BLSTM 1024 units, #BPE 2000 (motivated by the RWTH setup, thanks to Albert Zeyer, Rohit Prabhavalkar, and Kazuki Irie for their comments)

train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs32_mli800_mlo150_unigram2000

WER

pytorch, BLSTMP 8layers

CER (numbers in parenthesis are ER for different lm_weight)

WER

Files

RESULTS.md

Latest commit

History

RESULTS.md

File metadata and controls

pytorch large conformer with specaug + speed perturbation (8 GPUs) + Transformer LM (4 GPUs)

pytorch large conformer with specaug (8 GPUs) + Transformer LM (4 GPUs)

pytorch large conformer-transducer with specaug + speed perturbation (4 GPUs)

Lightweight/Dynamic convolution results

pytorch large Transformer with specaug (4 GPUs) + Transformer LM (4 GPUs)

pytorch large Transformer with specaug (4 GPUs) + Large LSTM LM

Models

Environments

WER

pytorch Transformer (accum grad 8, single GPU)

pytorch Transformer without any hyper-parameter tuning

train_960_pytorch_transformer_conv2d_e12_unit2048_d6_unit2048_aheads4_dim256_mtlalpha0.3_noam_sampprob0.0_ngpu3_bs32_lr10.0_warmup25000_mli512_mlo150_epochs100_accum2_lennormFalse_lsmunigram0.1/

pytorch VGG-3BLSTM 1024 units, #BPE 5000, latest RNNLM training with tuned decoding (ctc_weight=0.5, lm_weight=0.7), dropout 0.2

train_960_pytorch_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location_aconvc10_aconvf100_mtlalpha0.5_drop0.2_adadelta_sampprob0.0_bs20_mli800_mlo150

WER

pytorch VGG-3BLSTM 1024 units, #BPE 5000, latest RNNLM training with tuned decoding (ctc_weight=0.5, lm_weight=0.7)

train_960_pytorch_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_sampprob0.0_bs20_mli800_mlo150

WER

pytorch VGG-3BLSTM 1024 units, #BPE 5000 more layers with tuned decoding (ctc_weight=0.5, lm_weight=0.5)

train_960_vggblstm_e5_subsample1_2_2_1_1_unit1024_proj1024_d2_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs24_mli800_mlo150_unigram5000

WER

pytorch VGG-3BLSTM 1024 units, #BPE 2000 (motivated by the RWTH setup, thanks to Albert Zeyer, Rohit Prabhavalkar, and Kazuki Irie for their comments)

train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs32_mli800_mlo150_unigram2000

WER

pytorch, BLSTMP 8layers

CER (numbers in parenthesis are ER for different lm_weight)

WER