DS2 accuracy for librispeech #214

xhzhao · 2018-01-11T12:19:40Z

we try to reproduce the DS2 accuracy of the paper(WER=5.33) on librispeech datatset(1000hour), and we trained 70 epochs on 8x p100 GPU server with default model, and got a accuracy of WER=22.033, CER=7.316. I think this accuracy basically match with your pretrained model.

Dataset	WER	CER
Librispeech clean	11.20	3.36
Librispeech other	31.31	12.29

But this accuracy has a huge gap against the DS2 paper.My question is:

How could i get the DS2 paper accuracy on librispeech?

This is my command line:
python train.py --train_manifest /lustre/dataset/deepspeech/libri_train_manifest.csv --val_manifest /lustre/dataset/deepspeech/libri_val_manifest.csv --cuda --num_workers 32 --batch_size 64

This is the accuracy log:
Validation Summary Epoch: [1] Average WER 47.749 Average CER 15.775
Validation Summary Epoch: [2] Average WER 36.541 Average CER 11.845
Validation Summary Epoch: [3] Average WER 31.529 Average CER 10.206
Validation Summary Epoch: [4] Average WER 29.300 Average CER 9.433
Validation Summary Epoch: [5] Average WER 26.966 Average CER 8.715
Validation Summary Epoch: [6] Average WER 25.105 Average CER 8.173
Validation Summary Epoch: [7] Average WER 24.228 Average CER 7.858
Validation Summary Epoch: [8] Average WER 23.306 Average CER 7.572
Validation Summary Epoch: [9] Average WER 22.873 Average CER 7.442
Validation Summary Epoch: [10] Average WER 22.682 Average CER 7.408
Validation Summary Epoch: [11] Average WER 22.196 Average CER 7.281
Validation Summary Epoch: [12] Average WER 22.237 Average CER 7.333
Validation Summary Epoch: [13] Average WER 22.084 Average CER 7.243
Validation Summary Epoch: [14] Average WER 22.137 Average CER 7.257
Validation Summary Epoch: [15] Average WER 22.188 Average CER 7.314
Validation Summary Epoch: [16] Average WER 22.042 Average CER 7.297
Validation Summary Epoch: [17] Average WER 22.237 Average CER 7.345
Validation Summary Epoch: [18] Average WER 22.225 Average CER 7.380
Validation Summary Epoch: [19] Average WER 22.265 Average CER 7.356
Validation Summary Epoch: [20] Average WER 22.280 Average CER 7.402
...
Validation Summary Epoch: [70] Average WER 22.033 Average CER 7.316

The text was updated successfully, but these errors were encountered:

alugupta · 2018-01-11T16:27:28Z

My impression was the the accuracy the from the pretrained models is actually quite reasonable compared to the deepspeech 2 paper. If you look at Section 6 of the DS2 paper , it says that each model was trained on the full training set they had available which is 10x larger than the Librispeech dataset.

In table 13 where they report 5.33 on Librispeech clean, they are only doing the test-set I believe (it is trained on all 12000 hours of training data that they had available). The model they use is also slightly different than the one here (Section 6.1) : 11 layers with 7 bi-directional RNN layers, 3 2D-Conv layers, and an FC layer.

Looking at Table 10, if we look at the 10% data-set performance (Librispeech is 960 hours which is close to 1200 hours) then it's comparable (somewhat) to the pretrained model?

Am I thinking about this wrong?

Also, were you able to train on the 8xp100 GPU system with considerable speedup with the multi-GPU system (See issue 211)

ryanleary · 2018-01-11T19:27:08Z

You need to test with a language model to achieve that result.

xhzhao · 2018-01-12T08:12:19Z

@alugupta i will have a try on a single GPU, but it will be very slow to get a epoch training time.

ryanleary · 2018-01-12T11:47:17Z

You don't need to retrain. Your model is fine and matches expected performance. You must use the beam search decoder with a language model to get results competitive with the numbers listed in the paper. They're also using a language model to get those results

xhzhao · 2018-01-19T05:09:38Z

@ryanleary Can we train the language model with the GRU+CTC together? if true, we could get a better accuracy with the default model and that would be cool.

ryanleary closed this as completed Jan 12, 2018

snakers4 mentioned this issue May 21, 2019

Model convergence curves on LibriSpeech #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DS2 accuracy for librispeech #214

DS2 accuracy for librispeech #214

xhzhao commented Jan 11, 2018 •

edited

Loading

alugupta commented Jan 11, 2018

ryanleary commented Jan 11, 2018

xhzhao commented Jan 12, 2018

ryanleary commented Jan 12, 2018

xhzhao commented Jan 19, 2018

DS2 accuracy for librispeech #214

DS2 accuracy for librispeech #214

Comments

xhzhao commented Jan 11, 2018 • edited Loading

alugupta commented Jan 11, 2018

ryanleary commented Jan 11, 2018

xhzhao commented Jan 12, 2018

ryanleary commented Jan 12, 2018

xhzhao commented Jan 19, 2018

xhzhao commented Jan 11, 2018 •

edited

Loading