Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DS2 accuracy for librispeech #214

Closed
xhzhao opened this issue Jan 11, 2018 · 5 comments
Closed

DS2 accuracy for librispeech #214

xhzhao opened this issue Jan 11, 2018 · 5 comments

Comments

@xhzhao
Copy link

xhzhao commented Jan 11, 2018

we try to reproduce the DS2 accuracy of the paper(WER=5.33) on librispeech datatset(1000hour), and we trained 70 epochs on 8x p100 GPU server with default model, and got a accuracy of WER=22.033, CER=7.316. I think this accuracy basically match with your pretrained model.

Dataset WER CER
Librispeech clean 11.20 3.36
Librispeech other 31.31 12.29

But this accuracy has a huge gap against the DS2 paper.My question is:

  • How could i get the DS2 paper accuracy on librispeech?

This is my command line:
python train.py --train_manifest /lustre/dataset/deepspeech/libri_train_manifest.csv --val_manifest /lustre/dataset/deepspeech/libri_val_manifest.csv --cuda --num_workers 32 --batch_size 64

This is the accuracy log:
Validation Summary Epoch: [1] Average WER 47.749 Average CER 15.775
Validation Summary Epoch: [2] Average WER 36.541 Average CER 11.845
Validation Summary Epoch: [3] Average WER 31.529 Average CER 10.206
Validation Summary Epoch: [4] Average WER 29.300 Average CER 9.433
Validation Summary Epoch: [5] Average WER 26.966 Average CER 8.715
Validation Summary Epoch: [6] Average WER 25.105 Average CER 8.173
Validation Summary Epoch: [7] Average WER 24.228 Average CER 7.858
Validation Summary Epoch: [8] Average WER 23.306 Average CER 7.572
Validation Summary Epoch: [9] Average WER 22.873 Average CER 7.442
Validation Summary Epoch: [10] Average WER 22.682 Average CER 7.408
Validation Summary Epoch: [11] Average WER 22.196 Average CER 7.281
Validation Summary Epoch: [12] Average WER 22.237 Average CER 7.333
Validation Summary Epoch: [13] Average WER 22.084 Average CER 7.243
Validation Summary Epoch: [14] Average WER 22.137 Average CER 7.257
Validation Summary Epoch: [15] Average WER 22.188 Average CER 7.314
Validation Summary Epoch: [16] Average WER 22.042 Average CER 7.297
Validation Summary Epoch: [17] Average WER 22.237 Average CER 7.345
Validation Summary Epoch: [18] Average WER 22.225 Average CER 7.380
Validation Summary Epoch: [19] Average WER 22.265 Average CER 7.356
Validation Summary Epoch: [20] Average WER 22.280 Average CER 7.402
...
Validation Summary Epoch: [70] Average WER 22.033 Average CER 7.316

@alugupta
Copy link

My impression was the the accuracy the from the pretrained models is actually quite reasonable compared to the deepspeech 2 paper. If you look at Section 6 of the DS2 paper , it says that each model was trained on the full training set they had available which is 10x larger than the Librispeech dataset.

In table 13 where they report 5.33 on Librispeech clean, they are only doing the test-set I believe (it is trained on all 12000 hours of training data that they had available). The model they use is also slightly different than the one here (Section 6.1) : 11 layers with 7 bi-directional RNN layers, 3 2D-Conv layers, and an FC layer.

Looking at Table 10, if we look at the 10% data-set performance (Librispeech is 960 hours which is close to 1200 hours) then it's comparable (somewhat) to the pretrained model?

Am I thinking about this wrong?

Also, were you able to train on the 8xp100 GPU system with considerable speedup with the multi-GPU system (See issue 211)

@ryanleary
Copy link
Collaborator

You need to test with a language model to achieve that result.

@xhzhao
Copy link
Author

xhzhao commented Jan 12, 2018

@alugupta i will have a try on a single GPU, but it will be very slow to get a epoch training time.

@ryanleary
Copy link
Collaborator

You don't need to retrain. Your model is fine and matches expected performance. You must use the beam search decoder with a language model to get results competitive with the numbers listed in the paper. They're also using a language model to get those results

@xhzhao
Copy link
Author

xhzhao commented Jan 19, 2018

@ryanleary Can we train the language model with the GRU+CTC together? if true, we could get a better accuracy with the default model and that would be cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants