-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DS2 accuracy for librispeech #214
Comments
My impression was the the accuracy the from the pretrained models is actually quite reasonable compared to the deepspeech 2 paper. If you look at Section 6 of the DS2 paper , it says that each model was trained on the full training set they had available which is 10x larger than the Librispeech dataset. In table 13 where they report 5.33 on Librispeech clean, they are only doing the test-set I believe (it is trained on all 12000 hours of training data that they had available). The model they use is also slightly different than the one here (Section 6.1) : 11 layers with 7 bi-directional RNN layers, 3 2D-Conv layers, and an FC layer. Looking at Table 10, if we look at the 10% data-set performance (Librispeech is 960 hours which is close to 1200 hours) then it's comparable (somewhat) to the pretrained model? Am I thinking about this wrong? Also, were you able to train on the 8xp100 GPU system with considerable speedup with the multi-GPU system (See issue 211) |
You need to test with a language model to achieve that result. |
@alugupta i will have a try on a single GPU, but it will be very slow to get a epoch training time. |
You don't need to retrain. Your model is fine and matches expected performance. You must use the beam search decoder with a language model to get results competitive with the numbers listed in the paper. They're also using a language model to get those results |
@ryanleary Can we train the language model with the GRU+CTC together? if true, we could get a better accuracy with the default model and that would be cool. |
we try to reproduce the DS2 accuracy of the paper(WER=5.33) on librispeech datatset(1000hour), and we trained 70 epochs on 8x p100 GPU server with default model, and got a accuracy of WER=22.033, CER=7.316. I think this accuracy basically match with your pretrained model.
But this accuracy has a huge gap against the DS2 paper.My question is:
This is my command line:
python train.py --train_manifest /lustre/dataset/deepspeech/libri_train_manifest.csv --val_manifest /lustre/dataset/deepspeech/libri_val_manifest.csv --cuda --num_workers 32 --batch_size 64
This is the accuracy log:
Validation Summary Epoch: [1] Average WER 47.749 Average CER 15.775
Validation Summary Epoch: [2] Average WER 36.541 Average CER 11.845
Validation Summary Epoch: [3] Average WER 31.529 Average CER 10.206
Validation Summary Epoch: [4] Average WER 29.300 Average CER 9.433
Validation Summary Epoch: [5] Average WER 26.966 Average CER 8.715
Validation Summary Epoch: [6] Average WER 25.105 Average CER 8.173
Validation Summary Epoch: [7] Average WER 24.228 Average CER 7.858
Validation Summary Epoch: [8] Average WER 23.306 Average CER 7.572
Validation Summary Epoch: [9] Average WER 22.873 Average CER 7.442
Validation Summary Epoch: [10] Average WER 22.682 Average CER 7.408
Validation Summary Epoch: [11] Average WER 22.196 Average CER 7.281
Validation Summary Epoch: [12] Average WER 22.237 Average CER 7.333
Validation Summary Epoch: [13] Average WER 22.084 Average CER 7.243
Validation Summary Epoch: [14] Average WER 22.137 Average CER 7.257
Validation Summary Epoch: [15] Average WER 22.188 Average CER 7.314
Validation Summary Epoch: [16] Average WER 22.042 Average CER 7.297
Validation Summary Epoch: [17] Average WER 22.237 Average CER 7.345
Validation Summary Epoch: [18] Average WER 22.225 Average CER 7.380
Validation Summary Epoch: [19] Average WER 22.265 Average CER 7.356
Validation Summary Epoch: [20] Average WER 22.280 Average CER 7.402
...
Validation Summary Epoch: [70] Average WER 22.033 Average CER 7.316
The text was updated successfully, but these errors were encountered: