Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swbd/chain : added blstm + fastlstm and blstm + tdnn + fastlstm scripts #1497

Merged
merged 1 commit into from
Mar 16, 2017

Conversation

vijayaditya
Copy link
Contributor

No description provided.

@vijayaditya
Copy link
Contributor Author

@danpovey Unlike in TDNN+LSTMs replacing LSTM layers with fastLSTMs has led to degradation in BLSTM models.

@danpovey
Copy link
Contributor

Thanks! Are these numbers better than the TDNN+fast-LSTM numbers?
I wonder whether there should be soft links added (but this might depend on what the numbers look like; if the difference is not much, I might not want to emphasize these types of systems too much).

@danpovey
Copy link
Contributor

Regarding the effect of fast-LSTM, when I looked at the numbers:

                                       [normal LSTM impl] [fast LSTM impl]
+# WER on train_dev(tg)      13.80     13.25
 +# WER on train_dev(fg)      12.64     12.27
 +# WER on eval2000(tg)        15.6      15.7
 +# WER on eval2000(fg)        14.2      14.5

I saw an improvement, not a degradation. train_dev and eval2000 are about the same size so it's appropriate to average the numbers.

@vijayaditya
Copy link
Contributor Author

Updated the soft links. Added a new model type called tdnn_blstm. I will update this new model to follow tdnn_lstm_1c, but with bidirectional layers, as the next step.

@danpovey
Copy link
Contributor

ok, thanks-- merging.

@danpovey danpovey merged commit 5c98096 into kaldi-asr:master Mar 16, 2017
@vijayaditya
Copy link
Contributor Author

BTW these are still worse than BLSTMs. This experimentation was done to verify if the gains seen in TDNN+LSTMs were due to the higher sampling rates at the lower layers (i.e., splicing of -1,0,1) and not due to actual modeling of right context by TDNNs. Preliminary evidence suggests that the better results are not just due to higher sampling rates.

@freewym It might be better to also commit your BLSTM recipe with [-1,1] [-3,3] [-3,3] delays to give a context for this experiment.

@vijayaditya
Copy link
Contributor Author

Sorry I meant to say these results are worse than TDNN+LSTMs

@danpovey
Copy link
Contributor

danpovey commented Mar 16, 2017 via email

@vijayaditya
Copy link
Contributor Author

vijayaditya commented Mar 16, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Mar 16, 2017 via email

@vijayaditya
Copy link
Contributor Author

vijayaditya commented Mar 16, 2017 via email

@freewym
Copy link
Contributor

freewym commented Mar 16, 2017

FYI, the results of my previous experiments on comparing [-1,1] [-3,3] [-3,3] delays with [-3,3] [-3,3] [-3,3] delays using fastblstm, along with tdnn+fastlstm are:

System fastblstm_133 fastblstm tdnn+fastlstm
WER on train_dev(tg) 13.19 13.59 13.17
WER on train_dev(fg) 12.24 12.45 12.28
WER on eval2000(tg) 15.0 15.8 15.5
WER on eval2000(fg) 13.5 14.3 14.1

where fastblstm_133 is better than tdnn+fastlstm.

@vijayaditya
Copy link
Contributor Author

vijayaditya commented Mar 16, 2017 via email

@freewym
Copy link
Contributor

freewym commented Mar 16, 2017

I didn't try non-fast version of BLSTM_133.

@danpovey
Copy link
Contributor

danpovey commented Mar 16, 2017 via email

david-ryan-snyder pushed a commit to david-ryan-snyder/kaldi that referenced this pull request Apr 12, 2017
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants