-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swbd/chain : added blstm + fastlstm and blstm + tdnn + fastlstm scripts #1497
Conversation
@danpovey Unlike in TDNN+LSTMs replacing LSTM layers with fastLSTMs has led to degradation in BLSTM models. |
Thanks! Are these numbers better than the TDNN+fast-LSTM numbers? |
Regarding the effect of fast-LSTM, when I looked at the numbers:
I saw an improvement, not a degradation. train_dev and eval2000 are about the same size so it's appropriate to average the numbers. |
Updated the soft links. Added a new model type called tdnn_blstm. I will update this new model to follow tdnn_lstm_1c, but with bidirectional layers, as the next step. |
ok, thanks-- merging. |
BTW these are still worse than BLSTMs. This experimentation was done to verify if the gains seen in TDNN+LSTMs were due to the higher sampling rates at the lower layers (i.e., splicing of -1,0,1) and not due to actual modeling of right context by TDNNs. Preliminary evidence suggests that the better results are not just due to higher sampling rates. @freewym It might be better to also commit your BLSTM recipe with [-1,1] [-3,3] [-3,3] delays to give a context for this experiment. |
Sorry I meant to say these results are worse than TDNN+LSTMs |
Oh-- if the TDNN+BLSTMs are worse than BLSTMs I might want to remove the
soft link at some point. I don't want to give people the impression that
it's necessary or desirable to run that system type, if there is no real
advantage in it. But this can be decided after further tuning.
…On Thu, Mar 16, 2017 at 6:26 PM, Vijayaditya Peddinti < ***@***.***> wrote:
Sorry I meant to say these results are worse than TDNN+LSTMs
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu9LPxyrG74X4jtjRA8-MybwbE2Npks5rmbcIgaJpZM4Mf8Qm>
.
|
TDNN+BLSTMs are better than BLSTMs but worse than TDNN+LSTMs. They have
fewer TDNN layers than TDNN+LSTMs.
3 # tdnn_blstm_1a is same as blstm_6k, but with the initial tdnn layers
4 # local/chain/compare_wer_general.sh blstm_6l_sp blstm_6k_sp
5 # System blstm_6k_sp tdnn_blstm_6l_sp
6 # WER on train_dev(tg) 13.25 12.95
7 # WER on train_dev(fg) 12.27 11.98
8 # WER on eval2000(tg) 15.7 15.5
9 # WER on eval2000(fg) 14.5 14.1
10 # Final train prob -0.052 -0.041
11 # Final valid prob -0.080 -0.072
12 # Final train prob (xent) -0.743 -0.629
13 # Final valid prob (xent) -0.8816 -0.8091
On Thu, Mar 16, 2017 at 3:29 PM, Daniel Povey <notifications@github.com>
wrote:
… Oh-- if the TDNN+BLSTMs are worse than BLSTMs I might want to remove the
soft link at some point. I don't want to give people the impression that
it's necessary or desirable to run that system type, if there is no real
advantage in it. But this can be decided after further tuning.
On Thu, Mar 16, 2017 at 6:26 PM, Vijayaditya Peddinti <
***@***.***> wrote:
> Sorry I meant to say these results are worse than TDNN+LSTMs
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#1497 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-
auth/ADJVu9LPxyrG74X4jtjRA8-MybwbE2Npks5rmbcIgaJpZM4Mf8Qm>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADtwoAua3-vbd9NxXwYKqsPQk3_zcRIvks5rmbetgaJpZM4Mf8Qm>
.
|
Maybe it would be better to alternate forward and backward LSTM layers
instead of combining them, e.g. tdnn layers + forward LSTM layer + tdnn
layers + backward LSTM layer, etc.
On Thu, Mar 16, 2017 at 6:31 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:
… TDNN+BLSTMs are better than BLSTMs but worse than TDNN+LSTMs. They have
fewer TDNN layers than TDNN+LSTMs.
3 # tdnn_blstm_1a is same as blstm_6k, but with the initial tdnn layers
4 # local/chain/compare_wer_general.sh blstm_6l_sp blstm_6k_sp
5 # System blstm_6k_sp tdnn_blstm_6l_sp
6 # WER on train_dev(tg) 13.25 12.95
7 # WER on train_dev(fg) 12.27 11.98
8 # WER on eval2000(tg) 15.7 15.5
9 # WER on eval2000(fg) 14.5 14.1
10 # Final train prob -0.052 -0.041
11 # Final valid prob -0.080 -0.072
12 # Final train prob (xent) -0.743 -0.629
13 # Final valid prob (xent) -0.8816 -0.8091
On Thu, Mar 16, 2017 at 3:29 PM, Daniel Povey ***@***.***>
wrote:
> Oh-- if the TDNN+BLSTMs are worse than BLSTMs I might want to remove the
> soft link at some point. I don't want to give people the impression that
> it's necessary or desirable to run that system type, if there is no real
> advantage in it. But this can be decided after further tuning.
>
>
> On Thu, Mar 16, 2017 at 6:26 PM, Vijayaditya Peddinti <
> ***@***.***> wrote:
>
> > Sorry I meant to say these results are worse than TDNN+LSTMs
> >
> > —
> > You are receiving this because you modified the open/close state.
> > Reply to this email directly, view it on GitHub
> > <#1497 (comment)>,
> or mute
> > the thread
> > <https://github.com/notifications/unsubscribe-
> auth/ADJVu9LPxyrG74X4jtjRA8-MybwbE2Npks5rmbcIgaJpZM4Mf8Qm>
>
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#1497 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ADtwoAua3-
vbd9NxXwYKqsPQk3_zcRIvks5rmbetgaJpZM4Mf8Qm>
> .
>
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu7teOLMocwryPFxeN6qSVXvHCWIvks5rmbhLgaJpZM4Mf8Qm>
.
|
OK will try out that too..
…--Vijay
On Thu, Mar 16, 2017 at 3:32 PM, Daniel Povey <notifications@github.com>
wrote:
Maybe it would be better to alternate forward and backward LSTM layers
instead of combining them, e.g. tdnn layers + forward LSTM layer + tdnn
layers + backward LSTM layer, etc.
On Thu, Mar 16, 2017 at 6:31 PM, Vijayaditya Peddinti <
***@***.***> wrote:
> TDNN+BLSTMs are better than BLSTMs but worse than TDNN+LSTMs. They have
> fewer TDNN layers than TDNN+LSTMs.
>
> 3 # tdnn_blstm_1a is same as blstm_6k, but with the initial tdnn layers
> 4 # local/chain/compare_wer_general.sh blstm_6l_sp blstm_6k_sp
> 5 # System blstm_6k_sp tdnn_blstm_6l_sp
> 6 # WER on train_dev(tg) 13.25 12.95
> 7 # WER on train_dev(fg) 12.27 11.98
> 8 # WER on eval2000(tg) 15.7 15.5
> 9 # WER on eval2000(fg) 14.5 14.1
> 10 # Final train prob -0.052 -0.041
> 11 # Final valid prob -0.080 -0.072
> 12 # Final train prob (xent) -0.743 -0.629
> 13 # Final valid prob (xent) -0.8816 -0.8091
>
> On Thu, Mar 16, 2017 at 3:29 PM, Daniel Povey ***@***.***>
> wrote:
>
> > Oh-- if the TDNN+BLSTMs are worse than BLSTMs I might want to remove
the
> > soft link at some point. I don't want to give people the impression
that
> > it's necessary or desirable to run that system type, if there is no
real
> > advantage in it. But this can be decided after further tuning.
> >
> >
> > On Thu, Mar 16, 2017 at 6:26 PM, Vijayaditya Peddinti <
> > ***@***.***> wrote:
> >
> > > Sorry I meant to say these results are worse than TDNN+LSTMs
> > >
> > > —
> > > You are receiving this because you modified the open/close state.
> > > Reply to this email directly, view it on GitHub
> > > <#1497 (comment)
>,
> > or mute
> > > the thread
> > > <https://github.com/notifications/unsubscribe-
> > auth/ADJVu9LPxyrG74X4jtjRA8-MybwbE2Npks5rmbcIgaJpZM4Mf8Qm>
> >
> > > .
> > >
> >
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub
> > <#1497 (comment)>,
> or mute
> > the thread
> > <https://github.com/notifications/unsubscribe-auth/ADtwoAua3-
> vbd9NxXwYKqsPQk3_zcRIvks5rmbetgaJpZM4Mf8Qm>
> > .
> >
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#1497 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/
ADJVu7teOLMocwryPFxeN6qSVXvHCWIvks5rmbhLgaJpZM4Mf8Qm>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADtwoCbM43MduwUU6oHowZZUvsODMQ9Hks5rmbiWgaJpZM4Mf8Qm>
.
|
FYI, the results of my previous experiments on comparing [-1,1] [-3,3] [-3,3] delays with [-3,3] [-3,3] [-3,3] delays using fastblstm, along with tdnn+fastlstm are: System fastblstm_133 fastblstm tdnn+fastlstm where fastblstm_133 is better than tdnn+fastlstm. |
Thanks. Was the increase in the training time between fastBLSTM_333 and
fastBLSTM_133 the same as BLSTM_133 and BLSTM_333 ? IIRC you said BLSTM_133
resulted in a 40% increase in training time.
…--Vijay
On Thu, Mar 16, 2017 at 4:32 PM, Yiming Wang ***@***.***> wrote:
FYI, the results of my previous experiments on comparing [-1,1] [-3,3]
[-3,3] delays with [-3,3] [-3,3] [-3,3] delays using fastblstm, along with
tdnn+fastlstm are:
System fastblstm_133 fastblstm tdnn+fastlstm WER on train_dev(tg) 13.19
13.59 13.17 WER on train_dev(fg) 12.24 12.45 12.28 WER on eval2000(tg)
15.0 15.8 15.5 WER on eval2000(fg) 13.5 14.3 14.1
where fastblstm_133 is better than tdnn+fastlstm.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADtwoLHPeOS9CD4JLVDYUGyYZP7m514aks5rmcaHgaJpZM4Mf8Qm>
.
|
I didn't try non-fast version of BLSTM_133. |
probably a slightly smaller dim (or maybe even half the dim) on the first,
higher-frequency BLSTM layer would not hurt.
On Thu, Mar 16, 2017 at 7:37 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:
… Thanks. Was the increase in the training time between fastBLSTM_333 and
fastBLSTM_133 the same as BLSTM_133 and BLSTM_333 ? IIRC you said BLSTM_133
resulted in a 40% increase in training time.
--Vijay
On Thu, Mar 16, 2017 at 4:32 PM, Yiming Wang ***@***.***>
wrote:
> FYI, the results of my previous experiments on comparing [-1,1] [-3,3]
> [-3,3] delays with [-3,3] [-3,3] [-3,3] delays using fastblstm, along
with
> tdnn+fastlstm are:
> System fastblstm_133 fastblstm tdnn+fastlstm WER on train_dev(tg) 13.19
> 13.59 13.17 WER on train_dev(fg) 12.24 12.45 12.28 WER on eval2000(tg)
> 15.0 15.8 15.5 WER on eval2000(fg) 13.5 14.3 14.1
>
> where fastblstm_133 is better than tdnn+fastlstm.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#1497 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/
ADtwoLHPeOS9CD4JLVDYUGyYZP7m514aks5rmcaHgaJpZM4Mf8Qm>
> .
>
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu6s6fJKyjKbTHGIjkcl9G-vm0zrgks5rmcfSgaJpZM4Mf8Qm>
.
|
No description provided.