-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layer normalization #1
Comments
Hmm that's an interesting implementation. They're applying layer norm to in addition to . The supplementary material in Ba et al. (pp. 13–14) only applies layer norm to in both of their LSTM variants. Do you know if there's any follow-up literature that explains the PyTorch variant? |
@sharvil I do not about any. I personally think that any variant of GRU/LSTM with LayerNorm would be great addition. |
This implementation is fairly straightforward. Little effort has been spent on performance optimization. Paper: https://arxiv.org/pdf/1607.06450.pdf Issue: #1
This change adds a new layer, layer_norm_lstm, that applies layer normalization to the input of an LSTM cell. In future changes, this implementation will apply layer normalization to the recurrent connection and the output as well. Issue: #1
This class, LayerNormLSTMCell, is parameter-compatible with LayerNormLTSM. It's implemented using the TF Python API so it can run on CPUs in addition to other accelerators. Another advantage is that LayerNormLSTMCell is an instance of RNNCell, which means it's not fused over time and can be used in e.g. autoregressive models. Note that LayerNormLSTMCell is not intended for training. In particular, the kernel / recurrent kernel / bias initializers are not customizable and the defaults are not very good. Issue: #1
In particular, don't provide a bias term (beta) for the input and recurrent layer norms since there's already a bias term applied by the usual definition of an LSTM cell. Also, rename the layer norm scaling term from alpha to gamma to be consistent with the literature. Issue: #1
Nice! Having GRU would be also great, but we can probably manage with LSTMs :) |
Our LSTM implementation is much further ahead than the GRU one so we started with LSTMs first. When we do the GRU updates, we'll keep LayerNorm in mind. Thanks for the feature request! |
It would be nice to support some form of layer normalization in LSTM and GRU layer (example https://github.com/pytorch/pytorch/blob/master/benchmarks/fastrnns/custom_lstms.py#L171)
The text was updated successfully, but these errors were encountered: