Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer normalization #1

Closed
usamec opened this issue Feb 24, 2020 · 5 comments
Closed

Layer normalization #1

usamec opened this issue Feb 24, 2020 · 5 comments

Comments

@usamec
Copy link

usamec commented Feb 24, 2020

It would be nice to support some form of layer normalization in LSTM and GRU layer (example https://github.com/pytorch/pytorch/blob/master/benchmarks/fastrnns/custom_lstms.py#L171)

@sharvil
Copy link
Contributor

sharvil commented Mar 2, 2020

Hmm that's an interesting implementation. They're applying layer norm to c_t in addition to h_t. The supplementary material in Ba et al. (pp. 13–14) only applies layer norm to h_t in both of their LSTM variants.

Do you know if there's any follow-up literature that explains the PyTorch variant?

@usamec
Copy link
Author

usamec commented Mar 3, 2020

@sharvil I do not about any. I personally think that any variant of GRU/LSTM with LayerNorm would be great addition.

sharvil added a commit that referenced this issue Mar 4, 2020
This implementation is fairly straightforward. Little effort has
been spent on performance optimization.

Paper: https://arxiv.org/pdf/1607.06450.pdf
Issue: #1
sharvil added a commit that referenced this issue Mar 4, 2020
This change adds a new layer, layer_norm_lstm, that applies
layer normalization to the input of an LSTM cell. In future changes,
this implementation will apply layer normalization to the recurrent
connection and the output as well.

Issue: #1
sharvil added a commit that referenced this issue Mar 4, 2020
sharvil added a commit that referenced this issue Mar 4, 2020
This class, LayerNormLSTMCell, is parameter-compatible with
LayerNormLTSM. It's implemented using the TF Python API so it can
run on CPUs in addition to other accelerators. Another advantage
is that LayerNormLSTMCell is an instance of RNNCell, which means
it's not fused over time and can be used in e.g. autoregressive
models.

Note that LayerNormLSTMCell is not intended for training. In
particular, the kernel / recurrent kernel / bias initializers are
not customizable and the defaults are not very good.

Issue: #1
sharvil added a commit that referenced this issue Mar 4, 2020
In particular, don't provide a bias term (beta) for the input and
recurrent layer norms since there's already a bias term applied
by the usual definition of an LSTM cell.

Also, rename the layer norm scaling term from alpha to gamma to be
consistent with the literature.

Issue: #1
@sharvil
Copy link
Contributor

sharvil commented Mar 4, 2020

Here's what the haste.LayerNormLSTM implementation looks like:



This implementation is nearly identical to eqs. 20–22 of the layer norm paper. The differences are:

  1. we don't apply a bias term to layer norms on the input or recurrent connection; these parameters are unnecessary since there's already a bias term (... + b) applied by the LSTM
  2. we use instead of to denote the gain parameter (notation change)
  3. we initialize to 1 and to 0 instead of the other way around (seems like a typo in the paper)

I haven't gotten around to updating the docs yet but haste.LSTM can just be replaced with haste.LayerNormLSTM. Zoneout, DropConnect, etc. are all supported in LayerNormLSTM as well.

@sharvil sharvil closed this as completed Mar 9, 2020
@usamec
Copy link
Author

usamec commented Mar 9, 2020

Nice! Having GRU would be also great, but we can probably manage with LSTMs :)

@sharvil
Copy link
Contributor

sharvil commented Mar 9, 2020

Our LSTM implementation is much further ahead than the GRU one so we started with LSTMs first. When we do the GRU updates, we'll keep LayerNorm in mind. Thanks for the feature request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants