Layer normalization #1

usamec · 2020-02-24T22:25:12Z

It would be nice to support some form of layer normalization in LSTM and GRU layer (example https://github.com/pytorch/pytorch/blob/master/benchmarks/fastrnns/custom_lstms.py#L171)

sharvil · 2020-03-02T18:43:24Z

Hmm that's an interesting implementation. They're applying layer norm to in addition to . The supplementary material in Ba et al. (pp. 13–14) only applies layer norm to in both of their LSTM variants.

Do you know if there's any follow-up literature that explains the PyTorch variant?

usamec · 2020-03-03T14:47:13Z

@sharvil I do not about any. I personally think that any variant of GRU/LSTM with LayerNorm would be great addition.

This implementation is fairly straightforward. Little effort has been spent on performance optimization. Paper: https://arxiv.org/pdf/1607.06450.pdf Issue: #1

This change adds a new layer, layer_norm_lstm, that applies layer normalization to the input of an LSTM cell. In future changes, this implementation will apply layer normalization to the recurrent connection and the output as well. Issue: #1

Issue: #1

This class, LayerNormLSTMCell, is parameter-compatible with LayerNormLTSM. It's implemented using the TF Python API so it can run on CPUs in addition to other accelerators. Another advantage is that LayerNormLSTMCell is an instance of RNNCell, which means it's not fused over time and can be used in e.g. autoregressive models. Note that LayerNormLSTMCell is not intended for training. In particular, the kernel / recurrent kernel / bias initializers are not customizable and the defaults are not very good. Issue: #1

In particular, don't provide a bias term (beta) for the input and recurrent layer norms since there's already a bias term applied by the usual definition of an LSTM cell. Also, rename the layer norm scaling term from alpha to gamma to be consistent with the literature. Issue: #1

sharvil · 2020-03-04T21:07:26Z

Here's what the haste.LayerNormLSTM implementation looks like:

This implementation is nearly identical to eqs. 20–22 of the layer norm paper. The differences are:

we don't apply a bias term to layer norms on the input or recurrent connection; these parameters are unnecessary since there's already a bias term (... + b) applied by the LSTM
we use instead of to denote the gain parameter (notation change)
we initialize to 1 and to 0 instead of the other way around (seems like a typo in the paper)

I haven't gotten around to updating the docs yet but haste.LSTM can just be replaced with haste.LayerNormLSTM. Zoneout, DropConnect, etc. are all supported in LayerNormLSTM as well.

usamec · 2020-03-09T20:35:42Z

Nice! Having GRU would be also great, but we can probably manage with LSTMs :)

sharvil · 2020-03-09T20:43:38Z

Our LSTM implementation is much further ahead than the GRU one so we started with LSTMs first. When we do the GRU updates, we'll keep LayerNorm in mind. Thanks for the feature request!

sharvil added a commit that referenced this issue Mar 4, 2020

Add a fused Layer Normalization implementation to Haste.

e10ea61

This implementation is fairly straightforward. Little effort has been spent on performance optimization. Paper: https://arxiv.org/pdf/1607.06450.pdf Issue: #1

sharvil added a commit that referenced this issue Mar 4, 2020

Apply layer norm to recurrent connection of Layer Norm LSTM.

6b4e1eb

Issue: #1

sharvil added a commit that referenced this issue Mar 4, 2020

Apply layer norm to output of Layer Norm LSTM.

992aa1c

Issue: #1

sharvil closed this as completed Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer normalization #1

Layer normalization #1

usamec commented Feb 24, 2020

sharvil commented Mar 2, 2020

usamec commented Mar 3, 2020

sharvil commented Mar 4, 2020

usamec commented Mar 9, 2020

sharvil commented Mar 9, 2020

Layer normalization #1

Layer normalization #1

Comments

usamec commented Feb 24, 2020

sharvil commented Mar 2, 2020

usamec commented Mar 3, 2020

sharvil commented Mar 4, 2020

usamec commented Mar 9, 2020

sharvil commented Mar 9, 2020