Layer Normalization #11
Answered
by
Sengxian
conceptofmind
asked this question in
Q&A
-
Hi all, Do you use Layer Normalization with or without bias? From my understanding PaLM used bias less pre and post layer normalization. Additionally, do you apply DeepNorm to the feedforward, attention, and fused attention feedforward projection layers? Or just the feedforward and attention layers? Thank you, Enrico |
Beta Was this translation helpful? Give feedback.
Answered by
Sengxian
Aug 19, 2022
Replies: 1 comment 3 replies
-
Hello @conceptofmind, thank you for your attention!
|
Beta Was this translation helpful? Give feedback.
3 replies
Answer selected by
conceptofmind
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello @conceptofmind, thank you for your attention!
ffn
,v_proj
,out_proj
.