You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Ezra521 As you said, the implementation of the formula of the normalization module is different from the formula in paper. So, I have implemented the AdaNorm algorithm as in below.
def adanorm(inputs, epsilon=1e-8, scope="adanorm"):
with tf.variable_scope(scope):
mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
k = 1 / 10
y = (inputs - mean) / tf.sqrt(variance + epsilon)
term = k * y
outputs = (inputs - inputs * term) * y
return outputs
normalization module这个模块的公式的实现和你的论文中的公式有点出入,我感觉。比如你代码中是
这一块代码你check一下和论文中这个公式有点出入的。论文是《Understanding and Improving Layer Normalization》
The text was updated successfully, but these errors were encountered: