Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalization module 公式有问题 #3

Open
Ezra521 opened this issue Jun 4, 2022 · 1 comment
Open

normalization module 公式有问题 #3

Ezra521 opened this issue Jun 4, 2022 · 1 comment

Comments

@Ezra521
Copy link

Ezra521 commented Jun 4, 2022

normalization module这个模块的公式的实现和你的论文中的公式有点出入,我感觉。比如你代码中是

        elif self.mode == 'adanorm':
            mean = input.mean(-1, keepdim=True)
            std = input.std(-1, keepdim=True)
            input = input - mean
            mean = input.mean(-1, keepdim=True)
            graNorm = (1 / 10 * (input - mean) / (std + self.eps)).detach()
            input_norm = (input - input * graNorm) / (std + self.eps)
            return input_norm*self.adanorm_scale

这一块代码你check一下和论文中这个公式有点出入的。论文是《Understanding and Improving Layer Normalization》

image

@Ezra521 Ezra521 changed the title normalization module normalization module公式有问题 Jun 4, 2022
@Ezra521 Ezra521 changed the title normalization module公式有问题 normalization module 公式有问题 Jun 4, 2022
@alibugra
Copy link

alibugra commented Sep 2, 2022

@Ezra521 As you said, the implementation of the formula of the normalization module is different from the formula in paper. So, I have implemented the AdaNorm algorithm as in below.

def adanorm(inputs, epsilon=1e-8, scope="adanorm"):

    with tf.variable_scope(scope):

        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
        k = 1 / 10
        y = (inputs - mean) / tf.sqrt(variance + epsilon)
        term = k * y
        outputs = (inputs - inputs * term) * y

    return outputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants