Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Penalty Term Frobenius Norm Squared #4

Open
Shuailong opened this issue Sep 7, 2017 · 6 comments
Open

Penalty Term Frobenius Norm Squared #4

Shuailong opened this issue Sep 7, 2017 · 6 comments

Comments

@Shuailong
Copy link

Shuailong commented Sep 7, 2017

def Frobenius(mat):
    size = mat.size()
    if len(size) == 3:  # batched matrix
        ret = (torch.sum(torch.sum((mat ** 2), 1), 2).squeeze() + 1e-10) ** 0.5
        return torch.sum(ret) / size[0]
    else:
        raise Exception('matrix for computing Frobenius norm should be with 3 dims')

In the code above, the Frobenius Form of the Matrix is calculated as ret, and averaged over batch dimension. However, in the original paper, the norm is squared as the penalty term. Is it intended? Or It does not matter too much I wonder. Thanks!

@hantek
Copy link

hantek commented Sep 7, 2017

The Frobenius norm has a sqrt() operation, which is not necessary if we are optimizing it. The difference is just a matter of speed I think.

@andreasvc
Copy link

Another issue I run into with this code is that the first sum operation reduces the number of dimensions to 2, but the outer sum is then over the no-longer-existing dimension 2. So either the dimensions should be reversed:

ret = (torch.sum(torch.sum((mat ** 2), 2), 1).squeeze() + 1e-10) ** 0.5

or perhaps keepdim=True should be passed to sum?

Also, are you saying the sqrt can be removed as an optimization?

@hantek
Copy link

hantek commented Sep 21, 2017

Sorry for the late reply.

You are right, the above code in the first post should raise a dimension mismatch error.

Yes. I think the sqrt could be removed to reduce computations, but I haven't compared that.

@jx00109
Copy link

jx00109 commented Oct 27, 2017

In the APPENDIX of your paper, you have mentioned a method called batcheddot, can you show me the detail about batcheddot.
When you compute the relation(Fr), you have done an element-wise product of Fh and Fp, why not using Mh and Mp directly? Can you show me the shape of each tensor Mh Mp Fh Fp?
Looking forward to your reply: )

@hantek
Copy link

hantek commented Oct 30, 2017

The "batched_dot" is just the batched_dot() function in Theano.

$M_h$ and $M_p$ are of shape (u, r);
$F_h$ and $F_p$ are of shape (h, r); where h is the number of hidden states in the $W_{fp}$ matrix.

Please refer to this part if you want to look into implementation details: https://github.com/hantek/SelfAttentiveSentEmbed/blob/master/util_layers.py#L353-L356

For the reason why not directly multiply $M_h$ and $M_p$, it is because we want the hidden state $F_r$ to represent the relation between the two given sentences. The "gated Encoder" part is inspired by a model in vision: https://www.iro.umontreal.ca/~memisevr/pubs/pami_relational.pdf , and corresponds to the "factored gated autoencoder" in that paper. In short, $W_{fh}$ and $W_{fp}$ are necessary transformations for $F_r$ to be only related to the relative relation of the two embeddings.

@manuelsh
Copy link

manuelsh commented Nov 18, 2018

May I recommend:

def Frobenius(mat):
    assert len( mat.shape )==3, 'matrix for computing Frobenius norm should be with 3 dims'
    return torch.sum( (torch.sum(torch.sum((mat ** 2), 2), 1) ) ** 0.5 )/mat.shape[0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants