Penalty Term Frobenius Norm Squared #4

Shuailong · 2017-09-07T13:08:53Z

def Frobenius(mat):
    size = mat.size()
    if len(size) == 3:  # batched matrix
        ret = (torch.sum(torch.sum((mat ** 2), 1), 2).squeeze() + 1e-10) ** 0.5
        return torch.sum(ret) / size[0]
    else:
        raise Exception('matrix for computing Frobenius norm should be with 3 dims')

In the code above, the Frobenius Form of the Matrix is calculated as ret, and averaged over batch dimension. However, in the original paper, the norm is squared as the penalty term. Is it intended? Or It does not matter too much I wonder. Thanks!

The text was updated successfully, but these errors were encountered:

hantek · 2017-09-07T18:38:40Z

The Frobenius norm has a sqrt() operation, which is not necessary if we are optimizing it. The difference is just a matter of speed I think.

andreasvc · 2017-09-08T13:29:16Z

Another issue I run into with this code is that the first sum operation reduces the number of dimensions to 2, but the outer sum is then over the no-longer-existing dimension 2. So either the dimensions should be reversed:

ret = (torch.sum(torch.sum((mat ** 2), 2), 1).squeeze() + 1e-10) ** 0.5

or perhaps keepdim=True should be passed to sum?

Also, are you saying the sqrt can be removed as an optimization?

hantek · 2017-09-21T04:04:40Z

Sorry for the late reply.

You are right, the above code in the first post should raise a dimension mismatch error.

Yes. I think the sqrt could be removed to reduce computations, but I haven't compared that.

jx00109 · 2017-10-27T06:52:58Z

In the APPENDIX of your paper, you have mentioned a method called batcheddot, can you show me the detail about batcheddot.
When you compute the relation(Fr), you have done an element-wise product of Fh and Fp, why not using Mh and Mp directly? Can you show me the shape of each tensor Mh Mp Fh Fp?
Looking forward to your reply: )

hantek · 2017-10-30T21:02:04Z

The "batched_dot" is just the batched_dot() function in Theano.

$M_h$ and $M_p$ are of shape (u, r);
$F_h$ and $F_p$ are of shape (h, r); where h is the number of hidden states in the $W_{fp}$ matrix.

Please refer to this part if you want to look into implementation details: https://github.com/hantek/SelfAttentiveSentEmbed/blob/master/util_layers.py#L353-L356

For the reason why not directly multiply $M_h$ and $M_p$, it is because we want the hidden state $F_r$ to represent the relation between the two given sentences. The "gated Encoder" part is inspired by a model in vision: https://www.iro.umontreal.ca/~memisevr/pubs/pami_relational.pdf , and corresponds to the "factored gated autoencoder" in that paper. In short, $W_{fh}$ and $W_{fp}$ are necessary transformations for $F_r$ to be only related to the relative relation of the two embeddings.

manuelsh · 2018-11-18T21:30:07Z

May I recommend:

def Frobenius(mat):
    assert len( mat.shape )==3, 'matrix for computing Frobenius norm should be with 3 dims'
    return torch.sum( (torch.sum(torch.sum((mat ** 2), 2), 1) ) ** 0.5 )/mat.shape[0]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Penalty Term Frobenius Norm Squared #4

Penalty Term Frobenius Norm Squared #4

Shuailong commented Sep 7, 2017 •

edited

Loading

hantek commented Sep 7, 2017

andreasvc commented Sep 8, 2017

hantek commented Sep 21, 2017

jx00109 commented Oct 27, 2017 •

edited

Loading

hantek commented Oct 30, 2017

manuelsh commented Nov 18, 2018 •

edited

Loading

Penalty Term Frobenius Norm Squared #4

Penalty Term Frobenius Norm Squared #4

Comments

Shuailong commented Sep 7, 2017 • edited Loading

hantek commented Sep 7, 2017

andreasvc commented Sep 8, 2017

hantek commented Sep 21, 2017

jx00109 commented Oct 27, 2017 • edited Loading

hantek commented Oct 30, 2017

manuelsh commented Nov 18, 2018 • edited Loading

Shuailong commented Sep 7, 2017 •

edited

Loading

jx00109 commented Oct 27, 2017 •

edited

Loading

manuelsh commented Nov 18, 2018 •

edited

Loading