Loss rises during training。 #1

zhiyuanjiang · 2021-06-05T12:06:57Z

when i run your model, loss and acc are keep increasing, just like the picture below.

the model users the gradient descent method to optimize, but the loss is increased. I know that in the KL
loss, the Q distribution is approached to the P distribution, and both distributions are changed during training. So i want to know whether it is normal for loss and acc to increase during training, and why?

Tiger101010 · 2021-06-06T10:04:12Z

In my opinion, a potential reason is that the reconstruction loss has already converged in the pretrain stage, and then the joint learning stage refines the embedding learning to benefit clustring, however, which may not be beneficial to the reconstrustion job.

As the authors mentioned in their paper, you can do some visualization to illustrate this process. Here are some embedding visualization below. The first one is from pretrain stage, and the second one is from joint learning stage.

zhiyuanjiang · 2021-06-08T08:51:26Z

Thanks for your answer, but clustering loss is also increased.

zhiyuanjiang · 2021-06-09T03:55:08Z

Hi, I still have a problem. In the code below:

        attn_for_self = torch.mm(h, self.a_self)  # (N,1)
        attn_for_neighs = torch.mm(h, self.a_neighs)  # (N,1)
        attn_dense = attn_for_self + torch.transpose(attn_for_neighs, 0, 1)  # 妙啊
        attn_dense = torch.mul(attn_dense, M)
        attn_dense = self.leakyrelu(attn_dense)  # (N,N)

        zero_vec = -9e15 * torch.ones_like(adj)
        adj = torch.where(adj > 0, attn_dense, zero_vec)
        attention = F.softmax(adj, dim=1)

according to the paper, we need to use generalized neighbors of node, so in code adj = torch.where(adj > 0, attn_dense, zero_vec), why use adj, instead of attn_dense. just like:

adj = torch.where(attn_dense > 0, attn_dense, zero_vec)

Tiger101010 · 2021-06-09T04:22:49Z

attn_dense contains all conbination, but we only need those that acctually have edges. You can check comments in this file of the repo for detail.

zhiyuanjiang · 2021-06-09T07:22:21Z

Thanks for your answer. The paper proposes to exploit high-order neighbors. but adj is one-hop neighboring nodes, and attn_dense is high-hop neighboring nodes, isn't attn_dense more accurate?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss rises during training。 #1

Loss rises during training。 #1

zhiyuanjiang commented Jun 5, 2021

Tiger101010 commented Jun 6, 2021

zhiyuanjiang commented Jun 8, 2021

zhiyuanjiang commented Jun 9, 2021

Tiger101010 commented Jun 9, 2021

zhiyuanjiang commented Jun 9, 2021

Loss rises during training。 #1

Loss rises during training。 #1

Comments

zhiyuanjiang commented Jun 5, 2021

Tiger101010 commented Jun 6, 2021

zhiyuanjiang commented Jun 8, 2021

zhiyuanjiang commented Jun 9, 2021

Tiger101010 commented Jun 9, 2021

zhiyuanjiang commented Jun 9, 2021