Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss rises during training。 #1

Open
zhiyuanjiang opened this issue Jun 5, 2021 · 5 comments
Open

Loss rises during training。 #1

zhiyuanjiang opened this issue Jun 5, 2021 · 5 comments

Comments

@zhiyuanjiang
Copy link

when i run your model, loss and acc are keep increasing, just like the picture below.
image

the model users the gradient descent method to optimize, but the loss is increased. I know that in the KL
loss, the Q distribution is approached to the P distribution, and both distributions are changed during training. So i want to know whether it is normal for loss and acc to increase during training, and why?

@Tiger101010
Copy link
Owner

In my opinion, a potential reason is that the reconstruction loss has already converged in the pretrain stage, and then the joint learning stage refines the embedding learning to benefit clustring, however, which may not be beneficial to the reconstrustion job.

As the authors mentioned in their paper, you can do some visualization to illustrate this process. Here are some embedding visualization below. The first one is from pretrain stage, and the second one is from joint learning stage.
30
46-1

@zhiyuanjiang
Copy link
Author

Thanks for your answer, but clustering loss is also increased.

@zhiyuanjiang
Copy link
Author

Hi, I still have a problem. In the code below:

        attn_for_self = torch.mm(h, self.a_self)  # (N,1)
        attn_for_neighs = torch.mm(h, self.a_neighs)  # (N,1)
        attn_dense = attn_for_self + torch.transpose(attn_for_neighs, 0, 1)  # 妙啊
        attn_dense = torch.mul(attn_dense, M)
        attn_dense = self.leakyrelu(attn_dense)  # (N,N)

        zero_vec = -9e15 * torch.ones_like(adj)
        adj = torch.where(adj > 0, attn_dense, zero_vec)
        attention = F.softmax(adj, dim=1)

according to the paper, we need to use generalized neighbors of node, so in code adj = torch.where(adj > 0, attn_dense, zero_vec), why use adj, instead of attn_dense. just like:

adj = torch.where(attn_dense > 0, attn_dense, zero_vec)

@Tiger101010
Copy link
Owner

attn_dense contains all conbination, but we only need those that acctually have edges. You can check comments in this file of the repo for detail.

@zhiyuanjiang
Copy link
Author

Thanks for your answer. The paper proposes to exploit high-order neighbors. but adj is one-hop neighboring nodes, and attn_dense is high-hop neighboring nodes, isn't attn_dense more accurate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants