-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does prior distribution have no encoder loss? #6
Comments
I just found out that someone asked the same question earlier. |
Yeah, this is why this is such a good technique. Unlike GAN it's not a minimax optimization. Gradients are propagated directly through the loss function to the encoder network and they are optimized jointly. The experimental setup is based on the idea that the mutual information between an image, and a randomly selected image should be zero. This forces the encoder to learn a latent space where the encodings that share mutual information are close in distance, but those that don't are farther away. |
Excuse me, I met a problem when I use the mutual information, its loss value is negative at the beginning. Is this normal? |
Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss? |
not normal. I use the mutual Infomation is always positive. |
I read the code for the original paper,and I think I am right. The encoder and discriminator loss should be divide,like GAN |
So, Did you reimplement the code in this repo or use the official code? How is it work? |
I follow the DIM‘s work and do some job on user behavior modeling. In my experiment,the local mutual information is have a good performance on Sequence modeling,when the downstream task is Classification.Actually,the prior Loss is not important if only focus on downstream task performance. Prior loss plays a role of normalization to some extent. My paper is being submitted and I haven't sorted out the relevant code. |
Looking forward to your work! |
Just to put a pin in this one. I think the answer is quite clear from the paper. All three terms are added. There is no double backward pass in Infomax. As to the loss becoming negative. This can happen because the Pytorch F divergences can return "probabilities" greater than 1.0. This is due to the way F-divergences are calculated in practice. See this explanation of the formula pytorch/pytorch#7637 for the reason as to how log_prob can return a value greater than one. |
I have changed the encoder loss as you said, but at the beginning, the encoder loss is -9, and then when it comes to epoch40-epoch133, the loss is about -34. Would you like to know how you changed it? What's wrong with me?
encoder = Encoder().to(device)
|
Do you know how this code should be adjusted to reach 0.7 in his paper? |
the following code :
"-(term_a + term_b)" is the loss of Discriminator, and “term_b” is the loss of encoder( similar as generator of gan )
In the code you only backward Discriminator's loss(part of prior distribution), and there is no backward of the loss that belongs to the encoder in the prior distribution.
I think it could be the following process
Is my understanding wrong?
The text was updated successfully, but these errors were encountered: