Why does prior distribution have no encoder loss？ #6

HaopengZhang96 · 2019-10-21T14:32:48Z

the following code ：

term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y)).mean()
PRIOR = - (term_a + term_b) * self.gamma

"-(term_a + term_b)" is the loss of Discriminator, and “term_b” is the loss of encoder( similar as generator of gan )

In the code you only backward Discriminator's loss(part of prior distribution), and there is no backward of the loss that belongs to the encoder in the prior distribution.

loss.backward()  // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()
loss_optim.step()

I think it could be the following process

term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y.detach())).mean()  // y should be detach
PRIOR = - (term_a + term_b) * self.gamma
encoder_loss_for_p = term_b
.............

loss.backward()  // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()   //update the gradient from global+local but no prior
loss_optim.step()

encoder_loss_for_p.backward()   //optim the encoder for Adversarial
optim.step()

Is my understanding wrong？

The text was updated successfully, but these errors were encountered:

HaopengZhang96 · 2019-10-21T15:13:55Z

I just found out that someone asked the same question earlier.

DuaneNielsen · 2019-12-17T01:37:33Z

Yeah, this is why this is such a good technique. Unlike GAN it's not a minimax optimization.

Gradients are propagated directly through the loss function to the encoder network and they are optimized jointly.

The experimental setup is based on the idea that the mutual information between an image, and a randomly selected image should be zero.

This forces the encoder to learn a latent space where the encodings that share mutual information are close in distance, but those that don't are farther away.

tianlili1 · 2020-07-28T02:45:36Z

Excuse me, I met a problem when I use the mutual information, its loss value is negative at the beginning. Is this normal?

SchafferZhang · 2020-10-15T07:31:26Z

Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss?

HaopengZhang96 · 2020-10-15T07:36:35Z

Excuse me, I met a problem when I use the mutual information, its loss value is negative at the beginning. Is this normal?

not normal. I use the mutual Infomation is always positive.

HaopengZhang96 · 2020-10-15T07:41:06Z

Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss?

I read the code for the original paper,and I think I am right. The encoder and discriminator loss should be divide，like GAN

SchafferZhang · 2020-10-15T07:46:17Z

Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss?

I read the code for the original paper,and I think I am right. The encoder and decoder loss should be divide，like GAN

So, Did you reimplement the code in this repo or use the official code? How is it work?

HaopengZhang96 · 2020-10-15T07:57:02Z

Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss?

I read the code for the original paper,and I think I am right. The encoder and decoder loss should be divide，like GAN

So, Did you reimplement the code in this repo or use the official code? How is it work?

I follow the DIM‘s work and do some job on user behavior modeling. In my experiment，the local mutual information is have a good performance on Sequence modeling，when the downstream task is Classification.Actually，the prior Loss is not important if only focus on downstream task performance. Prior loss plays a role of normalization to some extent.

My paper is being submitted and I haven't sorted out the relevant code.

SchafferZhang · 2020-10-15T08:18:02Z

Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss?

I read the code for the original paper,and I think I am right. The encoder and decoder loss should be divide，like GAN

So, Did you reimplement the code in this repo or use the official code? How is it work?

I follow the DIM‘s work and do some job on user behavior modeling. In my experiment，the local mutual information is have a good performance on Sequence modeling，when the downstream task is Classification.Actually，the prior Loss is not important if only focus on downstream task performance. Prior loss plays a role of normalization to some extent.

My paper is being submitted and I haven't sorted out the relevant code.

Looking forward to your work!

DuaneNielsen · 2020-10-15T17:49:25Z

Just to put a pin in this one. I think the answer is quite clear from the paper.

All three terms are added. There is no double backward pass in Infomax.

As to the loss becoming negative. This can happen because the Pytorch F divergences can return "probabilities" greater than 1.0. This is due to the way F-divergences are calculated in practice. See this explanation of the formula pytorch/pytorch#7637 for the reason as to how log_prob can return a value greater than one.

yuu-Wang · 2025-01-15T14:08:43Z

the following code ：
term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y)).mean()
PRIOR = - (term_a + term_b) * self.gamma
"-(term_a + term_b)" is the loss of Discriminator, and “term_b” is the loss of encoder( similar as generator of gan )

In the code you only backward Discriminator's loss(part of prior distribution), and there is no backward of the loss that belongs to the encoder in the prior distribution.
loss.backward()  // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()
loss_optim.step()
I think it could be the following process
term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y.detach())).mean()  // y should be detach
PRIOR = - (term_a + term_b) * self.gamma
encoder_loss_for_p = term_b
.............

loss.backward()  // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()   //update the gradient from global+local but no prior
loss_optim.step()

encoder_loss_for_p.backward()   //optim the encoder for Adversarial
optim.step()
Is my understanding wrong？

I have changed the encoder loss as you said, but at the beginning, the encoder loss is -9, and then when it comes to epoch40-epoch133, the loss is about -34. Would you like to know how you changed it? What's wrong with me?
def forward(self, y, M, M_prime):

    # see appendix 1A of https://arxiv.org/pdf/1808.06670.pdf

    y_exp = y.unsqueeze(-1).unsqueeze(-1)
    y_exp = y_exp.expand(-1, -1, 26, 26)

    y_M = torch.cat((M, y_exp), dim=1)
    y_M_prime = torch.cat((M_prime, y_exp), dim=1)

    Ej = -F.softplus(-self.local_d(y_M)).mean()
    Em = F.softplus(self.local_d(y_M_prime)).mean()
    LOCAL = (Em - Ej) * self.beta

    Ej = -F.softplus(-self.global_d(y, M)).mean()
    Em = F.softplus(self.global_d(y, M_prime)).mean()
    GLOBAL = (Em - Ej) * self.alpha

    prior = torch.rand_like(y)

    epsilon = 1e-8
    # term_a = torch.log(self.prior_d(prior)).mean()
    # term_b = torch.log(1.0 - self.prior_d(y)).mean()
    term_a = torch.log(self.prior_d(prior) + epsilon).mean()
    term_b = torch.log(1.0 - self.prior_d(y) + epsilon).mean()
    PRIOR = - (term_a + term_b) * self.gamma

    discriminator_loss = LOCAL + GLOBAL + PRIOR
    encoder_loss = torch.log(self.prior_d(y.detach()) ).mean()

    return discriminator_loss, encoder_loss

encoder = Encoder().to(device)
loss_fn = DeepInfoMaxLoss().to(device)
encoder_optim = Adam(encoder.parameters(), lr=1e-4)
discriminator_optim = Adam(loss_fn.parameters(), lr=1e-4)

epoch_restart = 0
root = Path('/root/桌面/code/wangxy/DIM1/models/encoder/run3')

# if epoch_restart is not None and root is not None:
#     enc_file = root / Path('encoder' + str(epoch_restart) + '.wgt')
#     loss_file = root / Path('loss' + str(epoch_restart) + '.wgt')
#     encoder.load_state_dict(torch.load(str(enc_file)))
#     loss_fn.load_state_dict(torch.load(str(loss_file)))


for epoch in range(epoch_restart + 1, 1001):
    batch = tqdm(cifar_10_train_l, total=len(cifar_10_train_dt) // batch_size)
    dis_loss = []
    enc_loss = []

    for x, target in batch:
        x = x.to(device)

        y, M = encoder(x)
        # rotate images to create pairs for comparison
        M_prime = torch.cat((M[1:], M[0].unsqueeze(0)), dim=0)

        discriminator_loss, encoder_loss = loss_fn(y, M, M_prime)
        dis_loss.append(discriminator_loss.item())
        enc_loss.append(encoder_loss.item())

        batch.set_description(str(epoch) + ' dis_Loss: ' + str(stats.mean(dis_loss[-20:])))
        batch.set_description(str(epoch) + ' enc_Loss: ' + str(stats.mean(enc_loss[-20:])))

        discriminator_optim.zero_grad()
        discriminator_loss.backward()
        discriminator_optim.step()

        encoder_optim.zero_grad()
        encoder_loss.backward()
        encoder_optim.step()

yuu-Wang · 2025-01-16T02:48:10Z

Hi, @DuaneNielsen, have you ever check @HaopengZhang96's questions? Is it right that prior distribution does not need encoder loss?

I read the code for the original paper,and I think I am right. The encoder and decoder loss should be divide，like GAN

So, Did you reimplement the code in this repo or use the official code? How is it work?

Do you know how this code should be adjusted to reach 0.7 in his paper？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does prior distribution have no encoder loss？ #6

Why does prior distribution have no encoder loss？ #6

HaopengZhang96 commented Oct 21, 2019

HaopengZhang96 commented Oct 21, 2019

DuaneNielsen commented Dec 17, 2019

tianlili1 commented Jul 28, 2020

SchafferZhang commented Oct 15, 2020

HaopengZhang96 commented Oct 15, 2020

HaopengZhang96 commented Oct 15, 2020 •

edited

Loading

SchafferZhang commented Oct 15, 2020

HaopengZhang96 commented Oct 15, 2020

SchafferZhang commented Oct 15, 2020

DuaneNielsen commented Oct 15, 2020

yuu-Wang commented Jan 15, 2025

Is my understanding wrong？

yuu-Wang commented Jan 16, 2025

Why does prior distribution have no encoder loss？ #6

Why does prior distribution have no encoder loss？ #6

Comments

HaopengZhang96 commented Oct 21, 2019

Is my understanding wrong？

HaopengZhang96 commented Oct 21, 2019

DuaneNielsen commented Dec 17, 2019

tianlili1 commented Jul 28, 2020

SchafferZhang commented Oct 15, 2020

HaopengZhang96 commented Oct 15, 2020

HaopengZhang96 commented Oct 15, 2020 • edited Loading

SchafferZhang commented Oct 15, 2020

HaopengZhang96 commented Oct 15, 2020

SchafferZhang commented Oct 15, 2020

DuaneNielsen commented Oct 15, 2020

yuu-Wang commented Jan 15, 2025

Is my understanding wrong？

yuu-Wang commented Jan 16, 2025

HaopengZhang96 commented Oct 15, 2020 •

edited

Loading