Increasing loss #17

fradino · 2023-02-03T11:47:44Z

Hello,
I try to train the VAE, follow the step

but the loss is increasing

ZENGXH · 2023-02-03T20:01:20Z

Hi, this is expected since we 1) increase KL loss weight from 1e-7 to 0.5 throughout the training, i.e., the magnitude of the KL loss is increasing and 2) initialize the VAE as an identity mapping, i.e., it will have perfect reconstruction at the early iteration. As the KL weight increase, you will see the reconstruction loss getting higher as well. As a result, the loss curve will keep increasing until the KL weight reach 0.5.
This is the loss curve for my experiment train on car using the default hyper-parameter:

fradino · 2023-02-04T15:40:12Z

Thank you！It‘s helpful to me. Could you show me the loss curve of train diffusion prior?

fradino · 2023-02-06T06:34:00Z

And the loss becomes NAN after the step 3332

ZENGXH · 2023-02-20T04:30:38Z

this is my epoch loss:

Are you using the default config? Could you share some visualization (target, reconstruction, latent points) of the VAE training and some samples of the prior training?

fradino · 2023-02-20T04:38:59Z

I'm training VAE with the default config, and I find

x_0_pred becomes inf after training

ZENGXH · 2023-02-20T04:45:19Z

Thanks for the sharing. This looks wired. I didn't see this before. Could you try if reducing the learning rate by half can fix this issue or not?

fradino · 2023-02-20T06:08:25Z

Thanks for the sharing. This looks wired. I didn't see this before. Could you try if reducing the learning rate by half can fix this issue or not?

The only change I made was to change the BS from 32 to 16. I will try to reduce the learning rate by half. @ZENGXH

yuanzhen2020 · 2023-04-12T13:54:22Z

As you mentioned, the weight of KL loss will increase as the training progresses, and the reconstruction loss will also increase. I have a question about how to evaluate the performance of the trained VAE model or is there an indicator to evaluation throughout the training? Another question is: do you have some advises how to optimize this training parameters? @ZENGXH

ZENGXH · 2023-04-12T19:08:43Z

@yuanzhen2020 I usually look at the reconstructed point cloud and the latent points. A VAE that is well trained need to 1) has smooth latent points, the points will close to a Gaussian distribution and 2) maintain a good reconstruction (by checking both visualization and the reconstructed EMD and CD metric); we need to achieve a good trade off between 1) and 2).

In general vae training, another thing that may be helpful is to track the un-weighted KL + reconstruction loss, i.e., the ELBO value. The value should be decreasing through the training. I didn't track this since in LION the KL value is much larger than reconstruction loss: it will dominate too much in the ELBO.

Eventually, we care about the sample quality. So the ultimate way to verify whether a VAE is good enough or not is to train the prior and compare the sample metric on it. (but this is expansive).

In terms of training parameters, it seems tuning the dropout ratio, and the model size can make some difference in the performance.

ZENGXH mentioned this issue Feb 20, 2023

NAN Loss #22

Closed

ZENGXH mentioned this issue Apr 5, 2023

Multi GPU Training Problem #38

Closed

supriya-gdptl mentioned this issue Jun 14, 2023

NaN loss while training stage 1 VAE #47

Open

OswaldoBornemann mentioned this issue Aug 27, 2023

Why the reconstruction is so similar to the ground truth even in the early training stage in VAE? #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing loss #17

Increasing loss #17

fradino commented Feb 3, 2023

ZENGXH commented Feb 3, 2023

fradino commented Feb 4, 2023

fradino commented Feb 6, 2023

ZENGXH commented Feb 20, 2023

fradino commented Feb 20, 2023

ZENGXH commented Feb 20, 2023

fradino commented Feb 20, 2023

yuanzhen2020 commented Apr 12, 2023

ZENGXH commented Apr 12, 2023

Increasing loss #17

Increasing loss #17

Comments

fradino commented Feb 3, 2023

ZENGXH commented Feb 3, 2023

fradino commented Feb 4, 2023

fradino commented Feb 6, 2023

ZENGXH commented Feb 20, 2023

fradino commented Feb 20, 2023

ZENGXH commented Feb 20, 2023

fradino commented Feb 20, 2023

yuanzhen2020 commented Apr 12, 2023

ZENGXH commented Apr 12, 2023