Determine vae model convergence #18

Anonnoname · 2023-02-07T02:40:41Z

Hello! I'd like to ask how I can determine if my VAE model has converged. Which metrics or loss should I look at? When I'm training on the car dataset, as the KL weights increase, the latent points become more noisy, leading to a decrease in reconstruction quality. Is it possible that if I keep training the model, the reconstruction quality will continue to get worse? If so, how can I know when to stop training?

I used the default config. trainer.epochs set to 800.
step 25480

Anonnoname · 2023-02-09T02:52:36Z

Additionally, what is the method for determining if the diffusion model has converged or not? I noticed that the loss ceased to decrease in the early epochs, but the overall quality of the samples has continued to improve over time.

fradino · 2023-02-10T17:39:29Z

hello，have you ever encountered a situation where the loss becomes nan when training VAE

ZENGXH · 2023-02-20T04:07:27Z

@Anonnoname for the VAE training, it usually converged after the KL annealing stop. The criterion of a good VAE is that it can achieve a reasonably good reconstruction performance while the latent points look (slightly) smoother than the input points. In my experiment, the latent points will look like this at iter 144400:

I feel like your reconstruction is worse than expectation. And the latent points is over-smoothed. This is usually caused by a high KL loss weight. Are you using the default config?

ZENGXH · 2023-02-20T04:12:22Z

For the diffusion model, the loss tend to have high variance: it's hard to judge from the loss about the convergence. I usually 1) evaluate the checkpoint every 1000 epoch and determine from the evaluation metric and 2) visualize the results. My experience is that LION usually converge at around 10k iteration.

ZENGXH · 2023-02-20T04:13:05Z

@fradino for the NaN issue, could you start another issue and post your log & config so that I can help with that?

supriya-gdptl mentioned this issue Jun 14, 2023

NaN loss while training stage 1 VAE #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine vae model convergence #18

Determine vae model convergence #18

Anonnoname commented Feb 7, 2023

Anonnoname commented Feb 9, 2023

fradino commented Feb 10, 2023

ZENGXH commented Feb 20, 2023

ZENGXH commented Feb 20, 2023

ZENGXH commented Feb 20, 2023

Determine vae model convergence #18

Determine vae model convergence #18

Comments

Anonnoname commented Feb 7, 2023

Anonnoname commented Feb 9, 2023

fradino commented Feb 10, 2023

ZENGXH commented Feb 20, 2023

ZENGXH commented Feb 20, 2023

ZENGXH commented Feb 20, 2023