Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAE Config Files #27

Open
JosephDiPalma opened this issue Dec 29, 2024 · 3 comments
Open

VAE Config Files #27

JosephDiPalma opened this issue Dec 29, 2024 · 3 comments

Comments

@JosephDiPalma
Copy link

I'm interested in fine-tuning a VAE and was wondering the config files you used for the final vq-f4 model in your paper?
I see the existing config files under the autoencoder directory, but they are missing some hyper-parameters like number of training epochs.
Please advise.

@srikarym
Copy link
Collaborator

srikarym commented Dec 30, 2024

The VAE used in our paper is a VQ-f4 VAE (VQGAN). The original LDM repo only supports KL VAE training, so we used the codebase of taming-transformers. We fine-tuned the VAE for 50k iterations on 3 GPUs, with bs=12 per GPU.

Diffusers library now supports training a VQ-VAE: https://github.com/huggingface/diffusers/tree/main/examples/vqgan

You might want to track image quality metrics like PSNR / SSIM during training.

@JosephDiPalma
Copy link
Author

What about the discriminator hyper-parameters and the codebook weight for the LPIPS loss?

@JosephDiPalma
Copy link
Author

On second thought, would it be possible to share the config file you used for this phase?
I have managed to make the code run, but not sure if my results are correct.
They look strange as the total loss is increasing and I'm not sure if one of my settings is bad.

I appreciate all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants