Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KL-8 and Config File Settings #21

Open
20KMJ opened this issue Nov 29, 2024 · 1 comment
Open

KL-8 and Config File Settings #21

20KMJ opened this issue Nov 29, 2024 · 1 comment

Comments

@20KMJ
Copy link

20KMJ commented Nov 29, 2024

I hope this message finds you well. I am currently working with the model described in your paper, and I have a question regarding the configuration provided in the accompanying code.

In the paper, you mention using KL-8 for the autoencoder, but I noticed that in the provided config file, the embed_dim for the AutoencoderKL is set to 4. Could you kindly clarify why this discrepancy exists? Is the model actually using KL-4, or is there a different reason for this configuration?

I would greatly appreciate your insights on this matter.

Thank you for your time and help!

@lin-tianyu
Copy link
Owner

Hi,

I would recommend you read the original paper of Stable Diffusion (section 3.1 and appendix) and the SDSeg paper in section 3.2 to understand more about KL-8.

In short, the "8" in KL-8 means downsampling rate, meaning that after the autoencoder's compression, the image resolution $256\times 256$ would be downsampled by a factor of 8, result in $32\times 32$.

As for the embed_dim you mentioned, it is the number of channel of the latent representation. This has nothing to do with the downsampling rate 8.

Hope this helps,
Tianyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants