KL-8 and Config File Settings #21

20KMJ · 2024-11-29T06:02:47Z

I hope this message finds you well. I am currently working with the model described in your paper, and I have a question regarding the configuration provided in the accompanying code.

In the paper, you mention using KL-8 for the autoencoder, but I noticed that in the provided config file, the embed_dim for the AutoencoderKL is set to 4. Could you kindly clarify why this discrepancy exists? Is the model actually using KL-4, or is there a different reason for this configuration?

I would greatly appreciate your insights on this matter.

Thank you for your time and help!

lin-tianyu · 2024-11-29T06:12:24Z

Hi,

I would recommend you read the original paper of Stable Diffusion (section 3.1 and appendix) and the SDSeg paper in section 3.2 to understand more about KL-8.

In short, the "8" in KL-8 means downsampling rate, meaning that after the autoencoder's compression, the image resolution $256\times 256$ would be downsampled by a factor of 8, result in $32\times 32$.

As for the embed_dim you mentioned, it is the number of channel of the latent representation. This has nothing to do with the downsampling rate 8.

Hope this helps,
Tianyu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KL-8 and Config File Settings #21

KL-8 and Config File Settings #21

20KMJ commented Nov 29, 2024

lin-tianyu commented Nov 29, 2024

KL-8 and Config File Settings #21

KL-8 and Config File Settings #21

Comments

20KMJ commented Nov 29, 2024

lin-tianyu commented Nov 29, 2024