You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope this message finds you well. I am currently working with the model described in your paper, and I have a question regarding the configuration provided in the accompanying code.
In the paper, you mention using KL-8 for the autoencoder, but I noticed that in the provided config file, the embed_dim for the AutoencoderKL is set to 4. Could you kindly clarify why this discrepancy exists? Is the model actually using KL-4, or is there a different reason for this configuration?
I would greatly appreciate your insights on this matter.
Thank you for your time and help!
The text was updated successfully, but these errors were encountered:
I would recommend you read the original paper of Stable Diffusion (section 3.1 and appendix) and the SDSeg paper in section 3.2 to understand more about KL-8.
In short, the "8" in KL-8 means downsampling rate, meaning that after the autoencoder's compression, the image resolution $256\times 256$ would be downsampled by a factor of 8, result in $32\times 32$.
As for the embed_dim you mentioned, it is the number of channel of the latent representation. This has nothing to do with the downsampling rate 8.
I hope this message finds you well. I am currently working with the model described in your paper, and I have a question regarding the configuration provided in the accompanying code.
In the paper, you mention using KL-8 for the autoencoder, but I noticed that in the provided config file, the embed_dim for the AutoencoderKL is set to 4. Could you kindly clarify why this discrepancy exists? Is the model actually using KL-4, or is there a different reason for this configuration?
I would greatly appreciate your insights on this matter.
Thank you for your time and help!
The text was updated successfully, but these errors were encountered: