-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to align the dino feature and 512x512 image latents, the patch num is different #26
Comments
They upsample the images before feeding them to dinov2. It's mentioned in their openreview revision file. |
Can dinov2 process images with higher resolution? Is it possible to upsample after extracting features? |
hi, I'm not familiar with dino. I guess you can just resize the images before using dino to extract features, as suggested in the code. Line 50 in 80ee742
Also, we may need to interpolate the positional embeddings as suggested here: Line 86 in 80ee742
|
Hi, there
Great work. However, I am wondering if the diffusion model is trained on 512x512 or even larger picture size, how to align the projected feature with dino feature(which is 224x224->16*16), the the patch number is different.
So should I downsample the projected feature to calculate the project loss?
Best regard.
The text was updated successfully, but these errors were encountered: