How to align the dino feature and 512x512 image latents, the patch num is different #26

JacobKong · 2024-11-26T12:00:50Z

Hi, there

Great work. However, I am wondering if the diffusion model is trained on 512x512 or even larger picture size, how to align the projected feature with dino feature(which is 224x224->16*16), the the patch number is different.

So should I downsample the projected feature to calculate the project loss?

Best regard.

dengchcs · 2024-11-28T02:13:04Z

They upsample the images before feeding them to dinov2. It's mentioned in their openreview revision file.

PanXiebit · 2024-12-11T15:03:54Z

They upsample the images before feeding them to dinov2. It's mentioned in their openreview revision file.

Can dinov2 process images with higher resolution? Is it possible to upsample after extracting features?

dengchcs · 2024-12-12T07:42:37Z

They upsample the images before feeding them to dinov2. It's mentioned in their openreview revision file.

Can dinov2 process images with higher resolution? Is it possible to upsample after extracting features?

hi, I'm not familiar with dino. I guess you can just resize the images before using dino to extract features, as suggested in the code.

REPA/train.py

Line 50 in 80ee742

x = torch.nn.functional.interpolate(x, 224, mode='bicubic')

Also, we may need to interpolate the positional embeddings as suggested here:

REPA/utils.py

Line 86 in 80ee742

encoder.pos_embed.data = timm.layers.pos_embed.resample_abs_pos_embed(

JacobKong changed the title ~~How to align the dino feature and 512x512 image latents, the patch size is different~~ How to align the dino feature and 512x512 image latents, the patch num is different Nov 26, 2024

Luciennnnnnn mentioned this issue Jan 13, 2025

Which layer of DINOv2 do you align with? hustvl/LightningDiT#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to align the dino feature and 512x512 image latents, the patch num is different #26

How to align the dino feature and 512x512 image latents, the patch num is different #26

JacobKong commented Nov 26, 2024 •

edited

Loading

dengchcs commented Nov 28, 2024

PanXiebit commented Dec 11, 2024

dengchcs commented Dec 12, 2024

How to align the dino feature and 512x512 image latents, the patch num is different #26

How to align the dino feature and 512x512 image latents, the patch num is different #26

Comments

JacobKong commented Nov 26, 2024 • edited Loading

dengchcs commented Nov 28, 2024

PanXiebit commented Dec 11, 2024

dengchcs commented Dec 12, 2024

JacobKong commented Nov 26, 2024 •

edited

Loading