a bit curious, as claimed in the paper: In contrast, we use the pre-trained Stable Diffusion models, without additional training #29

XiudingCai · 2023-07-31T02:51:22Z

Thanks to the authors for their amazing work! But I'm a bit curious, as claimed in the paper: In contrast, we use the pre-trained Stable Diffusion models, without additional training. But I noticed that in the actual code, gradient optimization is still involved.

loss = 0.0
for name, module in self.unet.named_modules():
module_name = type(module).name
if module_name == "CrossAttention" and 'attn2' in name:
curr = module.attn_probs # size is num_channel,s*s,77
ref = d_ref_t2attn[t.item()][name].detach().to(device)
loss += ((curr - ref) ** 2).sum((1, 2)).mean(0)
loss.backward(retain_graph=False)
opt.step()

GaParmar · 2024-02-14T17:46:48Z

Hi @XiudingCai ,

Thank you for your interest in the paper!
We indeed compute the gradients during inference for our cross-attention guidance.
These gradients are not used for updating the model parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a bit curious, as claimed in the paper: In contrast, we use the pre-trained Stable Diffusion models, without additional training #29

a bit curious, as claimed in the paper: In contrast, we use the pre-trained Stable Diffusion models, without additional training #29

XiudingCai commented Jul 31, 2023

GaParmar commented Feb 14, 2024

a bit curious, as claimed in the paper: In contrast, we use the pre-trained Stable Diffusion models, without additional training #29

a bit curious, as claimed in the paper: In contrast, we use the pre-trained Stable Diffusion models, without additional training #29

Comments

XiudingCai commented Jul 31, 2023

GaParmar commented Feb 14, 2024