cross-attention maps are not robust #9

jinxixiang · 2023-04-25T12:42:56Z

Thank you for sharing such an interesting idea.

But I think one drawback of the demo, if I could be wrong, is the cross-attention maps are not robust. The results could be corrupted.

jinxixiang · 2023-04-26T03:00:04Z

songweige · 2023-04-26T18:57:41Z

Hi @jinxixiang, thank you for your interest and for trying out our demo!

You are right that the token maps are sometimes not quite stable and accurate. We have been working on improving this and had some progress. Here is the example that changes the color of the hair.

jinxixiang · 2023-04-27T00:54:22Z

@songweige Woo, this improved result is great! What did you modify with the cross-attention map? It seems to be refined.

I think one walkaround of this problem is utilizing the ability of Gounding SAM by splitting the denoising process into two stages. The first stage is to get the region mask, and the second stage is to conduct region-based diffusion. We are working on it.

songweige · 2023-04-27T02:34:53Z

We made a few changes. One major change was to use self-attention maps to compute the segmentation. Here are more examples. Hopefully we will update the demo by the end of this week.

Using SAM is a natural and nice idea. Please do let me know once you have some results or need any help from me.

songweige closed this as completed Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cross-attention maps are not robust #9

cross-attention maps are not robust #9

jinxixiang commented Apr 25, 2023

jinxixiang commented Apr 26, 2023

songweige commented Apr 26, 2023

jinxixiang commented Apr 27, 2023

songweige commented Apr 27, 2023

cross-attention maps are not robust #9

cross-attention maps are not robust #9

Comments

jinxixiang commented Apr 25, 2023

jinxixiang commented Apr 26, 2023

songweige commented Apr 26, 2023

jinxixiang commented Apr 27, 2023

songweige commented Apr 27, 2023