-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cross-attention maps are not robust #9
Comments
Hi @jinxixiang, thank you for your interest and for trying out our demo! You are right that the token maps are sometimes not quite stable and accurate. We have been working on improving this and had some progress. Here is the example that changes the color of the hair. |
@songweige Woo, this improved result is great! What did you modify with the cross-attention map? It seems to be refined. I think one walkaround of this problem is utilizing the ability of Gounding SAM by splitting the denoising process into two stages. The first stage is to get the region mask, and the second stage is to conduct region-based diffusion. We are working on it. |
Thank you for sharing such an interesting idea.
But I think one drawback of the demo, if I could be wrong, is the cross-attention maps are not robust. The results could be corrupted.
The text was updated successfully, but these errors were encountered: