Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty matching paper results #26

Open
MatthewCWeston opened this issue May 4, 2023 · 0 comments
Open

Difficulty matching paper results #26

MatthewCWeston opened this issue May 4, 2023 · 0 comments

Comments

@MatthewCWeston
Copy link

MatthewCWeston commented May 4, 2023

Hello there - this is a really cool paper. I'd been trying to emulate the original paper's results for adding sunglasses to a cat using both the officially released embeddings, and each of the embedding generation methods in the repo. While I was able to get a working setup using a special embedding generation method I implemented myself, nothing I tried was able to get the existing pipeline to reliably complete the task in question.

My workflow:

  • Initialize editing pipeline with DDIM scheduler and standard SD1.4 weights
  • map cat_sd14 to cat-wearing-sunglasses_sd14 (or a generated pair of embeddings; mean difference in any case)
  • Run with default cross-attention guidance (tried other values; didn't improve the output)
  • Compare the generated (reconstructed) and modified images to each other

Given that I was able to generate embeddings for which it works, it can't be too far off.

My results with 'special' embeddings (proof that the workflow above can work, but my method for generating this embedding is not in line with what the paper describes):

download

My results with released embeddings (similar results using each of the released caption generation methods to generate embeddings):

download

Did I miss part of the paper? I get similar results from the official demo and the gradio app, which makes things especially tricky to diagnose. My best guess, given what I've seen, is something related to prompt engineering for the generated captions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant