You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello there - this is a really cool paper. I'd been trying to emulate the original paper's results for adding sunglasses to a cat using both the officially released embeddings, and each of the embedding generation methods in the repo. While I was able to get a working setup using a special embedding generation method I implemented myself, nothing I tried was able to get the existing pipeline to reliably complete the task in question.
My workflow:
Initialize editing pipeline with DDIM scheduler and standard SD1.4 weights
map cat_sd14 to cat-wearing-sunglasses_sd14 (or a generated pair of embeddings; mean difference in any case)
Run with default cross-attention guidance (tried other values; didn't improve the output)
Compare the generated (reconstructed) and modified images to each other
Given that I was able to generate embeddings for which it works, it can't be too far off.
My results with 'special' embeddings (proof that the workflow above can work, but my method for generating this embedding is not in line with what the paper describes):
My results with released embeddings (similar results using each of the released caption generation methods to generate embeddings):
Did I miss part of the paper? I get similar results from the official demo and the gradio app, which makes things especially tricky to diagnose. My best guess, given what I've seen, is something related to prompt engineering for the generated captions.
The text was updated successfully, but these errors were encountered:
Hello there - this is a really cool paper. I'd been trying to emulate the original paper's results for adding sunglasses to a cat using both the officially released embeddings, and each of the embedding generation methods in the repo. While I was able to get a working setup using a special embedding generation method I implemented myself, nothing I tried was able to get the existing pipeline to reliably complete the task in question.
My workflow:
Given that I was able to generate embeddings for which it works, it can't be too far off.
My results with 'special' embeddings (proof that the workflow above can work, but my method for generating this embedding is not in line with what the paper describes):
My results with released embeddings (similar results using each of the released caption generation methods to generate embeddings):
Did I miss part of the paper? I get similar results from the official demo and the gradio app, which makes things especially tricky to diagnose. My best guess, given what I've seen, is something related to prompt engineering for the generated captions.
The text was updated successfully, but these errors were encountered: