[1.1.202 Inpaint] Improvement: Everything Related to Adobe Firefly Generative Fill #1464
Replies: 43 comments 125 replies
-
Been playing around with this a bit. One thing I've noticed is using another CN unit, supplying the same image with the reference preprocessor, seems to produce more natural looking results (this could be up to taste though). I've done this with several other images and results stay true, but here's using the tori example as in the OP, using the same simple prompt and parameters. Also that's a great point about Adobe's method likely involving cascading. Making sure SD has attention to the whole image at a sane resolution the model was trained on, and then upscaling to a more pleasing resolution at a low guided denoise certainly seems akin to what they're doing. They have the luxury of GigaGAN at their disposal which probably is more functional then our current best solution using hires fix/2nd img2img pass. Hopefully more people can make use of this. I would assume the average user is not going to be well aware of how doing this in txt2img is advantageous, let alone you can even do inpainting in the txt2img tab. The img2img tab certainly could use some improvements to better allow for this type of workflow. That being said I think something like this is going to be much more powerful if maintainers of plugins for various apps utilizing the A1111 API manage to integrate this well -- just as Adobe did for Photoshop. |
Beta Was this translation helpful? Give feedback.
-
This works very nicely for outpainting as well. You can take an image and add some transparancy to the sides of the image in a photoediting app, and then mask the transparent area in the ControlNet. This is using Inpaint_Only along with Reference_only with the original image as someone suggested above to keep the same sort of style. Prompt was just "Japanese Castle" Here is a quick Anime Example as well: |
Beta Was this translation helpful? Give feedback.
-
Is there an easy way to have inpainting work well with batch? Either by reusing the mask or providing a directory for masks? When I use the batch tab currently it just ignored the mask I scribbled. |
Beta Was this translation helpful? Give feedback.
-
does it work in img2img and inpaint only masked ? cant make it work |
Beta Was this translation helpful? Give feedback.
-
ok in img2img only masked i cant remove a person even with denoise1 but when i click whole image then it works but oviously resolution is not original... any remedy ? |
Beta Was this translation helpful? Give feedback.
-
When I try to use the same RealisticVision inpainting model with the txt2img controlnet inpaint method, I keep getting an error that says that tensor a needs to match the size of tensor b. This only happens when I use the inpainting model. When I use the regular realisitcvision model it works fine, except its not as high quality as it would be with the inpainting model. Does anyone know why this is happening? |
Beta Was this translation helpful? Give feedback.
-
How do you use inpaint on txt2img? I've painted sections of an image black and used it as the input, but the output still remains black. |
Beta Was this translation helpful? Give feedback.
-
anything we can learn from UnpromptedControl? The idea is quite similar to what you explain, it can fill up the empty space or remove object. I've tried it in A1111 and it is functional even without any prompt, the quality is not decent though. |
Beta Was this translation helpful? Give feedback.
-
I've been using openoutpaint to do this for a long time. It's released under an MIT license. zero01101/openOutpaint#227 This is actually what I talked about with the researchers at adobe during the consultation interview about AI tools they had me do lol. Well, one of the things I talked about. |
Beta Was this translation helpful? Give feedback.
-
@lllyasviel any advice on how to set the denoising strength. You kept 0.25 in your settings, is there a specific reason for it? |
Beta Was this translation helpful? Give feedback.
-
I have a collection of images that were generated using SD, but they have watermark or logo-like text at the bottom. Is it possible to perform a batch process on the folder to find and remove this text, and then use this inpainting method to perfectly fill in the area? |
Beta Was this translation helpful? Give feedback.
-
Can't "normal" inpainting models do this out of the box? One just has to use RunwayML's inpainting model and leave the prompt empty. |
Beta Was this translation helpful? Give feedback.
-
Had a go with new update and an empty prompt to compare with PSBeta Generative Fill for painted content. For ControlNet I used DeliberateV2 model, LMS sampler @ 50steps, CFG 7-12 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
1.1.209: Fixed a bug that images not distorted in preview but distorted in saved folder |
Beta Was this translation helpful? Give feedback.
-
@lllyasviel I’m not sure yet. I feel like this retains the content I am inpainting much better. Sometimes I am blown away by how close it gets. I wish I could get the detail and clarity I get out of inpaint only masked half the time too though. |
Beta Was this translation helpful? Give feedback.
-
Suggestion, and knowing you guys I'll bet you already have something like this in the works... Clearly, this new inpainting technique is a very powerful feature, and so it would be amazing if it could have the other relevant inpainting settings implemented (adjustable mask blur, inpaint or outpaint selection, upload mask image, etc). For instance, the Segment Anything extension is such a godsend for normal inpainting since you can swiftly make complicated mask selections, expand the selection, and use it as an image mask. It would be super amazing to have this for Controlnet inpainting. From my testing it also seems like there is a fixed Mask Blur, so it takes some imagination to draw the ideal mask without cutting it too short. My thought: its time for Controlnet to have a dedicated tab for Inpainting - which could be enabled or disabled in Settings tab (Enabled by default?) Then it could have simply one toggle for which inpainting preprocessor to use. I mean, is there any use case for multiple inpainting controlnets to be active? If not then I think having one dedicated tab would be logical. This could also prune out Inpainting preprocessor/model from the other controlnet tabs. |
Beta Was this translation helpful? Give feedback.
-
Is there a size limit on how far outpainting is possible before SD starts to hallucinate? 🤔 |
Beta Was this translation helpful? Give feedback.
-
Can someone help me please? I have no idea how to install this.. "control_v11p_sd15_inpaint" Nothing seems to work and i cant find any help anywhere :/ |
Beta Was this translation helpful? Give feedback.
-
it works great! but sometimes i see a lot of wird artefacts (glitches) on output images in unmasked areas. different samplers and settings |
Beta Was this translation helpful? Give feedback.
-
1.1.213: Fix weird artifacts (glitches) on output images in unmasked areas |
Beta Was this translation helpful? Give feedback.
-
Update: Some users reported that they cannot get good enough results. The reason is that they are skipping steps of this post and not using correct resolution. The main point of this post is that using multiple generating passes produce results better than before. The options are: If you use high-res fix (recommended), the correct setting is to use a base resolution at about 512, and then use scale factor at 2.0 if your wanted resolution is 1024 (perhaps use resrgtan+0.25 denoising strength). If you do not like high-res fix, the correct setting is to use a base resolution at about 512, inpaint, and send result to extras to use another upscaler to achieve your wanted resolution, then use a 3rd party software to blend the mask and original image again, send back to img2img and inpaint again with very low denoising strength. However, this method is very complicated and not very flexiable. If you use base resolution at 1024, the result is usually not very satisfying. This is because Stable Diffusion is trained on a lower resolution. |
Beta Was this translation helpful? Give feedback.
-
I have been playing with it the past few days. I change my mind this is superior to just inpainting. |
Beta Was this translation helpful? Give feedback.
-
Hi all, we put some comparisons to inpaint variation models here |
Beta Was this translation helpful? Give feedback.
-
Hello. Having some issues with ControlNet's inpainting. Masking an area often doesn't do anything the prompt told it to, but leaves behind an overlay of the mask on the output. What is the cause of this? I should also mention this overlay appears when the generation is complete and not during the generating preview. |
Beta Was this translation helpful? Give feedback.
-
Thank you for this awesome update to ControlNet, the results I'm getting are great. Would it be possible to make it so that, when inpainting is selected in a controlnet unit, any transparent part of the image gets masked? This behavior would be the same as it is in the regular inpaiting tab. It's more difficult to reproduce results when there doesn't seem to be a way to recreate the same mask. If there is already a way to do so then please inform me. Customizable mask blur, like others have mentioned, might also be quite useful, but the settings are under the hood do produce seamless results the majority of the time, so not sure if it would actually allow for any improvement. |
Beta Was this translation helpful? Give feedback.
-
When I use the inpaint_only preprocessor for image restoration, I found that it changes the pixels of my original image. Is this within the expected range? here is my options: Chinese Garden |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, more progress here |
Beta Was this translation helpful? Give feedback.
-
The short story is that ControlNet WebUI Extension has completed several improvements/features of Inpaint in 1.1.202, making it possible to achieve inpaint effects similar to Adobe Firefly Generative Fill using only open-source models/codes.
Adobe Firefly Generative Fill
This weekend someone told me that you do not really need an Adobe Subscription to use Firefly, and the popular Generative Fill can be used from their website (even without an Adobe account)!
After learning about this, I tested that Firefly Generative Fill with some test images used during the development of ControlNet. The performance of that model is super impressive and the technical architecture is more user-friendly than Stable Diffusion toolsets.
Overall, the behaviors of Adobe Firefly Generative Fill are:
For example, we test this image (1280x1024, note that this is a real photo)
And I put this in Firefly Generative Fill
(Note that I do not input any prompts. The input text area is blank.)
And these are some random non-cherry-picked results (very impressive):
All results are very impressive, and this one is the best result I can get (in my personal opinion)
Is it possible for A1111?
Before ControlNet 1.1.202, the answer is no.
After ControlNet 1.1.202, the answer is somewhat yes.
We all know that most SD models are terrible when we do not input prompts. Making a user-friendly pipeline with prompt-free inpainting (like FireFly) in SD can be difficult.
For example, this is a simple test without prompts:
No prompt
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 12345, Size: 1280x1024, Model hash: c0d1994c73, Model: realisticVisionV20_v20, Denoising strength: 1, ENSD: 31337, Mask blur: 4, Version: v1.3.0
(realisticVisionV20, non-cherry-picked, seed 12345)
We can see that it is clearly unusable.
One possible way is to use an inpaint variation model:
(realisticVisionV20_v20-inpainting (6482f11700), non-cherry-picked, seed 12345)
We can see that results are still unusable.
Another method is to use ControlNet 1.1 Inpaint:
(realisticVisionV20+control_v11p_sd15_inpaint, non-cherry-picked, seed 12345)
We can see that the difference is minimal.
One may argue that, we do not really need to follow the rule - if we want a system without always asking users to give prompts, we can secretly generate prompts and automatically feed it to the model without letting users knowing about it. It is possible to even use a pre-defined negative prompt.
But if you try it, you will soon realize that this is a terrible idea.
For example, the prompt generated by “Interrogate CLIP” for this image is
a red tori, a small, Douglas Robertson Bisset, japan, a digital rendering, cloisonnism
I do not know what is “a red tori” but it seems related to a house. And I add a general negative prompt:
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
The results become
(realisticVisionV20_v20-inpainting (6482f11700), non-cherry-picked, seed 12345)
We can see that the result becomes kind of boring and always generate that house-like object.
Improvements in CN 1.1.202
Inspired by these, we completed some features in CN to make similar effects in SD.
Allow inpaint in txt2img. This is necessary because txt2img has high-res fix. It is very likely that if you want to achieve high quality inpaint similar to Firefly, you need a native multiple-stage method. Now you can use txt2img high-res fix. as a cascaded inpaint pipeline. Note that the preprocessor “inpaint_only” does not change unmasked area.
Allow image-based guidance in inpaint. We know that CN has a control mode that allow you to put ControlNet on the conditional side of CFG scale, and in this way, the image-based guidance can act like a prompt-based guidance since they all use cfg-scale. This facilitates prompt-free inpaint.
For example, my setting is:
No prompt (!), just like Firefly, this is extremely challenging for a SD method.
(This is perhaps also a showing-off of ControlNet's image content understanding capability.)
I prefer DDIM for real photo inpaint, and using high-res fix is very important (make sure that your base diffusion is at scale about 512), also we recommend to use a relatively low cfg-scale when using this method (<5).
In ControlNet, make sure to select Control Mode “ControlNet is more important” to put CN on conditional side of cfg-scale:
After you set up these options, your SD will become a system that behaves similar to Firefly Generative Fill. You do not need to input any prompts and can just enjoy the high-quality inpaint (with any base model!).
Non-cherry-picked batch, seed 12345, No prompt:
Non-cherry-picked batch, seed 1593190232, No prompt:
Non-cherry-picked batch, seed 2467049182, No prompt:
(To reproduce, you need these model/vae:)
Ending
Note that these results are clearly not same as Adobe Firefly, but their behaviors are similar: they (probably) both use cascaded inpaint pipeline, and they both use image content (not only prompt) to guide the inpaint.
And of course, you can input prompts in this method. For example, I give a short prompt (and actually Firefly also supports very short prompts)
And the result is
Just enjoy the high-quality inpaint in 1.1.202!
Update (0603):
Some users reported that they cannot get good enough results. The reason is that they are skipping steps of this post and not using correct resolution. A key point of this post is that using multiple generating passes produce results better than before.
The options are:
If you use high-res fix (recommended), the correct setting is to use a base resolution at about 512, and then use scale factor at 2.0 if your wanted resolution is 1024 (perhaps use resrgtan+0.25 denoising strength).
If you do not like high-res fix, the correct setting is to use a base resolution at about 512, inpaint, and send result to extras to use another upscaler to achieve your wanted resolution, then use a 3rd party software to blend the mask and original image again, send back to img2img and inpaint again with very low denoising strength. However, this method is very complicated and not very flexiable.
If you use base resolution at 1024, the result is usually not very satisfying (as discussed above). This is because Stable Diffusion is trained on a resolution about 512.
Beta Was this translation helpful? Give feedback.
All reactions