ControlNet Pointwise - The pointless ControlNet, a baseline for inpainting #561
geroldmeisinger
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Download from Huggingface!
Based on my musings here #318 (reply in thread)
I have trained a ControlNet (
214244a32 drop=0.5 mp=fp16 lr=1e-5
) for 1.25 epochs by using a pointwise function to convert RGB to grayscale... which effectively makes it a pointless ControlNet 🤣I wanted to see how fast it converges on a simple linear-transformation. To emphasize again: it doesn't colorize grayscale images, it desaturates color images... which you might as well do in an image editor. It's the most ineffective way to make grayscale images. But it lets us evaluate the model very easily and we can peer into the inner workings of ControlNet a bit. And it's also a good baseline for inpainting assuming 0% masking and tells us which artefacts to expect in the unmasked area. I chose
drop=0.5
because I assumed the CN should pick up on "ignore the prompt"-task very fast, similar to the desaturation task, and it lets us compare the influence of prompts, and it keeps it comparable with inpainting. I don't think it would have converged faster without any prompts.Some interesting findings:
I generated evaluation images for
[lenna, dog2, house2, vermeer]
, with a good prompt, a default image prompt and no prompt/guess mode at every 3200a32 samples. Here are some interesting results forlenna
. Download ALL evaluation imagesLenna - good prompt
0000 no controlnet
4800 sudden change (all checkpoints prior to this basically looked the same)
6000 sufficiently phenomenal convergence
8400 last checkpoint
Lenna - image prompt only
0000 no controlnet
4700 last before sudden change (all checkpoints prior to this basically looked the same)
4800 sudden change
6000 (nothing special, just for comparision with "good prompt")
8400 last checkpoint
Lenna - no prompt
0000 no controlnet
4700 last before sudden change (all checkpoints prior to this basically looked the same)
4800 sudden change
6000 SPC from default ("artistic")
6700 freak checkpoint
8300 last checkpoint ("more realistic", I used 8300 here because 8400 is also an outlier)
I can make graphs too
Because we can generate a ground truth via image processing (simply convert the input image to grayscale) it is possible to calculate a distance metric. Warning: This is not possible with real ControlNets because we inherently don't have a ground truth!
SSIM
FFT low-frequencies
Naive pointwise distance on each channel (this is not robust against noise and pixel translations)
Why did I stop at 1.25 epochs? Because it was an overnight run and this is the epoch I happened to wake up to.
Let's see how many people actually use it so they can avoid classic image editing at all cost. 🤣
Beta Was this translation helpful? Give feedback.
All reactions