Would an image normalization feature help with training? #903

araleza · 2023-10-28T21:23:10Z

araleza
Oct 28, 2023

I noticed that a training run of mine was producing brighter and brighter sample images over time, and faces seemed to be losing detail too. Usually I would put that down to overtraining, but I felt it was too early in my training cycle for that.

So I looked at my input images. Many of them were photoshoot-style images, with solid white backgrounds. I deleted those ones, even though they made up half of my training set, leaving me with just 31 images, which is not particularly many. But when I restarted the training run with the same parameters as before, I actually got better results, and the high-glare look was mostly gone.

I think maybe a lot of people think their model is overtrained, when it's just the average brightness point that has drifted due to their input image set.

Looking into the topic of image brightness, I found that Huggingface's own Dreambooth training script automatically changes:

The brightness to be an average of 0.5, and also...
The standard deviation of the pixel brightnesses to be 0.5 as well

        self.image_transforms = transforms.Compose(
            [
                transforms.Resize(size, interpolation=transforms.InterpolationMode.BILINEAR),
                transforms.CenterCrop(size) if center_crop else transforms.RandomCrop(size),
                transforms.ToTensor(),
                transforms.Normalize([0.5], [0.5]),
            ]
        )

(from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py )

I don't think the Kohya scripts support this? I think it might help people train higher quality models if there was a simple command line option to perform this image operation while generating the latents. What do you think?

Edit: Testing out normalizing my input dataset with Python now. The ones with bright white backgrounds now look very washed out, and there's a lot of color clamping against 0 and 1 happening. This may not be as simple and good an idea as it sounded above

I'm starting to think that normalizing the brightness by doing like Gimp does when moving the brightness centerpoint in its Levels window might make the images more suitable for the network without washing them out. It's more of a curve bend of brightness than a linear scaling.

Edit: My training run has been going for a while now, and it's picked up the washed out look. Also, the clamping of the normalized pixels to 0 and 1 (0 and 255 in byte values) is damaging the output. Lots of black eyeshadow is appearing where it shouldn't, where all the dark eyes have gotten lots of pixels clamped to black in the input images. I'll probably try a Gimp-style brightness-centerpoint-bend tomorrow, and see if that works out better.

kohya-ss · 2023-10-29T08:22:42Z

kohya-ss
Oct 29, 2023
Maintainer

transforms.ToTensor(),

This line converts the pixel values of the image from 0-255 to 0-1, and convert the image to the tensor.

transforms.Normalize([0.5], [0.5]),

And this line converts the pixel values from 0-1 to -1 to +1. Because Normalize is x_new = (x - mean) / std, so this is x_new = (x - 0.5) / 0.5. This is the range the VAE assumes for the input.

The same process is defined here:

sd-scripts/library/train_util.py

Line 115 in 2a23713

IMAGE_TRANSFORMS = transforms.Compose(

and applied here:

sd-scripts/library/train_util.py

Line 1135 in 2a23713

image = self.image_transforms(img) # -1.0~1.0のtorch.Tensorになる

0 replies

araleza · 2023-10-29T14:27:59Z

araleza
Oct 29, 2023
Author

Ah yeah, thanks.

I tried changing the gamma of images to make them more balanced, but all it does to images with very bright backgrounds is turn the foreground into a silhouette.

I think the design of modern image training networks just can't handle bright images for now, so the best thing to do is remove them from the training set. Or maybe I'll try that new 'mask' code that's you're just about to add to the scripts:

#589

to mask away the bright backgrounds.

0 replies

araleza · 2023-10-29T19:06:51Z

araleza
Oct 29, 2023
Author

I've now been doing more training runs with the following two rules in place:

Drag the 'Value' centerpoint in Gimp (in 'Colors/Levels...' on the menu) for images that are brighter or dimmer than average, to make them average, and
Discard any training images that are too far off 50% brightness that rule 1 can't fix them without making them look bad

I'm seeing great results from this. My runs are lasting much longer without overtraining, and the concepts in them are being picked up in more detail than before.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would an image normalization feature help with training? #903

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Would an image normalization feature help with training? #903

araleza Oct 28, 2023

Replies: 3 comments

kohya-ss Oct 29, 2023 Maintainer

araleza Oct 29, 2023 Author

araleza Oct 29, 2023 Author

araleza
Oct 28, 2023

kohya-ss
Oct 29, 2023
Maintainer

araleza
Oct 29, 2023
Author

araleza
Oct 29, 2023
Author