Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: Export images #332

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

StableLlama
Copy link

This PR adds the ability to export the images to a new folder. During this export the images can be resized to suit the target model size and cropped to fit into the target buckets. After the resize a slight sharpening is happening as it is recommended by graphic specialists.

Notable features:

  • Resizing is done with the highest quality algorithm
  • Best target dimensions are thoroughly searched
  • Color space conversion can be applied as source images might use different color spaces but the training tools usually don't know anything about color spaces
  • A statistical preview is shown

Additionally code changes:

  • The use of the DEFAULT_SETTINGS mechanism was simplified to reduce the necessary coding to prevent inconsistency bugs and improve readability

@jhc13
Copy link
Owner

jhc13 commented Feb 7, 2025

I appreciate your work, but I think this might be outside the scope of the program. TagGUI is designed for creating and editing text captions for images without modifying the images themselves. I think the features you are implementing are better suited for a dedicated program for processing the images.

By the way, TagGUI already supports moving or copying images to a different directory, but these actions do not alter the images.

@StableLlama StableLlama marked this pull request as ready for review February 7, 2025 09:43
@StableLlama
Copy link
Author

This PR is the base for the next two steps I want to include as well:

  1. support manual cropping
    • Give the user a hint to catch exactly relevant aspect ratios (due to the bucketing it's not an obvious task)
    • Give the user a hint about what will be additionally cropped to fit a bucket to be able to fine adjust the crop
    • Possibly: allow multiple crops per image, like having a full body crop and a face crop out of a high res source image.
  2. support masking
    • Have one or more positive (rectangular) masks
    • Have one or more negative (rectangular) masks
    • When exporting create the real mask from the union/intersection of these masks and either store it in the alpha channel of the exported image or store it in a parallel directory, just like the usual training scripts are expecting them
    • Likely: show what the mask will exactly cover or not as the quantization due to the VAE / latent space might surprise the unaware user
    • Probably: have (rectangular) "hints" that show major features ("head", "hand"). Likely these can then help to quickly create a mask or crop, e.g. to prevent cropping though a hand which will create bad training results
    • Likely: The data storage for these features will be so simple that external tools can easily create them to allow a quick semiautomatic workflow. E.g. have a watermark detector tool to create a negative mask for it. Have a hand and face detector tool to create hints. The user can then quickly create the crop is such a way that all these information is taken care of. (Of course this functionality could be added to taggui as well, when that is desired and someone knowledgeable of YOLO models or similar can create the code)
    • Out of scope: a "real" mask editor where you can paint a mask.

All of this is the next step in the workflow of starting with a bunch of images and finally coming up with a ready set of training images. So far I know of no other tool that can do this, I know people who are missing a tool for this (not only me :)), it is closely related to tagging the images and taggui is the perfect base for this extended functionality.
So I think it is in the scope of taggui. Or, to be more precise: it is a valuable extension of the scope in an area where no tools are available.

And all of this is in the (highly appreciated!) spirit of not modifying the source images. That's even the reason for creating the exporting function as it allows you to reuse your valuable data set for new models when they become available:
You are using SD1.5 right now? Tag the images and export them with 512px.
Now you are switching to SDXL or Flux? Only thing to do is to do a new export with 1024px.
Then the next model comes that does UHD 4k? Still, only one more export, this time with 4k. (You can do it with the code from this PR already, even when it doesn't know about 4k)
Or a future model that does HDR and a wide color gamut? And your image collection already has such images? Then it's again only one export, just select the color space accordingly. (And all exports prior to this were even working correctly for you as it changed the color space to sRGB as that is what the current models are expecting. If you would have given those models these HDR images by plain copying the training result could have been disappointing)

@StableLlama
Copy link
Author

Hint about HDR, the Open Model Initiative seems to look actively at supporting HDR images: https://github.com/Open-Model-Initiative/HDR_SDXL

Advantages are smaller file sizes when compression is acceptable
(quality < 100) or lossless compression with quality = 100. Also the
alpha channel is supported and kept in the images.

Note 1: this only adds support for the export function, and is not general
JPEG XL support for taggui. There is the fork https://github.com/yggdrasil75/taggui
that does exactly this.

Note 2: You might need `pip install pillow-jxl-plugin` beforehand to
be able to export into JPEG XL.
@StableLlama
Copy link
Author

StableLlama commented Feb 12, 2025

@yggdrasil75 I've seen that you are working on real JPEG XL support. Do you intend to make a pull request out of it soon, as it would work very well with my latest commit to this PR (-> #337 )?

@yggdrasil75
Copy link
Contributor

I will once I stop being lazy and actually have it working. might be tomorrow now that someone else mentioned it.

@yggdrasil75
Copy link
Contributor

made the jxl pr. btw, the biggest benefit I would see with this would be adding segm/bbox support to auto mask non-character portions, or auto crop to character, is that the eventual plan?

@StableLlama
Copy link
Author

That's exactly where I'm heading at.
But before the automation can come (when I actually have no experience at) the manual editing interface must work as you'll need that for refinement anyway.
And once that is here you can easily add automation for it.

@StableLlama
Copy link
Author

StableLlama commented Feb 15, 2025

Now I've also added a few more refinements, so I will not change this branch / RP anymore (unless someone finds bugs that need fixing, of course).
For the next step of the exporting (i.e. the crop editor) I'll create a new PR that will require this PR as a basis.

So please pull it, as well as #335, as both form the basis.
Also #337 would fit very well here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants