-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train several resolutions at same time. #130
Comments
This is very interesting. I think it is not difficult technically. One problem is that the batch size needs to be variable to make efficient use of memory. However, I believe that will be possible. I will consider how to implement this feature in near future. |
日本語でよろしいでしょうか。 私は主に画風を学習させています。 できれば一つのデータセットから解像度(ex: 512、768、1024)ごとに分類できるようにした上で、一つのバケットに仕上げるのがベストではないかと思います。 サイズ別にbatchを作る必要性についてはよく分からない部分なので、batchに関することは私が議論する事項ではないようです。 repoが変更されるたびに手動でコードを修正していたので提案してみました。 |
失礼します。I have some questions related to style training as there seems only character training advice available online. |
Training with 1024 takes a very very very long training time. I think it is enough to remove only unique tags (category 4 of WD Tagger) such as character names. I adopt 0.3 as the threshold for WD Tagger. I hope this helps you. |
Thank you very much, your suggestion is really detailed. One thing I want to clearify about the "large resolution" in practice. |
It's output directly at high resolution. I used hiresfix before, but I didn't get satisfactory results in the process. (a bit of blurry) so I'm using high-resolution learning and output. |
I would like to confirm, but if I select 512 as the training resolution and the training images contain images of a size larger than that, will the training quality be affected? Also, I saw on a bulletin board somewhere that you can learn without problems even if all the image sizes are different.Is this correct information? |
If the sizes are different, there is no problem with proceeding with the learning, but it affects the quality. If the size is larger than the training resolution, there is a slight degradation in quality. Because it is resized by cv2.INTER_AREA in the training process. It is recommended that you reduce it in advance with other graphics tools (e.g., Photoshop). Alternatively, it is recommended to modify the code to a better downscaler (ex: cv2.BICUBIC). If the size is smaller than the training resolution, there is a significant degradation in quality. In this case, it is recommended to remove the corresponding image or to zoom in to the upscaler and then downscale again. Unless you're very sensitive about the quality of your dataset, you don't have to go through this process. |
Thank you for your comment. I would like to confirm one more thing, but if a model trained with 512 × 512 images is partially additionally trained with a higher training resolution (768,1024), the effect of increasing the resolution of the generated image is Is there? |
I don't think there's an effect of improving the resolution of the entire model. |
- ``--bucket_reso_steps`` and ``--bucket_no_upscale`` options are added to training scripts (fine tuning, DreamBooth, LoRA and Textual Inversion) and ``prepare_buckets_latents.py``. - ``--bucket_reso_steps`` takes the steps for buckets in aspect ratio bucketing. Default is 64, same as before. - Any value greater than or equal to 1 can be specified; 64 is highly recommended and a value divisible by 8 is recommended. - If less than 64 is specified, padding will occur within U-Net. The result is unknown. - If you specify a value that is not divisible by 8, it will be truncated to divisible by 8 inside VAE, because the size of the latent is 1/8 of the image size. - If ``--bucket_no_upscale`` option is specified, images smaller than the bucket size will be processed without upscaling. - Internally, a bucket smaller than the image size is created (for example, if the image is 300x300 and ``bucket_reso_steps=64``, the bucket is 256x256). The image will be trimmed. - Implementation of [#130](kohya-ss/sd-scripts#130). - Images with an area larger than the maximum size specified by ``--resolution`` are downsampled to the max bucket size. - Now the number of data in each batch is limited to the number of actual images (not duplicated). Because a certain bucket may contain smaller number of actual images, so the batch may contain same (duplicated) images. - ``--random_crop`` now also works with buckets enabled. - Instead of always cropping the center of the image, the image is shifted left, right, up, and down to be used as the training data. This is expected to train to the edges of the image. - Implementation of discussion [#34](kohya-ss/sd-scripts#34).
Would be nice to train high resolution image like in tiles, except it should understand which object to render. |
Thank you for your hard work.
Currently, Bucket systems are coded to fix only one resolution.
If possible, I would like to train images of different resolutions at the same time.
For example, 512 x 512, 576 x 1024, 1024 x 1024, 760 x 1280
Unfortunately, even if you turn off the enable_bucket option, it is being forced to resize.
I want to train a particular area further with a small image, but it is trained in large sizes due to forced resizing, and as a result, the learning quality of everything deteriorates on average.
I would like to be able to completely turn off bucket and force resize.
The text was updated successfully, but these errors were encountered: