-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support HyperTile optimization #13948
Conversation
Finally someone decided to make HyperTile for Automatic1111! |
Hey, if anyone wants to test this on SDXL, I created an amateur port of hypertile.py to use the SDXL depth layers. I spent a lot of time tuning the numbers and tile sizes and whatnot over the past few days and I think I've found the best settings, i.e. best performance, no artifacts and it only slightly changes the seed compared to without. I also added the LDM key for VAE, the original only had the diffusers key (A1111 only uses LDM/SGM and not diffusers) so this PR wasn't tiling the VAE at all, VAE hasn't changed across SD versions so it's relevant for both 1.5 and SDXL. Big thank you to the creator of https://github.com/arenasys/stable-diffusion-webui-model-toolkit as the components directory in that repo is the only place I could find any info about what layers 1.5 and XL use. hypertile.py.txt (Rename to hypertile.py in the modules directory) This only works for SDXL and not 1.5. Without it I get 1.7 it/s, with it I get 1.82 it/s, with no loss to quality or determinism. Definitely worth it. (Oh yeah, forgot to mention, I also commented out the line that prints every layer it hijacks to the console. SDXL has a LOT of layers.) |
SD Base (1.4-1.5) SD XL (3 pass, batch count 6) |
Co-Authored-By: Kieran Hunt <kph@hotmail.ca>
db0f9a1
to
b29fc6d
Compare
Also note that increasing the max_depth further increases it/s, as it hijacks more layers. I'm not sure about 1.5, but on SDXL I've gotten the best results with max_depth 2 (max_depth 1 is about the same speed as max_depth 2, but with a reduction in quality). |
The tile size / depth / etc options will be added to options soon ™️ Also, I found the old vladmandic's implementation, which says it is not compatible with ToMe / other types of extensions - but I guess it can just work if we hijack the hypertile at last moment, confirmed with ToMe ratio 0.3 / etc Thus, if anyone find some bug - please ping me |
I'm unable to get the newest version of the patch to work (on SDXL at least), I'm unsure as to what's causing it |
I'm pretty sure I was sleepy while implementing this
@gel-crabs Thank you for the comment! You're right, I confirmed the issue was from typos from refactoring. (I may have to refactor again...) A. The options were inverted so if you enable, it was disabled... Confirmed working for SD Base 1.5 Now. SD XL - 768x768 depth 0 |
It works! With the newest commits and full max_depth, my it/s now goes from 1.7 to 1.88. Not bad at all! If I'm able to find any information about SDXL depth layers in diffusers, I will hook it up in case A1111 gets diffusers support in the future (plz, I need inpaint) |
I wanted this to work without changes to processing.py so I partially reworked the file into a built-in extension; additionally added an option to only apply unet hypertile to a hires fix pass. Still no infotext params - adding them is easy but I think before that reasonable defaults should be figured out - ones that give most speed improvement with least image difference. |
ok i found it how do we use |
@FurkanGozukara go to Settings - Hypertile options, enable optimizations (and set swap size as large like 12 for safety) - then there you go |
thank you. i am testing right now SDXL on RTX 3060 - 12 GB - i don't see any difference in speed for 1024x1024 outputs changing what does each option do depth |
Sorry but I don't quite understand how this works, is this for txt2img with hires fix or only img2img upscale? And as for the options, do I enable Enable Hypertile U-Net, Enable Hypertile U-Net for hires fix second pass and Enable Hypertile VAE? I have tried to use it with all the options enabled and one-by-one for txt2img, I do not see any real difference in speed or image quality....unless I am doing something wrong. There are some very minor changes in 1.5 but none that I can see in SDXL. |
i got very little speed improvement testing with RTX 3090 TI from 1280x1024 to 2176x1740 without hyper tile : 1.13 second / it - second pass nothing like @AUTOMATIC1111 provided table above tested settings |
@ArxFusion @FurkanGozukara Thus the options are separated to 'first pass' and 'hires pass' and 'vae stage', to be used for corresponding bottlenecks. (In other words, if you just use 512x512 then you usually don't need it) Depth option is noticable if you are creating gigantic images, (well, depends on your ratio...) Max tile size - large is better (adjusted by ratio) Swap size - smaller is usually faster, but can produce artifact, thus there is trade-off between speed and aesthetic score. |
thank you i tested like this. shouldnt i see super speed improvement at high res fix pass? |
@FurkanGozukara Did you get hit with any memory problem while generating images? But as mentioned above, SD XL does not show that dramatic improvement compared to 1.5-type models, as expected. There are too many layers in SD XL, which can be the cause for this issue... (comfyUI shows same behavior - afaik it does not do anything for SD XL) |
I see. Well I tested without any VRAM limiting issue. I have 24 GB VRAM with RTX 3090 TI For SD 1.5 where can we utilize this? I mean when we make it higher resolution it produces garbage. So which places we could utilize? |
@FurkanGozukara It can be used for 1024x1024, 1600x1600 - or be combined with kohya's hires fix too, or even extreme high resolution with low denoise strength. (which was the main purpose by original author) |
can you show a screenshot of such sd 1.5 settings so i would like to test here like generating 1600x1600 image with sd 1.5 |
i see thanks. yes i also saw some real improvement at sd 1.5. tensor RT brings more improvement will this work with TensorRT? @aria1th |
i will combine and test with tensorRT. if both works huge speed improvement for SD 1.5 |
@FurkanGozukara Yes, but note that tensorRT requires code to be 'included' to compile, thus hypertile has to be the part of the model itself.... but still I'll say it is barely possible. |
ok i see. ty |
Thank you for updating the text with some info on the various options. I can see that for SDXL its not really working, but 1.5 there are some slight improvements, so far its only really noticable when running at more than 30 steps and it seems deterministic on your own hardware. I do notice that the images being generated can deviate between settings,some for the better and some for the worst depending on the options selected. I won't post my results since not sure how to even benchmark this because I feel everyone will have different experiences. |
Since Hypertile is intended for large images usually, could an option be added so that it's only enabled for hiresfix pass and img2img? I tried doing it myself but I didn't understand enough of the code to guess where to change. |
@zcatharisis Yes, Enable Hypertile for Unet second pass will exclusively allow Hypertile to be used for hires.fix. Img2Img is, though, a first pass. |
Sorry, I didn't make myself clear; the option would be to enable hiresfix pass and img2img passes only, while it is disabled in regular txt2img first pass. Sometimes I flip flop around upscaling by 3x in img2img, then generating a 576x768 image in txt2img, and back. It's a bit of a pain having to turn hypertile on for img2img and off for txt2img (since as you pointed out, it generates artifacts if the resolution isn't 128-multiple). EDIT:Or maybe a cleaner implementation would be to detect the resolution of the image being generated to toggle Hypertile on or off? For example, it is disabled at 1024x1024 and below and enable at that resolution and above? |
https://github.com/tfernd/HyperTile
Description
HyperTile is optimization with yet another split attention.
Currently, for testing other extensions like ControlNet, it requires modified repository
Thus requirements.txt is not modified here, it will require
Screenshots/videos:
The test is done for 3 environments:
The test is done with animefull-latest-pruned model, with
1girl
, negativeeasynegative
.Test is done in RTX4090@i7-8700-DDR4-3200 RAM, 16 batch counts for each test.
The patch is done to make hypertile deterministic.
Here is the behavior with / without hypertile:
Without HyperTile - original image. The result should be reproducable without patch.
With HyperTile - tested for 2 times with same seed. The result is slightly different, but deterministic.
TODO ; We need infotext for hypertile enabled / disabled.
Checklist: