-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable Diffusion Meta AITemplate with >= 200% performance increase #1625
Comments
Looking into what is needed for this to work: We need to isolate all portions of the model for sampling from ldm/taming and create torch-like AIT version of them to be transpiled into c++. https://facebookincubator.github.io/AITemplate/tutorial/how_to_infer_pt.html A good example here is of the port of the attention module: Then we run the |
I wonder if there will be the same challenges implementing this as there were implementing this other performance enhancement: In the case of that issue, if I understand correctly, the improved method could only run under linux and there wasn't a clear way to cross-compile for windows and so collaborators were kind of stuck waiting for changes upstream to be made. Any idea if this is going to be compatible with windows? I think another potential challenge is that AIT's hardware requirements (Ampere etc). Certainly if this could be implemented it could be behind a cmd opt, but would that mean that there may need to be multiple version of the core items you mentioned? |
You would need code duplication to the AIT syntax and a flag to turn it on for different hardware requirements, yes. It's very frustrating that there is no way to easily use the existing torch code. |
I don't think this repo currently uses diffusers, but stumbled upon this PR: Which has some comments talking about how it could potentially also make use of AITemplate in a future PR:
|
AITemplate + xformers combination just dropped:
|
A friend of mine was able to get his RTX 4090 inference speed from 25-28 it/s to 61-64 it/s range He said it was a rather painstaking process to get AITemplate work with a lot of errors along the way that he had solve Would be very interesting if someone investigated this further and figured out a way to port it to the webui Here's his specs: |
Even 25-28 it/s is insane. I'm lucky to get 6-8 on my 4090. |
I am getting stable 11 it/s on my 3080..?
…On Mon, 14 Nov 2022, 05:35 becausereasons, ***@***.***> wrote:
Even 25-28 it/s is insane. I'm lucky to get 6-8 on my 4090.
—
Reply to this email directly, view it on GitHub
<#1625 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMP7EHRUAMKRRF3ANBBM7LWIG6RRANCNFSM6AAAAAAQ4R7LWI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
@YourFriendlyNeighborhoodMONKE Curious if your friend took notes along the way about how they managed to get things going, what issues they ran into, how they solved them, etc. Would be awesome knowledge to have shared here to make others lives easier, and/or bootstrap possibly getting it running for this repo! |
I got about 7.5-8.5 out of the box, which is actually really bad for 4090, but yeah, it's because afaik there's still no support for Lovelace in pytorch and other areas as well - 3090's are probably beating those numbers out of the box The easiest optimizations you could do would be CUDNN optimization which is just replacing some .dll files in the /venv/lib/site-packages/torch/lib/ - You can find the files by searching 4090 cudnn in a discussion thread here and also on r/StableDiffusion xformers is fairly simple and straightforward too as auto's webui already supports it out of the box without compiling and all you really need to do is to put --xformers into webui.bat after I got little under 20it/s after those two, which isn't as high as some are able to get, but I'm happy enough and will just wait for better 4090 support and things like AIT becoming available for easy Windows installation or included in the webui Remember to back up everything before attempting! |
I understand, but I doubt he has something, because the way I understood it, it took him a couple days of struggle and he seems like he's pretty advanced as well in these areas - These kinds of things at this stage tend to have quite varied errors to deal with which are hardware/software configuration specific too I'll ask anyway! Btw. |
Another potential performance gain issue: |
A few semi-related issues about exploring using AITemplate with Dreambooth: |
|
why not disable to improve the nvidia cards?To kill amd? |
why not disable to improve the nvidia cards?To kill amd?
|
every primpt change need rebuild,it cost above 2mins |
Hi guys! I am newbie of stable diffusion webui. I don't know whether AITemplate is available on stable diffusion webui。Any plan to support it? |
AI Template from Meta proposes a 200% or more speedup in image generation.
Presently it is only available for the diffusers library: https://github.com/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion
PT = pytorch, AIT = AI template implementation
The text was updated successfully, but these errors were encountered: