-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stuff to save Vram to less then 7gb #29
base: main
Are you sure you want to change the base?
Conversation
added the quntization button to the ui
added dynamic unloading of main model and vae to save vram.
added Int8Quantized to save vram
oh i have to fix something with the vae encoding but i will do that tomorrow. |
@Manni1000 sir there are thousands of us waiting for this to get merged.... |
I will fix the vae buck today :) |
made low vram button work. only starts new pipline if the value is chaged
added code so the Quantizing button from the ui works
added code so that the int 8 button in the ui works
ok now everything works. but because of the chages in the main branch its not compatible |
chage to match og reposotty
I tested from your fork, OOM issue :( |
Maby fist try generating a image without a input image just as a test. |
This was without image only. Tried first example where there is only prompt and no image and 512 x 512. |
strage. if its only text to image it takes less then 7gb for me. and with a image its slightly above 8. maby try the version ignoring the newest two comits this was just stuff to make a merge easyer. |
I am using your repository only, not merging any code. If you guide me how to ignore any commit or give command to clone. (BTW not a developer) |
I can't get this to work either, I tried it on Kaggle, it says: Loading safetensors OutOfMemoryError: CUDA out of memory. Tried to allocate 2.12 GiB. GPU 0 has a total capacity of 14.74 GiB of which 848.12 MiB is free. Process 2413 has 13.91 GiB memory in use. Of the allocated memory 11.64 GiB is allocated by PyTorch, and 2.15 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management |
Install with Pinokio and raplace this files. Work with 5-9 gb VRAM. Check "Low VRAM (8-bit Quantization)" after load app. |
Thank you. Working fine. |
not for me though :-( I'm still trying to get this working on Kaggle first I'm getting "The error AttributeError: module 'torch.library' has no attribute 'register_fake' typically arises when using PyTorch, particularly in versions where this function is either not available or incorrectly referenced." (totch verion 2.3.1) Thought I need to upgrade, so I did, but now I get this error: "AttributeError: partially initialized module 'torchvision' has no attribute 'extension' (most likely due to a circular import)" why is this so complicated to run? the space on Huggingface gives an error 3 out of 4 tries also... I'm finding very difficult to evaluate this model to be honest |
@Manni1000 please check the PR i sent to your fork. I guess this PR will work once you merge my PR to your fork Manni1000#2 |
Also, I confirm it works (with my fix applied). Took 1min 56sec on a 4090, wit 8GB VRAM usage. BTW shouldn't we make LOW VRAM option checked by default? I mean, this thing is practically unusable without the low vram option. |
not work on colab t4 |
The @cocktailpeanut PR is working on Ubuntu 22.04:
Tick the new 'Low VRAM (8-bit quantization) ' button. Basic t2i maxes out at 5.8GB VRAM. Done right quick! |
ram=??? |
It needs more than 12 ram ,,,,,,Can you convert it and store it... Can you upload the conversion code only... Can you upload the converted model... Any solution you can come up with, please... I want to use it on 12 gb ram like Colab T4 |
15.9GB system RAM. Possibly a scooch too much for Colab. |
I added dynamic unloading and loading of the main model and the VAE to save VRAM.
I also added Int8Quantized to save VRAM.
I also added a button in the Gradio UI to select the low VRAM mode (Int8Quantized).
When this is active, the model runs with under 7GB VRAM. A lot more people will be able to use it like this.
!!!!! But this is not completely done. The button does not work!!! A variable in the pipeline.py (Quantization = True) file currently changes if it is active. Maybe someone else can connect the button logic with the variable; I am currently not able to do it. But I published this so others can play around with it.