Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRAM optimisation via offloading #87

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

hortom
Copy link

@hortom hortom commented Feb 3, 2024

Warning

This is not an entirely stable solution. Switching between optimisations won't work. Choose one and use it until you restart ComfyUI and you will be ok.

Description

With these modifications, I can run InstantID on my 4070 (12GB VRAM) many times with level_2 optimisation.

This VRAM optimisation has 2 levels:

Level 1
Select this with more than 12 GB VRAM - if the out-of-memory error happens.
This is a faster solution. It uses enable_model_cpu_offload()
VRAM usage during calculation: 85-88%
Speed: 1.25 it/s (On an RTX 4070)
Pros: relatively fast
Cons: after 5-8 runs, an OOM error is thrown with 12 GB VRAM, so it needs extra work to prevent memory leaking or better caching. A more clever dev can figure this out, I am sure.

Level 2
Select this with 12 GB or less VRAM
This is a slower solution. It uses enable_sequential_cpu_offload()
VRAM usage during calculation:15-17%
Speed: 2.35 s/it (2-3x slower on RTX 4070)
Pros: I can run this multiple times without any issues.
Cons: Still, need VRAM to load the models, for this needs around 10 GB.

Please note that these are hack-ish solutions as these functions shouldn't be called more than once on a pipe, as far as I know.
So, I had to add extra checking to detect if they were called (enable_sequential_cpu_offload()) or if possible, revert its effect (enable_model_cpu_offload()).

What has been changed:

Added: IPAttnProcessor2_0 class (attention_processor.py)
Added: vram_optimisation param to the generator node (InstantIDNode.py)
Added: 2 levels of VRAM optimisation to the pipeline (pipeline_stable_diffusion_xl_instantid.py)

Added: IPAttnProcessor2_0 class (attention_processor.py)
Added: `vram_optimisation` param to the generator node (InstantIDNode.py)
Added 2 levels of VRAM optimisation to the pipeline (pipeline_stable_diffusion_xl_instantid.py)
Copy link

@kesyn kesyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After creating a Lora loader node, it took more than ten times longer and ended up running out of memory. However, using this solution, everything now works fine.

time-river added a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024
ref:
ZHO-ZHO-ZHO#87

Signed-off-by: Fu Lin <river@vvl.me>
time-river pushed a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024
ref:
ZHO-ZHO-ZHO#87

Signed-off-by: Fu Lin <river@vvl.me>
time-river added a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024
ref:
ZHO-ZHO-ZHO#87

Signed-off-by: Fu Lin <river@vvl.me>
time-river added a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024
ref:
ZHO-ZHO-ZHO#87

Signed-off-by: Fu Lin <river@vvl.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants