VRAM optimisation via offloading #87

hortom · 2024-02-03T15:24:19Z

Warning

This is not an entirely stable solution. Switching between optimisations won't work. Choose one and use it until you restart ComfyUI and you will be ok.

Description

With these modifications, I can run InstantID on my 4070 (12GB VRAM) many times with level_2 optimisation.

This VRAM optimisation has 2 levels:

Level 1
Select this with more than 12 GB VRAM - if the out-of-memory error happens.
This is a faster solution. It uses enable_model_cpu_offload()
VRAM usage during calculation: 85-88%
Speed: 1.25 it/s (On an RTX 4070)
Pros: relatively fast
Cons: after 5-8 runs, an OOM error is thrown with 12 GB VRAM, so it needs extra work to prevent memory leaking or better caching. A more clever dev can figure this out, I am sure.

Level 2
Select this with 12 GB or less VRAM
This is a slower solution. It uses enable_sequential_cpu_offload()
VRAM usage during calculation:15-17%
Speed: 2.35 s/it (2-3x slower on RTX 4070)
Pros: I can run this multiple times without any issues.
Cons: Still, need VRAM to load the models, for this needs around 10 GB.

Please note that these are hack-ish solutions as these functions shouldn't be called more than once on a pipe, as far as I know.
So, I had to add extra checking to detect if they were called (enable_sequential_cpu_offload()) or if possible, revert its effect (enable_model_cpu_offload()).

What has been changed:

Added: IPAttnProcessor2_0 class (attention_processor.py)
Added: vram_optimisation param to the generator node (InstantIDNode.py)
Added: 2 levels of VRAM optimisation to the pipeline (pipeline_stable_diffusion_xl_instantid.py)

Added: IPAttnProcessor2_0 class (attention_processor.py) Added: `vram_optimisation` param to the generator node (InstantIDNode.py) Added 2 levels of VRAM optimisation to the pipeline (pipeline_stable_diffusion_xl_instantid.py)

kesyn

After creating a Lora loader node, it took more than ten times longer and ended up running out of memory. However, using this solution, everything now works fine.

ref: ZHO-ZHO-ZHO#87 Signed-off-by: Fu Lin <river@vvl.me>

VRAM optimisation via offloading

9be1beb

Added: IPAttnProcessor2_0 class (attention_processor.py) Added: `vram_optimisation` param to the generator node (InstantIDNode.py) Added 2 levels of VRAM optimisation to the pipeline (pipeline_stable_diffusion_xl_instantid.py)

kesyn approved these changes Mar 6, 2024

View reviewed changes

time-river added a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024

optim: VRAM optimisation via offloading

7fd1b6b

ref: ZHO-ZHO-ZHO#87 Signed-off-by: Fu Lin <river@vvl.me>

time-river pushed a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024

optim: Accelerate through onedif

cfdbfc7

ref: ZHO-ZHO-ZHO#87 Signed-off-by: Fu Lin <river@vvl.me>

time-river added a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024

optim: VRAM optimisation via offloading

8fd1509

ref: ZHO-ZHO-ZHO#87 Signed-off-by: Fu Lin <river@vvl.me>

time-river added a commit to time-river/ComfyUI-InstantID that referenced this pull request Mar 23, 2024

optim: VRAM optimisation via offloading

87d99ce

ref: ZHO-ZHO-ZHO#87 Signed-off-by: Fu Lin <river@vvl.me>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRAM optimisation via offloading #87

VRAM optimisation via offloading #87

hortom commented Feb 3, 2024

kesyn left a comment

VRAM optimisation via offloading #87

Are you sure you want to change the base?

VRAM optimisation via offloading #87

Conversation

hortom commented Feb 3, 2024

Warning

Description

This VRAM optimisation has 2 levels:

What has been changed:

kesyn left a comment

Choose a reason for hiding this comment