Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Warning
This is not an entirely stable solution. Switching between optimisations won't work. Choose one and use it until you restart ComfyUI and you will be ok.
Description
With these modifications, I can run InstantID on my 4070 (12GB VRAM) many times with level_2 optimisation.
This VRAM optimisation has 2 levels:
Level 1
Select this with more than 12 GB VRAM - if the out-of-memory error happens.
This is a faster solution. It uses
enable_model_cpu_offload()
VRAM usage during calculation: 85-88%
Speed: 1.25 it/s (On an RTX 4070)
Pros: relatively fast
Cons: after 5-8 runs, an OOM error is thrown with 12 GB VRAM, so it needs extra work to prevent memory leaking or better caching. A more clever dev can figure this out, I am sure.
Level 2
Select this with 12 GB or less VRAM
This is a slower solution. It uses
enable_sequential_cpu_offload()
VRAM usage during calculation:15-17%
Speed: 2.35 s/it (2-3x slower on RTX 4070)
Pros: I can run this multiple times without any issues.
Cons: Still, need VRAM to load the models, for this needs around 10 GB.
Please note that these are hack-ish solutions as these functions shouldn't be called more than once on a pipe, as far as I know.
So, I had to add extra checking to detect if they were called (
enable_sequential_cpu_offload()
) or if possible, revert its effect (enable_model_cpu_offload()
).What has been changed:
Added: IPAttnProcessor2_0 class (attention_processor.py)
Added:
vram_optimisation
param to the generator node (InstantIDNode.py)Added: 2 levels of VRAM optimisation to the pipeline (pipeline_stable_diffusion_xl_instantid.py)