FLUX memory management improvements #6791

RyanJDick · 2024-08-28T15:44:03Z

Summary

This PR contains several improvements to memory management for FLUX workflows.

It is now possible to achieve better FLUX model caching performance, but this still requires users to manually configure their ram/vram settings. E.g. a vram setting of 16.0 should allow for all quantized FLUX models to be kept in memory on the GPU.

Changes:

Check the size of a model on disk and free the requisite space in the model cache before loading it. (This behaviour existed previously, but was removed in https://github.com/invoke-ai/InvokeAI/pull/6072/files. The removal did not seem to be intentional).
Removed the hack to free 24GB of space in the cache before loading the FLUX model.
Split the T5 embedding and CLIP embedding steps into separate functions so that the two models don't both have to be held in RAM at the same time.
Fix a bug in InvokeLinear8bitLt that was causing some tensors to be left on the GPU when the model was offloaded to the CPU. (This class is getting very messy due to the non-standard state_dict handling in bnb.nn.Linear8bitLt. )
Tidy up some dtype handling in FluxTextToImageInvocation to avoid situations where we hold references to two copies of the same tensor unnecessarily.
(minor) Misc cleanup of ModelCache: improve docs and remove unused vars.

Future:
We should revisit our default ram/vram configs. The current defaults are very conservative, and users could see major performance improvements from tuning these values.

QA Instructions

I tested the FLUX workflow with the following configurations and verified that the cache hit rates and memory usage matched the expected behaviour:

ram = 16 and vram = 16
ram = 16 and vram = 1
ram = 1 and vram = 1

Note that the changes in this PR are not isolated to FLUX. Since we now check the size of models on disk, we may see slight changes in model cache offload patterns for other models as well.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)

brandonrising

Lgtm

…efaults were misleading, because the config defaults take precedence over them.

…efore loading it.

…to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache.

… that all model references are locally-scoped so that the two models don't have to be help in memory at the same time.

… clear the cache based on the on-disk model size.

…ion.

RyanJDick requested review from lstein, blessedcoolant, brandonrising, hipsterusername and psychedelicious as code owners August 28, 2024 15:44

github-actions bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files labels Aug 28, 2024

brandonrising approved these changes Aug 29, 2024

View reviewed changes

RyanJDick added 11 commits August 29, 2024 19:08

Remove default model cache sizes from model_cache_default.py. These d…

e064377

…efaults were misleading, because the config defaults take precedence over them.

Remove unused constructor params from ModelCache.

5284a87

Remove unused param from ModelCache.

5fefb3b

Remove unused MOdelCache.exists(...) function.

cad9a41

Improve ModelCache docs.

c578b8d

Tidy up GIG -> GB and remove unused GIG constant.

6ba9b1b

Check the size of a model on disk and make room for it in the cache b…

7709007

…efore loading it.

Fix bug in InvokeLinear8bitLt that was causing old state information …

29fe153

…to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache.

Split T5 encoding and CLIP encoding into separate functions to ensure…

c738fe0

… that all model references are locally-scoped so that the two models don't have to be help in memory at the same time.

Remove hack to clear cache from the FluxTextToImageInvocation. We now…

5e8cf9f

… clear the cache based on the on-disk model size.

Tidy variable management and dtype handling in FluxTextToImageInvocat…

4e4b6c6

…ion.

RyanJDick force-pushed the ryan/flux-model-cache-improvements branch from 42992cc to 4e4b6c6 Compare August 29, 2024 19:08

RyanJDick enabled auto-merge August 29, 2024 19:16

RyanJDick merged commit 87261bd into main Aug 29, 2024
14 checks passed

RyanJDick deleted the ryan/flux-model-cache-improvements branch August 29, 2024 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLUX memory management improvements #6791

FLUX memory management improvements #6791

RyanJDick commented Aug 28, 2024 •

edited

Loading

brandonrising left a comment

FLUX memory management improvements #6791

FLUX memory management improvements #6791

Conversation

RyanJDick commented Aug 28, 2024 • edited Loading

Summary

QA Instructions

Checklist

brandonrising left a comment

Choose a reason for hiding this comment

RyanJDick commented Aug 28, 2024 •

edited

Loading