Skip to content

Offload

Vladimir Mandic edited this page Oct 29, 2024 · 1 revision

Offload

Offload is a method of moving model or parts of the model between the GPU memory (VRAM) and system memory (RAM) in order to reduce the memory footprint of the model and allow it to run on GPUs with lower VRAM.

Automatic offload

Tip

Automatic offload is set by the Settings -> Diffusers -> Model offload mode

Balanced

  • Recommended for compatible high VRAM GPUs
  • Faster but requires compatible platform and sufficient VRAM
  • Balanced offload moves parts of the model depending on the user-specified threshold
    allowing to control how much VRAM is to be used
  • Default memory threshold is 75% of the available GPU memory
    Configure threshold in Settings -> Diffusers -> Max GPU memory for balanced offload mode in GB

Limitations: Not compatible with Optimum.Quanto qint quantization

Sequential

  • Recommended for low VRAM GPUs
  • Much slower but allows to run large models such as FLUX even on GPUs with 6GB VRAM

Limitations: Not compatible with Quanto qint or BitsAndBytes nf4 quantization
Note: Use of --lowvram automatically triggers use of sequenential offload

Model

  • Higher compatibility than either balanced and sequential, but lesser savings

Limitations: N/A

Manual Offload

In addition to above mentioned automatic offload method, SD.Next includes manual offload methods which are less granular and are only supported for specific models.

  • Move base model to CPU when using refiner
  • Move base model to CPU when using VAE
  • Move refiner model to CPU when not in use
Clone this wiki locally