Not freeing RAM when changing between checkpoints #2180

FerrahWolfeh · 2022-10-10T15:44:06Z

Describe the bug
When you start the webui with a X checkpoint, it fills the system RAM into a certain ammount. Now if you change the checkpoint to Y in the webui, the RAM usage increases as if both models are loaded. If you change back to checkpoint X, RAM usage remains unusually high and the system begins to violently swap as soon as you start to generate images.

To Reproduce
Steps to reproduce the behavior:

Start webui.sh with any model (eg. Waifu-Diffusion 1.3) and measure the system RAM once the startup finishes
use the selector on the top of the page to change to another model (eg. Stable-Diffusion 1.4)
Change back to the first model and check RAM again.

Expected behavior
As soon as you switch the checkpoints, the program should free most of the memory used for the currently loaded model and fill it with the new selected model. And when you switch back to the first checkpoint, the memory should again be freed and filled to about a similar amount when the program had when it was freshly started.

Screenshots
Here are some screenshots of the memory usage of my system (notice the used and available columns)

Base system usage (only firefox open with some active youtube tabs)

Usage right after initialization

Usage after switching to another checkpoint

Usage after switching back to first model

Desktop (please complete the following information):

OS: Arch Linux 5.19.13
Browser: Firefox
Commit revision ce37fdd

Additional context
This is most visible on a system where you don't have much RAM to begin with (16GB in my case) and the effects are visible even without generating anything. It gets worse if you begin switching checkpoints between generations

The text was updated successfully, but these errors were encountered:

bmaltais · 2022-10-10T16:17:23Z

This probably explain why loading a new model usually crash webui after 7 or 8 model swap on my system with 16gb RAM allocated to WSL2.

CoffeeMomoPad · 2022-10-10T19:06:34Z

Experiencing the same thing on 16GB RAM, did not happen before till now

TechOtakupoi233 · 2022-10-11T05:02:17Z

When loading a new ckpt, the program will start loading the new ckpt, but left the old one in VRAM and RAM. I have only 6GB VRAM, It can't hold two models at once, it will be nice if the program free up VRAM and RAM BRFORE loading a new ckpt.

nerdyrodent · 2022-10-21T12:31:07Z

Same here. Switching models uses more and more RAM. I've tried changing "Checkpoints to cache in RAM", but it appears to make no difference.

anonymous721 · 2022-10-25T22:57:31Z

It's causing me a lot of annoyance too. 15-20 minutes testing a couple different models, and I'm already over 20GB of system RAM used.

RandomLegend · 2022-11-09T15:35:13Z

This is a serious issue for me now.
I have 16GB of ram and i never had any issues switching between models before.

I ran an old version of webui perfectly fine, upgraded to newest git because of new features and now i cannot even swap a model once. It will just crash violently.

GeorgiaM-honestly · 2022-11-09T16:50:17Z

Hello,

I'm trying to replicate this by using, as your screenshots say, 15 GiB of ram. My setup differs: the bare metal OS is gentoo linux, then I'm using a qemu VM with devuan (debian without systemd), and within that, auto running inside of a docker container. Hopefully that added complexity doesn't screw my testing up.

And yes you are seeing correct: I don't have swap. I didn't bother because this stuff lives on a host that has 128GB of ram and I'm able to just dial in whatever I want to give to the VM.

The formatting here is getting completely hosed, I'm not sure what is going on, sorry about that.

After initial start and before visiting the ui, which here loads the standard 1.5 model ( v1-5-pruned-emaonly.ckpt | 81761151 ):

GiB:

           total        used        free      shared  buff/cache   available

Mem: 14 5 4 0 4 8
Swap: 0 0 0

After visiting the UI and switching the model to the standard v1.4 ( 7460a6fa ):

GiB:

           total        used        free      shared  buff/cache   available

Mem: 14 8 1 0 4 5
Swap: 0 0 0

After switching back to the standard 1.5 model ( v1-5-pruned-emaonly.ckpt | 81761151 ):

GiB:

           total        used        free      shared  buff/cache   available

Mem: 14 8 1 0 4 5
Swap: 0 0 0

As such I am not able to replicate this. Please let me know if I missed something, or if you'd like me to try something else! You can also look into zram / compressed ram on linux, it is a handy and tuneable set of options which begins to compress the oldest ram contents (gently and more heavily if resources continue to run out) with the goal of delaying when the very slow swap space is used.

0xdevalias · 2022-11-09T21:51:13Z

The formatting here is getting completely hosed, I'm not sure what is going on, sorry about that.

@GeorgiaM-honestly Have you wrapped it in triple backticks to make it a code block? (```)

Random thought/musing/not sure if this actually relates to how things are done in the code at all, but is the model ckpt hash used for caching it anywhere (or was it at some point in the past)? I know there are some other issues here (~~can't remember the links off the top of my head~~ see link below) that were talking about different model ckpts that had the same hash, even though they were different. I'm wondering if perhaps switching back and forth between models with that 'hash clash' might somehow be causing this memory leak?

Edit:

This one:

I ran an old version of webui perfectly fine, upgraded to newest git because of new features and now i cannot even swap a model once. It will just crash violently.

@RandomLegend Is this swapping between any models at all? Are you able to provide the model hashes for some of the models that cause it to crash? Do they happen to have the same hash as per my theory above by chance?

Also, this is a separate issue, but I saw it linked here, and wanted to backlink to it in case it's relevant:

CLIP interrogation leaks memory if the proccess fails #2264

And this one may also be related:

[Bug]: Inconsistent stability when changing from V1.5 model to Inpainting V1.5 model. #3449

Changing to an inpainting model is calling the load_model() and creating a new model, but the previous model is not being removed from memory, even calling gc.collect() is not removing the old model from memory.

So if you keep changing from inpainting to not inpainting or vice versa the leak keep increasing.

Originally posted by @jn-jairo in #3449 (comment)

The fact that gc.collect() doesn't clear the old model is interesting however. This means that something is keeping a pointer to the old model alive and preventing it from being cleaned up.

Originally posted by @random-thoughtss in #3449 (comment)

Just to notify the progress I made, It is indeed a reference problem, some places are keeping a reference of the model, what prevents the garbage collector to free the memory.

I am checking it with ctypes.c_long.from_address(id(shared.sd_model)).value and there are multiples references.

I am eliminating the references but there are still some to find, It will take a while to find everything.

Originally posted by @jn-jairo in #3449 (comment)

0xdevalias · 2022-11-09T22:01:06Z

Looking at the 'references timeline' on #3449 also pointed me to this PR by @jn-jairo that was merged ~9 days ago:

@GeorgiaM-honestly I wonder if that's why you can't replicate the issues here anymore?

@RandomLegend have you updated to a version of the code that has that fix merged, and if so, are you still seeing issues despite it?

0xdevalias · 2022-11-10T05:03:30Z

@0xdevalias when i observed and reported this issue i was on the latest code, yes.

However i just completely wiped the installation, with the venv and the repos and reinstalled from scratch. That fixed it. I assume it was some incompatibility with some old stuff laying around that wasn't cleared in recent commits.

Originally posted by @RandomLegend in #2264 (comment)

tzwel · 2022-12-05T23:59:49Z

how to downgrade?

clementine-of-whitewind · 2023-01-21T17:45:38Z

pliese fix

Coderx7 · 2023-07-30T15:16:53Z

Im having the same issue on the latest commit. I never had this issue and it just popped up out of nowhere!
I'm on an ubuntu 22.04 with 32GB of RAM( and no swap) and

Python revision: 3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0]
Dreambooth revision: 9f4d931a319056c537d24669cb950d146d1537b0
SD-WebUI revision: 68f336bd994bed5442ad95bad6b6ad5564a5409a

Checking Dreambooth requirements...
[+] bitsandbytes version 0.35.0 installed.
[+] diffusers version 0.10.2 installed.
[+] transformers version 4.25.1 installed.
[+] xformers version 0.0.16rc425 installed.
[+] torch version 1.13.1+cu117 installed.
[+] torchvision version 0.14.1+cu117 installed.

side note: I did install google-perftools and then removed it thinking it may have sth to do with it. nothing changed

catboxanon · 2023-08-07T16:36:23Z

The dev branch and upcoming 1.6.0 may have resolved this with the rework in b235022. I'm going to leave this open for the time being but for those that would like to test it out earlier you can switch to the dev branch to do so.

Avsynthe · 2023-09-12T13:59:07Z

Hey all. I'm having this issue also. I'm using 1.6.0 and it never releases RAM. The more I generate, the higher it goes.

The server went down today and I couldn't figure out why the last snapshot of the system showed 99% memory used of 64GB. I realised SD is just compounding away. This happens no matter what model I use with VAE models increasing it quicker for obvious reasons. Switching models makes no difference, it just continues on.

I've had to limit SD to 20GB RAM and so it'll eventually crash when it hits.

Wynneve · 2023-10-14T19:39:35Z

@Avsynthe Hello there! I've been having the same issue for entire day now and it seems like I found a “solution”.
I've tried switching some settings in the webui, changing my CUDA toolkit version in the PATH, changing the version of CUDA of PyTorch, updating to the “dev” branch of the webui, etc... Nothing worked.

Then I realized that I had updated PyTorch before this problem appeared and then I tried to downgrade to PyTorch 2.0.1. And it worked! No more memory leak, now it properly offloads the weights from the RAM to VRAM and vice versa each generation.

For your convenience, here is the command for installing this previous version of PyTorch:
pip3 install torch==2.0.1 torchvision --index-url https://download.pytorch.org/whl/cu118
As I remember, I've deleted it before reinstalling, so, if it refuses to downgrade, you can manually remove it before executing the command:
pip3 uninstall torch torchvision

Seems like it's more an issue of the new PyTorch on its own, something related to moving tensors between devices.

If you aren't using Torch 2.1.0, well, my sincere apologies for not helping you :(

DanielXu123 · 2024-03-26T13:20:19Z

@Avsynthe Same thing in Linux, added to 100GB RAM , is there any possible solutions?

DanielXu123 · 2024-04-09T08:50:03Z

@Wynneve I reinstalled torch from 2.1.0 to 2.0.1 , but it shows mu xformers cannot be activate correctly, could you please help to check what your xfomers version?

FerrahWolfeh added the bug-report Report of a bug, yet to be confirmed label Oct 10, 2022

FerrahWolfeh mentioned this issue Oct 11, 2022

CLIP interrogation leaks memory if the proccess fails #2264

Closed

AN3223 mentioned this issue Oct 22, 2022

[Bug]: Inconsistent stability when changing from V1.5 model to Inpainting V1.5 model. #3449

Closed

1 task

0xdevalias mentioned this issue Nov 26, 2022

[Bug]: Model hashes begin to repeat #4298

Closed

1 task

wallish77 mentioned this issue Dec 27, 2022

[Bug]: Memory accumulation until crash when generating images #6068

Open

1 task

jchook mentioned this issue Jan 27, 2023

[Bug]: SD freezing PC after I changed model #6957

Open

1 task

jacquesfeng123 mentioned this issue Apr 3, 2023

[Bug]: linux memory leak when switching models #9323

Open

1 task

catboxanon closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not freeing RAM when changing between checkpoints #2180

Not freeing RAM when changing between checkpoints #2180

FerrahWolfeh commented Oct 10, 2022

bmaltais commented Oct 10, 2022 •

edited

Loading

CoffeeMomoPad commented Oct 10, 2022

TechOtakupoi233 commented Oct 11, 2022

nerdyrodent commented Oct 21, 2022

anonymous721 commented Oct 25, 2022

RandomLegend commented Nov 9, 2022

GeorgiaM-honestly commented Nov 9, 2022

0xdevalias commented Nov 9, 2022 •

edited

Loading

0xdevalias commented Nov 9, 2022 •

edited

Loading

0xdevalias commented Nov 10, 2022

tzwel commented Dec 5, 2022

clementine-of-whitewind commented Jan 21, 2023

Coderx7 commented Jul 30, 2023 •

edited

Loading

catboxanon commented Aug 7, 2023 •

edited

Loading

Avsynthe commented Sep 12, 2023

Wynneve commented Oct 14, 2023 •

edited

Loading

DanielXu123 commented Mar 26, 2024

DanielXu123 commented Apr 9, 2024

Not freeing RAM when changing between checkpoints #2180

Not freeing RAM when changing between checkpoints #2180

Comments

FerrahWolfeh commented Oct 10, 2022

bmaltais commented Oct 10, 2022 • edited Loading

CoffeeMomoPad commented Oct 10, 2022

TechOtakupoi233 commented Oct 11, 2022

nerdyrodent commented Oct 21, 2022

anonymous721 commented Oct 25, 2022

RandomLegend commented Nov 9, 2022

GeorgiaM-honestly commented Nov 9, 2022

0xdevalias commented Nov 9, 2022 • edited Loading

0xdevalias commented Nov 9, 2022 • edited Loading

0xdevalias commented Nov 10, 2022

tzwel commented Dec 5, 2022

clementine-of-whitewind commented Jan 21, 2023

Coderx7 commented Jul 30, 2023 • edited Loading

catboxanon commented Aug 7, 2023 • edited Loading

Avsynthe commented Sep 12, 2023

Wynneve commented Oct 14, 2023 • edited Loading

DanielXu123 commented Mar 26, 2024

DanielXu123 commented Apr 9, 2024

bmaltais commented Oct 10, 2022 •

edited

Loading

0xdevalias commented Nov 9, 2022 •

edited

Loading

0xdevalias commented Nov 9, 2022 •

edited

Loading

Coderx7 commented Jul 30, 2023 •

edited

Loading

catboxanon commented Aug 7, 2023 •

edited

Loading

Wynneve commented Oct 14, 2023 •

edited

Loading