Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Reusing model to load the next leads to different image with same seed (Lora functionality related?) #13516

Open
1 task done
yoyoinneverland opened this issue Oct 5, 2023 · 33 comments
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@yoyoinneverland
Copy link

yoyoinneverland commented Oct 5, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

Same seed, same VENV, same device, same driver generating two different images depending on which models is loaded first due to reusing loaded models to load the next. Tends to happen only with the first model loaded. Going from Model B to Model A to Model C and back to Model A will have the same result as going from B to A only. Although this bit is replicated 95% of the time, there are some situations in which actually hopping between many models has an effect, but it's hard to pinpoint when.

Steps to reproduce the problem

For this example, we'll have Model A and Model B, different models.

  1. Launch A1111 with Model A
  2. Generate image [IMG A]
    01278-EMP
  3. Switch to Model B
  4. Restart A1111
  5. Model B is now loaded first
  6. Switch to Model A
  7. Attempt to replicate [IMG A]
  8. [IMG B] is generated instead
    01283-EMP

What should have happened?

Generation shouldn't differ. Even when switching and loading x or y model first, generation should stay the same.

Sysinfo

sysinfo-2023-10-05-21-03.txt

What browsers do you use to access the UI ?

Mozilla Firefox

Console logs

venv "D:\SD\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: --xformers --no-half-vae
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[-] ADetailer initialized. version: 23.9.3, num models: 9
2023-10-05 14:11:24,546 - ControlNet - INFO - ControlNet v1.1.410
ControlNet preprocessor location: D:\SD\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\downloads
2023-10-05 14:11:24,646 - ControlNet - INFO - ControlNet v1.1.410
Loading weights [2fcdee6e9c] from D:\SD\stable-diffusion-webui\models\Stable-diffusion\based66_v30.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: D:\SD\stable-diffusion-webui\configs\v1-inference.yaml
Startup time: 10.3s (prepare environment: 2.1s, import torch: 2.8s, import gradio: 0.7s, setup paths: 0.6s, initialize shared: 0.2s, other imports: 0.5s, setup codeformer: 0.1s, list SD models: 0.2s, load scripts: 2.1s, create ui: 0.5s, gradio launch: 0.4s).
Loading VAE weights specified in settings: D:\SD\stable-diffusion-webui\models\VAE\klF8Anime2.safetensors
Applying attention optimization: xformers... done.
Model loaded in 4.7s (load weights from disk: 0.8s, create model: 1.0s, apply weights to model: 0.8s, apply dtype to VAE: 0.9s, load VAE: 0.2s, calculate empty prompt: 1.0s).
Reusing loaded model based66_v30.safetensors [2fcdee6e9c] to load AnythingV5Ink_v5PrtRE.safetensors [7f96a1a9ca]
Loading weights [7f96a1a9ca] from D:\SD\stable-diffusion-webui\models\Stable-diffusion\AnythingV5Ink_v5PrtRE.safetensors
Loading VAE weights specified in settings: D:\SD\stable-diffusion-webui\models\VAE\klF8Anime2.safetensors
Applying attention optimization: xformers... done.
Weights loaded in 1.8s (send model to cpu: 0.7s, load weights from disk: 0.2s, apply weights to model: 0.3s, load VAE: 0.1s, move model to device: 0.5s).
Loading VAE weights specified in settings: D:\SD\stable-diffusion-webui\models\VAE\vaeFtMse840000.safetensors
Applying attention optimization: xformers... done.
VAE weights loaded.
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:03<00:00,  7.79it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.47it/s]
Loading VAE weights specified in settings: D:\SD\stable-diffusion-webui\models\VAE\klF8Anime2.safetensors00,  7.51it/s]
Applying attention optimization: xformers... done.
VAE weights loaded.
Total progress: 100%|██████████████████████████████████████████████████████████████████| 50/50 [00:12<00:00,  3.95it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 50/50 [00:12<00:00,  7.51it/s]

Additional information

Can be fixed by increasing the amount of models that can be loaded at the same time, although not optimal and only so many can be loaded into the RAM until you run out... so as long you avoid reusing models to load another, you should be fine, but alas...

(This was also replicated in the latest build of A1111)

@yoyoinneverland yoyoinneverland added the bug-report Report of a bug, yet to be confirmed label Oct 5, 2023
@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 5, 2023

Got rid of

   elif len(model_data.loaded_sd_models) > 0:
    sd_model = model_data.loaded_sd_models.pop()
    model_data.sd_model = sd_model

    sd_vae.base_vae = getattr(sd_model, "base_vae", None)
    sd_vae.loaded_vae_file = getattr(sd_model, "loaded_vae_file", None)
    sd_vae.checkpoint_info = sd_model.sd_checkpoint_info

    print(f"Reusing loaded model {sd_model.sd_checkpoint_info.title} to load {checkpoint_info.title}")
    return sd_model

on sd_models.py, so it doesn't reuse the model anymore. The problem no longer exists but it introduces a new issue when trying to load a model that had been loaded before within the same instance.
--Edit 1:
After more testing, it's best to set sd_checkpoints_limit to the highest your RAM can handle and revert the aforementioned changes, so you have some leeway until the problem comes back.
-- Edit 2:
I've forced sdwebui to not take anything from the older model, although it creates an error because it still tries to bring over some data to the next, simply retrying will load the next model anew. While this is a nicer fix, I still don't know why models are behaving like this.

For anyone willing to try it out, this is what I changed (it was checking for the limit of checkpoints allowed in ram before, so I skipped it and forced it to compare to 0.

    if len(model_data.loaded_sd_models) > 0:  Line 673, sd_models.py

--Edit 3:
After some more time reading the code and following the error that was introduced by Edit 2, I've managed to effectively neuter sdwebui trying to move data to the next model

    if shared.opts.sd_checkpoints_keep_in_cpu:
        send_model_to_cpu(sd_model)
        timer.record("send model to cpu")

By removing this ^in sd_models.py, in No longer will it try to move stuff around, leading to bad seeds and bad gens. I know this was coded for the sake of optimizing the loader, but it's brought me nothing but issues. I also replicated the bug on a new installation of Win10 with an older nvidia driver.

I'll keep an eye out for errors, but I think this is it and the loader should no longer bother anyone anymore, or at least, not me.

@altoiddealer
Copy link

I personally just encountered this issue now (been using A1111 every day), but I've been 'already up to date' for weeks when I git pull every day.

Which is interesting because this issue was just opened 16 hours ago.

@AndreyRGW
Copy link
Contributor

I personally just encountered this issue now (been using A1111 every day), but I've been 'already up to date' for weeks when I git pull every day.

Which is interesting because this issue was just opened 16 hours ago.

I may have a similar bug too, #13473

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 6, 2023

I personally just encountered this issue now (been using A1111 every day), but I've been 'already up to date' for weeks when I git pull every day.

Which is interesting because this issue was just opened 16 hours ago.

I believe this was introduced with #12227, around two months ago. People will only notice if they tried replicating generations, otherwise, they will never. There are also models that don't affect the outcome that much, so chances are you used a model that didn't change your gens in a way you'd notice.

I believe Orangemix2 -> AnythingV5 had a smaller impact of around 10% on the generated image, while Based66 -> AnythingV5 can change the gen entirely. So I guess it's about models that differ too much from each other, mainly. Still, webui shouldn't do a soft merge of these.

By the way, despite the loader being optimized to go from one model to another, I've found that disabling the transfer of any data between the two, will quicken the loading of models to the point that I now wait about 2s compared to 5s, without any secondary effects (so far).

I fear however this is plaguing recent previews of models on Civitai, because people are unaware this is corrupting their gens and therefore uploading imagery that won't be easily replicated unless they launched webui with that model specifically.

@Enferlain
Copy link

Enferlain commented Oct 6, 2023

For me it happens even when I don't change models and the only thing that happened is quit a1111 and restarted it (maybe in a new console/venv activation, not sure, cuz sometimes it doesn't change) but can also change if I don't restart it at all unless I'm going sc hizo

People will only notice if they tried replicating generations

Yeah I always save good gens to run hires fix on later, and it's been really annoying not being able to replicate the same result

@yoyoinneverland
Copy link
Author

For me it happens even when I don't change models and the only thing that happened is quit a1111 and restarted it (maybe in a new console/venv activation, not sure, cuz sometimes it doesn't change) but can also change if I don't restart it at all unless I'm going sc hizo

People will only notice if they tried replicating generations

Yeah I always save good gens to run hires fix on later, and it's been really annoying not being able to replicate the same result

Switching lora/lycoris weights used to corrupt them until I changed models. It happens less often nowadays, but it happens sometimes. Maybe that's related to your issue. Either way, I've also seen these changes randomly.

@Enferlain
Copy link

For me it happens even when I don't change models and the only thing that happened is quit a1111 and restarted it (maybe in a new console/venv activation, not sure, cuz sometimes it doesn't change) but can also change if I don't restart it at all unless I'm going sc hizo

People will only notice if they tried replicating generations

Yeah I always save good gens to run hires fix on later, and it's been really annoying not being able to replicate the same result

Switching lora/lycoris weights used to corrupt them until I changed models. It happens less often nowadays, but it happens sometimes. Maybe that's related to your issue. Either way, I've also seen these changes randomly.

Dunno about switching. I know that when I used xyz with the addnet extension, when it changed models they kept bleeding into each other so that might be related, but that doesn't happen with the built in extra networks at least. I also don't switch around the lora in the prompt when this happens, the prompt stays identical, the only thing that changes is a1111 being restarted

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 6, 2023

For me it happens even when I don't change models and the only thing that happened is quit a1111 and restarted it (maybe in a new console/venv activation, not sure, cuz sometimes it doesn't change) but can also change if I don't restart it at all unless I'm going sc hizo

People will only notice if they tried replicating generations

Yeah I always save good gens to run hires fix on later, and it's been really annoying not being able to replicate the same result

Switching lora/lycoris weights used to corrupt them until I changed models. It happens less often nowadays, but it happens sometimes. Maybe that's related to your issue. Either way, I've also seen these changes randomly.

Dunno about switching. I know that when I used xyz with the addnet extension, when it changed models they kept bleeding into each other so that might be related, but that doesn't happen with the built in extra networks at least. I also don't switch around the lora in the prompt when this happens, the prompt stays identical, the only thing that changes is a1111 being restarted

How long does it take you to reproduce this?

@Enferlain
Copy link

I'll give it a go in a bit, I'm not at my pc now. I'll try just restarting and also restarting venv, and also changing models and see if there is a difference in what happens between each of these and I'll edit this msg

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 7, 2023

An update on this. Gens remain the same after editing out those bits of code. Seems good to go, although not the best fix. You can look at my fork and grab the sd_models.py if you want to try it yourself.

Edit:
False alarm about the last post. The model required extra files, leading to bad gens.

@thihamin
Copy link

thihamin commented Oct 8, 2023

@yoyoinneverland are you saying that your commit is still a valid fix to get the same result when doing high res fix? master...yoyoinneverland:stable-diffusion-webui-nomod:master

@yoyoinneverland
Copy link
Author

@yoyoinneverland are you saying that your commit is still a valid fix to get the same result when doing high res fix? master...yoyoinneverland:stable-diffusion-webui-nomod:master

When using your average model fetched from civitai, yeah.

@yoyoinneverland
Copy link
Author

Update:
After more research, I've discovered something. It's not the models themselves that are leaking into each other, but rather, the loras. Here's a basic picture depicting what I mean.
loraexample
As demonstrated in the picture, generations that do not use Loras, do not show a single difference after swapping, versus images using Loras.
Would need more testing, but by emptying the model container, we're also removing the loras attached to it, although sometimes loras can remain attached even so. Something is corrupting loras to some extent after swapping checkpoints.

If you're not using loras, this should not affect you.

@yoyoinneverland yoyoinneverland changed the title [Bug]: Reusing model to load the next leads to different image with same seed [Bug]: Reusing model to load the next leads to different image with same seed (Lora functionality related?) Oct 8, 2023
@AndreyRGW
Copy link
Contributor

Update:
After more research, I've discovered something. It's not the models themselves that are leaking into each other, but rather, the loras. Here's a basic picture depicting what I mean.
loraexample
As demonstrated in the picture, generations that do not use Loras, do not show a single difference after swapping, versus images using Loras.
Would need more testing, but by emptying the model container, we're also removing the loras attached to it, although sometimes loras can remain attached even so. Something is corrupting loras to some extent after swapping checkpoints.

If you're not using loras, this should not affect you.

So that's why I could generate images in which I didn't use lora, exactly the same as the first time....

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 8, 2023

Update:
After more research, I've discovered something. It's not the models themselves that are leaking into each other, but rather, the loras. Here's a basic picture depicting what I mean.
loraexample
As demonstrated in the picture, generations that do not use Loras, do not show a single difference after swapping, versus images using Loras.
Would need more testing, but by emptying the model container, we're also removing the loras attached to it, although sometimes loras can remain attached even so. Something is corrupting loras to some extent after swapping checkpoints.
If you're not using loras, this should not affect you.

So that's why I could generate images in which I didn't use lora, exactly the same as the first time....

That seems to be the case... so for now, anyone using Loras and swapping models without a worry, is going to face the harsh reality of corrupted gens when they try to reproduce a past gen.

A surefire way not to corrupt loras with would be setting the checkpoint limit to something like 10. Otherwise you are risking doing it again.

Now that I think about it, it's also happened by changing the weights of the then LoCon/lycoris that required an extension to use, within the same checkpoint. So it could be that this error was introduced when LoCon functionality was merged into A1111.

@Enferlain
Copy link

Enferlain commented Oct 8, 2023

I noticed a few weeks ago while using the additional networks extension with loras for xyz between models, that all the images looked like the first model bled into the other 10 models next to it. You're saying that's related to what's happening here? By the way, I tested the modified sd_models.py, but I still wasn't able to replicate stuff I saved while doing xyzs from a few days ago.

I'll try to check stuff where I wasn't using a lora and see if those results changed

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 8, 2023

I noticed a few weeks ago while using the additional networks extension with loras for xyz between models, that all the images looked like the first model bled into the other 10 models next to it. You're saying that's related to what's happening here? By the way, I tested the modified sd_models.py, but I still wasn't able to replicate stuff I saved while doing xyzs from a few days ago.

Well, it seems to be triggered by using Lora/Lycoris/Locon.
About replicating with the modified sd_models.py, I ran into this issue too, but most of the gens I made with the modified file still work, some don't, however. Still needs more research.

I'll be running more tests in the meantime.

-----EDIT 1

I had an idea and played with the weights of a gen I couldn't reproduce anymore. Lo and behold, decreasing the weights by -0.001 to 0.02 took me closer to the original gen. It does seem that the weights increase with corrupted gens.

If any of you still have access to the loras and the prompt, do try decreasing the strength of some of the loras by 0.001 to 0.02.

@mariaWitch
Copy link

mariaWitch commented Oct 9, 2023

I noticed a few weeks ago while using the additional networks extension with loras for xyz between models, that all the images looked like the first model bled into the other 10 models next to it. You're saying that's related to what's happening here? By the way, I tested the modified sd_models.py, but I still wasn't able to replicate stuff I saved while doing xyzs from a few days ago.

I'll try to check stuff where I wasn't using a lora and see if those results changed

So given the nature of the issue, any images you generated after switching models should be irreproducible on the fixed version, since in theory fixing the issue would cause those images to be different. You may want to try reproducing images that you made on v1.5.x instead, and that might be a better way to test.

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 9, 2023

I noticed a few weeks ago while using the additional networks extension with loras for xyz between models, that all the images looked like the first model bled into the other 10 models next to it. You're saying that's related to what's happening here? By the way, I tested the modified sd_models.py, but I still wasn't able to replicate stuff I saved while doing xyzs from a few days ago.
I'll try to check stuff where I wasn't using a lora and see if those results changed

So given the nature of the issue, any images you generated after switching models is probably irreproducible on the fixed version, since in theory fixing the issue would cause those images to be different. You may want to try reproducing images that you made on v1.5.x instead, and that might be a better way to test.

Some gens can be reproduced to some extent if you still remember the order you swapped models around. It helps to check out the outputs right before the gen you wish to reproduce and looking at the date and time. Not always 100% the same but it does come close. So in this case it might be best to use the unedited file so you can trigger the bug on purpose.

@AG-w
Copy link

AG-w commented Oct 11, 2023

I just met the same problem
so what is the actual solution or workaround to prevent it happened again?

can It be fixed by simply force unload model like this pull?
master...yoyoinneverland:stable-diffusion-webui-nomod:master

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 11, 2023

I just met the same problem so what is the actual solution or workaround to prevent it happened again?

can It be fixed by simply force unload model like this pull? master...yoyoinneverland:stable-diffusion-webui-nomod:master

Yes, but it can introduce problems with niche and obscure models. I haven't tried merging models with it, so it's recommended you use it only for loading and generating with sd1.5 models. If needed, I'll look into the XL loader too.
An alternative to this is increasing the amount of models stored in the RAM, in the settings of webui.

Both work.

I'd like to add that someone issued a fix for an unrelated issue in the dev branch, so it might be best you use the alternative fix for the time being and then update into that, when it's available.

@AG-w
Copy link

AG-w commented Oct 13, 2023

If you want to actually fix it, do you need to fix lora or remove "reusing model" function?
I'm not sure how it cause problem if you unload model then load new one in fresh?

@yoyoinneverland
Copy link
Author

If you want to actually fix it, do you need to fix lora or remove "reusing model" function? I'm not sure how it cause problem if you unload model then load new one in fresh?

Well, you'd have to bring back the old way models were loaded two months ago. I think that will fix it for sure.

@Enferlain
Copy link

I think the modified sd_models made it so that when I switch models it doesn't get unloaded from vram

@AG-w
Copy link

AG-w commented Oct 17, 2023

I'm not sure how that modified sd_models.py keep model from unload

my change is a little different to above
I replace ">" to ">=" like this
if len(model_data.loaded_sd_models) >= shared.opts.sd_checkpoints_limit > 0:

@yoyoinneverland
Copy link
Author

yoyoinneverland commented Oct 21, 2023 via email

@psykhosisis
Copy link

I noticed a few weeks ago while using the additional networks extension with loras for xyz between models, that all the images looked like the first model bled into the other 10 models next to it. You're saying that's related to what's happening here? By the way, I tested the modified sd_models.py, but I still wasn't able to replicate stuff I saved while doing xyzs from a few days ago.

Well, it seems to be triggered by using Lora/Lycoris/Locon. About replicating with the modified sd_models.py, I ran into this issue too, but most of the gens I made with the modified file still work, some don't, however. Still needs more research.

I'll be running more tests in the meantime.

-----EDIT 1

I had an idea and played with the weights of a gen I couldn't reproduce anymore. Lo and behold, decreasing the weights by -0.001 to 0.02 took me closer to the original gen. It does seem that the weights increase with corrupted gens.

If any of you still have access to the loras and the prompt, do try decreasing the strength of some of the loras by 0.001 to 0.02

confirming with evidence. when attempting to recreate this generation here to play with highresfix etc on it.

2022103003274015252- capabilityXL_v10 -DPM++ 3M SDE Exponential-4 0-87-1368-768-999666338

identical parameters & lora weight etc. produced the next image. (the hands) :(

2022103003274015282- capabilityXL_v10 -DPM++ 3M SDE Exponential-4-87-1368-768-999666338

however dropping my lora weight from (1.0) to (0.999) rectifies it :)

2022103003274015283- capabilityXL_v10 -DPM++ 3M SDE Exponential-4-87-1368-768-999666338

i have had wild fluctuations in image reproduction over the last few months. this micro lowering of the lora weights did not help (nor ANYthing else)

the most recent Nvidia driver update (537.58), seemed to eliminate this for me until i noticed this tonight.

hope this can shed light for one of you and help to remove this toxic bug <3 :)

@yoyoinneverland
Copy link
Author

Hello Psykhosisis. You mentioned the last driver eliminated this issue for you? I like your gen, by the way.

@psykhosisis
Copy link

Hello Psykhosisis. You mentioned the last driver eliminated this issue for you? I like your gen, by the way.

it appears to have reduced it significantly at the very least, i am able to replicate results when i have needed to:)

@catboxanon
Copy link
Collaborator

catboxanon commented Nov 3, 2023

If someone runs a git bisect w/o any extensions using a known good commit, that will actually lead to a proper solution and not some code patching guess work.

@yoyoinneverland

This comment was marked as off-topic.

@Enferlain
Copy link

Eh. Updated to dev since I read some things to fixed, but I still get different gens going from yesterday into today.

@miguel234457
Copy link

still having this issue as of now, someone can look into this again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

10 participants