-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux.1 Dev, memory issue #4271
Comments
Having this issue as well. I updated yesterday, and now everything runs at half speed compared to before, and I was using fp16 before, and even upon downgrading to fp8 it's still half speed. to be clear, my generation speed itself is still the same it/s. But the time spent loading the models is dramatically longer. My terminal has the same feedback as OP's |
I noticed about 50% performance degradation, with a lower end GPU, but more RAM On a 1024 x 1024 img, I was at 30 s/it and now I'm at 45 s/it |
After some testing, I re-downloaded Portable v.0.0.2 and went to this commit f123328 & I went down to 6.2s/it~ So, pretty safe to say it's a python issue. |
Can you update and test the latest commit to see if things are better? |
For me, there was no improvement. |
I was able to resolve my issue by uninstalling two old versions of python (didnt realize I had them) and reinstalling python & comfyui from scratch via zip extraction. It immediately was able to do fp8 at the previous speed, but still had serious bottlenecking on fp16. I then enabled --highvram flag and it works as fast as before (however, I did not need this flag before) |
I use ComfyUI with ROCm.
With latest version I have
on mentioned here commit f123328 I have
I tried earlier commit eca962c which was faster for me and got
I see degradation of speed each time I update ComfyUI. |
E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build Import times for custom nodes: Starting server To see the GUI go to: http://127.0.0.1:8188 This was a fresh reinstall, a fresh pull, including the one after the "fix" push Gonna try bigbanje's solution next |
I have experienced the same problems (extreme slowdown and very frequent OOM crashes) after the recent update. It returned to normal after rolling back to commit 1aa9cf3 (20G VRAM). |
I've been updating every day, but I'm not sure when my performance degradation started. |
Bigbanje's solution did not work E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --lowvram Import times for custom nodes: Starting server To see the GUI go to: http://127.0.0.1:8188 /////////// Loading in on commit 1aa9cf3 When loading the graph, the following node types were not found:
Nodes that have failed to load will show as red on the graph. /// I do not have any issue what so ever with my CPU, like others do. E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --lowvram Import times for custom nodes: Starting server To see the GUI go to: http://127.0.0.1:8188 |
gonna try to load up v.0.0.2 again, in f123328 & see if I can replicate my own solution - because I suspect it being python being as issue, as it's on v2.4 |
Can you all post your full console output if you have not already? |
It's the full console, sadly |
I'd even screenshot it, due to how little it is |
I will disable --highvram and post my f16 output, one moment |
fp16 without --highvram
fp16 with --highvram (only pasting from "Starting server" and below as the rest was identical)
I know I'm on a Low-RAM, High-VRAM machine, so --highvram fixing my issue isn't too surprising... But I didn't need to do that before. Not sure what the error about 1Torch is about, I seem to get that no matter what I do after an update a few days ago. |
Here is my full console output, after opening Comfy UI, and doing a few flux generations. I'm at 512 x 768 resolution, in fp8 versions, with ~30 s/it. console output
|
|
Same issue here. Was working great but something updated and now it takes 10x longer and goes into lowvram mode. Still trying to figure out what changed |
Can you test if the latest commit improves things? |
[SKIP] Downgrading pip package isn't allowed: transformers (cur=4.38.2) anyway I think things got worse see gpu usage |
No improvement for me.
|
can you share your GPU graph while rendering? |
Massive improvement. I also nuked and redid ComfyUI from source instead of portable with the latest commits and it took 15 seconds vs 2.5 minutes for 10 steps. It still goes into lowvram mode when it shouldn't |
do you sure you can use flux with 8 Gb of ram in full speed? I think its impossible |
I don't want to open a new issue for this but maybe I should - I just want @comfyanonymous to know this is far from an issue only affecting Flux models. My comfyui was working super fine but I decided to update today to try out some flux models (had been about a week since I upgraded) and now ComfyUI is all but unusable on my normal standard SDXL workflows. My iterations per second have dropped to about 1/5th what they were before the update, and whenever I run batches I now OOM and crash completely on any batch >10. I am used to also being able to do things that are not graphic intensive when using ComfyUI - like web browse, discord, etc. but now ComfyUI murders even my system FPS down to around 5. It doesn't appear to really be using my GPU but it's slowly eating up all my system ram until it OOMs (it's happy to try and eat 60GBs of system ram!). I don't have to use Flux for anything, thankfully, so I'm going to just downgrade back to a version that worked OK. I'm just a bit shocked there isn't more noise about this outside of Flux but I guess everyone is just trying Flux these days. 😀 FWIW I don't know where things went sideways for me but I tested and can confirm the tagged version in the 0.0.4 standalone version works without any issues (just the code I'm not messing with actual python libs, etc.) Edit: I don't think it matters at all but just in case
|
If you still have issues on the latest master I need: Your system specs. Your exact workflow. |
I'm running the python & libs from the 0.0.3 standalone install but if there's any particular version info you need for any packages or anything else I haven't included just let me know. The workflow is one i use to test prompts against a collection of checkpoints I have, which seems to cause a good chunk of the issues (obviously since it doesn't OOM until around 10 in). |
Can you try without custom nodes so see if it's a core problem: --disable-all-custom-nodes |
Alright, sorry for the late reply but I tried to test things as extensively as I could here. tl;dr is that it appears to be a core problem, with no custom nodes running (using the flag you provided) I still run into this issue with massive slowdowns, OOMs, crashes, etc. I was able to troubleshoot and find what appears to be the smoking gun on the issue, which are the Lora Loaders. The normal Lora Loader using CLIP seems to be worse than the Model Only Lora Loader, but they both eventually lead to the same OOM errors, GPU crashing, etc. If there's any more data I can provide to help here just let me know. The console window usually dies with these crashes but if there's a log I can provide happy to do so if it would help at all. These tests were done on the v0.0.2-163-gb8ffb29 tag. (release 0.0.7) This issue isn't present on release 0.0.4. I had no intention of hijacking this thread, if you'd like me to open up a new ticket since this appears related to Loras and not the base checkpoint flows I'm happy to do so. |
Updated to b8ffb29 and disabled all custom nodes. I would normally expect this to run in about 19-25 seconds. On the bright side, it appears the VRAM releases the memory upon completion. Made a basic workflow, with no loras or anything extra. System info: Run info: |
RTX4090, 128gb RAM, Windows 10. Brand new install of ComfyUI portable, Latest Flux, it took 4 minutes to generate the demo image (the anime girl with bunny ears) at 11.13s/it using dev model. Feels very slow. It seems to always load in low vram mode (that is not correct right?). |
Thanks! You are my hero! I updated everything and it was almost a 10x slowdown. I uninstalled pytorch and then reinstalled 2.3.1. pip3 uninstall torch Before: |
Thank you. I downloaded the new portable version. It does not seem any faster. Still runs in low vram mode |
I didn't use that portable installer, but that note clued me into torch being an issue. then: |
Can you print the full log and tell me exactly which portable version you downloaded (there are 3 of them). |
I think pytorch is included with the portable version? log states pytorch version: 2.3.1+cu121 |
I've tried two portables so far, though they might actually have been the same https://github.com/comfyanonymous/ComfyUI/releases/tag/v0.0.4 I just set weight type to fp8 and it sooooo much faster (like x50, though now it's not fp16 right) Will continue to test. Next time I restart Comfy I will upload log |
I willingly admit that I don't know much about these things. But Forge has released an update so it's possible to run the nf4 model there, and I tried for a bit I seem to get the same or similar issues on Forge that many seem to get on Comfy. It acted as if it just continued to eat memory nonstop and never cleaned up. I offloaded to my system page, so together with my GPU I hade 60-70GB memory. I could generate maybe 3-4 images, and then I got an error saying that cuda ran out of memory and had to restart my system. Then I could generate another 3-4 images, and then had to restart. That's how it went on. My thought is that maybe the issue is with BitsAndBytes? Forge pretty much copied that part from Comfy. And since the issues seems similar, it kind of makes sense. To me at least. |
If anyone has issues make sure you run the: update/update_comfyui.bat to update ComfyUI first. |
Updated Observations with ComfyUI (2527[b8ffb2]): After further testing, I've found that loading the flux model (either the 16-bit flux1_schnell UNET or speedFP8_e5m2 checkpoint) using the Load Diffusion Model node, and keeping the weight type at its default, resolves the issue. Speeds are similar for both models. However, switching the weight type for either of these models to any of the FP8 options results in a significant slowdown—processing time increases from 19 seconds to 54 seconds per 4 steps. Additionally, if I load the FP8 model (speedFP8_e5m2) in the Checkpoint Loader node, it also becomes 3x slower. I haven't yet tested a non-FP8 flux model since the only versions I've downloaded are FP8 and NF4, besides the UNETs. VRAM releases after generation. The latest CheckpointLoaderNF4 node also releases VRAM, but: Using NF4 node: Load Diffusion Model (default weights): Basic SDXL workflow seems to be functioning well. I’ll continue testing with a562c1 to see how it performs. |
No change on a562c1 for the results of the NF4 node or using FP8 weights. |
Did some misc stuff on and off, running back ups and such. Still on f123328. I went with
And now I'm back to around 5.3s/it, in both lowvram & normalvram mode - it no longer forces me to use lowvram. I'll take the loss of 0.6s/it and be happy with it as is. |
Isn't it the same as running "Update ComfyUI" or "Update All" through the manager and restarting? |
FYI, the ComfyUI update in ComfyUI-Manager does not perform a torch update for safety reasons. |
Thanks @ltdrdata! -- FYI, with the recent update that disables cuda malloc by default, I've add to add |
Threadripper 1900 Was running flux and all last week. Was slow but running. I always get : I tried rolling back ComfyUI git commits, but to no avail. tried flags: Could it be related to python updates (torch or other), rocm updates (I rolled back to 2.3) ? +1 for adding a `summary of logs` :
`pip freeze` :
|
You can try my docker compose container and see how it works I have posted a docker-comppse recipe for getting ComfyUI easily up and running |
@hartmark Thanks. Installed docker and got it up and running. It works now that way. Too lazy to go back to Fedora to troubleshoot or convert it to podman ... |
Glad your got it working. |
I'm on this page because my Flux is taking 10+ minutes to render, suddenly. If this matters: Flux was working so fast in Pinokio-managed Comfy yesterday morning, faster than ever on my Windows 4080 rtx 16g VRam, all models! Was using Primarily Dev.1, then I noticed my Nvidia driver needed the update from early August. Ran that update, and now my best Purz-inspired workflow will not run, error in terminal of low VRAM, and Flux take minutes instead of seconds to produce an image. |
are you able to share what it says on the terminal where it is running... sometimes i see errors there and tried to backtrack the changes i made... |
Expected Behavior
I expect no issues. I had installed comfyui anew a couple days ago, no issues, 4.6 seconds per iteration~
Actual Behavior
After updating, I'm now experiencing 20 seconds per iteration.
Steps to Reproduce
Install the newest version + update the python deps.
Debug Logs
Other
I have no idea why it has f'd up.
The text was updated successfully, but these errors were encountered: