-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: slow loading .safetensors when switching to a new model #11216
Comments
This is normal speed if you read the model on HDD. |
It's on a SSD and .ckpt load fast, only .safetensors load extremely slow. |
ckpt is similar to safetensors in speed, only related to storage location. |
You should test the read / write speed of the storage device. |
Did even more tests, issue only happens to .safetensors not to .ckpt, not matter if I use a HDD or a SSD (of course SSD is a bit faster) but even on HDD .ckpt loads in just 26secs not 300secs. @Narsil Maybe you have an idea whats going on? Can this be related to some NVIDIA driver issue? |
I don't know what it could be. The first load is fast, then subsequent loads are slow. This is odd indeed, since normally it should be the other way around (first time, could actually load from disk, while subsequent calls should load from RAM if the model still fits in cache). Scenarios I can imagine:
If you can try a few things I mention here, I could try looking into it, but my windows knowledge is rather limited, and I never reproduced anything of the sort. Another note memmapping is super efficient for local disks, if you're actually running on a mounted network partition, the the memap variant might trigger a lot more READS which are individually slower. (Scenario 2 will help solve it) |
@Narsil FYI setting
So is that something you can fix in safetensors or do we need some option in webui to allow alternative loading method? FYI my webui is running on Ubuntu 20.04 inside WSL2 on Windows 11 (so it's not directly Windows, but accessing Windows drivers). |
Unfortunately, this might be a WSL/Windows things, not really a safetensors (or webui) thing. MMAP (the first load method) can be pretty much instant (like 100us) to load on CPU, while the second So on devices that support mmap, mmap is just always better (since it can skip giving memory to user space, and directly map kernel pages for instance). That being said, mmap can potentially issue a lot more reads if this property is not upholded (which definitely can be the case in WSL, since linux/Windows most likely work differently). There is no way to know about that within reading the whole file is better in your case, because you're essentially issuing a single read which plays better when the overhead is bad. I think having a flag within webui to switch loading methods could be a things (so users than benefit the mmap speed can keep having it, and you can opt-out to not suffer on your platform). |
ok thx, no problem I've created a PR hope it gets accepted @Sakura-Luna so can we please agree on this that it is a bug and I've added a PR for it #11260 |
Adding |
Interesting I've also 64GB |
@freecoderwaifu I've tried the |
These are my changes in the webui-user.bat set safetensors is a big speedup too. Average load speed for me: Without both It's still good you're addressing it with your PR, since I'm not sure what else |
This one shouldn't have any effect for version > |
Fixed in dev after merge of the PR (1419654) |
Still not fixed for me in v1.5.1 |
The issue seems to stem from WSL and memory mapping not playing along very well: Can you confirm ? |
I initially thought it's a WSL issue but it's a general Windows issue as it's also happening without WSL. |
Does item 2 from here #11216 (comment) help ? If so it's definitely a memory map issue, but what's really odd is that I'm never able to reproduce it (I'm using Windows in the cloud because I don't own any such machine anymore :( ) |
That's exactly what the added option does, for me it completely fixed the issue. Did you try the "Disable memmapping for loading .safetensors files" option in settings? |
I was having the same issue, but it would take in the 500s of seconds to load a .safetensor. Changing the option in the WebUI Settings>System>Disable memmapping for loading ..... brought the time down to the 60-80s range for loading them. My files are on a standard HDD though and I will likely move them to a SSD soon. |
Same issue here, load a model from a network device. I use a 2.5G Ethernet Adapter. i using iperf test the etheadsfrnet speed is ok. Maby issue still exists. This is loading time log without lowram option. This is loading time log with lowram option. This is consistent with the network response. |
AFAICT (it's been a while since I looked at the python source code because a different UI was being slow), python's mmap is just a blind implementation of POSIX mmap without things like "advice", etc but calling into the Windows API MapViewOfFile stuff. Maybe more importantly is that Windows memory mapped files support the same complicated set of ACLs that regular files do which I could see causing problems. Windows would be able to do full optimizations on a file mapped for writing or only reading (not copy on write) and with security set to the current user, but python opens them without an ACL (I think; again, it's been a minute) and in "copy on write" mode so Windows has to assume any process can write into the memory and it has to transparently make a copy of the whole mess to keep the originally loaded file intact. If something starts writing into the mapped file to patch it or whatever prior to it fully loading... who knows. If someone or the OEM they bought the machine from turned on memory de-duplication it'd really start killing performance around then, but that should be off by default even on Server. Compression shouldn't kick in that fast. Usually Windows software is either written to do all of this correctly or at least handle it a bit better. Add the fact that "huge pages / large pages" aren't enabled by default on Windows (you'll need to google that, Microsoft explains how to enable it since it's a user permission) and memory pages end up defaulting to a very small size so there's a lot of activity going on creating page tables for the whole mess. |
Hi guys, what's the settings of disable memmapping, and how to pass to the launch.py?I did not find any docs. |
It does what is discussed above, improving the loading speed of .safetensors |
thanks for your quick reply.I am new to sd-webui, is there any examples of settings(what it is and how to pass)? |
@yang-zhiying Hi Zhiying, sorry for the wrong description of my questions. I want to know how to do this when launching the api. |
sorry for my wrong anwser. you can edit the config.json, Change |
Thanks, is there any examples of config.json?And what else configuration(like defaults value) shoulde be included in this config.json? |
Is there an existing issue for this?
What happened?
Is this still a known issue? .safetensors are loading very slow (up to 300secs) if you switch the model in the model dropdown. Models don't load slow if you restart the whole webui.
Steps to reproduce the problem
Switch to another .safetensors model in model dropdown
What should have happened?
Tested on 1.3.2 and also on 1.4.0-RC (Windows WSL2)
Commit where the problem happens
v1.3.2 and above
What Python version are you running on ?
Python 3.10.x
What platforms do you use to access the UI ?
Windows
What device are you running WebUI on?
Nvidia GPUs (RTX 20 above)
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
List of extensions
No
Console logs
Additional information
No response
The text was updated successfully, but these errors were encountered: