Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage regression compared to Windows heap allocator #1042

Open
Zoxc opened this issue Mar 21, 2025 · 5 comments
Open

Memory usage regression compared to Windows heap allocator #1042

Zoxc opened this issue Mar 21, 2025 · 5 comments

Comments

@Zoxc
Copy link

Zoxc commented Mar 21, 2025

Testing out mimalloc (v2.2.2) in the Rust compiler shows some large regressions in physical memory use in some scenarios.

I don't observe these regressions in my Rust port and I suspect it is because I walk the entire list of abandoned segments instead of exiting early. I wonder if a flag to do the same in mimalloc could be added.

@daanx
Copy link
Collaborator

daanx commented Mar 21, 2025

Ah, that is not great -- I'll look into it. However, in the past months there has been a lot of development on mimalloc v3 (the dev3 branch) which is specifically to address such memory issues (and has an improved ownership model which might be better suited for a rust port as well..).

Would it easy for you to try dev3 and see if it improves matters? (in dev3 the idea is to make abandonment cheap and increase sharing of pages between different threads -- working on a writeup but haven't gotten to it yet). Best, Daan

@Zoxc
Copy link
Author

Zoxc commented Mar 21, 2025

It does appear that v3 fixes the memory regression. I did also do a quick performance check of v2.2.2 (Before) and v3 (After) is slower. Is there a change regrading committed memory that's reducing performance? I see that's significantly reduced.

BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check1.2042s1.2288s💔 2.04%147.58 MiB147.71 MiB 0.08%261.81 MiB216.67 MiB💚 -17.24%
🟣 hyper:check0.2061s0.2060s -0.02%80.86 MiB78.94 MiB💚 -2.38%195.31 MiB142.50 MiB💚 -27.04%
🟣 regex:check0.6887s0.7010s💔 1.78%108.12 MiB108.02 MiB -0.09%223.32 MiB169.17 MiB💚 -24.24%
🟣 syn:check1.1358s1.1484s💔 1.10%141.14 MiB143.11 MiB💔 1.39%255.29 MiB209.21 MiB💚 -18.05%
Total3.2348s3.2842s💔 1.53%477.70 MiB477.78 MiB 0.02%935.71 MiB737.56 MiB💚 -21.18%
Summary1.0000s1.0123s💔 1.23%1 byte1.00 bytes -0.25%1 byte0.78 bytes💚 -21.64%

@daanx
Copy link
Collaborator

daanx commented Mar 21, 2025

Ha, that is good to see! In some of our services v3 reduces memory usage by a lot, but on many small benchmarks the difference is usually less pronounced. On my benchmarks v3 is about as fast as v2 -- maybe we can tune it better to eeck out that last 1.23% ; can you try with MIMALLOC_PURGE_DELAY=-1 ? ( Is there a way for me to run the benchmarks like you did above and get such nice report? (if it is not too complex to set up)). Also, is this on Linux/x64 ? Finally, v2.2.2 should really be as fast as 2.1.7 -- I may have changed the abandoned list parameters and it would be good to fix this anyways regardless of the new v3 version. Thanks again!
(ps. if you are up for it, maybe send me an email sometime and we could chat about your Rust port? )

@Zoxc
Copy link
Author

Zoxc commented Mar 22, 2025

If you want to do a local build you'd need my mimalloc branch of rustc. You need to enable mimalloc with a bootstrap.toml file:

[rust]
codegen-units = 1
mimalloc = true
deny-warnings = false

The last commit in the branch points to a local checkout of https://github.com/purpleprotocol/mimalloc_rust. That contains a mimalloc submodule which will be used. You can then build the compiler with python x.py build library.

To benchmark the compiler I'm using https://github.com/Zoxc/rcb, see the readme on how to set it up. The run above is ./rcb bench --incr-none -n 40 master~win-mimalloc~9 master~win-mimalloc~8 --details none --check. It was done on Windows 10 x64.

Note that mimalloc is only used for Rust allocations, not for C allocations. In non-check builds LLVM does a few of those.

@Zoxc
Copy link
Author

Zoxc commented Mar 22, 2025

v3 with (After) and without (Before) MIMALLOC_PURGE_DELAY=-1:

BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check1.2034s1.2103s 0.58%147.77 MiB151.25 MiB💔 2.36%216.66 MiB220.73 MiB💔 1.88%
🟣 hyper:check0.1999s0.2004s 0.29%78.94 MiB78.94 MiB 0.00%142.50 MiB142.50 MiB -0.00%
🟣 regex:check0.6826s0.6806s -0.30%108.03 MiB108.03 MiB 0.00%169.17 MiB169.17 MiB -0.00%
🟣 syn:check1.1267s1.1251s -0.14%142.63 MiB144.00 MiB 0.96%208.61 MiB210.71 MiB💔 1.00%
Total3.2126s3.2164s 0.12%477.37 MiB482.23 MiB💔 1.02%736.95 MiB743.11 MiB 0.84%
Summary1.0000s1.0010s 0.10%1 byte1.01 bytes 0.83%1 byte1.01 bytes 0.72%
It doesn't seem to have much effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants