Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimalloc test #47062

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from
Draft

Mimalloc test #47062

wants to merge 27 commits into from

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Oct 5, 2022

This is just a very rough draft of what changing the malloc for the GC allocations could look like. I used the statically linked version of it because it seemed the easiest.

@kpamnany

@gbaraldi
Copy link
Member Author

gbaraldi commented Oct 5, 2022

The GCBenchmarks suite doesn't seem to show much of a difference here since most allocations it does don't use the malloc allocator but our own. There might be some bugs 😄

@JeffBezanson JeffBezanson added the GC Garbage collector label Oct 5, 2022
Comment on lines +1 to +3
## jll artifact
MIMALLOC_JLL_NAME := mimalloc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## jll artifact
MIMALLOC_JLL_NAME := mimalloc
# -*- makefile -*-
## jll artifact
MIMALLOC_JLL_NAME := mimalloc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this will need a bit of love, if we decide it's worth

@@ -0,0 +1,9 @@
# This file is a part of Julia. License is MIT: https://julialang.org/license

using Test, Libdl, mimalloc_jll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Libdl is unused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the libuv one, but I think it's wrong in both of them :)

@testset "mimalloc_jll" begin
ptr = ccall((:mi_malloc, mimalloc), Ptr{Cvoid}, (Int,), 4)
@test ptr != C_NULL
ccall((:mi_free, mimalloc), Cvoid, (Ptr{Cvoid},))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the argument?

@oscardssmith
Copy link
Member

oscardssmith commented Nov 7, 2022

I just tested this at @JeffBezanson's suggestion and I'm seeing a 50% improvement for some simple BigInt workflows:
Before:

julia> aux = 19283719823701928731092873019287310928731092378123;
julia> @btime b = $aux + $aux
  26.880 ns (2 allocations: 64 bytes)
38567439647403857462185746038574621857462184756246

After:

julia> aux = 19283719823701928731092873019287310928731092378123;
julia> @btime b = $aux + $aux
  16.627 ns (2 allocations: 64 bytes)

@gbaraldi
Copy link
Member Author

gbaraldi commented Nov 7, 2022

I believe this is worth doing, I need to fix windows though. I also wanted to test mimalloc 1.x since there is some comments that it might be faster. Also play with the options a bit.

@oscardssmith
Copy link
Member

IMO we should get this merged once windows works. 1.x might be worth testing, but 2.x is a clear improvement over what we have now so if you get windows working, I would vote to merge and improve from there.

@KristofferC
Copy link
Member

I just tested this at @JeffBezanson's suggestion and I'm seeing a 50% improvement for some simple BigInt workflows:

Note that you are only testing one sample there, the very fastest one. For allocators that keep state among the different executions of the benchmarked function, that number might not tell the whole story.

@oscardssmith
Copy link
Member

Good point.
Before:

julia> @benchmark b = $aux + $aux
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  26.336 ns … 101.416 μs  ┊ GC (min … max):  0.00% … 85.47%
 Time  (median):     29.807 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   91.968 ns ±   1.814 μs  ┊ GC (mean ± σ):  41.83% ±  2.22%

  ▃▇█▇▆▇▇▆▅▄▃▄▅▅▅▄▃▂▂▁▁▁         ▁▁▁                           ▂
  █████████████████████████▇▇▇▇█████▇▇▇▇▇▇▇▆▆▇▇▆▆▄▆▆▅▄▅▃▄▃▄▃▄▄ █
  26.3 ns       Histogram: log(frequency) by time        62 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

After:

julia> @benchmark b = $aux + $aux
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  18.076 ns … 170.878 μs  ┊ GC (min … max):  0.00% … 75.65%
 Time  (median):     23.817 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   52.637 ns ±   2.063 μs  ┊ GC (mean ± σ):  41.17% ±  1.07%

      ▁       ▃██▄        ▁▃▃▁                                  
  ▁▃▅██▅▃▂▂▂▂▆████▇▄▃▂▃▃▄▇████▆▅▅▄▄▄▄▃▃▂▂▃▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▁▁▁▁▁ ▃
  18.1 ns         Histogram: frequency by time         33.4 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

@gbaraldi
Copy link
Member Author

gbaraldi commented Nov 7, 2022

I'm not sure why it's making such a difference here. I wouldn't expect this to go through malloc. And we aren't overriding malloc for everyone. Though if it goes through a GC counted malloc then it makes sense.

@gbaraldi
Copy link
Member Author

gbaraldi commented Nov 7, 2022

Also windows seems to be overriding it's malloc for everything which is a bit surprising 🤔

@oscardssmith
Copy link
Member

BigInts go through gc counted malloc because GMP will sometimes try to realloc or free them.

@gbaraldi gbaraldi marked this pull request as ready for review November 9, 2022 15:27
@gbaraldi
Copy link
Member Author

So windows is still broken for some reason :(

@gbaraldi
Copy link
Member Author

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@BioTurboNick
Copy link
Contributor

I tried this PR out on my use case (reading and processing data from TIFFs and saving that data).

With this PR:

  • Trial 1: 12.2 G held after execution, dropped to 7.8 G after GC.gc()/malloc_trim
  • Trial 2: 10.2 G to 8.1 G

Master:

  • Trial 1: 8.9 G held after execution, dropped to 3.8 G after GC.gc()/malloc_trim
  • Trial 2: 9.2 G to 3.7 G

Timing was comparable.

@gbaraldi
Copy link
Member Author

@BioTurboNick Could you share the code, maybe in private. So that I could study what is going on?

@BioTurboNick
Copy link
Contributor

@BioTurboNick Could you share the code, maybe in private. So that I could study what is going on?

Not sure how much I can share. Is there anything I could run on my end that would collect useful information for you?

@gbaraldi
Copy link
Member Author

Basically I wanted to see why it was holding on to the memory, in my experience it seems to be equal or better at releasing memory. What OS are you using? For reference?

@KristofferC
Copy link
Member

KristofferC commented Nov 28, 2022

I would suggest merging this behind a feature flag (--alloc=mimalloc or just --mimalloc or something). I feel it would be a lot easier to get real-world data if this didn't require recompilation of Julia. Some people (or companies) might have pretty deep pipelines where it is difficult to change out the whole Julia build but easier to add a command line flag.

@BioTurboNick
Copy link
Contributor

Basically I wanted to see why it was holding on to the memory, in my experience it seems to be equal or better at releasing memory. What OS are you using? For reference?

AWS's flavor of Linux

@KristofferC
Copy link
Member

@gbaraldi, this feels like it is pretty much good to go? Maybe add a NEWS entry?

@oscardssmith
Copy link
Member

I've just done the rebase to fix a merge conflict. I don't think this needs a news as our allocator is an internal detail.

@gbaraldi
Copy link
Member Author

gbaraldi commented Jan 5, 2023

I think this might need a news option, since it adds the option to switch. And doesn't switch by default. We might want to have more widespread testing and then commit to a switch.

@KristofferC
Copy link
Member

I don't think this needs a news as our allocator is an internal detail.

What's the point of doing this if no one knows about it?

@oscardssmith
Copy link
Member

I'd missed the fact that this was off by default

@gbaraldi
Copy link
Member Author

gbaraldi commented Jan 6, 2023

Windows has some issues with this PR and I don't know why :(

@BioTurboNick
Copy link
Contributor

The analyzegc tester doesn't like the PR either, maybe it's the root cause?

@gbaraldi
Copy link
Member Author

gbaraldi commented Jan 6, 2023

Analyzegc just needs me to add that the function pointers aren't safepoints. Windows has a bunch of things in the log.

@vtjnash
Copy link
Member

vtjnash commented May 30, 2023

Needs rebase?

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Aug 25, 2023

What's the status of this? I do see:

USE_SYSTEM_MIMALLOC:=0

It's great that you're supporting mimalloc, and I suppose that it will be bundled with and an official CLI option:

--alloc[={default*|mimalloc}

but what I had in mind, why can't you replace the allocator with just any e.g.:

export LD_PRELOAD=/path/to/libhoard.so
julia

I just found this one, and would have liked to test it too:
https://github.com/emeryberger/Hoard

I didn't scan all your changes, a lot seems to have to do with GMP and MPFR, either unrelated, or because they need access to the same implementation of malloc and free as the rest of Julia. I'm a bit confused, why not just use the standard (overridable) libc memory allocation functions? Or at least have that option, to bypass some Julia logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants