Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performances issues #51

Closed
jvoisin opened this issue Nov 2, 2021 · 6 comments
Closed

Performances issues #51

jvoisin opened this issue Nov 2, 2021 · 6 comments

Comments

@jvoisin
Copy link
Contributor

jvoisin commented Nov 2, 2021

isoalloc was disabled by default in mimalloc-bench, since "it takes too long on the shbench tests".

It would be nice to investigate why it's so slow, and to fix this behaviour.

@struct
Copy link
Owner

struct commented Nov 4, 2021

It's not too surprising. There are certain workloads where isoalloc creates a ton of zones and then spends a lot of time in a critical section holding a lock while iterating each zone to find which one owns a specific chunk. There may even be a configuration of default zones or cache sizes that solves it.

@struct
Copy link
Owner

struct commented Nov 14, 2021

The sh8bench.c code seems to leak a lot of memory. Still investigating.

@struct
Copy link
Owner

struct commented Nov 20, 2021

Output from the isoalloc heap profiler:

allocated=238005832
sampled=23987
backtrace_hash=0x4cb4,calls=23455
backtrace_hash=0xf3f4,calls=532
16,4,95218
32,1,23980
64,6,142537
128,8,166491
256,1,23961
512,1,23961
1024,1,23950
2048,7,119293
4096,1,23921
8192,6,110576
9168,6,109470
34568,11,227923

@struct
Copy link
Owner

struct commented Nov 20, 2021

I've got the runtime of the sh8bench down to ~48 seconds (from ~3 minutes) through configuration of zone sizes. The larger user zone sizes the less zone iteration and locking must be done on alloc/free, and the faster it is. This is still way too slow but its unfortunately not a simple fix. The allocator is designed around the idea of zone isolation. If zones are too big then we lose the security properties of isolation. This is motivating me to finally put more time into the profiler CLI to produce a configuration that slides between perf and security that is consumable at runtime, not just build time.

time LD_PRELOAD=/code/isoalloc/build/libisoalloc.so ./sh8bench


Total elapsed time for 4 threads: 48.00 (190.6090 CPU)

real	0m48.056s
user	3m10.009s
sys	0m0.772s

@struct
Copy link
Owner

struct commented Nov 26, 2021

Latest commit adds support for a lookup table which significantly improves performance. However it doesn't solve the core design issue around lock contention as you add threads. The sh8bench mark performs better with 4 threads but still not great with 8. I have ideas for solving this but Im going to continue incrementally adding performance improvements whenever I can.

@struct
Copy link
Owner

struct commented Jan 2, 2022

Performance is going to be a long standing issue we iteratively make improvements on. Given recent changes I am going to close this issue but happy to work on specific performance related issues that get filed in the future!

@struct struct closed this as completed Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants