-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve slow path performance for allocation #143
Conversation
Change remote to count down 0, so fast path does not need a constant. Use signed value so that branch does not depend on addition.
The fast path of remote_dealloc is sufficiently compact that it can be inlined.
Turn the internal structure into tail calls, to improve fast path. Should be no algorithmic changes.
Break lazy initialisation into two functions, so it is easier to codegen fast paths.
Make the backup path a bit faster. Only algorithmic change is to delay checking for first allocation. Otherwise, should be unchanged.
The fisrt operation a new thread takes is special. It results in allocating an allocator, and swinging it into the TLS. This makes this a very special path, that is rarely tested. This test generates a lot of threads to cover the first alloc and dealloc operations.
Large alloc stats aren't necessarily balanced on a thread, this changes to tracking individual pushs and pops, rather than the net effect (with an unsigned value).
Each allocator has a bump ptr for each size class. This is no longer slab local. Slabs that haven't been fully allocated no longer need to be in the DLL for this sizeclass.
This change reduces the branching in the case of finding a new free list. Using a non-empty cyclic list enables branch free add, and a single branch in remove to detect the empty case.
Use needs initialisation as makes more sense for other scenarios.
f098f39
to
8a8a2f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine, comments need some small cleanups.
Any idea why the pipeline "Microsoft.snmalloc" is not firing? |
This reverts commit a857b92.
This PR addresses #66 |
A segfault during linking? |
@SchrodingerZhu this is in the self-host test. So we are LD_PRELOADing the snmalloc we just built, and it is crashing. This is one of the harder tests to debug, but also provides us a good piece of coverage of the code. This change is pretty big, so I wanted to go above the standard CI level, hence why it has been hanging around for a week or so. I was going to run a lot of benchmarks through it, but haven't had the time yet. With the changes to CI I guess we have hit an issue independently. |
The GlobalPlaceholder allocator is now a zero init block of memory. This removes various issues for when things are initialised. It is made read-only to we detect write to it on some platforms.
f2da4ff
to
65bb8c1
Compare
Make separate bump allocator ptr per thread and size class.
bump allocation.
Refactor code to
The general form of allocation is now
There are special cases interleaved into this
The first occurs between 1 and 2, and the second occurs between, 3 and 4.