Skip to content

Commit

Permalink
Improve slow path performance for allocation (#143)
Browse files Browse the repository at this point in the history
* Remote dealloc refactor.

* Improve remote dealloc

Change remote to count down to 0, so fast path does not need a constant.

Use signed value so that branch does not depend on addition.

* Inline remote_dealloc

The fast path of remote_dealloc is sufficiently compact that it can be
inlined.

* Improve fast path in Slab::alloc

Turn the internal structure into tail calls, to improve fast path.
Should be no algorithmic changes.

* Refactor initialisation to help fast path.

Break lazy initialisation into two functions, so it is easier to codegen
fast paths.

* Minor tidy to statically sized dealloc.

* Refactor semi-slow path for alloc

Make the backup path a bit faster.  Only algorithmic change is to delay
checking for first allocation. Otherwise, should be unchanged.

* Test initial operation of a thread

The first operation a new thread takes is special.  It results in
allocating an allocator, and swinging it into the TLS.  This makes
this a very special path, that is rarely tested.  This test generates
a lot of threads to cover the first alloc and dealloc operations.

* Correctly handle reusing get_noncachable

* Fix large alloc stats

Large alloc stats aren't necessarily balanced on a thread, this changes
to tracking individual pushs and pops, rather than the net effect
(with an unsigned value).

* Fix TLS init on large alloc path

* Add Bump ptrs to allocator

Each allocator has a bump ptr for each size class.  This is no longer
slab local.

Slabs that haven't been fully allocated no longer need to be in the DLL
for this sizeclass.

* Change to a cycle non-empty list

This change reduces the branching in the case of finding a new free
list. Using a non-empty cyclic list enables branch free add, and a
single branch in remove to detect the empty case.

* Update differences

* Rename first allocation

Use needs initialisation as makes more sense for other scenarios.

* Use a ptrdiff to help with zero init.

* Make GlobalPlaceholder zero init

The GlobalPlaceholder allocator is now a zero init block of memory.
This removes various issues for when things are initialised. It is made read-only
to we detect write to it on some platforms.
  • Loading branch information
mjp41 authored Mar 31, 2020
1 parent ecef894 commit d900e29
Show file tree
Hide file tree
Showing 20 changed files with 688 additions and 237 deletions.
5 changes: 4 additions & 1 deletion difference.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ This document outlines the changes that have diverged from
4. We now store a direct pointer to the next element in each slabs free list
rather than a relative offset into the slab. This enables list
calculation on the fast path.


5. There is a single bump-ptr per size class that is part of the
allocator structure. The per size class slab list now only contains slabs
with free list, and not if it only has a bump ptr.

[2-4] Are changes that are directly inspired by
(mimalloc)[http://github.com/microsoft/mimalloc].
20 changes: 20 additions & 0 deletions src/ds/address.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,15 @@ namespace snmalloc
return reinterpret_cast<T*>(reinterpret_cast<char*>(base) + diff);
}

/**
* Perform pointer arithmetic and return the adjusted pointer.
*/
template<typename T>
inline T* pointer_offset_signed(T* base, ptrdiff_t diff)
{
return reinterpret_cast<T*>(reinterpret_cast<char*>(base) + diff);
}

/**
* Cast from a pointer type to an address.
*/
Expand Down Expand Up @@ -125,4 +134,15 @@ namespace snmalloc
return static_cast<size_t>(
static_cast<char*>(cursor) - static_cast<char*>(base));
}

/**
* Compute the difference in pointers in units of char. This can be used
* across allocations.
*/
inline ptrdiff_t pointer_diff_signed(void* base, void* cursor)
{
return static_cast<ptrdiff_t>(
static_cast<char*>(cursor) - static_cast<char*>(base));
}

} // namespace snmalloc
4 changes: 2 additions & 2 deletions src/ds/bits.h
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ namespace snmalloc
*
* `std::min` is in `<algorithm>`, so pulls in a lot of unneccessary code
* We write our own to reduce the code that potentially needs reviewing.
**/
*/
template<typename T>
constexpr inline T min(T t1, T t2)
{
Expand All @@ -341,7 +341,7 @@ namespace snmalloc
*
* `std::max` is in `<algorithm>`, so pulls in a lot of unneccessary code
* We write our own to reduce the code that potentially needs reviewing.
**/
*/
template<typename T>
constexpr inline T max(T t1, T t2)
{
Expand Down
122 changes: 122 additions & 0 deletions src/ds/cdllist.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#pragma once

#include "defines.h"

#include <cstdint>
#include <type_traits>

namespace snmalloc
{
/**
* Special class for cyclic doubly linked non-empty linked list
*
* This code assumes there is always one element in the list. The client
* must ensure there is a sentinal element.
*/
class CDLLNode
{
/**
* to_next is used to handle a zero initialised data structure.
* This means that `is_empty` works even when the constructor hasn't
* been run.
*/
ptrdiff_t to_next = 0;

// TODO: CHERI will need a real pointer too
// CDLLNode* next = nullptr;
CDLLNode* prev = nullptr;

void set_next(CDLLNode* c)
{
// TODO: CHERI will need a real pointer too
// next = c;
to_next = pointer_diff_signed(this, c);
}

public:
/**
* Single element cyclic list. This is the empty case.
*/
CDLLNode()
{
set_next(this);
prev = this;
}

SNMALLOC_FAST_PATH bool is_empty()
{
return to_next == 0;
}

/**
* Removes this element from the cyclic list is it part of.
*/
SNMALLOC_FAST_PATH void remove()
{
SNMALLOC_ASSERT(!is_empty());
debug_check();
get_next()->prev = prev;
prev->set_next(get_next());
// As this is no longer in the list, check invariant for
// neighbouring element.
get_next()->debug_check();

#ifndef NDEBUG
set_next(nullptr);
prev = nullptr;
#endif
}

SNMALLOC_FAST_PATH CDLLNode* get_next()
{
// TODO: CHERI will require a real pointer
// return next;
return pointer_offset_signed(this, to_next);
}

SNMALLOC_FAST_PATH CDLLNode* get_prev()
{
return prev;
}

SNMALLOC_FAST_PATH void insert_next(CDLLNode* item)
{
debug_check();
item->set_next(get_next());
get_next()->prev = item;
item->prev = this;
set_next(item);
debug_check();
}

SNMALLOC_FAST_PATH void insert_prev(CDLLNode* item)
{
debug_check();
item->prev = prev;
prev->set_next(item);
item->set_next(this);
prev = item;
debug_check();
}

/**
* Checks the lists invariants
* x->next->prev = x
* for all x in the list.
*/
void debug_check()
{
#ifndef NDEBUG
CDLLNode* item = get_next();
CDLLNode* p = this;

do
{
SNMALLOC_ASSERT(item->prev == p);
p = item;
item = item->get_next();
} while (item != this);
#endif
}
};
} // namespace snmalloc
8 changes: 4 additions & 4 deletions src/ds/dllist.h
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,12 @@ namespace snmalloc
return *this;
}

bool is_empty()
SNMALLOC_FAST_PATH bool is_empty()
{
return head == Terminator();
}

T* get_head()
SNMALLOC_FAST_PATH T* get_head()
{
return head;
}
Expand All @@ -109,7 +109,7 @@ namespace snmalloc
return tail;
}

T* pop()
SNMALLOC_FAST_PATH T* pop()
{
T* item = head;

Expand Down Expand Up @@ -169,7 +169,7 @@ namespace snmalloc
#endif
}

void remove(T* item)
SNMALLOC_FAST_PATH void remove(T* item)
{
#ifndef NDEBUG
debug_check_contains(item);
Expand Down
2 changes: 1 addition & 1 deletion src/ds/helpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ namespace snmalloc
*
* Wraps on read. This allows code to trust the value is in range, even when
* there is a memory corruption.
**/
*/
template<size_t length, typename T>
class Mod
{
Expand Down
Loading

0 comments on commit d900e29

Please sign in to comment.