-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHERI prep plumbing #267
CHERI prep plumbing #267
Conversation
f9213aa
to
9736b37
Compare
9736b37 is just some fixes for CI; let's see how it feels now. |
9736b37
to
4583532
Compare
CI fixes and a little more detangling of the wad of changes. |
4fff265
to
1b7494d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is a really nice refactor that very naturally adds Provenance to snmalloc. I think I have two concerns,
- Have any of the changes affected the codegen of fast paths without strict provenance. You are passing more variable, so this can affect the calling convention, and thus might not be optimised away even when not used.
- FreePtr is being put into the space of allocations, but this appears to be unbounded? This seems like it is very likely to be a place for exploit if we don't have full temporal safety,
* Allocslab, which reads the allocator field, which should | ||
* have common offset. | ||
*/ | ||
#ifdef __clang__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only __clang__
? Also, as this is all coming from Allocslab
? Should we really use a Allocslab::get
and then downcasts to Superslab
and Mediumslab
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An AllocSlab::get()
would certainly make sense and at this particular site, we don't need any downcasting. I'd still want to keep the assert, tho', because while I can't really imagine a C++ compiler deciding not to lay out class members for a single inheritance chain as concatenated structs, I'm sure someone will find some epsilon performance improvement by considering something else.
Non-clang objected that Superslab
and Mediumslab
were not "standard layout classes" thanks to their inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so you believe it is possible for upcast actually to change the offset, i.e.
Superslab* p;
Allocslab* a = reinterpret_cast<AllocSlab*>(p);
assert (reinterpret_cast<void*>(p) == reinterpret_cast<void*>(a));
the assertion could fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I... I am just a caveman, and C++ terrifies me, as the line goes, and reinterpret_cast
does so more than most things in this language. Having dug around quite a bit in C++20, though, I think that that particular assertion cannot fail:
reinterpret_cast<Allocslab*>(p)
is defined (7.6.1.9 clause 7) to bestatic_cast<Allocslab*>(static_cast<void*>(p))
.static_cast<void*>(p)
is defined (7.6.1.8 clause 4; 12.4.3.1; 7.3.11 clause 2) to have the same pointer value asp
.static_cast<Allocslab*>(...)
is also defined to leave the pointer value alone (7.6.1.8 clause 13) becauseSuperslab
andAllocslab
objects are not pointer interconvertable (6.8.2).reinterpret_cast<void*>(x)
isstatic_cast<void*>(static_cast<void*>(x))
and this leaves the pointer value unchanged by the above and the observation that the outer cast is satisfied by an empty implicit conversion sequence.
Unfortunately, I think the case at hand is subtly different: those conversions are defined to leave the pointer value unchanged but are not necessarily defined to result in sensible pointers. In particular, I think there is risk that, despite its type, a
is not actually left pointing at an Allocslab
, due to the trip through void*
induced by the reinterpret_cast
. Consider, instead:
Superslab *s;
Allocslab *sa = static_cast<Allocslab*>(sa);
Mediumslab *m = reinterpret_cast<Mediumslab*>(s); // same value as s, as above
Allocslab *ma = static_cast<Allocslab*>(ma);
assert (reinterpret_cast<void*>(sa) == reinterpret_cast<void*>(ma));
Here, the static_cast
s are governed by (I think, eventually) 7.3.11 clause 3, which lacks any note that the pointer value is unchanged, even if the base class is the first or sole base class. Pointer interconvertability would save us only if our classes were standard-layout, but they're not, because Baseslab
has mixed access control (as did Allocslab
prior to this PR; 11.2 clauses 3.3 and 3.4). Again, I haven't the foggiest why anyone would actually use this seemingly permitted freedom in the standard, but here we are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy for you to leave the static_assert
in place. But I think guarding it for clang is negating its utility. It is going to probably be some other C++ compiler that causes the issue, I am pretty certain offsetof
is in the standard now, so this should be valid on any C++ compiler.
{ | ||
#ifdef SNMALLOC_PASS_THROUGH | ||
return external_alloc::malloc_usable_size(const_cast<void*>(p)); | ||
#else | ||
// This must be called on an external pointer. | ||
size_t size = ChunkMap::get(address_cast(p)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there performance implications for this change? Does it generate the same asm code where everything is effectively static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, very belatedly, took some time to look at this in more detail. Clang's able, if both alloc_size
and external_pointer
are de-static
-ed, to generate almost the same code, differing just in register selection, for the perf/external_pointer
test. (Oddly, changing just one makes it perform worse, but since we're not doing that...)
src/mem/alloc.h
Outdated
@@ -405,13 +405,13 @@ namespace snmalloc | |||
} | |||
|
|||
template<Boundary location = Start> | |||
static void* external_pointer(void* p) | |||
void* external_pointer(void* p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does de-staticing external_pointer
generate the same code?
* knowledge that it isn't actually headed out to the user. | ||
*/ | ||
template<typename T, typename U = T> | ||
SNMALLOC_FAST_PATH static FreePtr<T> unsafe_mk_freeptr(AuthPtr<U> p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems dangerous? I guess all calls to the functions in this file need careful review. This will have massive bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Ideally, this (or this kind of cast) would be used only in the non-StrictProvenance
case as just type-level trickery around a no-op.
I'll go sprinkle commentary at each use site about why I think it's justified there.
src/mem/alloc.h
Outdated
@@ -348,6 +351,7 @@ namespace snmalloc | |||
|
|||
uint8_t size = chunkmap().get(address_cast(p)); | |||
AuthPtr<void> p_auth = mk_authptr(p); | |||
FreePtr<void> p_free = unsafe_mk_freeptr<void>(p_auth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this okay?
src/mem/alloc.h
Outdated
@@ -839,7 +861,9 @@ namespace snmalloc | |||
AuthPtr<void> prev_auth = mk_authptr(prev); | |||
Superslab* super = Superslab::get(prev_auth); | |||
Slab* slab = Metaslab::get_slab(prev_auth); | |||
small_dealloc_offseted_inner(super, slab, prev, i); | |||
FreePtr<FreeListEntry> prev_free = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I am a bit worried by this, doesn't it have to have its bounds restricted?
src/mem/address_space.h
Outdated
@@ -195,7 +284,7 @@ namespace snmalloc | |||
size_t block_size; | |||
if constexpr (pal_supports<AlignedAllocation, PAL>) | |||
{ | |||
block_size = PAL::minimum_alloc_size; | |||
block_size = auth_map.alloc_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If size
is greater then auth_map.alloc_size
does this code work? It will not allocate enough memory to service the request, and it will also need to update multiple authmap entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoop, yes, of course.
@@ -330,15 +339,14 @@ namespace snmalloc | |||
return; | |||
} | |||
dealloc_sized_slow(p_auth, p_free, size); | |||
#endif | |||
} | |||
|
|||
SNMALLOC_SLOW_PATH | |||
void | |||
dealloc_sized_slow(AuthPtr<void> p_auth, FreePtr<void> p_free, size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these register pressure concerns affect the codegen without StrictProvenance? dealloc_sized_slow
will not be inlined, so the calling of them even in tail calls may affect the codegen for the fast path.
src/mem/address_space.h
Outdated
static_assert( | ||
!aal_supports<StrictProvenance, AAL> || | ||
(bits::next_pow2_const(alloc_size) == alloc_size), | ||
"Provenace root granule size must be a power of two"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment explaining why please.
} | ||
}; | ||
|
||
using AuthPagemap = std::conditional_t< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments for things like this - it's a complex expression that it's very easy to misread!
@@ -13,7 +99,7 @@ namespace snmalloc | |||
* It cannot unreserve memory, so this does not require the | |||
* usual complexity of a buddy allocator. | |||
*/ | |||
template<SNMALLOC_CONCEPT(ConceptPAL) PAL> | |||
template<SNMALLOC_CONCEPT(ConceptPAL) PAL, typename PrimAlloc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend the doc comment to explain what PrimAlloc
is.
{ | ||
return get(address_cast(p)); | ||
return get(address_cast(p.unsafe_return_ptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be cleaner if ReturnPtr
defined an address_cast
member, then we're not requiring access to the raw pointer.
@@ -22,7 +22,7 @@ namespace snmalloc | |||
uint16_t free; | |||
uint8_t head; | |||
uint8_t sizeclass; | |||
uint16_t stack[SLAB_COUNT - 1]; | |||
FreePtr<void> stack[SLAB_COUNT - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks as if it's a big increase in the metadata size for a medium slab (especially on CHERI, where each entry goes from 2 to 16 bytes). Is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this would push the meta-data size over a page. I would be careful that there aren't corner cases that are bad here. The increase is probably not going to affect much in terms of performance or memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch re: crossing a page boundary. I had converted these to FreePtr<void>
s so that we didn't have to ptrauth_bound
on the allocation path and so that, when the time comes, the version tags would be available implicitly... but
- the
CSetBounds
is probably not very expensive and - it's might be worth instead having
uint16_t stack[SLAB_COUNT-1]
anduint8_t stack_memver[SLAB_COUNT-1]
; this will necessitate further AAL methods for memory versioning, but that's probably fine.
If we do decide that CSetBounds
+CSetVersion
on the alloc path and CGetVersion
on free is collectively too pricey, I wonder if it might be worth having a pair of SLAB_COUNT/2-sized stacks, one of which lives on a free page. That'd keep everything small enough and shouldn't add much code complexity. But that's a discussion for another PR, really.
AIUI there's nothing that requires that the `allocator` field of an Allocslab be at a constant offset within two derived classes (specifically, Mediumslab and Superslab). Add a static_assert that they have equal offsets in an attempt to justify our almost-surely UB static dispatch to Superslab's get_allocator().
We're going to need to amplify the pointer and that's going to require access to our AddressSpaceManager, which we only get non-statically through our LargeAlloc.
Like alloc_size, this will require amplification internally
When post()-ing the RemoteCache to message queues, we push an entire bucket onto a remote allocator's incoming queue (specifically, the allocator owning the front Remote in the bucket we're moving). In order to do that, we need to exceed the bounds of the Remote allocation and reference its Allocslab header (to get the ->message_queue). On StrictProvenance architectures, this will require that we amplify the head Remote* and then engage in some pointer math. While Remotes contain the address of the message_queue as the allocator's identity, but this may not be a pointer, just an address, and may have undergone obfuscation anyway.
This lets us use Pagemaps without requiring dllists in scope
This will let us use Pagemaps further down the dependency stack (specifically, we're going to want a Pagemap inside the AddressSpaceManager) by letting us manually tie the knot rather than rely on GlobalVirtual and default_memory_provider() being defined by the time we want a Pagemap.
While here, pull out some constants to their own header. Eventually we'll want to match on AalFeatures in the AAL Concept.
We're going to use these to annotate pointers on StrictProvenance architectures as being either bounded for return as part of alloc or being used mostly for their authority rather than what they point at per se.
No functional change, just annotating that these are expected to take suitably authorized pointers. Use mk_authptr() everywhere; future commits will introduce amplification and change function signatures to take AuthPtr-s where appropriate.
Push mk_authptr()s up towards the surface calls. Sometimes it's easier to do the authority-needing operations sooner and pass the result down rather than pass the authority-bearing pointer itself.
Sign-post via some type-safety whether we are looking at the beginning of an allocation (FreePtr<void>) or a potentially cache-friendly-offset version (FreePtr<Remote>, FreePtr<FreeListEntry>).
These capture the primitive architectural operations we are going to use. At the moment, since all AALs are not StrictProvenance, the only implementations are stubs that just subvert the type system (but give us something to compile against, going forward).
* The AddressSpaceManager now requests address space in AAL-controlled granule sizes and maintains a (somewhat erroneously named) Pagemap sparse array / tree for these provenance roots. Nothing is stored on non-StrictProvenance architectures. * Plumb access to amplification roots up through the encapsulators of the AddressSpaceManager, i.e., the MemoryProviderStateMixin and LargeAlloc.
We're going to want to zero out Remote structures embedded in free allocations before returning them to the application. Guided by the type parameter of FreePtr<>, add a zero_remote() method that lets us safely coerce the carried pointer type while zeroing the Remote.
Push the unsafety into the the function rather than at its callers.
While here, narrow the ChunkMap interface to require a ReturnPtr for get(). While we can make a ReturnPtr from anything, take this as a hint that we probably should know the chunkmap state by the time we've progressed to using FreePtr/AuthPtr.
This avoids re-amplification, but it's on a very slow path and so may not be worth the register pressure.
This avoids amplification on the size==0 slow path of dealloc_sized_slow. As we are already carrying p_auth here, there's no additional register pressure, so this may be an acceptable change even if it isn't likely to matter much.
Replace mk_authptr()s with ptrauth_amplify, which reaches down to the AddressSpaceManager. On non-StrictProvenance architectures, this is really just an elaborate path to mk_authptr(), but on StrictProvenance architectures, this is the moment of truth from which we rederive our authority to access our in-slab metadata.
1b7494d
to
ef95d70
Compare
This is atop #265 and #266 and contains some other small changes and the next steps in preparation for CHERI / StrictProvenance tracking (with an eye towards MTE, too, but that's quite distant). It is not yet finished -- I have a pile more changes on my desk to see it through -- but I am curious for CI's, and others', opinions.
To begin, read
docs/StrictProvenance.md
for an overview and some rationale for this proposal.Then: are the aliases in
src/ds/ptrwrap.h
too ugly as they are? They depend pretty heavily on inlining and SROA to avoid an indirection, and so I am quite open to suggestions to replace them with something better.In terms of the AAL, there isn't much new, just exposing
CSetBounds
(ptrauth_bound
) andCSetAddress
(ptrauth_rebound
), basically. The AddressSpaceManager has grown a very coarse AuthMap andptrauth_amplify
method, which gets plumbed up toAlloc
s via theLargeAlloc
.The changes to
Alloc
are just the beginning but should give a good hint as to where things are going. I'll amend this with some more commits as I de-tangle the hideous mess I've made chasing types.