Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHERI prep plumbing #267

Closed
wants to merge 26 commits into from
Closed

Conversation

nwf
Copy link
Collaborator

@nwf nwf commented Dec 16, 2020

This is atop #265 and #266 and contains some other small changes and the next steps in preparation for CHERI / StrictProvenance tracking (with an eye towards MTE, too, but that's quite distant). It is not yet finished -- I have a pile more changes on my desk to see it through -- but I am curious for CI's, and others', opinions.

To begin, read docs/StrictProvenance.md for an overview and some rationale for this proposal.

Then: are the aliases in src/ds/ptrwrap.h too ugly as they are? They depend pretty heavily on inlining and SROA to avoid an indirection, and so I am quite open to suggestions to replace them with something better.

In terms of the AAL, there isn't much new, just exposing CSetBounds (ptrauth_bound) and CSetAddress (ptrauth_rebound), basically. The AddressSpaceManager has grown a very coarse AuthMap and ptrauth_amplify method, which gets plumbed up to Allocs via the LargeAlloc.

The changes to Alloc are just the beginning but should give a good hint as to where things are going. I'll amend this with some more commits as I de-tangle the hideous mess I've made chasing types.

@nwf nwf requested review from mjp41 and davidchisnall December 16, 2020 15:06
@nwf nwf force-pushed the 202012-cheri-prep-plumbing branch from f9213aa to 9736b37 Compare December 16, 2020 17:22
@nwf
Copy link
Collaborator Author

nwf commented Dec 16, 2020

9736b37 is just some fixes for CI; let's see how it feels now.

@nwf nwf force-pushed the 202012-cheri-prep-plumbing branch from 9736b37 to 4583532 Compare December 18, 2020 18:30
@nwf
Copy link
Collaborator Author

nwf commented Dec 18, 2020

CI fixes and a little more detangling of the wad of changes.

@nwf nwf force-pushed the 202012-cheri-prep-plumbing branch 2 times, most recently from 4fff265 to 1b7494d Compare January 5, 2021 11:27
Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a really nice refactor that very naturally adds Provenance to snmalloc. I think I have two concerns,

  • Have any of the changes affected the codegen of fast paths without strict provenance. You are passing more variable, so this can affect the calling convention, and thus might not be optimised away even when not used.
  • FreePtr is being put into the space of allocations, but this appears to be unbounded? This seems like it is very likely to be a place for exploit if we don't have full temporal safety,

* Allocslab, which reads the allocator field, which should
* have common offset.
*/
#ifdef __clang__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only __clang__? Also, as this is all coming from Allocslab? Should we really use a Allocslab::get and then downcasts to Superslab and Mediumslab?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An AllocSlab::get() would certainly make sense and at this particular site, we don't need any downcasting. I'd still want to keep the assert, tho', because while I can't really imagine a C++ compiler deciding not to lay out class members for a single inheritance chain as concatenated structs, I'm sure someone will find some epsilon performance improvement by considering something else.

Non-clang objected that Superslab and Mediumslab were not "standard layout classes" thanks to their inheritance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so you believe it is possible for upcast actually to change the offset, i.e.

   Superslab* p; 
   Allocslab* a = reinterpret_cast<AllocSlab*>(p);
   assert (reinterpret_cast<void*>(p) == reinterpret_cast<void*>(a)); 

the assertion could fail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I... I am just a caveman, and C++ terrifies me, as the line goes, and reinterpret_cast does so more than most things in this language. Having dug around quite a bit in C++20, though, I think that that particular assertion cannot fail:

  • reinterpret_cast<Allocslab*>(p) is defined (7.6.1.9 clause 7) to be static_cast<Allocslab*>(static_cast<void*>(p)).
  • static_cast<void*>(p) is defined (7.6.1.8 clause 4; 12.4.3.1; 7.3.11 clause 2) to have the same pointer value as p.
  • static_cast<Allocslab*>(...) is also defined to leave the pointer value alone (7.6.1.8 clause 13) because Superslab and Allocslab objects are not pointer interconvertable (6.8.2).
  • reinterpret_cast<void*>(x) is static_cast<void*>(static_cast<void*>(x)) and this leaves the pointer value unchanged by the above and the observation that the outer cast is satisfied by an empty implicit conversion sequence.

Unfortunately, I think the case at hand is subtly different: those conversions are defined to leave the pointer value unchanged but are not necessarily defined to result in sensible pointers. In particular, I think there is risk that, despite its type, a is not actually left pointing at an Allocslab, due to the trip through void* induced by the reinterpret_cast. Consider, instead:

Superslab *s;
Allocslab *sa = static_cast<Allocslab*>(sa);
Mediumslab *m = reinterpret_cast<Mediumslab*>(s); // same value as s, as above
Allocslab *ma = static_cast<Allocslab*>(ma);
assert (reinterpret_cast<void*>(sa) == reinterpret_cast<void*>(ma));

Here, the static_casts are governed by (I think, eventually) 7.3.11 clause 3, which lacks any note that the pointer value is unchanged, even if the base class is the first or sole base class. Pointer interconvertability would save us only if our classes were standard-layout, but they're not, because Baseslab has mixed access control (as did Allocslab prior to this PR; 11.2 clauses 3.3 and 3.4). Again, I haven't the foggiest why anyone would actually use this seemingly permitted freedom in the standard, but here we are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for you to leave the static_assert in place. But I think guarding it for clang is negating its utility. It is going to probably be some other C++ compiler that causes the issue, I am pretty certain offsetof is in the standard now, so this should be valid on any C++ compiler.

{
#ifdef SNMALLOC_PASS_THROUGH
return external_alloc::malloc_usable_size(const_cast<void*>(p));
#else
// This must be called on an external pointer.
size_t size = ChunkMap::get(address_cast(p));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there performance implications for this change? Does it generate the same asm code where everything is effectively static?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, very belatedly, took some time to look at this in more detail. Clang's able, if both alloc_size and external_pointer are de-static-ed, to generate almost the same code, differing just in register selection, for the perf/external_pointer test. (Oddly, changing just one makes it perform worse, but since we're not doing that...)

src/mem/alloc.h Outdated
@@ -405,13 +405,13 @@ namespace snmalloc
}

template<Boundary location = Start>
static void* external_pointer(void* p)
void* external_pointer(void* p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does de-staticing external_pointer generate the same code?

* knowledge that it isn't actually headed out to the user.
*/
template<typename T, typename U = T>
SNMALLOC_FAST_PATH static FreePtr<T> unsafe_mk_freeptr(AuthPtr<U> p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems dangerous? I guess all calls to the functions in this file need careful review. This will have massive bounds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Ideally, this (or this kind of cast) would be used only in the non-StrictProvenance case as just type-level trickery around a no-op.

I'll go sprinkle commentary at each use site about why I think it's justified there.

src/mem/alloc.h Outdated
@@ -348,6 +351,7 @@ namespace snmalloc

uint8_t size = chunkmap().get(address_cast(p));
AuthPtr<void> p_auth = mk_authptr(p);
FreePtr<void> p_free = unsafe_mk_freeptr<void>(p_auth);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this okay?

src/mem/alloc.h Outdated
@@ -839,7 +861,9 @@ namespace snmalloc
AuthPtr<void> prev_auth = mk_authptr(prev);
Superslab* super = Superslab::get(prev_auth);
Slab* slab = Metaslab::get_slab(prev_auth);
small_dealloc_offseted_inner(super, slab, prev, i);
FreePtr<FreeListEntry> prev_free =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I am a bit worried by this, doesn't it have to have its bounds restricted?

@@ -195,7 +284,7 @@ namespace snmalloc
size_t block_size;
if constexpr (pal_supports<AlignedAllocation, PAL>)
{
block_size = PAL::minimum_alloc_size;
block_size = auth_map.alloc_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size is greater then auth_map.alloc_size does this code work? It will not allocate enough memory to service the request, and it will also need to update multiple authmap entries?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoop, yes, of course.

@@ -330,15 +339,14 @@ namespace snmalloc
return;
}
dealloc_sized_slow(p_auth, p_free, size);
#endif
}

SNMALLOC_SLOW_PATH
void
dealloc_sized_slow(AuthPtr<void> p_auth, FreePtr<void> p_free, size_t size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these register pressure concerns affect the codegen without StrictProvenance? dealloc_sized_slow will not be inlined, so the calling of them even in tail calls may affect the codegen for the fast path.

static_assert(
!aal_supports<StrictProvenance, AAL> ||
(bits::next_pow2_const(alloc_size) == alloc_size),
"Provenace root granule size must be a power of two");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment explaining why please.

}
};

using AuthPagemap = std::conditional_t<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments for things like this - it's a complex expression that it's very easy to misread!

@@ -13,7 +99,7 @@ namespace snmalloc
* It cannot unreserve memory, so this does not require the
* usual complexity of a buddy allocator.
*/
template<SNMALLOC_CONCEPT(ConceptPAL) PAL>
template<SNMALLOC_CONCEPT(ConceptPAL) PAL, typename PrimAlloc>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend the doc comment to explain what PrimAlloc is.

{
return get(address_cast(p));
return get(address_cast(p.unsafe_return_ptr));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be cleaner if ReturnPtr defined an address_cast member, then we're not requiring access to the raw pointer.

@@ -22,7 +22,7 @@ namespace snmalloc
uint16_t free;
uint8_t head;
uint8_t sizeclass;
uint16_t stack[SLAB_COUNT - 1];
FreePtr<void> stack[SLAB_COUNT - 1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks as if it's a big increase in the metadata size for a medium slab (especially on CHERI, where each entry goes from 2 to 16 bytes). Is it necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this would push the meta-data size over a page. I would be careful that there aren't corner cases that are bad here. The increase is probably not going to affect much in terms of performance or memory usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch re: crossing a page boundary. I had converted these to FreePtr<void>s so that we didn't have to ptrauth_bound on the allocation path and so that, when the time comes, the version tags would be available implicitly... but

  • the CSetBounds is probably not very expensive and
  • it's might be worth instead having uint16_t stack[SLAB_COUNT-1] and uint8_t stack_memver[SLAB_COUNT-1]; this will necessitate further AAL methods for memory versioning, but that's probably fine.

If we do decide that CSetBounds+CSetVersion on the alloc path and CGetVersion on free is collectively too pricey, I wonder if it might be worth having a pair of SLAB_COUNT/2-sized stacks, one of which lives on a free page. That'd keep everything small enough and shouldn't add much code complexity. But that's a discussion for another PR, really.

nwf-msr and others added 20 commits January 7, 2021 03:11
AIUI there's nothing that requires that the `allocator` field of an Allocslab
be at a constant offset within two derived classes (specifically, Mediumslab
and Superslab).  Add a static_assert that they have equal offsets in an attempt
to justify our almost-surely UB static dispatch to Superslab's get_allocator().
We're going to need to amplify the pointer and that's going to require access
to our AddressSpaceManager, which we only get non-statically through our
LargeAlloc.
Like alloc_size, this will require amplification internally
When post()-ing the RemoteCache to message queues, we push an entire bucket
onto a remote allocator's incoming queue (specifically, the allocator owning
the front Remote in the bucket we're moving).  In order to do that, we need to
exceed the bounds of the Remote allocation and reference its Allocslab header
(to get the ->message_queue).  On StrictProvenance architectures, this will
require that we amplify the head Remote* and then engage in some pointer math.

While Remotes contain the address of the message_queue as the allocator's
identity, but this may not be a pointer, just an address, and may have
undergone obfuscation anyway.
This lets us use Pagemaps without requiring dllists in scope
This will let us use Pagemaps further down the dependency stack (specifically,
we're going to want a Pagemap inside the AddressSpaceManager) by letting us
manually tie the knot rather than rely on GlobalVirtual and
default_memory_provider() being defined by the time we want a Pagemap.
While here, pull out some constants to their own header.  Eventually we'll
want to match on AalFeatures in the AAL Concept.
We're going to use these to annotate pointers on StrictProvenance architectures
as being either bounded for return as part of alloc or being used mostly for
their authority rather than what they point at per se.
No functional change, just annotating that these are expected to take suitably
authorized pointers.  Use mk_authptr() everywhere; future commits will
introduce amplification and change function signatures to take AuthPtr-s where
appropriate.
Push mk_authptr()s up towards the surface calls.  Sometimes it's easier to do
the authority-needing operations sooner and pass the result down rather than
pass the authority-bearing pointer itself.
Sign-post via some type-safety whether we are looking at the beginning of an
allocation (FreePtr<void>) or a potentially cache-friendly-offset version
(FreePtr<Remote>, FreePtr<FreeListEntry>).
These capture the primitive architectural operations we are going to use.  At
the moment, since all AALs are not StrictProvenance, the only implementations
are stubs that just subvert the type system (but give us something to compile
against, going forward).
* The AddressSpaceManager now requests address space in AAL-controlled granule
sizes and maintains a (somewhat erroneously named) Pagemap sparse array / tree
for these provenance roots.  Nothing is stored on non-StrictProvenance
architectures.

* Plumb access to amplification roots up through the encapsulators of the
AddressSpaceManager, i.e., the MemoryProviderStateMixin and LargeAlloc.
We're going to want to zero out Remote structures embedded in free allocations
before returning them to the application.  Guided by the type parameter of
FreePtr<>, add a zero_remote() method that lets us safely coerce the carried
pointer type while zeroing the Remote.
Push the unsafety into the the function rather than at its callers.
While here, narrow the ChunkMap interface to require a ReturnPtr for get().
While we can make a ReturnPtr from anything, take this as a hint that we
probably should know the chunkmap state by the time we've progressed to using
FreePtr/AuthPtr.
This avoids re-amplification, but it's on a very slow path and so may not be
worth the register pressure.
This avoids amplification on the size==0 slow path of dealloc_sized_slow.  As
we are already carrying p_auth here, there's no additional register pressure,
so this may be an acceptable change even if it isn't likely to matter much.
Replace mk_authptr()s with ptrauth_amplify, which reaches down to the
AddressSpaceManager.  On non-StrictProvenance architectures, this is really
just an elaborate path to mk_authptr(), but on StrictProvenance architectures,
this is the moment of truth from which we rederive our authority to access our
in-slab metadata.
@nwf nwf force-pushed the 202012-cheri-prep-plumbing branch from 1b7494d to ef95d70 Compare January 7, 2021 03:11
@nwf nwf mentioned this pull request Mar 13, 2021
@nwf nwf closed this Mar 22, 2021
@nwf nwf deleted the 202012-cheri-prep-plumbing branch June 30, 2021 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants