Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building for CHERI/Morello #962

Merged
merged 2 commits into from
Aug 17, 2021
Merged

Conversation

jrtc27
Copy link

@jrtc27 jrtc27 commented Aug 6, 2021

These two commits improve the portability of the C implementation such that they can be successfully built for CHERI, and thus Arm's experimental Morello prototype. Please see the individual commits for the detailed rationale behind the changes.

jrtc27 added 2 commits August 6, 2021 23:49
The only integral types guaranteed by the C standard, if they exist, to
support having a pointer cast to them and back are (u)intptr_t. On most
architectures, size_t and uintptr_t are typedefs for the same underlying
type, so this code ends up working. However, on CHERI, and thus Arm's
experimental Morello prototype, C language pointers are implemented with
hardware capabilities, which are unforgeable pointers with bounds and
permissions. This means that, whilst size_t remains a plain 32/64-bit
integer size, (u)intotr_t is represented with a capability. Casting to
size_t and back to a pointer causes the capability metadata to be lost
and the resulting capability to be invalid, meaning it will trap when
dereferenced. Instead, use uintptr_t, and provide fallback definitions
for old versions of MSVC like for the other C99 integer types.
…pointers

If (u)intptr_t exist, the only guarantee provided by the C standard is
that you can cast a pointer to one and back. No claims are made about
what that injective pointer-to-integer mapping is, nor the surjective
reverse integer-to-pointer mapping; for example, nowhere is it specified
that, given a char *p, p + 1 == (char *)((uintptr_t)p + 1), although no
sensible implementation would make that not the case (provided p is a
valid strictly in bounds pointer).

With real pointers, taking them out of bounds (other than the one
exception that is a one-past-the-end pointer) of their corresponding
allocation is UB. Whilst the semantics of arithmetic on uintptr_t is not
specified when cast back to a pointer, compilers already assume that
uintptr_t arithmetic does not go out of bounds (or, put another way,
that the result always ends up pointing to the same allocation) and
their alias analysis-based optimisations already assume this. CHERI, and
Arm's Morello, implement C language pointers with hardware capabilities,
which are unforgeable pointers with bounds and permissions. In order to
only double rather than quadruple the size of pointers, CHERI exploits
C's requirement that pointers not be taken out of bounds, and compresses
these bounds using a floating-point-like representation. The important
implication of this for most software is that if a capability, i.e. C
pointer in CHERI C, is taken too far out of bounds (where "too far" is
proportional to the size of the allocation) it can no longer be
represented and thus is invalidated, meaning it will trap on a later
dereference. This also extends to (u)intptr_t, which are also
implemented as capabilities, since if it were just a plain integer then
casting a pointer to (u)intptr_t would lose the metadata and break the
ability to cast back to a pointer.

Whilst the composition of dividing a pointer and then multiplying it
again has a somewhat sensible interpretation of rounding it down (even
though technically no such guarantees are made by the spec), the
individual operations do not, and the division is highly likely to take
the pointer far out of bounds of its allocation and, on CHERI, result in
it being unrepresentable and thus invalidated. Instead, we can perform a
more standard bitwise AND with a mask to clear the low bits, exploiting
the fact that alignments are powers of two; note that technically the
rounding up variant of this does take the pointer out of bounds slightly
and, whilst there are other ways to write such code that avoid doing so,
they are more cumbersome, and this code does not need to worry about
that.
@codecov-commenter
Copy link

Codecov Report

Merging #962 (823caa8) into c_master (5d30e42) will not change coverage.
The diff coverage is 66.66%.

@@            Coverage Diff            @@
##           c_master     #962   +/-   ##
=========================================
  Coverage     55.45%   55.45%           
=========================================
  Files             8        8           
  Lines          1044     1044           
=========================================
  Hits            579      579           
  Misses          465      465           

@redboltz
Copy link
Contributor

Thank you for sending the PR.
The very detailed commit comments are helpful to understand.

I will port the PR to C++ version.

@redboltz redboltz merged commit 2ebe884 into msgpack:c_master Aug 17, 2021
redboltz added a commit to redboltz/msgpack-c that referenced this pull request Aug 29, 2021
Improved alignment calculation logic.
redboltz added a commit to redboltz/msgpack-c that referenced this pull request Aug 29, 2021
Improved alignment calculation logic.
redboltz added a commit to redboltz/msgpack-c that referenced this pull request Aug 29, 2021
Improved alignment calculation logic.
Fixed test for zone.
Now, align parameter must be 2^n (n >=0). e.g. 1,2,4,8,16, ...
@redboltz redboltz mentioned this pull request Aug 29, 2021
redboltz added a commit that referenced this pull request Aug 29, 2021
@redboltz redboltz added the C label Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants