-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero Page Optimization #2400
Zero Page Optimization #2400
Conversation
This comment has been minimized.
This comment has been minimized.
How is this constant set? Is it a language item? Is it usable on stable? Is it set in the linker config script? Does it default to just the null pointer? I'm ok with the principle behind this RFC, but for any embedded/bare metal/kernel development, this needs to be configurable on stable. Otherwise, it will be impossible to write things like bootloaders or microcontrollers on stable rust in some cases, since it is often necessary to use the lower bytes in (for example) 8- or 16-bit modes. Also, I think this should be independent of page size entirely. For mainstream OSes like Linux, the "null range" happens to be the first page because most MMUs cannot enforce finer-grain controls, but there is no fundamental reason why the compiler should be tied to the same constraint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I say anything on whether Rust ought to do this, I want to register my confusion about the claim that the zero page is already effectively assumed to exist by today's Rust.
text/0000-zero-page-optimization.md
Outdated
|
||
Inside Rust std, we rely on the assumption that zero page exists: | ||
|
||
https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this link supports the claim that std assumes a zero page. It links to ZST allocation, but ZST allocation can hand out whatever pointers it wants. It could return something as ridiculous as the address of main
, if that is suitably aligned!
And besides, alignment can be much larger than the page size, so the linked line can create pointers not on the zero page.
text/0000-zero-page-optimization.md
Outdated
To make things worse, such usage is also seen outside std, on crates that compile | ||
on stable Rust: | ||
|
||
https://github.com/rust-lang-nursery/futures-rs/blob/856fde847d4062f5d2af5d85d6640028297a10f1/futures-util/src/lock.rs#L157-L169 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code, too, doesn't seem to me like it assumes that a zero page exists. It stores an address in an AtomicUsize
and assumes 1
can't be such an address, but that assumption is true because of alignment (it's storing the address of a Waker, which contains a pointer[1]), not because of anything about the zero page.
[1] it is theoretically conceivable to have a platform where pointers are just one byte or where pointers can be unaligned, but Rust doesn't support any such targets, and even if it did futures-util could simply add #[repr(align(2))]
to Waker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that we can use a few bits for embedding data when alignment is involved. However, we can't call the current BiLock
code sound; as you mentioned, it assumes alignment on pointers to exist, without any comments indicating that.
text/0000-zero-page-optimization.md
Outdated
always true. For instance, microcontrollers without MMU doesn't implement such | ||
guards at all, and `0` is a valid address where the entrypoint lies. See | ||
[Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s | ||
design as one of such example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that address 0 is valid in Cortex-M and that you can’t validly create a Rust reference &T
to it, but it’s not like arbitrary data can end up there by chance. That address is reserved for some early boot detail that most application don’t deal with directly. In the cortext-m-rt
crate there is not even a corresponding Rust item, it is entirely dealt with in the linker script.
So I don’t think there is a problem here in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SimonSapin Are you suggesting that access to 0 should be strictly unsafe
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I’m only saying that the ARM Cortex case is not really relevant to the "Rust makes bad assumptions" argument. But then what do you mean by "access to 0"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. do I have to use *mut _
or *const _
if I want to access part of the "null range"?
Personally, I would like to see the possibility of this optimization (automatically hiding enum variants or small values in the low bits of a pointer), but we also need to make sure people don't rely on non-portable assumptions. |
text/0000-zero-page-optimization.md
Outdated
targeted at people dealing with FFI or unsafe. | ||
|
||
The recently stabilized `NonNull` type will have more strict requirements: | ||
the pointer must be not in the null page, and it must be valid to dereference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify "valid to dereference?" Surely it is not meant that the pointer must point to valid data, as dangling
is also stable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this part of the RFC sounds incorrect. ptr::NonNull
is a pointer that is not null. It makes no guarantee beyond that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it need to be changed to have a new type for this then, as this conflicts with how NonNull works currently. The new type would have similar semantics to a reference, where it always points to valid data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this might be a consequence of terribly unfortunate timing, but I think it is wise to tread carefully given how recently NonNull
was stabilized, and how it was clearly intended to be the defacto way to receive Rust's zero-discriminant optimizations. In particular, any of the following:
- introducing a replacement for
NonNull<T>
- removing the promise that
Option<NonNull<T>>
is the same size asNonNull<T>
- deprecating
NonNull<T>
so early after its stabilization will send a poor message about what it means for something to become stable in rust.
On embedded systems where 0 is a valid address, 0 itself is usually some form of interrupt vector or entry point, unlikely to be a valid code or data pointer. But the same can’t be said for the entire first page – indeed, you can get tiny ARM microcontrollers with as little as 4KB of flash total, and it’s mapped at 0! So we’d have to make sure to turn this off for (even potentially) freestanding targets. On the other end, 64-bit macOS and iOS by default reserves a whole 4GB of memory starting at 0. |
The code mentioned here is not strictly sound, but in practice no way to exploit such unsoundness exists. This RFC is just proposing a better way to present those enumerations; I'll update the wordings. |
Presence of NonNull and pointers do not imply that these addresses are
invalid in Rust, but rather that the addresses used in conjunction with
those constructs are reserved to have an alternative meaning. As long as
deriving the meaning does not involve reading memory at the addresses, it
is sound and entirely fine.
For an empty slice and ZST addresses, any address is correct, because those
addresses will never be dereferenced and cannot possibly clash with
anything else. The same is true for the heap "allocations" code. NonZero is
slightly more tricky as you end up being unable to put your
reserved-but-valid addresses into one and preserving its meaning, but
that's also entriely fine because in 4kB MCUs there's only a few ways to
provide a view into e.g. ISR table and avoiding NonNull there should be a
fairly easy task (but not that easy anymore when NonNull reserves the whole
addressable space)
With that in mind the cost-to-benefit ratio of additional reserved
addresses seems way too high to me, but I'm also sympathetic to enablement
of optimisations in more places so I'm ambivalent wrt this RFC at this time.
…On Fri, Apr 13, 2018, 04:44 Who? Me?! ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In text/0000-zero-page-optimization.md
<#2400 (comment)>:
> +# Motivation
+[motivation]: #motivation
+
+Modern operating systems normally [traps null pointer access](https://en.wikipedia.org/wiki/Zero_page).
+This means valid pointers will never take values inside the zero page, and we
+can exploit this for ~12 bits of storage for secondary variants.
+
+Inside Rust std, we rely on the assumption that zero page exists:
+
+https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238
+
+However, this is not something that is documented in the nomicon, neither it's
+always true. For instance, microcontrollers without MMU doesn't implement such
+guards at all, and `0` is a valid address where the entrypoint lies. See
+[Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s
+design as one of such example.
i.e. do I have to use *mut _ or *const _ if I want to access part of the
"null range"?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2400 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0rQrn11d5IxUBvZZ9hDhektuWgKwks5toAL7gaJpZM4TRZ6A>
.
|
can exploit this for ~12 bits of storage for secondary variants. | ||
|
||
[Inside Rust std](https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238), | ||
we use a "dangling" pointer for ZST allocations; this involves a somewhat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand how this relates at all to the motivation for this RFC.
text/0000-zero-page-optimization.md
Outdated
|
||
The recently stabilized `NonNull` type will have more strict requirements: | ||
the pointer must be not in the null page. `NonNull::dangling` will be | ||
deprecated in favor of this optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's deprecated, what's the replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In favour of the zero page optimization. That is, using an enumeration instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. You propose that the current way to get a NonNull
that is non-null and aligned is deprecated. What non-deprecated thing can current users of that method do instead to get a NonNull
with the same properties? that is similarly valid with the new invariant?
(Leaving aside the question of whether it's OK to change the meaning of NonZero
like this after stabilization.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that NonNull::dangling()
seems to be just a hack where Option<NonNull<T>>
should be used. NonNull::dangling()
advocates less idiomatic coding, and Option<NonNull<T>>
should be a perfect fit as a replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a big claim that requires a fair bit of support given that the API was accepted and stabilized.
Furthermore, using Option is not equivalent to using a dangling pointer since it "uses up" the null value: e.g. Vec<T>
contains a NonNull<T>
and this makes Option<Vec<T>>
the same size as Vec<T>
, if it used Option<NonNull<T>>
instead, Option<Vec<T>>
would be bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ishitatsuyuki Sorry, I don’t see how NonNull::dangling
is related to Option<NonNull<_>>
at all. dangling
is for creating an arbitrary pointer that is correctly aligned without being null. It is used for zero-size allocations, for example in Vec
: https://github.com/rust-lang/rust/blob/fb730d75d4c1c05c90419841758300b6fbf01250/src/liballoc/raw_vec.rs#L93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkruppe Can I suggest that a code search only showed usage for "optional allocation", for either ZST or an absent node in the linked data structure? Also, the original intent of this addition seems to be "we need this to interact with allocator": rust-lang/rust#45527
@SimonSapin Using NonNull::dangling
is a convention inside the alloc related functions, but it's not expressed through types. Using an enum makes it less error prone, catching the cases where we may pass an dangling pointer to the underlying allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ishitatsuyuki If I understand correctly, you are saying that if the following occurred:
Vec<T>
instead storedOption<NonNull<T>>
NonNull<T>
was changed to forbid pointers in the null page
Then Option<Vec<T>>
could still receive optimization? If this is the case, it might help to demonstrate this explicitly.
That said, I think part of the concern here is that there are places where Vec<T>
benefits specifically from the fact that dangling()
is aligned. e.g., a slice can be constructed directly from the pointer without having to branch on None
. ISTM that would be impossible when using Option<NonNull<T>>
as it must remain possible to take a reference to the option.
Edit: Or wait... maybe it is possible. The pointer for Some(vec![])
would be null, and the representation of None::<Vec<T>>
would begin with 1
where the Option<NonNull<T>>
is stored. Hm...
Edit 2: but then what about Vec<Option<T>>
? We end up with an Option<NonNull<Option<T>>
whose None
representation is 1, which is not aligned when interpreted as a pointer. Or something like that. My brain hurts.
The motivation for this RFC currently seems weak to me. None of the code cited in the motivation section actually needs a zero page reserved to be valid (if it even relates to low addresses being valid or not). More enum layout optimizations become possible (though non-portably), but how commonly will that apply? Consider that the compiler is already allowed to exploit alignment to get a few extra values for discriminants (this is not currently implemented but it's been discussed a lot), so it would only kick in if you have both a lot of field-less variants and a pointer to a type of low alignment AFAICT. Combine that with the very real concerns of breaking embedded use cases (or at least being incompatible with them, which would mean small microcontrollers that could get the most value out of saving some memory don't get those optimizations) and the other smaller concerns around |
Personally, I'd like to see ways for Rust to more naturally and automatically handle things that C code often does manually, which includes stunts like reusing the low bits of aligned pointers, or knowing that valid pointers can never point into a particular range. I'd much rather have those things handled automatically and consistently by the compiler. I also think, ideally, that we should not require explicit declaration of valid pointer ranges within individual structures containing pointers, nor require every use of a pointer type to include such a declaration. Having some special kind of pointer that excludes the first 4k or 1M or similar would require changing substantial amounts of code to take advantage of an optimization like this. We don't have any similar requirement to take advantage of the optimization of the null-pointer optimization for things like I don't, however, think we should do this so aggressively that we break embedded use cases. I'd like to see people able to write Rust code for platforms where you can have valid data at addresses 4 and 8. And even, with enough care, valid data at address 0, though I don't mind if that requires some special accessors known to not mind null pointers. Given that, I have a question for the people currently objecting to this RFC: would your objections be fully addressed by a feature that was under the full control of the person invoking the compiler, such as via command-line options or optimization options (that would affect the Rust ABI) to specify the range of invalid pointers? (With some careful target-specific defaults, such as for x86 Linux or x86 Windows versus x86 ELF.) That shouldn't break any use case, embedded or otherwise. People creating a new target can determine the correct default values, with the default default being "just the null pointer". People using an embedded target should find that this optimization doesn't apply unless they specifically enable it. And people targeting a platform like x86 Linux but wanting to write code that uses pointers near or at 0 (requiring a change to |
@joshtriplett I think that would be good. It's a bit annoying that this information is often already in linker scripts and configs, though... |
@mark-i-m For embedded applications, kernels, and similar, yes. Standard applications, on the other hand, don't typically have such linker scripts or configs. |
I have code for this, somewhat, where I use the alignment of 🥖 @rust-lang/wg-codegen |
cc @ticki @steveklabnik @phil-opp @SergioBenitez (adding folks doing OS work in Rust) |
I'm iffy on this; a lot of the problems that could be caused by this may not be immediately obvious in a crater run. Especially for FFI-using applications that don't get tested via crater since crater doesn't know how to build them (or if they're on crates.io). I'd rather just do a new type period.
This optimization is pretty common in C++ codebases, manually done. |
A new "optimized" reference type could have many cool benefits, not just this:
wrt embedded needing pointers to low integer addresses: The zero page size could be a target specific information, just like pointer sizes or endianess |
Introduce a new type that can benefit from more optimizations.
Based on what @oli-obk suggested I've revamped this RFC. Basically, this now also acts as groundwork toward more optimization we can do in the future. |
text/0000-zero-page-optimization.md
Outdated
of an enumeration in a way similar to before, except that we will allow | ||
discriminants of up to the zero page size (typically 4095). | ||
- These types will be ZST if `T` is ZST. An arbitrary constant is returned as | ||
the inner raw pointer. `0` is a good candidate here because we don't actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZST pointer addresses are their alignment, not 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that because we used to assign such value? I think we no longer have to do that complicated thing, 0
makes the logic more simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well.. I would have assumed that we should separate the actual value from the memory representation. While the representation is ()
, the value should still be meaningful (not 0
, because that has the "invalid" meaning for pointers)
text/0000-zero-page-optimization.md
Outdated
- These types will be ZST if `T` is ZST. An arbitrary constant is returned as | ||
the inner raw pointer. `0` is a good candidate here because we don't actually | ||
store it, we don't have to worry about it conflicting with the optimization. | ||
- These types will be inhabitable if `T` is inhabitable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can only be done for Shared
, the other types can't have these optimizations, as that would break code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on how this can break code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These things have been discussed in detail in #2040
text/0000-zero-page-optimization.md
Outdated
We should refactor the allocation related code to prefer enumerations over | ||
`NonNull::dangling`. Taking `RawVec` code as an example, we would use | ||
`Option<Shared<T>>` to store the internal pointer. For ZST, we initialize | ||
with an arbitrary value (as we don't store it); for zero-length vector, we make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the comment above about zst
pointers
Isn't it impossible to have As I understand how niche filling works today, enum discriminants can be snuggled into padding, because the contents of those bits are undefined when you just have the I guess since Never mind me, I convinced myself that this is (probably) sound. Just treat the least significant bits of a pointer as a niche that can be filled until setting that bit might make it a valid pointer. It needs to be noted, though: there's a lot of |
@CAD97 fwiw your comment seems to assume this is specifically implemented in Option's code; it's not, it's a generic optimization. is_some is just a match. Anyway, this is exactly what the proposal is talking about. The double option isn't a problem because the invalid states only occur when there's a None; i.e. when there's nothing to point at. |
I was just talking about Is there a concrete reason Definitely extending the niche space on I'm going to go back to not pretending I know how pointers work now |
I don't like the idea of crates being able to change the "null range" through an attribute. This is an ABI-breaking change and should only be configurable at the target level using a field in the target json file. |
One thing about microcontrollers is that they tend to have little memory. So on embedded we could use the high addresses instead of the low ones. This would of course not be target specific but specific to the actual physical controller you are targetting. |
So @Amanieu said that the null range shouldn't be changed by crates. I also noticed that altering the size via an attribute only affects that crate, which isn't the thing we want to do on microcontrollers. @mark-i-m Can you elaborate on what options we have for stable microcontroller runtimes? Or, is it bad to hard-code the value per target inside rustc? @oli-obk Yeah, using the high address is good, except it breaks the Also, it seems that ZST references/pointers have their own troubles. I'm going to remove them from this RFC for the meanwhile (this can be discussed in a further RFC). |
Sorry, I didn't quite understand this. "specific to the actual physical controller you are targetting" isn't "target specific"?
TBH, my knowledge of pure embedded systems spefically is limited, but in terms of other bare metal software (e.g. OS kernels), it seems that "null range" is a property of the target platform itself. For example, an OS kernel could choose where it wants to place the "null range" in virtual address spaces (currently most choose the first page, such as 0-4096 on x86). At first glance, the target .json file seems like the ideal place for that: {
"llvm-target": "i686-unknown-none-gnu",
"data-layout": "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128",
"target-endian": "little",
"target-pointer-width": "32",
"target-c-int-width": "32",
"os": "none",
"arch": "x86",
"target-env": "gnu",
"pre-link-args": [ "-m32" ],
"features": "",
"disable-redzone": true,
"eliminate-frame-pointer": false,
"linker-is-gnu": true,
"no-compiler-rt": true,
"archive-format": "gnu",
"linker-flavor": "ld"
// Add two more options
"null-range": "0x0-0x1000"
// or alternately
"null-range": "none"
} The compiler can then choose to take into account this range when laying out structures. I don't think anything about the null pointer optimization is specific to the value Of course, we could also add a |
Well... I always see the targets as a specific processor, not the entire board. But for the address space everything connected to the memory BUS needs to be known. |
Tue configuration definitely cannot be a part of target specification
unless you want to have a specification per machine and not a generic
target. This is true not only for embedded systems but also your plain old
computers with x86 in them, sure to how e.g. pcie works.
…On Thu, Apr 19, 2018, 19:10 Oliver Schneider ***@***.***> wrote:
Sorry, I didn't quite understand this. "specific to the actual physical
controller you are targetting" isn't "target specific"?
Well... I always see the targets as a specific processor, not the entire
board. But for the address space everything connected to the memory BUS
needs to be known.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2400 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0pOeAJdAJHhtvWESSVEzoviVkF7xks5tqLbngaJpZM4TRZ6A>
.
|
tl;dr I think this RFC is pretty useless as written, because of the existence of unexploited alignment bits and the nonexistence of more general layout optimizations like custom ranges, which could be exploited very cheaply for most of the same use cases this RFC is designed to cover. It could be made useful in combination with extremely aggressive size optimizations of aligned bits, but the performance cost those in those cases is high enough that you'd probably need to use a new kind of representation in order to benefit from them. Long version: People have talked about this only being useful if you only had pointers to "low alignment" types, and while that's true I want to more explicitly point out that it's only useful if you have pointers to byte aligned types. That's because using alignment bits for just carrying variants of an enum Therefore, this RFC only makes sense in three contexts: (1) more than one single variant with one pointer variant, where you need/want the single variants' numerical values to be tightly packed for some reason. I'm not really sure whether you'd really get much out of this though--in the alignment based solution, even though the pointers are spread out, they're spread out at aligned intervals, so it's pretty cheap to turn the variants into packed versions for the purpose of using a jump table or something: (2) more than one single variant with one pointer variant to a type with 8-bit alignment (let's say an `&[u8]). It's true that byte slices are pretty common in Rust; however, I'm not convinced that there are a lot of use cases where this specific pattern matters for byte slices. First, this wouldn't help almost any of the cases where a Fortunately, there is an approach that covers many such use cases. Rust guarantees that byte slices allocated from Vecs only have sizes up to isize::MAX, so your first use case should (with sufficient cleverness on Vec's part) be able to tell the type system that any larger values for length are free game (this is an example where a type system that can [unsafely] opt out of ranges on a type by type basis, working in tandem with a compiler that knows how to exploit them, can enable cleverer optimizations than either could do on its own). That would enable both To give another example of where being able to opt out of particular ranges for a type is useful: I have run into situations where I had an enum with three variants: a nullary one, and two that carried integers that I knew would never exceed i32::MAX. This presents an obvious encoding into an i32, with one of the variants taking the negative range, another 0, and a third the positive range. However, because Rust doesn't provide any way to explicitly opt out of particular ranges, I couldn't do that even if I was willing to manually make the values positive, and had to resort to a manual encoding on the i32. Such an optimization would be never be applied by the compiler unless it was asked to do so, because it's only safe due to the semantics of the code using the integer ranges, and it wouldn't be helped by the zero page optimization you're proposing. It is true that such range-based solutions don't obviously help with byte slices, since those lengths are in general (I think) allowed to exceed isize::MAX. If you are working with enums with lots of variants and just one byte slice a lot, or have large nested option sequences with byte slices, then your proposal is worthwhile; but the wins there seem low priority to me compared to properly exploiting alignment and being able to specify legal ranges explicitly. (3) In conjunction with using alignment bits for tags with data. To me, this is by far the most interesting use case. The reason is that needing to shrink the sizes of enums with large, but not too large, numbers of data-carrying variants comes up a lot. In Rust right now, the best size you can hope for without copious amounts of unsafe code is to create a single enum with a variant for each kind of node, and box or reference the contents of each variant, which usually eats at least one word (and in practice at least two in many cases, since you often want to align AST nodes from an arena). With alignment bits used for variants, though, the size can go down to a single word, as long as the values the type points to have enough spare alignment bits that they can store all the variants. For instance, with 8-byte alignment (the usual alignment of a Box on a 64-bit system, which is already sort of mandated by the existence of a Box in one of the variants), you can hold up to 8 tagged variants containing pointers to one or more values of the same type--an incredibly common case for ASTs! That would essentially let you pack AST nodes as tight as possible outside of succinct implementations, and still keep them a single word. Not only that, but as long as you were willing to align all but 7 of the values (or whatever) at more than 8 byte boundaries (which in practice is often fine since jemalloc likes to allocate at 16-byte boundaries), at a performance cost you could use a variable length encoding and use different numbers of alignment bits per variant. However, unlike the case where you're using alignment bits for nullary variants, it's quite possible that you would run out of alignment bits long before you ran out of nullary ones. Even with a cache-aligned encoding (to 64-byte boundaries, say) you'd only have at most 64 variants to use, and it would be fewer if you had to use a variable length encoding. In this case being able to use known illegal values for nullary variants would be quite compelling, I think! Unfortunately, using alignment bits for tags for variants with data isn't free, since [at least in most of the cases I can think of?] it would be hard to avoid having to always mask any pointer "derived" from such a type before using it. Besides the operation itself taking time, that seems like it would lead to much more register pressure, since you have to leave the original pointer untouched. Even worse, I'm not sure how "tainting" pointers from the type would actually work with mutable references, since you'd need to be careful to avoid disturbing the alignment bits--since Rust doesn't tell you whether a reference to a type is part of a structure exploiting this optimization, it would be really hard to avoid having to changing all writes to pointers to explicitly be masked | assignments. Even if the operation itself is cheap, I imagine not using direct assignments and loads breaks a lot of optimization passes and confuses the branch predictor. Maybe I'm wrong and these are not really issues nowadays, since many modern runtimes like JavaScript, use tagged pointers pervasively, but it certainly seems like a lot of mandatory overhead on pointer writes of the same sort that GCs tend to induce, and Rust has strenuously tried to avoid adding in by default. So, I conclude that you would probably only want to perform optimizations around using alignment bits for variants with data by (1) explicitly opting in, and (2) having them apply only to special pointer types (that always had their alignment bits masked out before an assignment). That way you could give them different codegen and/or semantics (for instance, you could disallow taking general references into the interiors of enums with |
I agree that exploiting alignment bits may be a more powerful solution. But as you said, it won't work on Also, embedding data inside a pointer will violate the type system's contract, where you can always take address of a value. This means that (3) in your comment is basically not achievable without a completely new mechanism for such representations. Please also note that although the name of this RFC primarily proposes to use the "zero page", it also provides various additions which is why this is written into a RFC (if we just wanted to compress enum, we can just implement it in the compiler without RFC discussion). The motivation, as well, is to expose a more type based API for embedding tag values inside a pointer. |
Yes, (3) required brand new representations. But my argument is that in its absence, the only major use this RFC would have is for The original null pointer optimization was important mostly because I especially think that any optimization like this should do way more than just extend the null guarantee to point a bit further; if you want to give platforms a way to opt out of bit ranges for pointers, why not go further? Generally speaking, on any platform that supports memory mapping, you should be able to guarantee that certain ranges are always unmapped, making them usable for variant data. The mapping could be either implicit (from the OS) or explicit (from the running program), and would certainly be unsafe, but at least that would be a general framework for doing this sort of optimization. You can be even more precise if you have control over the allocator. Postgres does a cute thing where data allocated by each process lives in a disjoint memory address space with the same size; in Rust, if you set things up such that thread-local types were only ever allocated in the thread's local pool, you would have extra unused values such that the number was statically known, but the process of determining the mapping from their runtime value to their compile time value was dynamically determined based on the thread id. I don't realistically expect Rust to ever support anything insane like that automatically--I'm just pointing out that if you want to allow ruling out address values based on combining implementation details about the runtime with type system knowledge, there are far more possibilities than what this RFC proposes.
The thing is, I don't really see how this RFC actually does that. It talks about some optimizations to & and &mut and wants to bring back the A much more powerful invariant (that would actually make a meaningful difference in the kinds of uses it would have) would be something like "shared pointers may dangle, but must be assigned to an address that was a legal instance of T at some point previously in the program; failure to respect this turns their values into poison". I think that's both a much better interpretation of what Finally: I don't think there's actually that much value in extending the zero range from a semantics perspective. The proposed solution (transmute to For that reason, even if this optimization were implemented, I think it would work better as an optimization than an actual guarantee (vs. something like alignment, which is actually guaranteed). Making it just an optimization discourages people from trying to do unnecessary low level bit hacking (which, as you note, isn't always portable) while at the same time practically addressing the actual issue here (enums take up more space than they need to). |
This is also true for WASM, as far as I understand. |
Hello! We discussed this RFC in the compiler team's backlog today, and we decided to close this RFC. The RFC itself seems to have a few different ideas (e.g., |
I would love to see a separated version of just the notion of reserved pointer values and using those as a niche. |
The notion of reserved values (not only pointers) for use as niche would be great. |
Rendered