Zero Page Optimization #2400

ishitatsuyuki · 2018-04-12T09:04:29Z

mark-i-m · 2018-04-12T15:26:04Z

We will add a target-specific constant to determine the availability and size of the zero page.

How is this constant set? Is it a language item? Is it usable on stable? Is it set in the linker config script? Does it default to just the null pointer?

I'm ok with the principle behind this RFC, but for any embedded/bare metal/kernel development, this needs to be configurable on stable. Otherwise, it will be impossible to write things like bootloaders or microcontrollers on stable rust in some cases, since it is often necessary to use the lower bytes in (for example) 8- or 16-bit modes.

Also, I think this should be independent of page size entirely. For mainstream OSes like Linux, the "null range" happens to be the first page because most MMUs cannot enforce finer-grain controls, but there is no fundamental reason why the compiler should be tied to the same constraint.

hanna-kruppe

Before I say anything on whether Rust ought to do this, I want to register my confusion about the claim that the zero page is already effectively assumed to exist by today's Rust.

hanna-kruppe · 2018-04-12T17:13:45Z

text/0000-zero-page-optimization.md

+
+Inside Rust std, we rely on the assumption that zero page exists:
+
+https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238


I don't understand how this link supports the claim that std assumes a zero page. It links to ZST allocation, but ZST allocation can hand out whatever pointers it wants. It could return something as ridiculous as the address of main, if that is suitably aligned!

And besides, alignment can be much larger than the page size, so the linked line can create pointers not on the zero page.

hanna-kruppe · 2018-04-12T17:20:58Z

text/0000-zero-page-optimization.md

+To make things worse, such usage is also seen outside std, on crates that compile
+on stable Rust:
+
+https://github.com/rust-lang-nursery/futures-rs/blob/856fde847d4062f5d2af5d85d6640028297a10f1/futures-util/src/lock.rs#L157-L169


This code, too, doesn't seem to me like it assumes that a zero page exists. It stores an address in an AtomicUsize and assumes 1 can't be such an address, but that assumption is true because of alignment (it's storing the address of a Waker, which contains a pointer[1]), not because of anything about the zero page.

[1] it is theoretically conceivable to have a platform where pointers are just one byte or where pointers can be unaligned, but Rust doesn't support any such targets, and even if it did futures-util could simply add #[repr(align(2))] to Waker.

It's true that we can use a few bits for embedding data when alignment is involved. However, we can't call the current BiLock code sound; as you mentioned, it assumes alignment on pointers to exist, without any comments indicating that.

SimonSapin · 2018-04-12T18:04:07Z

text/0000-zero-page-optimization.md

+always true. For instance, microcontrollers without MMU doesn't implement such
+guards at all, and `0` is a valid address where the entrypoint lies. See
+[Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s
+design as one of such example.


Is it true that address 0 is valid in Cortex-M and that you can’t validly create a Rust reference &T to it, but it’s not like arbitrary data can end up there by chance. That address is reserved for some early boot detail that most application don’t deal with directly. In the cortext-m-rt crate there is not even a corresponding Rust item, it is entirely dealt with in the linker script.

So I don’t think there is a problem here in practice.

@SimonSapin Are you suggesting that access to 0 should be strictly unsafe?

No, I’m only saying that the ARM Cortex case is not really relevant to the "Rust makes bad assumptions" argument. But then what do you mean by "access to 0"?

i.e. do I have to use *mut _ or *const _ if I want to access part of the "null range"?

joshtriplett · 2018-04-12T19:17:43Z

Personally, I would like to see the possibility of this optimization (automatically hiding enum variants or small values in the low bits of a pointer), but we also need to make sure people don't rely on non-portable assumptions.

ExpHP · 2018-04-12T21:41:10Z

text/0000-zero-page-optimization.md

+targeted at people dealing with FFI or unsafe.
+
+The recently stabilized `NonNull` type will have more strict requirements:
+the pointer must be not in the null page, and it must be valid to dereference.


Can you clarify "valid to dereference?" Surely it is not meant that the pointer must point to valid data, as dangling is also stable...

Right, this part of the RFC sounds incorrect. ptr::NonNull is a pointer that is not null. It makes no guarantee beyond that.

Seems it need to be changed to have a new type for this then, as this conflicts with how NonNull works currently. The new type would have similar semantics to a reference, where it always points to valid data.

I understand this might be a consequence of terribly unfortunate timing, but I think it is wise to tread carefully given how recently NonNull was stabilized, and how it was clearly intended to be the defacto way to receive Rust's zero-discriminant optimizations. In particular, any of the following:

introducing a replacement for NonNull<T>

removing the promise that Option<NonNull<T>> is the same size as NonNull<T>

deprecating NonNull<T>

so early after its stabilization will send a poor message about what it means for something to become stable in rust.

comex · 2018-04-12T22:35:59Z

On embedded systems where 0 is a valid address, 0 itself is usually some form of interrupt vector or entry point, unlikely to be a valid code or data pointer. But the same can’t be said for the entire first page – indeed, you can get tiny ARM microcontrollers with as little as 4KB of flash total, and it’s mapped at 0! So we’d have to make sure to turn this off for (even potentially) freestanding targets.

On the other end, 64-bit macOS and iOS by default reserves a whole 4GB of memory starting at 0.

ishitatsuyuki · 2018-04-12T22:53:01Z

The code mentioned here is not strictly sound, but in practice no way to exploit such unsoundness exists. This RFC is just proposing a better way to present those enumerations; I'll update the wordings.

nagisa · 2018-04-13T05:17:13Z

Presence of NonNull and pointers do not imply that these addresses are invalid in Rust, but rather that the addresses used in conjunction with those constructs are reserved to have an alternative meaning. As long as deriving the meaning does not involve reading memory at the addresses, it is sound and entirely fine. For an empty slice and ZST addresses, any address is correct, because those addresses will never be dereferenced and cannot possibly clash with anything else. The same is true for the heap "allocations" code. NonZero is slightly more tricky as you end up being unable to put your reserved-but-valid addresses into one and preserving its meaning, but that's also entriely fine because in 4kB MCUs there's only a few ways to provide a view into e.g. ISR table and avoiding NonNull there should be a fairly easy task (but not that easy anymore when NonNull reserves the whole addressable space) With that in mind the cost-to-benefit ratio of additional reserved addresses seems way too high to me, but I'm also sympathetic to enablement of optimisations in more places so I'm ambivalent wrt this RFC at this time.

…

On Fri, Apr 13, 2018, 04:44 Who? Me?! ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In text/0000-zero-page-optimization.md <#2400 (comment)>: > +# Motivation +[motivation]: #motivation + +Modern operating systems normally [traps null pointer access](https://en.wikipedia.org/wiki/Zero_page). +This means valid pointers will never take values inside the zero page, and we +can exploit this for ~12 bits of storage for secondary variants. + +Inside Rust std, we rely on the assumption that zero page exists: + +https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238 + +However, this is not something that is documented in the nomicon, neither it's +always true. For instance, microcontrollers without MMU doesn't implement such +guards at all, and `0` is a valid address where the entrypoint lies. See +[Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s +design as one of such example. i.e. do I have to use *mut _ or *const _ if I want to access part of the "null range"? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2400 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApc0rQrn11d5IxUBvZZ9hDhektuWgKwks5toAL7gaJpZM4TRZ6A> .

hanna-kruppe · 2018-04-13T14:55:50Z

text/0000-zero-page-optimization.md

+can exploit this for ~12 bits of storage for secondary variants.
+
+[Inside Rust std](https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238),
+we use a "dangling" pointer for ZST allocations; this involves a somewhat


I still don't understand how this relates at all to the motivation for this RFC.

hanna-kruppe · 2018-04-13T14:56:27Z

text/0000-zero-page-optimization.md

+
+The recently stabilized `NonNull` type will have more strict requirements:
+the pointer must be not in the null page. `NonNull::dangling` will be
+deprecated in favor of this optimization.


If it's deprecated, what's the replacement?

In favour of the zero page optimization. That is, using an enumeration instead.

I don't understand. You propose that the current way to get a NonNull that is non-null and aligned is deprecated. What non-deprecated thing can current users of that method do instead to get a NonNull ~~with the same properties?~~ that is similarly valid with the new invariant?

(Leaving aside the question of whether it's OK to change the meaning of NonZero like this after stabilization.)

The issue is that NonNull::dangling() seems to be just a hack where Option<NonNull<T>> should be used. NonNull::dangling() advocates less idiomatic coding, and Option<NonNull<T>> should be a perfect fit as a replacement.

That is a big claim that requires a fair bit of support given that the API was accepted and stabilized.

Furthermore, using Option is not equivalent to using a dangling pointer since it "uses up" the null value: e.g. Vec<T> contains a NonNull<T> and this makes Option<Vec<T>> the same size as Vec<T>, if it used Option<NonNull<T>> instead, Option<Vec<T>> would be bigger.

@ishitatsuyuki Sorry, I don’t see how NonNull::dangling is related to Option<NonNull<_>> at all. dangling is for creating an arbitrary pointer that is correctly aligned without being null. It is used for zero-size allocations, for example in Vec: https://github.com/rust-lang/rust/blob/fb730d75d4c1c05c90419841758300b6fbf01250/src/liballoc/raw_vec.rs#L93

@rkruppe Can I suggest that a code search only showed usage for "optional allocation", for either ZST or an absent node in the linked data structure? Also, the original intent of this addition seems to be "we need this to interact with allocator": rust-lang/rust#45527

@SimonSapin Using NonNull::dangling is a convention inside the alloc related functions, but it's not expressed through types. Using an enum makes it less error prone, catching the cases where we may pass an dangling pointer to the underlying allocator.

@ishitatsuyuki If I understand correctly, you are saying that if the following occurred:

Vec<T> instead stored Option<NonNull<T>>

NonNull<T> was changed to forbid pointers in the null page

Then Option<Vec<T>> could still receive optimization? If this is the case, it might help to demonstrate this explicitly.

That said, I think part of the concern here is that there are places where Vec<T> benefits specifically from the fact that dangling() is aligned. e.g., a slice can be constructed directly from the pointer without having to branch on None. ISTM that would be impossible when using Option<NonNull<T>> as it must remain possible to take a reference to the option.

Edit: Or wait... maybe it is possible. The pointer for Some(vec![]) would be null, and the representation of None::<Vec<T>> would begin with 1 where the Option<NonNull<T>> is stored. Hm...

Edit 2: but then what about Vec<Option<T>>? We end up with an Option<NonNull<Option<T>> whose None representation is 1, which is not aligned when interpreted as a pointer. Or something like that. My brain hurts.

hanna-kruppe · 2018-04-13T15:04:37Z

The motivation for this RFC currently seems weak to me. None of the code cited in the motivation section actually needs a zero page reserved to be valid (if it even relates to low addresses being valid or not). More enum layout optimizations become possible (though non-portably), but how commonly will that apply?

Consider that the compiler is already allowed to exploit alignment to get a few extra values for discriminants (this is not currently implemented but it's been discussed a lot), so it would only kick in if you have both a lot of field-less variants and a pointer to a type of low alignment AFAICT.

Combine that with the very real concerns of breaking embedded use cases (or at least being incompatible with them, which would mean small microcontrollers that could get the most value out of saving some memory don't get those optimizations) and the other smaller concerns around NonZero already mentioned and the prospect of exploiting the zero page is not very appealing.

joshtriplett · 2018-04-13T18:22:02Z

Personally, I'd like to see ways for Rust to more naturally and automatically handle things that C code often does manually, which includes stunts like reusing the low bits of aligned pointers, or knowing that valid pointers can never point into a particular range. I'd much rather have those things handled automatically and consistently by the compiler.

I also think, ideally, that we should not require explicit declaration of valid pointer ranges within individual structures containing pointers, nor require every use of a pointer type to include such a declaration. Having some special kind of pointer that excludes the first 4k or 1M or similar would require changing substantial amounts of code to take advantage of an optimization like this. We don't have any similar requirement to take advantage of the optimization of the null-pointer optimization for things like Option<&T>.

I don't, however, think we should do this so aggressively that we break embedded use cases. I'd like to see people able to write Rust code for platforms where you can have valid data at addresses 4 and 8. And even, with enough care, valid data at address 0, though I don't mind if that requires some special accessors known to not mind null pointers.

Given that, I have a question for the people currently objecting to this RFC: would your objections be fully addressed by a feature that was under the full control of the person invoking the compiler, such as via command-line options or optimization options (that would affect the Rust ABI) to specify the range of invalid pointers? (With some careful target-specific defaults, such as for x86 Linux or x86 Windows versus x86 ELF.)

That shouldn't break any use case, embedded or otherwise. People creating a new target can determine the correct default values, with the default default being "just the null pointer". People using an embedded target should find that this optimization doesn't apply unless they specifically enable it. And people targeting a platform like x86 Linux but wanting to write code that uses pointers near or at 0 (requiring a change to /proc/sys/vm/mmap_min_addr for instance) could disable this optimization easily enough.

mark-i-m · 2018-04-13T18:42:14Z

@joshtriplett I think that would be good. It's a bit annoying that this information is often already in linker scripts and configs, though...

joshtriplett · 2018-04-13T19:27:58Z

@mark-i-m For embedded applications, kernels, and similar, yes. Standard applications, on the other hand, don't typically have such linker scripts or configs.

nox · 2018-04-14T11:50:30Z

Personally, I would like to see the possibility of this optimization (automatically hiding enum variants or small values in the low bits of a pointer), but we also need to make sure people don't rely on non-portable assumptions.

I have code for this, somewhat, where I use the alignment of T in &T to teach rustc that 1, 2 and 3 will never be a valid representation of &usize, for example. That breaks transmute (and other things) because of how layout is computed for generic types currently. I plan to work on that later this year, but it's definitely not an easy task. I can write some more about this and link to IRC discussions with people (where people is "just Eddy" as you could have guessed) if there is interest about this.

🥖 @rust-lang/wg-codegen

Manishearth · 2018-04-14T16:30:53Z

cc @ticki @steveklabnik @phil-opp @SergioBenitez

(adding folks doing OS work in Rust)

Manishearth · 2018-04-14T16:35:36Z

During the migration, we should migrate the impact with a crater run. If changing the behavior directly is unacceptable, then we'll have to create a new type instead.

I'm iffy on this; a lot of the problems that could be caused by this may not be immediately obvious in a crater run. Especially for FFI-using applications that don't get tested via crater since crater doesn't know how to build them (or if they're on crates.io).

I'd rather just do a new type period.

To take advantage of zero page optimization, use transmute from and to usize. This will cause compilation to fail if such optimization is not permitted on the target.

Not applicable: Null pointer optimization is Rust specific, and this enhancement is Rust specific too.

This optimization is pretty common in C++ codebases, manually done.
This is a hack. We could use target_feature for this, probably.

oli-obk · 2018-04-16T08:23:16Z

I'd rather just do a new type period.

A new "optimized" reference type could have many cool benefits, not just this:

reuse bits that are zero due to alignment for storing discriminants
references to zsts are zsts
references to uninhabited types are uninhabited

wrt embedded needing pointers to low integer addresses:

The zero page size could be a target specific information, just like pointer sizes or endianess

Introduce a new type that can benefit from more optimizations.

ishitatsuyuki · 2018-04-16T09:41:35Z

Based on what @oli-obk suggested I've revamped this RFC. Basically, this now also acts as groundwork toward more optimization we can do in the future.

oli-obk · 2018-04-17T09:19:38Z

text/0000-zero-page-optimization.md

-of an enumeration in a way similar to before, except that we will allow
-discriminants of up to the zero page size (typically 4095).
+- These types will be ZST if `T` is ZST. An arbitrary constant is returned as
+the inner raw pointer. `0` is a good candidate here because we don't actually


ZST pointer addresses are their alignment, not 0.

Are you saying that because we used to assign such value? I think we no longer have to do that complicated thing, 0 makes the logic more simple.

Well.. I would have assumed that we should separate the actual value from the memory representation. While the representation is (), the value should still be meaningful (not 0, because that has the "invalid" meaning for pointers)

oli-obk · 2018-04-17T09:20:20Z

text/0000-zero-page-optimization.md

+- These types will be ZST if `T` is ZST. An arbitrary constant is returned as
+the inner raw pointer. `0` is a good candidate here because we don't actually
+store it, we don't have to worry about it conflicting with the optimization.
+- These types will be inhabitable if `T` is inhabitable.


That can only be done for Shared, the other types can't have these optimizations, as that would break code

Can you elaborate on how this can break code?

These things have been discussed in detail in #2040

oli-obk · 2018-04-17T09:21:18Z

text/0000-zero-page-optimization.md

+We should refactor the allocation related code to prefer enumerations over
+`NonNull::dangling`. Taking `RawVec` code as an example, we would use
+`Option<Shared<T>>` to store the internal pointer. For ZST, we initialize
+with an arbitrary value (as we don't store it); for zero-length vector, we make


see the comment above about zst pointers

CAD97 · 2018-04-18T02:29:16Z

Isn't it impossible to have Option<Option<&_>> be flat? Since Option::as_ref exists, there needs to exist a valid Option<&_> to be pointed at.

As I understand how niche filling works today, enum discriminants can be snuggled into padding, because the contents of those bits are undefined when you just have the T.

I guess since Option+&_ has magic on it already for the null pointer optimization, it might be possible to change Option::<&_>::is_some from (ptr) != 0 to (ptr) > ZERO_PAGE, and then you could say Option::<...<Option<&_>...>::is_some is (ptr) > (1 << nesting level), effectively treating the end of a pointer into the zero page as padding which can be filled as a niche.

Never mind me, I convinced myself that this is (probably) sound. Just treat the least significant bits of a pointer as a niche that can be filled until setting that bit might make it a valid pointer.

It needs to be noted, though: there's a lot of 1 as {ptr} out there. This'll probably open all of those up to soundness holes, even though it intends to make them unnecessary.

Manishearth · 2018-04-18T02:32:47Z

@CAD97 fwiw your comment seems to assume this is specifically implemented in Option's code; it's not, it's a generic optimization. is_some is just a match.

Anyway, this is exactly what the proposal is talking about. The double option isn't a problem because the invalid states only occur when there's a None; i.e. when there's nothing to point at.

CAD97 · 2018-04-18T02:47:30Z

I was just talking about is_some to make it easier to talk about, it's of course a match. Though the check is emitted at some level. Obviously though my understanding was a bit off, and now it's a bit less off.

Is there a concrete reason ptr::NonNull doesn't just get the added guarantee of more niche space for the optimizer to work with? ptr::NonNull::dangling could then be the first aligned pointer after the niche space. Though I suppose ptr::NonNull::new_unchecked only requires the argument is non-zero, so I've answered my own question.

Definitely extending the niche space on &_ is useful, though, as that is fully under Rust's control.

^{_{I'm going to go back to not pretending I know how pointers work now}}

nox · 2018-04-18T06:09:59Z

@CAD97 As I mentioned earlier, using alignment to increase the niche space of &_ is not as trivial as it seems.

Amanieu · 2018-04-18T10:51:35Z

I don't like the idea of crates being able to change the "null range" through an attribute. This is an ABI-breaking change and should only be configurable at the target level using a field in the target json file.

oli-obk · 2018-04-18T11:27:04Z

One thing about microcontrollers is that they tend to have little memory. So on embedded we could use the high addresses instead of the low ones. This would of course not be target specific but specific to the actual physical controller you are targetting.

ishitatsuyuki · 2018-04-19T11:50:24Z

So @Amanieu said that the null range shouldn't be changed by crates. I also noticed that altering the size via an attribute only affects that crate, which isn't the thing we want to do on microcontrollers. @mark-i-m Can you elaborate on what options we have for stable microcontroller runtimes? Or, is it bad to hard-code the value per target inside rustc?

@oli-obk Yeah, using the high address is good, except it breaks the None is NULL convention. Do you think we should make this breaking change, or on the other hand use 0 plus a high range for the optimization?

Also, it seems that ZST references/pointers have their own troubles. I'm going to remove them from this RFC for the meanwhile (this can be discussed in a further RFC).

mark-i-m · 2018-04-19T15:57:03Z

@oli-obk

This would of course not be target specific but specific to the actual physical controller you are targetting.

Sorry, I didn't quite understand this. "specific to the actual physical controller you are targetting" isn't "target specific"?

@ishitatsuyuki

@mark-i-m Can you elaborate on what options we have for stable microcontroller runtimes? Or, is it bad to hard-code the value per target inside rustc?

TBH, my knowledge of pure embedded systems spefically is limited, but in terms of other bare metal software (e.g. OS kernels), it seems that "null range" is a property of the target platform itself. For example, an OS kernel could choose where it wants to place the "null range" in virtual address spaces (currently most choose the first page, such as 0-4096 on x86).

At first glance, the target .json file seems like the ideal place for that:

{
    "llvm-target": "i686-unknown-none-gnu",
    "data-layout": "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128",
    "target-endian": "little",
    "target-pointer-width": "32",
    "target-c-int-width": "32",
    "os": "none",
    "arch": "x86",
    "target-env": "gnu",
    "pre-link-args": [ "-m32" ],
    "features": "",
    "disable-redzone": true,
    "eliminate-frame-pointer": false,
    "linker-is-gnu": true,
    "no-compiler-rt": true,
    "archive-format": "gnu",
    "linker-flavor": "ld"

    // Add two more options
    "null-range": "0x0-0x1000"
    // or alternately
    "null-range": "none"
}

The compiler can then choose to take into account this range when laying out structures. I don't think anything about the null pointer optimization is specific to the value 0x0, right?

Of course, we could also add a -C flag or something, but I think this has the same problems as a per-crate attribute, right?

oli-obk · 2018-04-19T16:10:12Z

Sorry, I didn't quite understand this. "specific to the actual physical controller you are targetting" isn't "target specific"?

Well... I always see the targets as a specific processor, not the entire board. But for the address space everything connected to the memory BUS needs to be known.

nagisa · 2018-04-20T04:51:18Z

Tue configuration definitely cannot be a part of target specification unless you want to have a specification per machine and not a generic target. This is true not only for embedded systems but also your plain old computers with x86 in them, sure to how e.g. pcie works.

…

On Thu, Apr 19, 2018, 19:10 Oliver Schneider ***@***.***> wrote: Sorry, I didn't quite understand this. "specific to the actual physical controller you are targetting" isn't "target specific"? Well... I always see the targets as a specific processor, not the entire board. But for the address space everything connected to the memory BUS needs to be known. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2400 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApc0pOeAJdAJHhtvWESSVEzoviVkF7xks5tqLbngaJpZM4TRZ6A> .

pythonesque · 2018-05-05T13:30:33Z

tl;dr I think this RFC is pretty useless as written, because of the existence of unexploited alignment bits and the nonexistence of more general layout optimizations like custom ranges, which could be exploited very cheaply for most of the same use cases this RFC is designed to cover. It could be made useful in combination with extremely aggressive size optimizations of aligned bits, but the performance cost those in those cases is high enough that you'd probably need to use a new kind of representation in order to benefit from them.

Long version:

People have talked about this only being useful if you only had pointers to "low alignment" types, and while that's true I want to more explicitly point out that it's only useful if you have pointers to byte aligned types. That's because using alignment bits for just carrying variants of an enum T without data gives you usize::MAX - usize::MAX / align_of::<T>() extra variants to play with, which for every type but u8 is much higher than what you get from just using the zero page on most architectures (especially those architectures where the zero page optimization would actually apply, since architectures without much memory probably wouldn't reserve the zero page this way). That is, even with 16-bit alignment you have half of all addresses available to use as tags. That should presumably apply to NonNull and Unique pointers as well. Next to that, any optimizations from using the zero page aren't exactly compelling--even if you assume the whole first 4GB is reserved on some 64-bit systems, that only gives you 2^32 variants, while exploiting tag bits on 16-bit aligned data gives you 2^63. I am pretty sure anyone with an enum with more than 2^63 variants wouldn't be satisfied with a measly 2^31 more :P.

Therefore, this RFC only makes sense in three contexts:

(1) more than one single variant with one pointer variant, where you need/want the single variants' numerical values to be tightly packed for some reason. I'm not really sure whether you'd really get much out of this though--in the alignment based solution, even though the pointers are spread out, they're spread out at aligned intervals, so it's pretty cheap to turn the variants into packed versions for the purpose of using a jump table or something: packed_variant = raw_variant - raw_variant & ~alignment_mask (there may even be much cheaper ways to do it). For most operations, like copying or storing data, the fact that they weren't packed would not really be relevant. Detecting the non-variant case is also easy--just check whether packed_variant & alignment_mask is zero--and is at least as cheap as doing so in the "numeric values are packed" case (since the latter requires a comparison). So I think this use case doesn't really justify including this.

(2) more than one single variant with one pointer variant to a type with 8-bit alignment (let's say an `&[u8]). It's true that byte slices are pretty common in Rust; however, I'm not convinced that there are a lot of use cases where this specific pattern matters for byte slices.

First, this wouldn't help almost any of the cases where a &[u8] or Vec<u8> is returned as part of an io::Result, because it has a non-nullary error variant. For the same reason, it wouldn't help cases like Cow<str> which frustratingly takes up 32 bytes instead of 24, or frankly most of the situations I've wanted to use byte slices or Vec<u8> in enums. There are probably some where it would be beneficial, but I'm not sure there are enough that it'd be worth the effort.

Fortunately, there is an approach that covers many such use cases. Rust guarantees that byte slices allocated from Vecs only have sizes up to isize::MAX, so your first use case should (with sufficient cleverness on Vec's part) be able to tell the type system that any larger values for length are free game (this is an example where a type system that can [unsafely] opt out of ranges on a type by type basis, working in tandem with a compiler that knows how to exploit them, can enable cleverer optimizations than either could do on its own). That would enable both Result types returning Vecs and Cow types to store the variant tag in the Vec's length field for everything but the variant that actually contained the Vec, which would greatly improve the memory consumption of these types.

To give another example of where being able to opt out of particular ranges for a type is useful: I have run into situations where I had an enum with three variants: a nullary one, and two that carried integers that I knew would never exceed i32::MAX. This presents an obvious encoding into an i32, with one of the variants taking the negative range, another 0, and a third the positive range. However, because Rust doesn't provide any way to explicitly opt out of particular ranges, I couldn't do that even if I was willing to manually make the values positive, and had to resort to a manual encoding on the i32. Such an optimization would be never be applied by the compiler unless it was asked to do so, because it's only safe due to the semantics of the code using the integer ranges, and it wouldn't be helped by the zero page optimization you're proposing.

It is true that such range-based solutions don't obviously help with byte slices, since those lengths are in general (I think) allowed to exceed isize::MAX. If you are working with enums with lots of variants and just one byte slice a lot, or have large nested option sequences with byte slices, then your proposal is worthwhile; but the wins there seem low priority to me compared to properly exploiting alignment and being able to specify legal ranges explicitly.

(3) In conjunction with using alignment bits for tags with data.

To me, this is by far the most interesting use case. The reason is that needing to shrink the sizes of enums with large, but not too large, numbers of data-carrying variants comes up a lot. In Rust right now, the best size you can hope for without copious amounts of unsafe code is to create a single enum with a variant for each kind of node, and box or reference the contents of each variant, which usually eats at least one word (and in practice at least two in many cases, since you often want to align AST nodes from an arena). With alignment bits used for variants, though, the size can go down to a single word, as long as the values the type points to have enough spare alignment bits that they can store all the variants. For instance, with 8-byte alignment (the usual alignment of a Box on a 64-bit system, which is already sort of mandated by the existence of a Box in one of the variants), you can hold up to 8 tagged variants containing pointers to one or more values of the same type--an incredibly common case for ASTs! That would essentially let you pack AST nodes as tight as possible outside of succinct implementations, and still keep them a single word. Not only that, but as long as you were willing to align all but 7 of the values (or whatever) at more than 8 byte boundaries (which in practice is often fine since jemalloc likes to allocate at 16-byte boundaries), at a performance cost you could use a variable length encoding and use different numbers of alignment bits per variant.

However, unlike the case where you're using alignment bits for nullary variants, it's quite possible that you would run out of alignment bits long before you ran out of nullary ones. Even with a cache-aligned encoding (to 64-byte boundaries, say) you'd only have at most 64 variants to use, and it would be fewer if you had to use a variable length encoding. In this case being able to use known illegal values for nullary variants would be quite compelling, I think!

Unfortunately, using alignment bits for tags for variants with data isn't free, since [at least in most of the cases I can think of?] it would be hard to avoid having to always mask any pointer "derived" from such a type before using it. Besides the operation itself taking time, that seems like it would lead to much more register pressure, since you have to leave the original pointer untouched. Even worse, I'm not sure how "tainting" pointers from the type would actually work with mutable references, since you'd need to be careful to avoid disturbing the alignment bits--since Rust doesn't tell you whether a reference to a type is part of a structure exploiting this optimization, it would be really hard to avoid having to changing all writes to pointers to explicitly be masked | assignments. Even if the operation itself is cheap, I imagine not using direct assignments and loads breaks a lot of optimization passes and confuses the branch predictor. Maybe I'm wrong and these are not really issues nowadays, since many modern runtimes like JavaScript, use tagged pointers pervasively, but it certainly seems like a lot of mandatory overhead on pointer writes of the same sort that GCs tend to induce, and Rust has strenuously tried to avoid adding in by default.

So, I conclude that you would probably only want to perform optimizations around using alignment bits for variants with data by (1) explicitly opting in, and (2) having them apply only to special pointer types (that always had their alignment bits masked out before an assignment). That way you could give them different codegen and/or semantics (for instance, you could disallow taking general references into the interiors of enums with repr(packed), which would avoid having to worry about mutation through them needing to be tracked, and change the codegen to mask out the alignment bits when reading from variants). But if you to have distinct pointer types [or at least representations] anyway in order for the alignment optimizations to kick in, why not incorporate the optimization you're describing only into that type, instead of applying it to all pointers? I think this is what @oli-obk was getting at, but I think I'm willing to make the stronger statement that a zero-page optimization isn't even useful without a "size optimized pointer type" as long as it's remotely feasible to exploit alignment bits.

ishitatsuyuki · 2018-05-05T13:43:25Z

I agree that exploiting alignment bits may be a more powerful solution. But as you said, it won't work on &[u8], which means that it's not a universal solution either. Plus, it has additional complexity on implementing, compared to the null page because we already have some ranging semantics but not any framework for alignment bits.

Also, embedding data inside a pointer will violate the type system's contract, where you can always take address of a value. This means that (3) in your comment is basically not achievable without a completely new mechanism for such representations.

Please also note that although the name of this RFC primarily proposes to use the "zero page", it also provides various additions which is why this is written into a RFC (if we just wanted to compress enum, we can just implement it in the compiler without RFC discussion). The motivation, as well, is to expose a more type based API for embedding tag values inside a pointer.

pythonesque · 2018-05-05T15:10:10Z

Yes, (3) required brand new representations. But my argument is that in its absence, the only major use this RFC would have is for &[u8] slices (in theory it also covers things like bool slices but most people who care about space would be using packed representations for those already). I don't think that's that compelling a use case for an optimization so fragile (in the sense that it relies on extremely implementation specific details of the target platform to work, which are liable to change without warning). If we're going to perform dubious optimizations like that we might as well take advantage of the fact that the upper 16 bits of 64-bit pointers can't point to valid addresses on most Intel systems, (and I think the most any mainstream system allows right now is 52?) which if anything is actually less likely to change since JavaScript exploits that. That would limit the utility of the zero page optimization to enums with only one non-nullary variant and multiple nullary ones holding &[u8] slices on 32-bit platforms, I think... it feels very niche to me.

The original null pointer optimization was important mostly because Option<&T> lets you match C on space in the very common case where people use NULL to signal an error, and the main reason people felt confident in it is that in practice all modern architectures (essentially) don't use 0 for anything. But all the subsequent work around making the null pointer optimization more robust (by giving the compiler a general framework for understanding how to exploit unused values) feels much more powerful to me, and I'd rather we exploit that to the hilt before we start looking at more stuff that takes advantage of anything nonportable.

I especially think that any optimization like this should do way more than just extend the null guarantee to point a bit further; if you want to give platforms a way to opt out of bit ranges for pointers, why not go further? Generally speaking, on any platform that supports memory mapping, you should be able to guarantee that certain ranges are always unmapped, making them usable for variant data. The mapping could be either implicit (from the OS) or explicit (from the running program), and would certainly be unsafe, but at least that would be a general framework for doing this sort of optimization.

You can be even more precise if you have control over the allocator. Postgres does a cute thing where data allocated by each process lives in a disjoint memory address space with the same size; in Rust, if you set things up such that thread-local types were only ever allocated in the thread's local pool, you would have extra unused values such that the number was statically known, but the process of determining the mapping from their runtime value to their compile time value was dynamically determined based on the thread id. I don't realistically expect Rust to ever support anything insane like that automatically--I'm just pointing out that if you want to allow ruling out address values based on combining implementation details about the runtime with type system knowledge, there are far more possibilities than what this RFC proposes.

Please also note that although the name of this RFC primarily proposes to use the "zero page", it also provides various additions which is why this is written into a RFC (if we just wanted to compress enum, we can just implement it in the compiler without RFC discussion). The motivation, as well, is to expose a more type based API for embedding tag values inside a pointer.

The thing is, I don't really see how this RFC actually does that. It talks about some optimizations to & and &mut and wants to bring back the Shared pointer, but it all seems to be for the purpose of making the zero page optimization work. In particular, is it not the case that the issue withNonNull is that it's too specific about the fact that it's ruling out null instead of insisting on something stronger? That's why I wouldn't want to add a new type that makes the same mistake, just extending from 0 to a larger range.

A much more powerful invariant (that would actually make a meaningful difference in the kinds of uses it would have) would be something like "shared pointers may dangle, but must be assigned to an address that was a legal instance of T at some point previously in the program; failure to respect this turns their values into poison". I think that's both a much better interpretation of what Shared was actually supposed to be (a shared pointer without a lifetime), and provides clear semantics for what messing up entails; in particular, if their values only turn into poison if a write is invalid, it should be fine to set them to an illegal value provided that (1) poison spreads from the variants of an enum to the enum itself, and (2) you overwrite all poisoned data before it's read again.

Finally: I don't think there's actually that much value in extending the zero range from a semantics perspective. The proposed solution (transmute to usize) means that programs could still fail depending on the target platform, since the optimization wouldn't apply universally. Moreover, because the layout optimization is supposed to be "composable" in the way that Rust's enum layout optimizations are today, you couldn't necessarily switch on whether the zero page optimization was available (or even how many values it had) in order to provide a fallback path. You'd probably just end up either failing on those architectures, or providing a custom flag that users could specify if they wanted to use a platform that didn't have the optimization. The semantic guarantee would benefit the current implementation of BiLock (since it explicitly uses address values, so it could just switch on whether zero_page_optimization was enabled) but I thought the whole point was to not have to do low level implementations like that.

For that reason, even if this optimization were implemented, I think it would work better as an optimization than an actual guarantee (vs. something like alignment, which is actually guaranteed). Making it just an optimization discourages people from trying to do unnecessary low level bit hacking (which, as you note, isn't always portable) while at the same time practically addressing the actual issue here (enums take up more space than they need to).

shepmaster · 2018-05-21T17:43:07Z

and 0 and 1 is a valid address where the entrypoint lies

This is also true for WASM, as far as I understand.

nikomatsakis · 2020-11-13T22:50:37Z

Hello! We discussed this RFC in the compiler team's backlog today, and we decided to close this RFC. The RFC itself seems to have a few different ideas (e.g., Shared<T> as a way to get a &T-like type without a lifetime, optimizing nested enums, etc) combined but they need more discussion, and we think the appropriate venue would be as part of the unsafe code guidelines work. As the thread has shown, there has already been discussion about a number of these goals and the ways we could achieve them. I would encourage folks in the thread to pursue these ideas, by opening issues on the https://github.com/rust-lang/unsafe-code-guidelines (if one doesn't already exist) or -- perhaps -- by experimentation with the implementation.

joshtriplett · 2020-11-14T00:22:28Z

I would love to see a separated version of just the notion of reserved pointer values and using those as a niche.

glandium · 2020-11-14T07:19:59Z

The notion of reserved values (not only pointers) for use as niche would be great.

Zero Page Optimization

626b582

This comment has been minimized.

Sign in to view

hanna-kruppe reviewed Apr 12, 2018

View reviewed changes

SimonSapin reviewed Apr 12, 2018

View reviewed changes

Centril added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Apr 12, 2018

ExpHP reviewed Apr 12, 2018

View reviewed changes

ishitatsuyuki added 4 commits April 13, 2018 16:26

Reword motivation

11a5146

Expose attribute for configuring size

c82a246

Update NonNull requirements

67ef480

Add unresolved questions

3b9864c

hanna-kruppe reviewed Apr 13, 2018

View reviewed changes

ishitatsuyuki added 2 commits April 16, 2018 17:28

Reword internal refactoring

a529635

Newtype revamp

3167c54

Introduce a new type that can benefit from more optimizations.

oli-obk reviewed Apr 17, 2018

View reviewed changes

Remove ZST optimization

935d62e

Centril added A-optimization Optimization related proposals & ideas A-repr #[repr(...)] related proposals & ideas labels Nov 22, 2018

nikomatsakis closed this Nov 13, 2020

fstirlitz mentioned this pull request Dec 8, 2021

RFC: Alignment niches for references types. #3204

Open


		Inside Rust std, we rely on the assumption that zero page exists:

		https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238

Zero Page Optimization #2400

Zero Page Optimization #2400

Conversation

ishitatsuyuki commented Apr 12, 2018 • edited Loading

This comment has been minimized.

mark-i-m commented Apr 12, 2018

hanna-kruppe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshtriplett commented Apr 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ishitatsuyuki Apr 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comex commented Apr 12, 2018

ishitatsuyuki commented Apr 12, 2018

nagisa commented Apr 13, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanna-kruppe Apr 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ExpHP Apr 14, 2018 • edited Loading

Choose a reason for hiding this comment

hanna-kruppe commented Apr 13, 2018

joshtriplett commented Apr 13, 2018

mark-i-m commented Apr 13, 2018

joshtriplett commented Apr 13, 2018

nox commented Apr 14, 2018

Manishearth commented Apr 14, 2018

Manishearth commented Apr 14, 2018

oli-obk commented Apr 16, 2018

ishitatsuyuki commented Apr 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CAD97 commented Apr 18, 2018 • edited Loading

Manishearth commented Apr 18, 2018

CAD97 commented Apr 18, 2018

nox commented Apr 18, 2018

Amanieu commented Apr 18, 2018

oli-obk commented Apr 18, 2018

ishitatsuyuki commented Apr 19, 2018

mark-i-m commented Apr 19, 2018

oli-obk commented Apr 19, 2018

nagisa commented Apr 20, 2018 via email

pythonesque commented May 5, 2018 • edited Loading

ishitatsuyuki commented May 5, 2018

pythonesque commented May 5, 2018 • edited Loading

shepmaster commented May 21, 2018

nikomatsakis commented Nov 13, 2020

joshtriplett commented Nov 14, 2020

glandium commented Nov 14, 2020

ishitatsuyuki commented Apr 12, 2018 •

edited

Loading

ishitatsuyuki Apr 12, 2018 •

edited

Loading

hanna-kruppe Apr 14, 2018 •

edited

Loading

ExpHP Apr 14, 2018 •

edited

Loading

CAD97 commented Apr 18, 2018 •

edited

Loading

pythonesque commented May 5, 2018 •

edited

Loading

pythonesque commented May 5, 2018 •

edited

Loading