Provide a guarantee on unmapped memory region (null region) #1831

ishitatsuyuki · 2016-12-27T13:58:29Z

Most OS implement safeguards on null arithmetic, by reserving the first memory pages as unmapped (reference).

It's good to guarantee some amount of spaces, to enable the use of raw pointers with small enum. (By the way, you have 16 bits to play with on amd64.)

For example, this one is currently UB. At least one page of width would be good to implement those enum-like structures.

strega-nil · 2017-01-01T20:12:10Z

I'm not sure what you're talking about. I feel examples are in order.

seanmonstar · 2017-01-01T20:28:25Z

@ubsan the line linked to shows that the futures crate is assuming that a pointer address of 1 could not possibly be anything, and @ishitatsuyuki is saying that the current Rust implementation doesn't guarantee that.

strega-nil · 2017-01-01T20:56:03Z

@seanmonstar ah, okay.

@ishitatsuyuki I would like to make certain that you understand that that example is not undefined behavior. It simply may not do what you'd like it to.

strega-nil · 2017-01-01T20:56:59Z

I am personally very much against any of these guarantees. 0 and 1 may be valid pointer values, for example, if you're writing an operating system.

seanmonstar · 2017-01-01T21:01:02Z

Well, rust does currently believe that 0 is indeed null: https://github.com/rust-lang/rust/blob/1.14.0/src/libcore/ptr.rs#L55

p.is_null() checks for 0 as well.

strega-nil · 2017-01-01T23:14:24Z

@seanmonstar The standard (or core) library can make assumptions that other code should not make.

ishitatsuyuki · 2017-01-02T02:23:32Z

It's quite okay to implement such things, since it's implemented in almost any operating system as a security measure, and we have already guaranteed that 1 byte = 8 bit (which doesn't apply to some minor architecture).

nagisa · 2017-01-02T06:20:15Z

It's quite okay to implement such things, since it's implemented in almost any operating system as a security measure, and we have already guaranteed that 1 byte = 8 bit

There’s an important distinction between OS and Architecture. rustc does not (and probably never will) target any architecture where 1B≠8b, but nothing is preventing one from writing an OS (possibly in Rust, even!), where 0th page is map-able by the kernel or even the user.

ishitatsuyuki · 2017-01-02T07:46:57Z

@nagisa:
NULL has been 0 for a long time, both defined in POSIX and Windows. I don't really see the point in reinventing this. Redox follows the de facto standard too.
Hence, convenient enum in pointer support is important, mostly the atomic one in the futures crate example I mentioned. We should have flexibility in 3rd-party crates as well; unlike Go, Rust's std is quite minimal, lacking async I/O or coroutine things.

ticki · 2017-01-02T16:32:10Z

Note: pointer 1 is used internally in both libcore and libstd as a non-null placeholding pointers (in e.g. empty Vec<T>).

ticki · 2017-01-02T16:37:08Z

@nagisa You're confusing virtual and physical memory.

joshtriplett · 2017-01-10T03:40:49Z

It seems fine to assume that 0 means "NULL" in the vast majority of libraries, though some low-level systems programming code may need to do otherwise. (Such code would have to handle that assumption carefully, though it does seem reasonable to require that the language itself outside of any library should not break if you attempt to write a given value to a 0 pointer.)

But expecting the availability of any particular non-zero invalid pointer seems like a non-portable assumption, albeit a relatively safe one on a large subset of environments. A more portable (if slightly slower) approach would declare a dummy object, obtain its pointer, and use that value as the placeholder.

I have an alternative proposal, which could provide the same effect with more readable code:

Teach rustc for any given target platform about the range of invalid pointer values; for any hosted environment, this would typically mean "any pointer in the zeroth page of memory". Then, extend the "forbidden value" optimization work in rust-lang/rust#36237 (which generalized the original null-pointer optimization) to optimize enum types containing pointers and a small number of values. That would provide a much more general and transparent memory size optimization, allowing many different enums to shrink to the size of a pointer on most platforms.

With that optimization, the following enums would each take exactly the size of a pointer on most platforms:

enum E1 {
    Invalid,
    Pointer(*SomeType),
}

enum E2 {
    Invalid1,
    Invalid2,
    Pointer(*SomeType),
}

enum E3 {
    Boolean(bool),
    Pointer(*SomeType),
}

enum E4 {
    Byte(u8),
    Pointer(*SomeType),
}

enum E5 { // Existing null-pointer optimization
    Invalid,
    Reference(&SomeType),
}

enum E6 {
    Invalid1,
    Invalid2,
    Reference(&SomeType),
}

enum E7 {
    Byte(u8),
    Reference(&SomeType),
}

Consider how much code would use substantially less memory with this optimization, without requiring any manual special-cases.

ishitatsuyuki · 2017-01-10T07:57:29Z

The optimization is also awesome, however the example I mentioned use this trick on atomic values. Without a guarantee this cannot be made safe.

joshtriplett · 2017-01-10T08:08:16Z

@ishitatsuyuki If you're willing to make exactly the same non-portable assumption, you could have that same guarantee. Given the optimization above, it doesn't seem too hard to add a way to assert that at compile time. The same compile-time assertion would also allow you to build your own unsafe code relying on the same assumption. (And, as a bonus, if someone attempts to compile your code on a platform that doesn't support that assumption, they'll get a compile-time error rather than runtime breakage.)

eddyb · 2017-01-10T10:02:28Z

It's a bit ridiculous but we can take this in the opposite direction and disable 0 as an invalid (even function?) pointer for certain targets. Not sure how much unsafe code would break.

nagisa · 2017-01-13T16:45:49Z

https://www.reddit.com/r/cpp/comments/5noem9/a_personal_tale_on_a_special_value/ here’s a case in point.

There’s other considerations: for example packing stuff into a pointer may make garbage collectors harder to write in a sense that rooting becomes more complex; and page size across platforms or even OS configurations may be variable.

joshtriplett · 2017-02-23T22:03:51Z

Nominating for @rust-lang/lang discussion.

whitequark · 2017-02-23T22:43:53Z

It would be quite troublesome to treat even a single (4K) page at address zero as guaranteed unreachable, much less a few, since many microcontrollers (offhand: the entire STM32 series) maps the flash or RAM there when booting from it, and on the lower end, 4K may be all you get (in case of RAM) or a significant part of it (in case of flash).

joshtriplett · 2017-02-23T22:49:24Z

@whitequark I definitely agree that many targets will not guarantee it. I think the reasonable question is "could, or should, Rust take into account that some targets do".

Much like the null-pointer optimization for None/Some, this seems like something where teaching the compiler about a thing C programmers do could allow them to do it safely and without hacks, by letting the compiler do it for them. I don't think we should make a universal guarantee, but a platform-specific one with optimization possibilities makes sense.

whitequark · 2017-02-23T22:52:19Z

I don't think we should make a universal guarantee, but a platform-specific one with optimization possibilities makes sense.

If this is just an internal layout optimization (i.e. part of the "Rust" ABI), I have no objection, since it will not affect any correct code. Then I also don't see the need for an RFC.

If this is an externally visible layout optimization (i.e. part of the "C" ABI, like the Option<*x>), then varying it by platform seems very troublesome, because inevitably unsafe code will be written with the assumption that the optimization is active, and then break. Worse, I can't think of ubsan-like tooling that could easily find such mistakes.

whitequark · 2017-02-23T22:55:41Z

@joshtriplett Yes, opting in sounds good.

Amanieu · 2017-02-23T22:59:02Z

You could exploit type alignment for this. For example, memory address 1 is guaranteed to not contain a valid i32.

joshtriplett · 2017-02-23T23:30:34Z

@Amanieu Only on platforms that prohibit unaligned pointers. On x86, you could have an i32 at an odd address. But yes, some means ought to exist to automatically use the low bits of an aligned pointer, as an optimization.

Amanieu · 2017-02-24T00:48:46Z

@joshtriplett Actually, dereferencing a misaligned pointer is UB in LLVM (and therefore in Rust as well). The compiler can and will make assumptions about the alignment of a pointer, especially with auto-vectorization. The only correct way to access misaligned data is through ptr::read_unaligned and ptr::write_unaligned.

Basically this means that if you have a &i32 or Box<i32> then the language guarantees that the address is a multiple of 4.

parched · 2017-02-24T06:22:21Z

On some architectures, notably the 64bit ones that don't have a 64bit address space, there are a number of unused bits that programmer (or language) is free to use. If we added an option the target specification to for these bits, then rustc could use that also for these optimisations.

I am also in favour of not having address 0 (or any other addresses) having a special meaning unless it is specified in the target configuration cfg-able by the programmer.

joshtriplett · 2017-04-06T21:50:15Z

We discussed this in today's @rust-lang/lang meeting, with the following resolutions:

The assumption that 0 is the null pointer, and that it can be used accordingly, is one that people on specific target platforms (e.g. embedded) may legitimately want to question. The compiler has some assumptions about this. If someone wants to target a platform that has 0 as a valid pointer, they should talk to the @rust-lang/compiler team about that, and then propose an RFC.
Regarding the use of 1 as a non-null invalid pointer value in the internals of libstd and libcore: that needs review with the people working on the "unsafe code guidelines"; libstd and libcore will go along with those guidelines once available. (Also talk with @rust-lang/libs about this.)
We'd encourage someone to teach the compiler about non-null invalid pointers on the target platform (e.g. "the first 4k of pointers are invalid"), and then let the compiler optimize enums and similar accordingly. That'll need a full RFC, which may want to work with the "scenarios" proposal, as well as the compiler team.

As always, if someone is interested in producing an RFC and would like help doing so, please post a pre-RFC on https://internals.rust-lang.org/ and say that you'd like some guidance through the RFC process.

nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jan 4, 2017

joshtriplett added the I-nominated label Feb 23, 2017

joshtriplett closed this as completed Apr 6, 2017

joshtriplett removed the I-nominated label Apr 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a guarantee on unmapped memory region (null region) #1831

Provide a guarantee on unmapped memory region (null region) #1831

ishitatsuyuki commented Dec 27, 2016

strega-nil commented Jan 1, 2017

seanmonstar commented Jan 1, 2017 •

edited

Loading

strega-nil commented Jan 1, 2017

strega-nil commented Jan 1, 2017

seanmonstar commented Jan 1, 2017

strega-nil commented Jan 1, 2017

ishitatsuyuki commented Jan 2, 2017

nagisa commented Jan 2, 2017

ishitatsuyuki commented Jan 2, 2017

ticki commented Jan 2, 2017

ticki commented Jan 2, 2017

joshtriplett commented Jan 10, 2017 •

edited

Loading

ishitatsuyuki commented Jan 10, 2017

joshtriplett commented Jan 10, 2017

eddyb commented Jan 10, 2017

nagisa commented Jan 13, 2017

joshtriplett commented Feb 23, 2017

whitequark commented Feb 23, 2017

joshtriplett commented Feb 23, 2017

whitequark commented Feb 23, 2017

whitequark commented Feb 23, 2017

Amanieu commented Feb 23, 2017

joshtriplett commented Feb 23, 2017

Amanieu commented Feb 24, 2017 •

edited

Loading

parched commented Feb 24, 2017

joshtriplett commented Apr 6, 2017

Provide a guarantee on unmapped memory region (null region) #1831

Provide a guarantee on unmapped memory region (null region) #1831

Comments

ishitatsuyuki commented Dec 27, 2016

strega-nil commented Jan 1, 2017

seanmonstar commented Jan 1, 2017 • edited Loading

strega-nil commented Jan 1, 2017

strega-nil commented Jan 1, 2017

seanmonstar commented Jan 1, 2017

strega-nil commented Jan 1, 2017

ishitatsuyuki commented Jan 2, 2017

nagisa commented Jan 2, 2017

ishitatsuyuki commented Jan 2, 2017

ticki commented Jan 2, 2017

ticki commented Jan 2, 2017

joshtriplett commented Jan 10, 2017 • edited Loading

ishitatsuyuki commented Jan 10, 2017

joshtriplett commented Jan 10, 2017

eddyb commented Jan 10, 2017

nagisa commented Jan 13, 2017

joshtriplett commented Feb 23, 2017

whitequark commented Feb 23, 2017

joshtriplett commented Feb 23, 2017

whitequark commented Feb 23, 2017

whitequark commented Feb 23, 2017

Amanieu commented Feb 23, 2017

joshtriplett commented Feb 23, 2017

Amanieu commented Feb 24, 2017 • edited Loading

parched commented Feb 24, 2017

joshtriplett commented Apr 6, 2017

seanmonstar commented Jan 1, 2017 •

edited

Loading

joshtriplett commented Jan 10, 2017 •

edited

Loading

Amanieu commented Feb 24, 2017 •

edited

Loading