Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a guarantee on unmapped memory region (null region) #1831

Closed
ishitatsuyuki opened this issue Dec 27, 2016 · 26 comments
Closed

Provide a guarantee on unmapped memory region (null region) #1831

ishitatsuyuki opened this issue Dec 27, 2016 · 26 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@ishitatsuyuki
Copy link
Contributor

Most OS implement safeguards on null arithmetic, by reserving the first memory pages as unmapped (reference).

It's good to guarantee some amount of spaces, to enable the use of raw pointers with small enum. (By the way, you have 16 bits to play with on amd64.)

For example, this one is currently UB. At least one page of width would be good to implement those enum-like structures.

@strega-nil
Copy link

I'm not sure what you're talking about. I feel examples are in order.

@seanmonstar
Copy link
Contributor

seanmonstar commented Jan 1, 2017

@ubsan the line linked to shows that the futures crate is assuming that a pointer address of 1 could not possibly be anything, and @ishitatsuyuki is saying that the current Rust implementation doesn't guarantee that.

@strega-nil
Copy link

@seanmonstar ah, okay.

@ishitatsuyuki I would like to make certain that you understand that that example is not undefined behavior. It simply may not do what you'd like it to.

@strega-nil
Copy link

I am personally very much against any of these guarantees. 0 and 1 may be valid pointer values, for example, if you're writing an operating system.

@seanmonstar
Copy link
Contributor

Well, rust does currently believe that 0 is indeed null: https://github.com/rust-lang/rust/blob/1.14.0/src/libcore/ptr.rs#L55

p.is_null() checks for 0 as well.

@strega-nil
Copy link

@seanmonstar The standard (or core) library can make assumptions that other code should not make.

@ishitatsuyuki
Copy link
Contributor Author

It's quite okay to implement such things, since it's implemented in almost any operating system as a security measure, and we have already guaranteed that 1 byte = 8 bit (which doesn't apply to some minor architecture).

@nagisa
Copy link
Member

nagisa commented Jan 2, 2017

It's quite okay to implement such things, since it's implemented in almost any operating system as a security measure, and we have already guaranteed that 1 byte = 8 bit

There’s an important distinction between OS and Architecture. rustc does not (and probably never will) target any architecture where 1B≠8b, but nothing is preventing one from writing an OS (possibly in Rust, even!), where 0th page is map-able by the kernel or even the user.

@ishitatsuyuki
Copy link
Contributor Author

@nagisa:
NULL has been 0 for a long time, both defined in POSIX and Windows. I don't really see the point in reinventing this. Redox follows the de facto standard too.
Hence, convenient enum in pointer support is important, mostly the atomic one in the futures crate example I mentioned. We should have flexibility in 3rd-party crates as well; unlike Go, Rust's std is quite minimal, lacking async I/O or coroutine things.

@ticki
Copy link
Contributor

ticki commented Jan 2, 2017

Note: pointer 1 is used internally in both libcore and libstd as a non-null placeholding pointers (in e.g. empty Vec<T>).

@ticki
Copy link
Contributor

ticki commented Jan 2, 2017

@nagisa You're confusing virtual and physical memory.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jan 4, 2017
@joshtriplett
Copy link
Member

joshtriplett commented Jan 10, 2017

It seems fine to assume that 0 means "NULL" in the vast majority of libraries, though some low-level systems programming code may need to do otherwise. (Such code would have to handle that assumption carefully, though it does seem reasonable to require that the language itself outside of any library should not break if you attempt to write a given value to a 0 pointer.)

But expecting the availability of any particular non-zero invalid pointer seems like a non-portable assumption, albeit a relatively safe one on a large subset of environments. A more portable (if slightly slower) approach would declare a dummy object, obtain its pointer, and use that value as the placeholder.

I have an alternative proposal, which could provide the same effect with more readable code:

Teach rustc for any given target platform about the range of invalid pointer values; for any hosted environment, this would typically mean "any pointer in the zeroth page of memory". Then, extend the "forbidden value" optimization work in rust-lang/rust#36237 (which generalized the original null-pointer optimization) to optimize enum types containing pointers and a small number of values. That would provide a much more general and transparent memory size optimization, allowing many different enums to shrink to the size of a pointer on most platforms.

With that optimization, the following enums would each take exactly the size of a pointer on most platforms:

enum E1 {
    Invalid,
    Pointer(*SomeType),
}

enum E2 {
    Invalid1,
    Invalid2,
    Pointer(*SomeType),
}

enum E3 {
    Boolean(bool),
    Pointer(*SomeType),
}

enum E4 {
    Byte(u8),
    Pointer(*SomeType),
}

enum E5 { // Existing null-pointer optimization
    Invalid,
    Reference(&SomeType),
}

enum E6 {
    Invalid1,
    Invalid2,
    Reference(&SomeType),
}

enum E7 {
    Byte(u8),
    Reference(&SomeType),
}

Consider how much code would use substantially less memory with this optimization, without requiring any manual special-cases.

@ishitatsuyuki
Copy link
Contributor Author

The optimization is also awesome, however the example I mentioned use this trick on atomic values. Without a guarantee this cannot be made safe.

@joshtriplett
Copy link
Member

@ishitatsuyuki If you're willing to make exactly the same non-portable assumption, you could have that same guarantee. Given the optimization above, it doesn't seem too hard to add a way to assert that at compile time. The same compile-time assertion would also allow you to build your own unsafe code relying on the same assumption. (And, as a bonus, if someone attempts to compile your code on a platform that doesn't support that assumption, they'll get a compile-time error rather than runtime breakage.)

@eddyb
Copy link
Member

eddyb commented Jan 10, 2017

It's a bit ridiculous but we can take this in the opposite direction and disable 0 as an invalid (even function?) pointer for certain targets. Not sure how much unsafe code would break.

@nagisa
Copy link
Member

nagisa commented Jan 13, 2017

https://www.reddit.com/r/cpp/comments/5noem9/a_personal_tale_on_a_special_value/ here’s a case in point.

There’s other considerations: for example packing stuff into a pointer may make garbage collectors harder to write in a sense that rooting becomes more complex; and page size across platforms or even OS configurations may be variable.

@joshtriplett
Copy link
Member

Nominating for @rust-lang/lang discussion.

@whitequark
Copy link
Member

It would be quite troublesome to treat even a single (4K) page at address zero as guaranteed unreachable, much less a few, since many microcontrollers (offhand: the entire STM32 series) maps the flash or RAM there when booting from it, and on the lower end, 4K may be all you get (in case of RAM) or a significant part of it (in case of flash).

@joshtriplett
Copy link
Member

@whitequark I definitely agree that many targets will not guarantee it. I think the reasonable question is "could, or should, Rust take into account that some targets do".

Much like the null-pointer optimization for None/Some, this seems like something where teaching the compiler about a thing C programmers do could allow them to do it safely and without hacks, by letting the compiler do it for them. I don't think we should make a universal guarantee, but a platform-specific one with optimization possibilities makes sense.

@whitequark
Copy link
Member

I don't think we should make a universal guarantee, but a platform-specific one with optimization possibilities makes sense.

If this is just an internal layout optimization (i.e. part of the "Rust" ABI), I have no objection, since it will not affect any correct code. Then I also don't see the need for an RFC.

If this is an externally visible layout optimization (i.e. part of the "C" ABI, like the Option<*x>), then varying it by platform seems very troublesome, because inevitably unsafe code will be written with the assumption that the optimization is active, and then break. Worse, I can't think of ubsan-like tooling that could easily find such mistakes.

@whitequark
Copy link
Member

@joshtriplett Yes, opting in sounds good.

@Amanieu
Copy link
Member

Amanieu commented Feb 23, 2017

You could exploit type alignment for this. For example, memory address 1 is guaranteed to not contain a valid i32.

@joshtriplett
Copy link
Member

@Amanieu Only on platforms that prohibit unaligned pointers. On x86, you could have an i32 at an odd address. But yes, some means ought to exist to automatically use the low bits of an aligned pointer, as an optimization.

@Amanieu
Copy link
Member

Amanieu commented Feb 24, 2017

@joshtriplett Actually, dereferencing a misaligned pointer is UB in LLVM (and therefore in Rust as well). The compiler can and will make assumptions about the alignment of a pointer, especially with auto-vectorization. The only correct way to access misaligned data is through ptr::read_unaligned and ptr::write_unaligned.

Basically this means that if you have a &i32 or Box<i32> then the language guarantees that the address is a multiple of 4.

@parched
Copy link

parched commented Feb 24, 2017

On some architectures, notably the 64bit ones that don't have a 64bit address space, there are a number of unused bits that programmer (or language) is free to use. If we added an option the target specification to for these bits, then rustc could use that also for these optimisations.

I am also in favour of not having address 0 (or any other addresses) having a special meaning unless it is specified in the target configuration cfg-able by the programmer.

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/lang meeting, with the following resolutions:

  1. The assumption that 0 is the null pointer, and that it can be used accordingly, is one that people on specific target platforms (e.g. embedded) may legitimately want to question. The compiler has some assumptions about this. If someone wants to target a platform that has 0 as a valid pointer, they should talk to the @rust-lang/compiler team about that, and then propose an RFC.
  2. Regarding the use of 1 as a non-null invalid pointer value in the internals of libstd and libcore: that needs review with the people working on the "unsafe code guidelines"; libstd and libcore will go along with those guidelines once available. (Also talk with @rust-lang/libs about this.)
  3. We'd encourage someone to teach the compiler about non-null invalid pointers on the target platform (e.g. "the first 4k of pointers are invalid"), and then let the compiler optimize enums and similar accordingly. That'll need a full RFC, which may want to work with the "scenarios" proposal, as well as the compiler team.

As always, if someone is interested in producing an RFC and would like help doing so, please post a pre-RFC on https://internals.rust-lang.org/ and say that you'd like some guidance through the RFC process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests