-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a guarantee on unmapped memory region (null region) #1831
Comments
I'm not sure what you're talking about. I feel examples are in order. |
@ubsan the line linked to shows that the futures crate is assuming that a pointer address of |
@seanmonstar ah, okay. @ishitatsuyuki I would like to make certain that you understand that that example is not undefined behavior. It simply may not do what you'd like it to. |
I am personally very much against any of these guarantees. |
Well, rust does currently believe that
|
@seanmonstar The standard (or core) library can make assumptions that other code should not make. |
It's quite okay to implement such things, since it's implemented in almost any operating system as a security measure, and we have already guaranteed that 1 byte = 8 bit (which doesn't apply to some minor architecture). |
There’s an important distinction between OS and Architecture. rustc does not (and probably never will) target any architecture where 1B≠8b, but nothing is preventing one from writing an OS (possibly in Rust, even!), where 0th page is map-able by the kernel or even the user. |
@nagisa: |
Note: pointer |
@nagisa You're confusing virtual and physical memory. |
It seems fine to assume that 0 means "NULL" in the vast majority of libraries, though some low-level systems programming code may need to do otherwise. (Such code would have to handle that assumption carefully, though it does seem reasonable to require that the language itself outside of any library should not break if you attempt to write a given value to a 0 pointer.) But expecting the availability of any particular non-zero invalid pointer seems like a non-portable assumption, albeit a relatively safe one on a large subset of environments. A more portable (if slightly slower) approach would declare a dummy object, obtain its pointer, and use that value as the placeholder. I have an alternative proposal, which could provide the same effect with more readable code: Teach rustc for any given target platform about the range of invalid pointer values; for any hosted environment, this would typically mean "any pointer in the zeroth page of memory". Then, extend the "forbidden value" optimization work in rust-lang/rust#36237 (which generalized the original null-pointer optimization) to optimize enum types containing pointers and a small number of values. That would provide a much more general and transparent memory size optimization, allowing many different enums to shrink to the size of a pointer on most platforms. With that optimization, the following enums would each take exactly the size of a pointer on most platforms: enum E1 {
Invalid,
Pointer(*SomeType),
}
enum E2 {
Invalid1,
Invalid2,
Pointer(*SomeType),
}
enum E3 {
Boolean(bool),
Pointer(*SomeType),
}
enum E4 {
Byte(u8),
Pointer(*SomeType),
}
enum E5 { // Existing null-pointer optimization
Invalid,
Reference(&SomeType),
}
enum E6 {
Invalid1,
Invalid2,
Reference(&SomeType),
}
enum E7 {
Byte(u8),
Reference(&SomeType),
} Consider how much code would use substantially less memory with this optimization, without requiring any manual special-cases. |
The optimization is also awesome, however the example I mentioned use this trick on atomic values. Without a guarantee this cannot be made safe. |
@ishitatsuyuki If you're willing to make exactly the same non-portable assumption, you could have that same guarantee. Given the optimization above, it doesn't seem too hard to add a way to assert that at compile time. The same compile-time assertion would also allow you to build your own unsafe code relying on the same assumption. (And, as a bonus, if someone attempts to compile your code on a platform that doesn't support that assumption, they'll get a compile-time error rather than runtime breakage.) |
It's a bit ridiculous but we can take this in the opposite direction and disable 0 as an invalid (even function?) pointer for certain targets. Not sure how much unsafe code would break. |
https://www.reddit.com/r/cpp/comments/5noem9/a_personal_tale_on_a_special_value/ here’s a case in point. There’s other considerations: for example packing stuff into a pointer may make garbage collectors harder to write in a sense that rooting becomes more complex; and page size across platforms or even OS configurations may be variable. |
Nominating for @rust-lang/lang discussion. |
It would be quite troublesome to treat even a single (4K) page at address zero as guaranteed unreachable, much less a few, since many microcontrollers (offhand: the entire STM32 series) maps the flash or RAM there when booting from it, and on the lower end, 4K may be all you get (in case of RAM) or a significant part of it (in case of flash). |
@whitequark I definitely agree that many targets will not guarantee it. I think the reasonable question is "could, or should, Rust take into account that some targets do". Much like the null-pointer optimization for None/Some, this seems like something where teaching the compiler about a thing C programmers do could allow them to do it safely and without hacks, by letting the compiler do it for them. I don't think we should make a universal guarantee, but a platform-specific one with optimization possibilities makes sense. |
If this is just an internal layout optimization (i.e. part of the "Rust" ABI), I have no objection, since it will not affect any correct code. Then I also don't see the need for an RFC. If this is an externally visible layout optimization (i.e. part of the "C" ABI, like the |
@joshtriplett Yes, opting in sounds good. |
You could exploit type alignment for this. For example, memory address 1 is guaranteed to not contain a valid |
@Amanieu Only on platforms that prohibit unaligned pointers. On x86, you could have an i32 at an odd address. But yes, some means ought to exist to automatically use the low bits of an aligned pointer, as an optimization. |
@joshtriplett Actually, dereferencing a misaligned pointer is UB in LLVM (and therefore in Rust as well). The compiler can and will make assumptions about the alignment of a pointer, especially with auto-vectorization. The only correct way to access misaligned data is through Basically this means that if you have a |
On some architectures, notably the 64bit ones that don't have a 64bit address space, there are a number of unused bits that programmer (or language) is free to use. If we added an option the target specification to for these bits, then rustc could use that also for these optimisations. I am also in favour of not having address 0 (or any other addresses) having a special meaning unless it is specified in the target configuration cfg-able by the programmer. |
We discussed this in today's @rust-lang/lang meeting, with the following resolutions:
As always, if someone is interested in producing an RFC and would like help doing so, please post a pre-RFC on https://internals.rust-lang.org/ and say that you'd like some guidance through the RFC process. |
Most OS implement safeguards on null arithmetic, by reserving the first memory pages as unmapped (reference).
It's good to guarantee some amount of spaces, to enable the use of raw pointers with small enum. (By the way, you have 16 bits to play with on amd64.)
For example, this one is currently UB. At least one page of width would be good to implement those enum-like structures.
The text was updated successfully, but these errors were encountered: