-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When are things guaranteed to have unique addresses? #206
Comments
FWIW I'm under that impression because I can't find such a guarantee being written down anywhere (I looked in the book, the reference, and the nomicon). In this case, let mut a = 2;
let mut b = 2;
assert_ne!(&mut a as *mut _ as usize, &mut b as *mut _ as usize); // PASS we do guarantee that the assert never fails because we guarantee that two aliasing let a: Freeze = val;
let b: Freeze = val;
assert_ne!(&a as *const _ as usize, &b as *const _ as usize); I don't think it would be unsound for If we allow that, as long as the allocation cannot be modified (e.g. because the allocation is immutable and it does not contain an I have no clue whether this is worth doing, but in general I find that code that relies on the relative addresses of let bindings on the stack to be brittle anyways.
Just keep in mind that the address is only unique while the let binding is alive, e.g., pub fn foo() -> bool { baz(bar()) }
pub fn bar() -> *const i32 {
let x = 42;
&x as *const _ // this address is only unique until the end of bar!
}
pub fn baz(x: *const i32) -> bool {
let x = 42;
assert_ne!(&x as *const _ as usize, x as usize)
// ^ this can fail since both addresses are not guaranteed to be different
}
Note that there are no references to constants: |
One main difference between C and C++ is that they do not have zero-sized types (although C++ has EBO / #[derive(Copy, Clone)] struct ZST;
fn main() {
let mut xs = [ZST; 2];
assert_eq!(
&mut xs[0] as *mut _ as usize,
&mut xs[1] as *mut _ as usize
);
} passes. I don't see why this couldn't happen for multiple |
The entire purpose of For let a: Freeze = val;
let b: Freeze = val;
assert_ne!(&a as *const _ as usize, &b as *const _ as usize); we guarantee that addresses are different if the type has a size of at least 1. I find it hard to imagine a semantics that lets us overlap their storage here.
Operationally really |
Are there any other uses of address uniqueness? Maybe instead of coming up with complicated rules for when addresses are unique and then being limited by those rules for backwards compatibility, we could have an attribute or something to opt-in to a variable having a unique address? |
I think the rules will only become even more complicated if we try to relax them. Remember, Rust is not specified axiomatically by saying "these properties hold for all program executions"; there is an Abstract Machine with an operational specification -- something you can put into an interpreter -- that explains all Rust behavior. So you'd have to propose some mechanism e.g. in Miri to actually observe overlapping addresses for such variables. |
I was thinking about this the other day, and have some thoughts. For now, I am going to assume two things 1) local variables have stable addresses (not that the alternative might not be just as interesting). 2) A strict provenance model like @Gankra proposes. If we allow some int to pointer casts, some of these options may fall away. First thing first: I believe there is absolutely no need for distinct allocations to have distinct addresses - we have provenance to disambiguate. As I understand SB, the "provenance" of a pointer is an integer that identifies an item in the borrow stacks, and these integers never repeat. Consequently, there should be no problem saying that "when a pointer is dereferenced, we search all the bytes that have the same address as the pointer, and see if there is an item in any of the borrow stacks that makes the access legal. There can be at most one, since no pointer can have provenance to more than one allocation." This might feel a little surprising, but I don't think it's as bad as it initially sounds; it might even be a way to drive home the "memory is not flat" point. (this also plays nicely with the mental model that Gankra proposed for memory, where it's a two dimensional grid of address x provenance). Now the probably more difficult question: What do we want to guarantee? I am for now only going to think about stack local variables; there might be interesting (different) arguments for other categories of allocations. I see at least a few possibilities, but am completely undecided myself. Because I'll be talking about some optimizations, I'll need to differentiate between "live range in the abstract machine" and "live range as reported by compiler analyses." I'll refer to the first as "scope" and the second as "liveness." We do not guarantee that simultaneously in scope locals have distinct addressesThis has the benefit of enabling optimizations. The stack slot in this code cannot be re-used: let mut x = 5;
foo(&x);
x = 10;
let y = 5;
foo(&y); It would be possible and actually fairly easy for a Rust compiler to see that let x = 5;
let y = 5;
foo(&x, &y); That is simultaneously more powerful but also potentially more surprising. I'm not sure how much benefit the above two optimizations give. However, I could see the following optimization being potentially more useful: let mut x = input();
foo(&x);
x += 10;
let y = x;
bar(&y); here, the optimization would not be to re-use a stack slot (which has relatively small benefits), but to be able to merge We do guarantee that simultaneously in scope locals have distinct addressesThe main benefit of this is to disable the potentially surprising optimizations above. I had asked about use cases for such a guarantee on discord (besides not having to go "wtf is the compiler doing"), and something like This additionally has the downside of making MIR storage markers be statements that have significant semantics. In other words, StorageLive(_3);
StorageDead(_4); could not be freely re-ordered. This is related to and discussed in rust-lang/rust#68622 . Some alternative?We could try and define the guarantees around here in terms of some analysis or other conditions. I've talked to Ralf enough that my instinctive reaction to that is now also "that's not an operational semantics," but I actually think the need for a real operational semantics might be reduced here - the values of the addresses are implementation defined anyway. @moulins had suggested the following on discord:
This would maybe allow some of the optimizations, but I have some concerns about this definition; at least as I understand it, there's an implicit requirement here of "two pointers that exist at the same time." But it's not clear to me how we should define this concept without typed memory - pointers only exist as values temporarily, most of the time there are just pointer bytes in memory that do not necessarily correspond to an actual value. In any case, there might be some idea here that I haven't thought of |
Is there anything that would keep us from merging (But it also seems fine to say "well just use |
That does sound fine for any read-only static. |
It seems better to require const for this. i can think of cases in C++ where a static is used just to generate a value so that the pointer is used as an identity. I think it would be confusing to have this require UnsafeCell even in the case where it's never written. That said, I don't feel that strongly here... but I suspect in practice this would be pretty low value of an optimization TBH. |
The |
What I expected cold be phrased as "simultaneously live locals have distinct addresses. That would still allow your first and 3rd optimization, but not the 2nd. |
Well, maybe, but that would require a definition of liveness on the AM, which seems non-trivial |
It's pretty trivial to define liveness if the input MIR has |
Well, storage statements early on are no different from defining it based on scope |
Ah yes, I should have specified -- as @digama0 said, I imagine on the MIR level we have explicit liveness annotations. This basically moves the liveness analysis to Rust → MIR lowering and out of the operational semantics. Maybe that's cheating, maybe that's elegant -- I have not decided yet. ;)
Yes it is different; your first example is easy to support with this approach. The third one is tricky since assigning from |
Oh, I see, you're suggesting doing a |
It's not quite arbitrary IMO, it is sort of the minimal guarantee we can make if we want to ensure that different variables that might get used at the same time will never be on the same address. |
Maybe, but even then. What would the actual spec say? |
It would describe the Rust → MIR lowering, including the algorithm that adds the liveness statements. |
So, while this was not the original intent of the question, one thing that we probably have to do is figure out how this: fn nop(a: &mut i32) {}
fn bad(limit: usize) {
if limit == 0 {
return;
}
let mut x: u32 = 0;
let mut r = &mut x;
bad(limit - 1);
nop(r);
return;
}
fn main() {
bad(usize::MAX);
println!("unreachable");
} Printing "unreachable" when compiled on |
I think that's a separate subject -- removing allocations (whether on the stack or on the heap) is, eh, tricky to justify in the best case if you have finite memory. So, this is the same as LLVM optimizing the following to print "unreachable": int *x = malloc(SIZE_MAX);
int *y = malloc(SIZE_MAX);
if (x && y) {
printf("This is unreachable. I could deref a NULL pointer and this program would still be fine.");
}
That should have been a sign to create a new issue instead. :) See #328. |
Even if we emitted I say this both in the sense of "I am pretty sure LLVM would not misoptimize that" and also "well obviously we have to slap LLVM on the wrist if it does". |
References to constants are not guaranteed to have unique addresses:
Since
const
s are just aliases, the same holds for those:What about
static
s?static
variables with interior mutability (andstatic mut
variables) obviously must have unique addresses, but what about ones without?And local variables? (Assuming that both variables are alive at the point of comparison, since obviously variables that have fallen out of scope can have their addresses reused.)
Currently, rustc seems to produce unique addresses in both cases. But @gnzlbg is under the impression that multiple local variables are not guaranteed to have distinct addresses.
Address uniqueness can be a useful property, e.g. if you want a unique 'sentinel' value to assign to a pointer variable. On the other hand, I'd say Rust usually avoids giving much significance to something being a variable as opposed to an expression.
A related issue is #15, which is about whether the address of something can change over time.
Compared to C and C++
In C, rvalues are not implicitly bound to addresses unless assigned to a variable (or a C99 compound literal). C appears to guarantee that distinct variables have distinct addresses.
In C++, rvalues can be implicitly bound to const references, which gives them an address: this is "temporary materialization" and creates a "temporary object". Like C, the C++ spec guarantees that distinct "objects" "compare unequal", so I think this assertion is guaranteed to pass (not sure though):
In practice, this means that the compiler always stores a copy of the constant on the stack and takes the address of that, rather than directly referencing a static allocation.
The text was updated successfully, but these errors were encountered: