Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hook for updating the canonical address #3787

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dheaton-arm
Copy link

@dheaton-arm dheaton-arm commented Aug 5, 2024

Where a method modifies the address that should be considered canonical - such as via TBI or when an MTE tag has been set - Miri will need to be notified.

This adds a hook to inform Miri of the new address.


The implementation used here permits a single (mutable) alias, with the intent that user code will only use the alias (once set) rather than the original base_addr. I'm open to feedback on this approach, but ran into issues when trying to update the actual base_addr when popping the function off the stack (and thus deallocating anything created in the function), hence this implementation which allows that to work correctly.

In particular, this would allow rust-lang/stdarch#1622 to work under Miri, as it could use this hook to ensure that Miri is aware that the change to the pointer address is actually fine in context.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

I'm sorry I am a bit confused here, could you explain the problem this is trying to solve?

In Rust it is generally not correct to have an allocation be accessed via two different addresses. So "only use alias once set" doesn't work; if an alias is to be used, there must not have been any accesses to the original address ever. The existing t-opsem discussions for pointer tagging schemes concluded that these schemes are fine if we can just pretend that the full pointer with the tag is the actual address that the allocation is located at. However, accessing the same allocation with pointers that use different tags is equivalent to accessing physical memory that is mapped into the virtual address space multiple times, and that's something Rust doesn't really support currently (and AFAIK neither doses LLVM).

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

Maybe miri_set_canonical_address should be considered a realloc-like operation, in which case it could move the memory to a new place. However, in that case it's very suspicious that this would only be done inside Miri -- such a realloc would have to occur in real executions as well, or else there is UB.

This seems to me like something that needs opsem discussions, so maybe bring this up on Zulip or at https://github.com/rust-lang/unsafe-code-guidelines/ ?

@dheaton-arm
Copy link
Author

I'm sorry I am a bit confused here, could you explain the problem this is trying to solve?

It's currently not possible to run code which modifies the top byte of a pointer (whether via TBI or MTE) under Miri, as Miri will consider the pointer to be dangling on dereference. Given that rustc itself does not currently have any issues (in practice) with this, and the only issues arise when running under Miri, the intent is to allow for updating the address Miri considers an allocation to live at.

The existing t-opsem discussions for pointer tagging schemes concluded that these schemes are fine if we can just pretend that the full pointer with the tag is the actual address that the allocation is located at.

I asked on Zulip for advice regarding adding TBI accessors and MTE intrinsics, and was referred to discussions which - to my reading - concluded what you describe.
That is fine, but I'm uncertain as to how to fully change the canonical address (ie, the actual address that Miri expects from that point on) - changing its entry in the pointer maps (base_addr, etc) works but the 'old' pointer seems to be held in a stack frame (which causes issues when said stack frame is popped off and all allocations are deallocated, in this case using the wrong/old pointer), which I haven't been able to work out, hence this implementation.
I'm happy to discuss this more if necessary, also.

Maybe miri_set_canonical_address should be considered a realloc-like operation

Yes, that would be the intent, I'm just not sure how best to handle this as realloc, for example, generates an address without the capability to set the top byte.

Rust doesn't really support currently (and AFAIK neither doses LLVM).

Could you elaborate on what you mean by this, sorry? The code that this would enable to run under Miri currently already runs under Rust, and MTE has, at least, been generally supported by LLVM for some time now; in the case of TBI, the only issues I'm aware of are that some operations (such as memcpy) are unsafe between aliasing pointers as the aliasing analysis fails, but otherwise it works.

But, in short, I would be happy to replace the current approach which uses a single, mutable, alias with actually changing the address that the allocation is considered to live at, as you describe - but advice on where such a change needs to take place would be appreciated, as it's not clear to me.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

It's currently not possible to run code which modifies the top byte of a pointer (whether via TBI or MTE) under Miri, as Miri will consider the pointer to be dangling on dereference. Given that rustc itself does not currently have any issues (in practice) with this, and the only issues arise when running under Miri, the intent is to allow for updating the address Miri considers an allocation to live at.

I assume this is UB in rustc as well, you just happen to not hit any optimizations that would cause problems.
But if Miri complains about the code, it's almost certainly already UB, and you need a fix not just for Miri but for the entire approach.

What is the smallest example demonstrating the unexpected UB?

Could you elaborate on what you mean by this, sorry?

I mean that the code has UB.

The code that this would enable to run under Miri currently already runs under Rust, and MTE has, at least, been generally supported by LLVM for some time now; in the case of TBI, the only issues I'm aware of are that some operations (such as memcpy) are unsafe between aliasing pointers as the aliasing analysis fails, but otherwise it works.

A lot of code has UB but still "works", until it doesn't.

But, in short, I would be happy to replace the current approach which uses a single, mutable, alias with actually changing the address that the allocation is considered to live at, as you describe - but advice on where such a change needs to take place would be appreciated, as it's not clear to me.

It's not clear to me either as I know absolutely nothing about how pointer tags are set up in the (non-Miri) code.^^

@dheaton-arm
Copy link
Author

dheaton-arm commented Aug 6, 2024

So, in general, anything that looks like the following is UB:

// let pointer be some pointer to an allocation
let addr = pointer.addr() & 0x00ffffffffffffff;
let p = addr | ((0x70 as usize) << 56);
let pointer = pointer.with_addr(p);
// dereferencing `pointer` will now always trip Miri.

Of course, generally, we're happy to accept that this should be considered UB, but we were therefore hoping that by providing a Rust-controlled mechanism to do this instead (rust-lang/stdarch#1622), the UB could be mitigated as Rust/Miri can track the address change. That's the intent of this work, as Miri raises an error when dereferencing a pointer from with_top_byte from that PR, for example.

Put another way, the with_top_byte helper in that PR should be considered itself to be the 'kind of realloc' that this PR introduces, and indeed the plan would be to have that method call this PR's hook. I did investigate to see whether I needed to adjust rustc for that PR, but couldn't find any instance of rustc tracking such allocations at this time, and hence the focus on making it work within Miri. If rustc started tracking such allocations later down the line, then that would be the function that would need to be responsible for updating that, too. Do you disagree with that approach?

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

So, in general, anything that looks like the following is UB:

Can you provide a small program with a main function showing how this is used? I am still very confused about the entire context here. You are setting the topmost byte of a pointer to 0x70 to achieve... what? Why 0x70? I expected some sort of target-specific magic somewhere to ensure the pointer tag is actually checked by hardware, or so. And the tag has to be randomly generated or else an attacker can just hard-code it and then I don't see how this helps with exploit mitigation.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

I did investigate to see whether I needed to adjust rustc for that PR, but couldn't find any instance of rustc tracking such allocations at this time, and hence the focus on making it work within Miri. If rustc started tracking such allocations later down the line, then that would be the function that would need to be responsible for updating that, too. Do you disagree with that approach?

LLVM tracks allocations, so this definitely needs changes in rustc and maybe LLVM to make them aware of the realloc. (The same problem exists when doing this in C, btw. If you are just porting C code here it seems likely that the original code was written without much regard for whether that is UB or not.)

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

To give one example, if you do something like this

fn tag_ptr(ptr: *const i32, tag: u8) -> *const i32 {
  ptr.map_addr(|a| {
    (a & 0x00ffffffffffffff) | ((tag as usize) << 56)
  })
}

let my_local_var = 42;
tag_ptr(&my_local_var, 0x70).read();
tag_ptr(&my_local_var, 0x80).read();

then LLVM might be able to see that this pointer points to 4 bytes of memory and it has been offset in two different ways that are way more than 4 bytes apart, so one of these reads must be out-of-bounds, and therefore this program has UB.

I don't know if LLVM has any optimizations today that may actually detect this (@nikic and @comex might know), but it would totally be in its right to start doing analyses like that any time, so we can't just say "yeah let's accept such code and hope nothing goes wrong".

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2024

Here's an example where LLVM seems more likely to notice and cause trouble:

fn tag_ptr(ptr: *mut i32, tag: u8) -> *mut i32 {
  ptr.map_addr(|a| {
    (a & 0x00ffffffffffffff) | ((tag as usize) << 56)
  })
}

let mut my_local_var = 0;
tag_ptr(&mut my_local_var, 0x70).write(1);
tag_ptr(&mut my_local_var, 0x80).write(2);
assert_eq!(
  tag_ptr(&mut my_local_var, 0x70).read(),
  2
);

From LLVMs perspective it should not be too hard to notice that the two writes cannot alias, and therefore the final read must return what was written by the first write.

@RalfJung RalfJung marked this pull request as draft August 6, 2024 14:27
@RalfJung RalfJung added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Aug 28, 2024
@bors
Copy link
Contributor

bors commented Sep 1, 2024

☔ The latest upstream changes (presumably #3854) made this pull request unmergeable. Please resolve the merge conflicts.

Where a method modifies the address that should be considered canonical - such as via TBI or when an MTE tag has been set - Miri will need to be notified.

This adds a hook to inform Miri of the new address.
@mrkajetanp
Copy link

Hi, in line with the changes to the TBIBox helper and discussions on Zulip here's a smaller version of the address hook, mainly intended for use with the aforementioned TBIBox.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2024

Thanks! I'll wait for the discussion about the library surface and "real rustc" implementation for this to settle before taking a closer look at this PR. (And sadly I don't have the capacity to be involved in the design there, sorry.)

TbiBox should probably be using some new Miri primitives to properly model this all.

@mrkajetanp
Copy link

Which way around should this be done then? Since the TBIBox will require this hook before it gets merged to avoid breaking miri I was thinking we should focus on merging this first. Should we then finalise stdarch#1622 first, then finalise & merge this and then add the hooks to TBIBox and merge that?

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2024

Which existing code breaks if that stdarch PR lands? It's new code, it's okay if Miri does not support it from the start.

First a design needs to be proposed for how TbiBox's API looks and how it is implemented. Once t-libs-api and t-libs are happy with that, we can look at how it should be supported in Miri. (t-opsem should likely also be involved there, due to the nature of this type.)

@mrkajetanp
Copy link

That approach makes sense to me as well, fair enough, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: Waiting for the PR author to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants