Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHERI-valid pointer stuffing produces worse codegen than an implementation which does huge wrapping offsets #96152

Closed
saethlin opened this issue Apr 17, 2022 · 18 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-strict-provenance Area: Strict provenance for raw pointers I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@saethlin
Copy link
Member

saethlin commented Apr 17, 2022

I came across this here: tokio-rs/bytes#542 so I'm raising it more broadly because this seems like a portability hazard. godbolt demo

In a world where we can forget about provenance, one could set the lowest bit in a pointer with this:

pub fn old_style(a: *mut u8) -> *mut u8 {
    (a as usize | 1) as *mut u8
}

But of course we want to have a provenance model, including because we want to support architectures where pointer provenance is checked at runtime. So one might want to implement this function like so to be compatible with CHERI:

pub fn cheri_compat(a: *mut u8) -> *mut u8 {
    let old = a as usize;
    let new = old | 1;
    let diff = new.wrapping_sub(old);
    a.wrapping_add(diff)
}

But that version is slower. Instead of just mov + or, it gets compiled to mov + not + and + and. Which is very silly. We can get the original codegen back by writing this in a style which is almost certainly invalid on CHERI:

pub fn fast(a: *mut u8) -> *mut u8 {
    let old = a as usize;
    let new = old | 1;
    a.wrapping_sub(old).wrapping_add(new)
}

It doesn't make sense to me that users should have to choose between compatibility with CHERI and avoiding ptr-int-ptr casts while keeping a careful eye out for codegen regressions.

@saethlin saethlin changed the title CHERI-valid pointer stuffing is slower than an implementation which does huge wrapping offsets CHERI-valid pointer stuffing produces worse codegen than an implementation which does huge wrapping offsets Apr 17, 2022
@Urgau
Copy link
Member

Urgau commented Apr 17, 2022

I just checked with the more idiomatic way map_addr and it produce the same codegen as the cheri_compat function. godbolt.

cc @Gankra (because you're the author of the pointer provenance related stuff in std)

@Gankra
Copy link
Contributor

Gankra commented Apr 17, 2022

Yes with_addr should be a compiler intrinsic so that it can just Do The Right Thing and get properly optimized. I note as much in core::ptr, but, I am not a Compiler Person.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 18, 2022
@thomcc thomcc added the A-strict-provenance Area: Strict provenance for raw pointers label Apr 29, 2022
@nikic
Copy link
Contributor

nikic commented May 2, 2022

We're missing this fold: https://alive2.llvm.org/ce/z/eKRvKm This is missed both on the InstCombine and the DAGCombine level.

@nikic
Copy link
Contributor

nikic commented May 2, 2022

https://reviews.llvm.org/D124763 for the InstCombine fold (which does not actually help here, this needs to happen on DAGCombine level).

@nikic
Copy link
Contributor

nikic commented May 2, 2022

https://reviews.llvm.org/D124772 is the DAGCombine fold -- which also doesn't actually help here, because demanded bits simplification interfered and reduces the width of the xor.

@nikic
Copy link
Contributor

nikic commented May 4, 2022

https://reviews.llvm.org/D124710 finishes the InstCombine side of this, and https://reviews.llvm.org/D124856 is another step on the DAGCombine side -- this one affects the original case, but doesn't give the result we want yet.

@nikic
Copy link
Contributor

nikic commented May 4, 2022

https://reviews.llvm.org/D124930 would actually fix the original case.

@nikic nikic self-assigned this May 4, 2022
@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2022

@nikic that's amazing. :-)
Looks like your patches have all landed by now?

@nikic
Copy link
Contributor

nikic commented Jul 1, 2022

Yes, everything has landed here, so this should be fixed with the next LLVM upgrade.

@RalfJung
Copy link
Member

Which is the version of LLVM that will contain these patches? Looks like that would be LLVM 15?

@RalfJung
Copy link
Member

Should we consider this fixed by #99464 ?

@saethlin can you check codegen with the latest nightly?

@saethlin
Copy link
Member Author

The codegen at the godbolt link in the top-level comment is unchanged, it's still this:

example::old_style:
        mov     rax, rdi
        or      rax, 1
        ret

example::cheri_compat:
        mov     eax, edi
        not     eax
        and     eax, 1
        add     rax, rdi
        ret

example::fast:
        mov     rax, rdi
        or      rax, 1
        ret

@RalfJung
Copy link
Member

Godbolt doesn't seem to have the update yet

rustc 1.65.0-nightly (20ffea693 2022-08-11)

@saethlin
Copy link
Member Author

Ah! Looks like the playground is probably up to date? Its short links don't capture the rest of the state, so here's the disassembly it spits out for the example code:

playground::old_style: # @playground::old_style
# %bb.0:
	movq	%rdi, %rax
	orq	$1, %rax
	retq
                                        # -- End function

playground::cheri_compat: # @playground::cheri_compat
# %bb.0:
	movq	%rdi, %rax
	orq	$1, %rax
	retq
                                        # -- End function

playground::fast: # @playground::fast
# %bb.0:
	movq	%rdi, %rax
	orq	$1, %rax
	retq
                                        # -- End function

So I think we're good here! Thanks!

@RalfJung
Copy link
Member

Is it worth adding a codegen test for this?

@saethlin
Copy link
Member Author

saethlin commented Sep 2, 2022

Possibly. But we have a small problem maybe? All three of those compile to different IR:

define nonnull ptr @_ZN7example9old_style17h5807fc970566528bE(ptr %a) unnamed_addr #0 {
  %_3 = ptrtoint ptr %a to i64
  %_2 = or i64 %_3, 1
  %0 = inttoptr i64 %_2 to ptr
  ret ptr %0
}

define ptr @_ZN7example12cheri_compat17hf4c9a96c91413294E(ptr %a) unnamed_addr #0 {
  %old = ptrtoint ptr %a to i64
  %old.not = and i64 %old, 1
  %diff = xor i64 %old.not, 1
  %0 = getelementptr i8, ptr %a, i64 %diff
  ret ptr %0
}

define ptr @_ZN7example4fast17hfe789c747db55c0eE(ptr %a) unnamed_addr #0 {
  %old = ptrtoint ptr %a to i64
  %new = or i64 %old, 1
  %count = sub i64 0, %old
  %0 = getelementptr i8, ptr %a, i64 %count
  %1 = getelementptr i8, ptr %0, i64 %new
  ret ptr %1
}

So I could write a codegen test that checks that they compile to what they do right now, but I feel like it would be valid and good for LLVM to produce the same IR for all 3, and it would be a bummer if the codegen test broke when it does.

If there's a way around this, I'm happy to PR the codegen test.

@nikic
Copy link
Contributor

nikic commented Sep 2, 2022

The codegen test in this case would be an x86 assembly codegen test. The LLVM IR cannot be the same, because all of those have different semantics (in terms of provenance).

@RalfJung
Copy link
Member

RalfJung commented Sep 2, 2022

The last two should be identical in terms of provenance?

saethlin added a commit to saethlin/rust that referenced this issue Nov 6, 2022
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 7, 2022
…r=nikic

Add a codegen test for rust-lang#96152

This is a regression test for rust-lang#96152, it is intended to check that our codegen for a particular strict provenance pattern is always as good as the ptr2int2ptr/provenance-ignoring style.

r? `@nikic`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-strict-provenance Area: Strict provenance for raw pointers I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants