Skip to content

invalid shift operation on riscv32 with 128-bit wide integer and opt-level="z" #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sethp opened this issue Sep 25, 2022 · 3 comments
Closed

Comments

@sethp
Copy link

sethp commented Sep 25, 2022

So here's the story so far: I'm working on an embedded rust project, and when I tried the HAL's interrupt handler example by coping it into my crate, I found that pressing the button did not service the interrupt as expected but instead triggered a panic with the message attempt to subtract with overflow[0]. The line of code indicated in the panic looked like this:

prios[prio as usize] |= 1 << i; 

where prios had been defined earlier as let mut prios = [0u128; 15];.

Despite happening every time I tried it from my project, the upstream project that I copied it from worked fine. Some diffing & debugging ensued, and we noticed that our build profile had opt-level="z" set, which was the only condition under which the panic occurred. With this observation, I was able to reproduce the same panic using a much simpler program[1] (that I am unfortunately only able to run on the physical esp32 board):

    let i = core::hint::black_box(0);
    print_u128(1u128 << i);

Which, after spending some quality time with a copy of the risc-v spec, I was able to trace back to this sequence of instructions:

42000f72:  sw    zero,-40(s0) # let i = core::hint::black_box(0); 
 ...
42000f7a:  lw    s2,-40(s0)   # s2 := i
 ...
42000fa2:  addi   a2,s2,-64   # a2 := (i - 64)
42000fa6:  li     a0,1        # a0 := 1
42000fa8:  mv     a1,s1       # a1 := 0 (because s1 is 0, LLVM decided against `mv a1, zero` or `li a1,0`)
42000faa:  auipc  ra,0x6      # ^ now (a1, a0) represents 1u64
# this function's signature is:
#    pub extern "C" fn __ashldi3(a: u64, b: u32) -> u64 {
# so `a` is passed in `(a1, a0)` and `b` is passed in `a2` 
42000fae:  jalr   -1278(ra) # 42006aac <__ashldi3> # __ashldi3(1u64, (i - 64))
42000fb2: ...

This seems like the 1u128 << i operation has been broken down into a series of 64-bit shifts that start by attempting to compute 1u64 << (i - 64), which would seem to be a problem for any i < 64 (like 0 above, or 3 in the original interrupt example)[2].

Truth be told, I'm not really sure where the appropriate place to report this would be: this repo seems to define the __ashldi3 function, but why was code emitted using that instead of a single __ashlti3 that operates "natively" over 128-bit numbers? Where does the algorithm that "knows" how to compose higher-order shifts from lower-order primitives live, and what part of that caused the emitted code sequence to incorrectly handle most of its operands?

I was hoping that I could construct a test case reproducing the problem without requiring special hardware using some combination of riscv32-flavored qemu, a custom test harness, and/or semihosting: I'm still intending to pursue that, for what it's worth, but any guidance on whether or how would definitely be appreciated.

[0]: Original issue filed against the HAL: esp-rs/esp-hal#196
[1]: The full program listing, as well as panic output and more disassembly are available here: https://gist.github.com/sethp/9cf309cfaf0a01b5e8df0245df268789
[2]: Confusingly, manual probing of the space found that, roughly, 1u128 << i works for 32 <= i < 63 and i = 96, and panics for all the other values of i that I tried. Full notes part of that same gist here.

@sethp
Copy link
Author

sethp commented Sep 25, 2022

Oh, and also: why only opt-level="z"? I suppose since 22-insn ashl is getting called twice for a 128-bit shift it might make sense from a code size perspective to leave that outboard rather than inlining (since a call site is only ~4 insns, and 44 > 22 + 4 + 4). Maybe even at the cost of another outboard lshr that only gets used once? And opt-level="s" happens to tip the other way?

That said, whether the functions are called or inlined, I wouldn't expect that choice to affect the correctness of the program in an architecture-specific way: I ran tests from this repo for both the x86_64 and aarch64 architectures, and in both cases 1u128 << core::hint::black_box(0) seemed to be computed correctly.

@Amanieu
Copy link
Member

Amanieu commented Sep 26, 2022

I've reproduced the issue here: https://rust.godbolt.org/z/P4v4cvMoo

This is clearly a bug in LLVM in how it expands i128 shifts. The __ashldi3 function (and other di 64-bit shifts) have a precondition which requires the shift amount to be >=0 and <64.

@Amanieu
Copy link
Member

Amanieu commented Sep 26, 2022

Upstream issue: llvm/llvm-project#57988

I'm going to close this issue since it is not a bug in compiler-builtins. You can follow the issue in LLVM, and once it is fixed you can request to have the fix backported to rustc's LLVM version.

@Amanieu Amanieu closed this as completed Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants