You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So here's the story so far: I'm working on an embedded rust project, and when I tried the HAL's interrupt handler example by coping it into my crate, I found that pressing the button did not service the interrupt as expected but instead triggered a panic with the message attempt to subtract with overflow[0]. The line of code indicated in the panic looked like this:
prios[prio asusize] |= 1 << i;
where prios had been defined earlier as let mut prios = [0u128; 15];.
Despite happening every time I tried it from my project, the upstream project that I copied it from worked fine. Some diffing & debugging ensued, and we noticed that our build profile had opt-level="z" set, which was the only condition under which the panic occurred. With this observation, I was able to reproduce the same panic using a much simpler program[1] (that I am unfortunately only able to run on the physical esp32 board):
let i = core::hint::black_box(0);print_u128(1u128 << i);
Which, after spending some quality time with a copy of the risc-v spec, I was able to trace back to this sequence of instructions:
42000f72: sw zero,-40(s0) # let i = core::hint::black_box(0);
...
42000f7a: lw s2,-40(s0) # s2 := i
...
42000fa2: addi a2,s2,-64 # a2 := (i - 64)
42000fa6: li a0,1 # a0 := 1
42000fa8: mv a1,s1 # a1 := 0 (because s1 is 0, LLVM decided against `mv a1, zero` or `li a1,0`)
42000faa: auipc ra,0x6 # ^ now (a1, a0) represents 1u64
# this function's signature is:
# pub extern "C" fn __ashldi3(a: u64, b: u32) -> u64 {
# so `a` is passed in `(a1, a0)` and `b` is passed in `a2`
42000fae: jalr -1278(ra) # 42006aac <__ashldi3> # __ashldi3(1u64, (i - 64))
42000fb2: ...
This seems like the 1u128 << i operation has been broken down into a series of 64-bit shifts that start by attempting to compute 1u64 << (i - 64), which would seem to be a problem for any i < 64 (like 0 above, or 3 in the original interrupt example)[2].
Truth be told, I'm not really sure where the appropriate place to report this would be: this repo seems to define the __ashldi3 function, but why was code emitted using that instead of a single __ashlti3 that operates "natively" over 128-bit numbers? Where does the algorithm that "knows" how to compose higher-order shifts from lower-order primitives live, and what part of that caused the emitted code sequence to incorrectly handle most of its operands?
I was hoping that I could construct a test case reproducing the problem without requiring special hardware using some combination of riscv32-flavored qemu, a custom test harness, and/or semihosting: I'm still intending to pursue that, for what it's worth, but any guidance on whether or how would definitely be appreciated.
[0]: Original issue filed against the HAL: esp-rs/esp-hal#196
[1]: The full program listing, as well as panic output and more disassembly are available here: https://gist.github.com/sethp/9cf309cfaf0a01b5e8df0245df268789
[2]: Confusingly, manual probing of the space found that, roughly, 1u128 << i works for 32 <= i < 63 and i = 96, and panics for all the other values of i that I tried. Full notes part of that same gist here.
The text was updated successfully, but these errors were encountered:
Oh, and also: why only opt-level="z"? I suppose since 22-insn ashl is getting called twice for a 128-bit shift it might make sense from a code size perspective to leave that outboard rather than inlining (since a call site is only ~4 insns, and 44 > 22 + 4 + 4). Maybe even at the cost of another outboard lshr that only gets used once? And opt-level="s" happens to tip the other way?
That said, whether the functions are called or inlined, I wouldn't expect that choice to affect the correctness of the program in an architecture-specific way: I ran tests from this repo for both the x86_64 and aarch64 architectures, and in both cases 1u128 << core::hint::black_box(0) seemed to be computed correctly.
This is clearly a bug in LLVM in how it expands i128 shifts. The __ashldi3 function (and other di 64-bit shifts) have a precondition which requires the shift amount to be >=0 and <64.
I'm going to close this issue since it is not a bug in compiler-builtins. You can follow the issue in LLVM, and once it is fixed you can request to have the fix backported to rustc's LLVM version.
So here's the story so far: I'm working on an embedded rust project, and when I tried the HAL's interrupt handler example by coping it into my crate, I found that pressing the button did not service the interrupt as expected but instead triggered a panic with the message
attempt to subtract with overflow
[0]. The line of code indicated in the panic looked like this:where
prios
had been defined earlier aslet mut prios = [0u128; 15];
.Despite happening every time I tried it from my project, the upstream project that I copied it from worked fine. Some diffing & debugging ensued, and we noticed that our build profile had
opt-level="z"
set, which was the only condition under which the panic occurred. With this observation, I was able to reproduce the same panic using a much simpler program[1] (that I am unfortunately only able to run on the physical esp32 board):Which, after spending some quality time with a copy of the risc-v spec, I was able to trace back to this sequence of instructions:
This seems like the
1u128 << i
operation has been broken down into a series of 64-bit shifts that start by attempting to compute1u64 << (i - 64)
, which would seem to be a problem for anyi < 64
(like 0 above, or 3 in the original interrupt example)[2].Truth be told, I'm not really sure where the appropriate place to report this would be: this repo seems to define the
__ashldi3
function, but why was code emitted using that instead of a single__ashlti3
that operates "natively" over 128-bit numbers? Where does the algorithm that "knows" how to compose higher-order shifts from lower-order primitives live, and what part of that caused the emitted code sequence to incorrectly handle most of its operands?I was hoping that I could construct a test case reproducing the problem without requiring special hardware using some combination of riscv32-flavored qemu, a custom test harness, and/or semihosting: I'm still intending to pursue that, for what it's worth, but any guidance on whether or how would definitely be appreciated.
[0]: Original issue filed against the HAL: esp-rs/esp-hal#196
[1]: The full program listing, as well as panic output and more disassembly are available here: https://gist.github.com/sethp/9cf309cfaf0a01b5e8df0245df268789
[2]: Confusingly, manual probing of the space found that, roughly,
1u128 << i
works for32 <= i < 63
andi = 96
, and panics for all the other values ofi
that I tried. Full notes part of that same gist here.The text was updated successfully, but these errors were encountered: