Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V/ESP32-C3] IllegalInstruction Exception on yield_now().await but only in --release #995

Closed
TheButlah opened this issue Oct 3, 2022 · 6 comments

Comments

@TheButlah
Copy link

TheButlah commented Oct 3, 2022

Minimally reproducible example: link
Embassy Versions tested:

  • cb9f0ef5b800ce4a22cde1805e0eb88425f1e07b (what I had when I initially hit this)
  • f075e624440af121da7a27a145e2acee0730c542 (tried again with latest main)

This code operates as expected when doing a regular debug build. However, in --release mode, it triggers an IllegalInstruction exception. According to my (preliminary and uninformed) research, this may be an indication of a soundness issue in embassy where we are executing an invalid opcode.

If I set a breakpoint on a yield_now().await in probe-rs, clicking play appears to resume the code at the same spot, without hitting any of the debug statements after the yield.

If I remove all the breakpoints and hit play, I immediately hit my ExceptionHandler with an IllegalInstruction exception.

sometimes the loops manage to execute a couple times, before the exception occurs. I've seen it happen as fast as on the very first yield, and as slow as 18 iterations.

Here is the log:

DEBUG Booted
└─ src\main.rs:41
DEBUG Started sensor_task
└─ src\main.rs:64
DEBUG In data(), i was 0
└─ src\main.rs:67
DEBUG Started network_task
└─ src\main.rs:52
TRACE In main(), i was 0
└─ src\main.rs:55
DEBUG After network yield
└─ src\main.rs:58
TRACE In main(), i was 1
└─ src\main.rs:55
DEBUG after sensor yield
└─ src\main.rs:70
DEBUG In data(), i was 1
└─ src\main.rs:67
DEBUG mcause bits: 0b10, mcause cause: "Exception(IllegalInstruction)"
└─ src\main.rs:77
DEBUG trap_frame: "TrapFrame { ra: 1107302208, t0: 1, t1: 1107302092, t2: 1070071856, t3: 0, t4: 0, t5: 0, t6: 0, a0: 1107302180, a1: 1095026825, a2: 1, a3: 1, a4: 0, a5: 0, a6: 1, a7: 1070071808 }"
└─ src\main.rs:82
@TheButlah
Copy link
Author

TheButlah commented Oct 3, 2022

I found that the mepc register is 0x4144C888. This is probably why the IllegalInstruction happens, all instructions are after 0x42000000 in the esp32c3, so that address does not exist in the binary.

The question is, why is it using that instruction address? I genuinely have no idea. Its either a bug in rustc itself or a bug in embassy, and I'm at a loss for how to figure it out.

If there is anything I can do to help give more info that would make it possible to solve this, let me know. This is super out of my comfort zone.

@TheButlah
Copy link
Author

TheButlah commented Oct 4, 2022

I have been able to narrow down this error. If I remove all atomics from my code (and my dependencies), and change my compilation target to riscv32imc-unknown-none-elf (note the lack of "a"), then the code no longer traps in --release or debug builds.

I think this is because riscv-atomic-emulation-trap, seems to not be handling the atomic - that is the most likely culprit.

I'm not sure if its the fault of embassy or the atomic trap.

@TheButlah
Copy link
Author

TheButlah commented Oct 4, 2022

I was unable to reproduce this bug with a minimal program of me using just critical sections, mutexes, and atomics, so I still believe it has something to do with embassy. I have made the example even more minimal by eliminating defmt, esp-alloc, and the panic handler.

Simplified code is here

I am getting this error log (note that I've added mcause return address ra):

ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0x16 (USB_JTAG_CHIP_RESET),boot:0xc (SPI_FAST_FLASH_BOOT)
Saved PC:0x40047dd8
Booted
Started sensor_task
In data(), i was 0
Started network_task
In main(), i was 0
After network yield
In main(), i was 1
after sensor yield
In data(), i was 1
mcause bits: 0b10, mcause cause: Exception(IllegalInstruction)
return address register (ra): 0x42002722
trap_frame: TrapFrame { ra: 1107306274, t0: 1, t1: 1107302062, t2: 1077413584, t3: 0, t4: 0, t5: 0, t6: 0, a0: 1107306246, a1: 1095026825, a2: 1, a3: 1, a4: 1006654854, a5: 0, a6: 1, a7: 1070071808 }

Note that an opt-level of 1 is enough to trigger the error, but the default debug opt level of 0 is not.

@TheButlah
Copy link
Author

TheButlah commented Oct 4, 2022

Some progress is being made finally. I compiled with opt-level = 1 and found that ra (return address) was 0x420030b6. I'm not sure if that means that ra is the line that triggered it, or if its before or after that line by 1 instruction (I'm unfamiliar with riscv)
Spacing added for emphasis:

420030b0: f5 d9         beqz    a1, 0x420030a4 <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x12>
420030b2: 4c 45         lw      a1, 12(a0)
420030b4: 82 95         jalr    a1

420030b6: fd b7         j       0x420030a4 <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x12>

420030b8: b2 40         lw      ra, 12(sp)
420030ba: 22 44         lw      s0, 8(sp)
420030bc: 92 44         lw      s1, 4(sp)

@TheButlah
Copy link
Author

TheButlah commented Oct 4, 2022

Here is the latest information I collected, compiled with cargo build (uses opt-level = 1, see Cargo.toml)

Observed behavior: See the log - in summary, code crashes due to an IllegalInstruction exception. Please note that this code works if compiling with opt-level = 0 or if compiling for the -imc target instead of the -imac target.
Expected behavior: The code should constantly be looping and interleaving between the two async tasks, printing each iteration of their loops.

link to code
objdump.txt
firmware.zip

Log:

ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0x16 (USB_JTAG_CHIP_RESET),boot:0xc (SPI_FAST_FLASH_BOOT)
Saved PC:0x40047dd8
Booted
Started sensor_task
In data(), i was 0
Started network_task
In main(), i was 0
After network yield
In main(), i was 1
after sensor yield
In data(), i was 1
mcause bits: 0b10, mcause cause: Exception(IllegalInstruction)
mepc: 0x4144c888
return address (ra): 0x4200310a
trap_frame: TrapFrame { ra: 1107308810, t0: 1, t1: 1107304598, t2: 0, t3: 0, t4: 0, t5: 0, t6: 39, a0: 1107308782, a1: 1095026825, a2: 1, a3: 1, a4: 1114112, a5: 1107300492, a6: 1, a7: 1070075904 }

Objdump (snippet is the region around the ra, formatting is as originally presented by objdump):

420030e6 <embassy_executor::raw::Executor::poll::he0ac36d825da07d4>:
420030e6: 41 11        	addi	sp, sp, -16
420030e8: 06 c6        	sw	ra, 12(sp)
420030ea: 22 c4        	sw	s0, 8(sp)
420030ec: 26 c2        	sw	s1, 4(sp)
420030ee: 2f 25 05 0e  	amoswap.w.aqrl	a0, zero, (a0)
420030f2: 09 cd        	beqz	a0, 0x4200310c <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x26>
420030f4: 75 54        	li	s0, -3
420030f6: 19 a0        	j	0x420030fc <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x16>
420030f8: 26 85        	mv	a0, s1
420030fa: 89 c8        	beqz	s1, 0x4200310c <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x26>
420030fc: 44 41        	lw	s1, 4(a0)
420030fe: af 25 85 66  	amoand.w.aqrl	a1, s0, (a0)
42003102: 85 89        	andi	a1, a1, 1
42003104: f5 d9        	beqz	a1, 0x420030f8 <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x12>
42003106: 4c 45        	lw	a1, 12(a0)
42003108: 82 95        	jalr	a1
4200310a: fd b7        	j	0x420030f8 <embassy_executor::raw::Executor::poll::he0ac36d825da07d4+0x12>
4200310c: b2 40        	lw	ra, 12(sp)
4200310e: 22 44        	lw	s0, 8(sp)
42003110: 92 44        	lw	s1, 4(sp)
42003112: 41 01        	addi	sp, sp, 16
42003114: 82 80        	ret

42003116 <embassy_executor::raw::Executor::spawner::h26681bec429e7c8d>:
42003116: 82 80        	ret

42003118 <embassy_executor::raw::util::UninitCell<T>::as_mut_ptr::h86f89d5175b3858d>:
42003118: 82 80        	ret

As you can see, atomic emulation is a transitive dependency so it should be working

PS D:\Ryan\Programming\slimevr\rust\firmware> cargo tree -i riscv-atomic-emulation-trap
riscv-atomic-emulation-trap v0.1.1
└── esp-hal-common v0.2.0
    └── esp32c3-hal v0.2.0
        └── firmware v0.0.0 (D:\Ryan\Programming\slimevr\rust\firmware)

@Dirbaio
Copy link
Member

Dirbaio commented Oct 4, 2022

closing, bug is in esp-rs/riscv-atomic-emulation-trap#3

@Dirbaio Dirbaio closed this as completed Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants