Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hardfault in MPU configuration #1193

Merged
merged 1 commit into from
Mar 8, 2023
Merged

Conversation

mkeeter
Copy link
Collaborator

@mkeeter mkeeter commented Mar 8, 2023

While adding caboose support to control-plane-agent, I noticed a dead-reproducible issue which put the system into a weird state:

humility: attached via ST-Link V3
system time = 5156
ID TASK                       GEN PRI STATE
 0 jefe                         0   0 recv, notif: fault timer(T+44)
 1 sys                          0   1 recv
 2 i2c_driver                   0   2 recv
 3 user_leds                    0   5 recv
 4 gimlet_seq                   0   2 recv
 5 pong                         0   8 recv, notif: timer(T+344)
 6 uartecho                     0   3 notif: usart-irq
 7 host_sp_comms                0   8 recv, notif: jefe-state-change(irq82) usart-irq multitimer control-plane-agent
 8 hiffy                        0   7 notif: bit31(T+116)
 9 hf                           0   6 recv
10 hash_driver                  0   2 recv
11 net                          0   3 recv, notif: eth-irq(irq117) wake-timer(T+4874)
12 udprpc                       0   6 notif: socket
13 udpecho                      0   4 notif: socket
14 udpbroadcast                 0   6 notif: bit31(T+385)
15 control_plane_agent          0   7 ready
16 sensor                       0   5 recv, notif: timer(T+844)
17 sprot                        0   5 recv
18 validate                     0   3 recv
19 idle                         0   9 ready
20 rng_driver                   0   6 recv
21 update_server                0   3 recv

Note that no task is marked as running, and system time is not advancing.

Here are the steps to reproduce:

  • Flash 7c1c470aa4c5a229bd8 (latest commit on caboose-mgs-api branch) onto a Gimletlet with a NIC
  • Plug into the same network as your home machine
  • Run faux-mgs -ltrace --interface en0 state (substitute en0 for your favorite interface)
  • RIP Gimletlet

Luckily, we got a good backtrace using humility gdb --run-openocd:

cortex_m_rt::HardFault_ (ef=0x24000258) at src/lib.rs:560
560         loop {
Breakpoint 1 at 0x8004178: file src/lib.rs, line 560.
Note: automatically using hardware breakpoints for read-only addresses.
semihosting is enabled
(gdb) bt
#0  cortex_m_rt::HardFault_ (ef=0x24000258) at src/lib.rs:560
#1  <signal handler called>
#2  kern::arch::arm_m::BusFault () at sys/kern/src/arch/arm_m.rs:1241
#3  <signal handler called>
#4  core::ptr::write_volatile<u32> (src=302710801, dst=<optimized out>)
    at /rustc/95a3a7277b44bbd2dd3485703d9a05f64652b60e/library/core/src/ptr/mod.rs:1577
#5  vcell::VolatileCell<u32>::set<u32> (value=302710801, self=<optimized out>) at /crates.io/vcell-0.1.3/src/lib.rs:41
#6  volatile_register::RW<u32>::write<u32> (value=302710801, self=<optimized out>)
    at /crates.io/volatile-register-0.2.1/src/lib.rs:83
#7  kern::arch::arm_m::apply_memory_protection (task=<optimized out>) at sys/kern/src/arch/arm_m.rs:417
#8  0x08002d80 in kern::syscalls::switch_to (task=0x24000e68 <kern::startup::HUBRIS_TASK_TABLE_SPACE+2520>)
    at sys/kern/src/syscalls.rs:610
#9  0x08002d80 in kern::syscalls::syscall_entry::{closure#0} (tasks=...)
#10 kern::startup::with_task_table<(), kern::syscalls::syscall_entry::{closure_env#0}> (body=...)
    at sys/kern/src/startup.rs:116
#11 kern::syscalls::syscall_entry (nr=<optimized out>, task=<optimized out>) at sys/kern/src/syscalls.rs:73
#12 0x0800391a in kern::arch::arm_m::SVCall () at sys/kern/src/arch/arm_m.rs:811
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Here's the relevant disassembly:

   0x08001eda <+26>:    cmp     r1, #8
   0x08001edc <+28>:    beq.n   0x8001f44 <kern::arch::arm_m::apply_memory_protection+132>
   0x08001ede <+30>:    ldr.w   r3, [r8, r1, lsl #2]
   0x08001ee2 <+34>:    ldr     r6, [r3, #4]
   0x08001ee4 <+36>:    cmp     r6, #2
   0x08001ee6 <+38>:    bcc.n   0x8001f4a <kern::arch::arm_m::apply_memory_protection+138>
   0x08001ee8 <+40>:    ldr     r4, [r3, #0]
   0x08001eea <+42>:    ldr     r3, [r3, #8]
   0x08001eec <+44>:    orrs    r4, r1
   0x08001eee <+46>:    orr.w   r4, r4, #16
   0x08001ef2 <+50>:    str     r4, [r0, #0]
   0x08001ef4 <+52>:    lsls    r4, r3, #31
   0x08001ef6 <+54>:    and.w   r2, r3, #8
   0x08001efa <+58>:    mov.w   r4, #33554432   ; 0x2000000
   0x08001efe <+62>:    it      eq
   0x08001f00 <+64>:    moveq.w r4, #16777216   ; 0x1000000
   0x08001f04 <+68>:    lsls    r5, r3, #30
   0x08001f06 <+70>:    and.w   r5, r12, r3, lsl #26
   0x08001f0a <+74>:    it      mi
   0x08001f0c <+76>:    movmi.w r4, #50331648   ; 0x3000000
   0x08001f10 <+80>:    lsls    r3, r3, #27
   0x08001f12 <+82>:    orr.w   r5, r5, r2, lsl #16
   0x08001f16 <+86>:    mov.w   r3, #262144     ; 0x40000
   0x08001f1a <+90>:    add     r4, r5
   0x08001f1c <+92>:    it      pl
   0x08001f1e <+94>:    movpl.w r3, #196608     ; 0x30000
   0x08001f22 <+98>:    cmp     r2, #0
   0x08001f24 <+100>:   it      ne
   0x08001f26 <+102>:   movne.w r3, #65536      ; 0x10000
   0x08001f2a <+106>:   adds    r2, r4, r3
   0x08001f2c <+108>:   mov     r3, r9
   0x08001f2e <+110>:   eor.w   r2, r2, r9
   0x08001f32 <+114>:   clz     r3, r6
   0x08001f36 <+118>:   adds    r1, #1
   0x08001f38 <+120>:   sub.w   r3, lr, r3, lsl #1
   0x08001f3c <+124>:   orrs    r2, r3
   0x08001f3e <+126>:   adds    r2, #1
=> 0x08001f40 <+128>:   str     r2, [r0, #4]
   0x08001f42 <+130>:   b.n     0x8001eda <kern::arch::arm_m::apply_memory_protection+26>

and register:

(gdb) info reg
r0             0xe000ed9c          -536810084
r1             0x4                 4
r2             0x120b0011          302710801
r3             0x10                16
r4             0x2000000           33554432
r5             0x0                 0
r6             0x200               512
r7             0x240002f8          603980536
r8             0x80050f4           134238452
r9             0x10080000          268959744
r10            0x15                21
r11            0x24001258          603984472
r12            0x10000000          268435456
sp             0x240002e0          0x240002e0
lr             0x3c                60
pc             0x8001f40           0x8001f40 <kern::arch::arm_m::apply_memory_protection+128>
xpsr           0x100000b           16777227
fpscr          0x16                22
msp            0x24000258          0x24000258
psp            0x24033330          0x24033330
primask        0x0                 0
basepri        0x0                 0
faultmask      0x0                 0
control        0x1                 1

This right about here in MPU configuration, about to write region 3's address to MPU_RASR

Here's the relevant region:

(gdb) p/x *task.descriptor.regions[3]
$9 = kern::descs::RegionDesc {base: 0x8085a00, size: 0x200, attributes: kern::descs::RegionAttributes {bits: 0x1}}

Sure enough, this is the caboose region that we just added to the MPU tables!

It's correctly aligned for a 512-byte region, so what gives?

Well, turns out that we can't hot-configure the MPU; we're writing a 512-byte-aligned address to RASR, but RBAR may be configured to require a more stringent alignment (because it could be set to a larger size from a previous task). This means that the MPU is briefly in an invalid state, which is Bad News.

The fix is to disable the MPU region when configuring it, then re-enable it. This means we have to go through the RNR register instead of being able to simultaneously select the region and address; alas.

@mkeeter mkeeter force-pushed the fix-mpu-configuration-fault branch from e65ea84 to b745eef Compare March 8, 2023 22:51
@mkeeter mkeeter enabled auto-merge (squash) March 8, 2023 22:53
@mkeeter mkeeter merged commit bf64be1 into master Mar 8, 2023
@mkeeter mkeeter deleted the fix-mpu-configuration-fault branch March 9, 2023 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants