arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275

pderwin · 2020-01-28T16:00:46Z

Describe the bug

I am getting intermittent run-time failures such as:

 syscall z_hdlr_k_uptime_get failed check: Memory region 0x0002ff70 (size 8) write access denied
 ***** Hardware exception *****
 Current thread ID = 0x000221b0
 Faulting instruction address = 0x1a
 Fatal fault in thread 0x000221b0! Aborting.

In this example, k_uptime_get() was called, which is going to save its 64-bit return value into memory (due to the syscall macros). The syscall gear is checking to see if the memory
address is valid. The routines wind down into mpu_buffer_validate(), which will call is_user_accessible_region() and finally into get_region_ap(). The code
is failing because get_region_ap() has a critical code section which is not wrapped with an irq_lock()/irq_unlock(). In this failure, we write the MPU index register, then
take an external interrupt, go off to another thread for a while ( which is going to re-load the MPU entries), return to the current thread (again re-loading the MPU entries), and read the attribute register (for the wrong index) and fail.

In reading the current Cortex-M code, I don't see it protecting these critical sections either, so I'm left wondering if the 'M' class will save the index register during
interrupt processing. If so, I would like to fix the 'R' code in a similar fashion.

There are several of these utility subroutines with this same critical code sections.

To Reproduce

I saw this on a Cortex-R based system. Ioannis believes the problem exists in an 'M' based system as well.

This is an intermittent failure that will be hard to reproduce.

A unit test might be to have a thread which is doing nothing more than calling k_uptime_get(); This will be calling the Z_SYSCALL_MEMORY_WRITE macro to check the destination buffer address. Asynchronous to this, do something on a timer interrupt to send a message to a higher priority thread. In my exact case, I was sending a message to a user mode work Q. The higher priority thread could just read the message/work-item and wait again. All we need to do is force the context switch to occur. I think this test should fail fairly quickly.

Expected behavior
The memory access should have passed. The location I was writing to was valid.

Impact
memory access fault. System shuts down

Screenshots or console output
n/a

Environment (please complete the following information):

OS: Linux build environment
Toolchain GCC
Commi used: V 1.14 (however, we are in the process of upgrade to v2.1.0 now )

Additional context
discussed in devel mailing list #6695

The text was updated successfully, but these errors were encountered:

jhedberg · 2020-01-28T18:21:20Z

Is this reproducible with the latest master branch? Cortex R was not supported by Zephyr 1.14.

andrewboie · 2020-01-28T18:22:48Z

Raising priority as this appears to affect Cortex M as well.

pderwin · 2020-01-28T18:35:41Z

Our code base was just updated to V2.1 today, so it will be a day or two before I can move up to that level.

I will go ahead and create the unit test that I was speaking of, and assuming that fails, post that for testing on your M systems.

andrewboie · 2020-01-28T18:40:22Z

Our code base was just updated to V2.1 today, so it will be a day or two before I can move up to that level.

I will go ahead and create the unit test that I was speaking of, and assuming that fails, post that for testing on your M systems.

Just posted a patch #22281 for Cortex-M.

We need to lock interrupts around programming the MPU index into RNR and reading values out of RASR/RBAR/RLAR, in case we get scheduled or take an interrupt in between these two operations. Addresses zephyrproject-rtos#22275 for ARM Cortex-M. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>

pderwin · 2020-01-28T20:12:14Z

I was able to create a small contained testcase that shows a failure on R. I suspect this will fail on M as well. It is attached as phil.c.gz. Sample serial output below.

(15:03:40 169/ 0:00 0:00) uart:~$ idle 135 IDLE task pid: 0x00010fd4 priority: 15
(15:03:40 169/ 0:00 0:00) init_thread_user_mode 52 high priority: 1
(15:03:40 169/ 0:00 0:00) low priority thread in user mode, Calling k_uptime_get() forever my priority: 2
(15:03:40 169/ 0:00 0:00) LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLinit_thread_user_mode 59 begin real test
(15:03:45 169/ 0:05 0:05) LLLLLsyscall z_hdlr_k_uptime_get failed check: Memory region 0x000163b8 (size 8) write access denied
(15:03:46 169/ 0:06 0:01) ***** Hardware exception *****
(15:03:46 169/ 0:06 0:00) Current thread ID = 0x00010a4c
(15:03:46 169/ 0:06 0:00) Faulting instruction address = 0x0
(15:03:46 169/ 0:06 0:00) Fatal fault in thread 0x00010a4c! Aborting.
(15:03:46 169/ 0:06 0:00) H
(15:03:4`

phil.c.gz

`

andrewboie · 2020-01-28T20:37:31Z

@pderwin thanks! I'll take this and adapt it to our user mode testing suite.

I talked to @ioannisg earlier, he's going to send a PR which will save/restore RNR when we take interrupts, I've marked my PR DNM meanwhile.

andrewboie · 2020-01-28T22:00:34Z

@pderwin one thing I noticed in your code:

   k_thread_create(&low_priority_thread,
		   low_priority_thread_stack,
		   sizeof(low_priority_thread_stack),
		   low_priority_thread_entry,
		   NULL, NULL, NULL,
		   2,
		   ( K_INHERIT_PERMS | K_USER ),
		   K_NO_WAIT);

Use K_THREAD_STACK_SIZEOF() and not sizeof(). Depending on architecture, there can be extra data within the stack object, aside from the actual buffer.

andrewboie · 2020-01-28T22:25:20Z

confirmed this affects Cortex-M, I can reproduce with mps2_an385 under QEMU.

andrewboie · 2020-01-28T23:05:56Z

Also reproduces on mps2_an521

I implemented a torture test inspired by @pderwin see #22288. I found that ARC is broken as well, I opened #22290. x86/x86_64 passes.

reported by zephyrproject-rtos#22290 and similar to zephyrproject-rtos#22275, the access to mpu regs in buffer validate should be atomic. Signed-off-by: Wayne Ren <wei.ren@synopsys.com>

reported by #22290 and similar to #22275, the access to mpu regs in buffer validate should be atomic. Signed-off-by: Wayne Ren <wei.ren@synopsys.com>

We need to lock interrupts around programming the MPU index into RNR and reading values out of RASR/RBAR/RLAR, in case we get scheduled or take an interrupt in between these two operations. Addresses zephyrproject-rtos#22275 for ARM Cortex-M. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>

pderwin added the bug The issue is a bug, or the PR is fixing a bug label Jan 28, 2020

jhedberg added the priority: low Low impact/importance bug label Jan 28, 2020

andrewboie added priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels Jan 28, 2020

jhedberg assigned stephanosio Jan 28, 2020

andrewboie mentioned this issue Jan 28, 2020

arm: cortex-m: fix MPU region validation #22281

Closed

andrewboie assigned ioannisg Jan 28, 2020

andrewboie added the area: Memory Protection label Jan 28, 2020

This was referenced Jan 28, 2020

tests: syscalls: add torture test #22288

Closed

ARC crashes due to concurrent system calls #22290

Closed

andrewboie pushed a commit that referenced this issue Feb 5, 2020

arch: arc: fix the bug where buffer validate should be atomic

d1fa27c

reported by #22290 and similar to #22275, the access to mpu regs in buffer validate should be atomic. Signed-off-by: Wayne Ren <wei.ren@synopsys.com>

ioannisg mentioned this issue Feb 13, 2020

Arch arm cortex m swap locking #22691

Merged

ioannisg added the has-pr label Feb 18, 2020

andrewboie closed this as completed in #22691 Feb 19, 2020

ioannisg mentioned this issue Mar 17, 2020

arch: arm: mpu: protect RNR when reading RBAR, RASR in ARMv7-M driver #23010

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275

arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275

pderwin commented Jan 28, 2020

jhedberg commented Jan 28, 2020

andrewboie commented Jan 28, 2020

pderwin commented Jan 28, 2020

andrewboie commented Jan 28, 2020

pderwin commented Jan 28, 2020

andrewboie commented Jan 28, 2020

andrewboie commented Jan 28, 2020

andrewboie commented Jan 28, 2020

andrewboie commented Jan 28, 2020

arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275

arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275

Comments

pderwin commented Jan 28, 2020

jhedberg commented Jan 28, 2020

andrewboie commented Jan 28, 2020

pderwin commented Jan 28, 2020

andrewboie commented Jan 28, 2020

pderwin commented Jan 28, 2020

andrewboie commented Jan 28, 2020

andrewboie commented Jan 28, 2020

andrewboie commented Jan 28, 2020

andrewboie commented Jan 28, 2020