Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275

Closed
pderwin opened this issue Jan 28, 2020 · 9 comments · Fixed by #22691
Closed
Assignees
Labels
area: Memory Protection bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@pderwin
Copy link
Contributor

pderwin commented Jan 28, 2020

Describe the bug

I am getting intermittent run-time failures such as:

 syscall z_hdlr_k_uptime_get failed check: Memory region 0x0002ff70 (size 8) write access denied
 ***** Hardware exception *****
 Current thread ID = 0x000221b0
 Faulting instruction address = 0x1a
 Fatal fault in thread 0x000221b0! Aborting.

In this example, k_uptime_get() was called, which is going to save its 64-bit return value into memory (due to the syscall macros). The syscall gear is checking to see if the memory
address is valid. The routines wind down into mpu_buffer_validate(), which will call is_user_accessible_region() and finally into get_region_ap(). The code
is failing because get_region_ap() has a critical code section which is not wrapped with an irq_lock()/irq_unlock(). In this failure, we write the MPU index register, then
take an external interrupt, go off to another thread for a while ( which is going to re-load the MPU entries), return to the current thread (again re-loading the MPU entries), and read the attribute register (for the wrong index) and fail.

In reading the current Cortex-M code, I don't see it protecting these critical sections either, so I'm left wondering if the 'M' class will save the index register during
interrupt processing. If so, I would like to fix the 'R' code in a similar fashion.

There are several of these utility subroutines with this same critical code sections.

To Reproduce

I saw this on a Cortex-R based system. Ioannis believes the problem exists in an 'M' based system as well.

This is an intermittent failure that will be hard to reproduce.

A unit test might be to have a thread which is doing nothing more than calling k_uptime_get(); This will be calling the Z_SYSCALL_MEMORY_WRITE macro to check the destination buffer address. Asynchronous to this, do something on a timer interrupt to send a message to a higher priority thread. In my exact case, I was sending a message to a user mode work Q. The higher priority thread could just read the message/work-item and wait again. All we need to do is force the context switch to occur. I think this test should fail fairly quickly.

Expected behavior
The memory access should have passed. The location I was writing to was valid.

Impact
memory access fault. System shuts down

Screenshots or console output
n/a

Environment (please complete the following information):

  • OS: Linux build environment
  • Toolchain GCC
  • Commi used: V 1.14 (however, we are in the process of upgrade to v2.1.0 now )

Additional context
discussed in devel mailing list #6695

@pderwin pderwin added the bug The issue is a bug, or the PR is fixing a bug label Jan 28, 2020
@jhedberg
Copy link
Member

Is this reproducible with the latest master branch? Cortex R was not supported by Zephyr 1.14.

@jhedberg jhedberg added the priority: low Low impact/importance bug label Jan 28, 2020
@andrewboie andrewboie added priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels Jan 28, 2020
@andrewboie
Copy link
Contributor

Raising priority as this appears to affect Cortex M as well.

@pderwin
Copy link
Contributor Author

pderwin commented Jan 28, 2020

Our code base was just updated to V2.1 today, so it will be a day or two before I can move up to that level.

I will go ahead and create the unit test that I was speaking of, and assuming that fails, post that for testing on your M systems.

@andrewboie
Copy link
Contributor

Our code base was just updated to V2.1 today, so it will be a day or two before I can move up to that level.

I will go ahead and create the unit test that I was speaking of, and assuming that fails, post that for testing on your M systems.

Just posted a patch #22281 for Cortex-M.

andrewboie pushed a commit to andrewboie/zephyr that referenced this issue Jan 28, 2020
We need to lock interrupts around programming the MPU index into
RNR and reading values  out of RASR/RBAR/RLAR, in case we
get scheduled or take an interrupt in between these two
operations.

Addresses zephyrproject-rtos#22275 for ARM Cortex-M.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
@pderwin
Copy link
Contributor Author

pderwin commented Jan 28, 2020

I was able to create a small contained testcase that shows a failure on R. I suspect this will fail on M as well. It is attached as phil.c.gz. Sample serial output below.


(15:03:40 169/ 0:00 0:00) uart:~$ idle 135 IDLE task pid: 0x00010fd4 priority: 15
(15:03:40 169/ 0:00 0:00) init_thread_user_mode 52 high priority: 1
(15:03:40 169/ 0:00 0:00) low priority thread in user mode, Calling k_uptime_get() forever my priority: 2
(15:03:40 169/ 0:00 0:00) LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLinit_thread_user_mode 59 begin real test
(15:03:45 169/ 0:05 0:05) LLLLLsyscall z_hdlr_k_uptime_get failed check: Memory region 0x000163b8 (size 8) write access denied
(15:03:46 169/ 0:06 0:01) ***** Hardware exception *****
(15:03:46 169/ 0:06 0:00) Current thread ID = 0x00010a4c
(15:03:46 169/ 0:06 0:00) Faulting instruction address = 0x0
(15:03:46 169/ 0:06 0:00) Fatal fault in thread 0x00010a4c! Aborting.
(15:03:46 169/ 0:06 0:00) H
(15:03:4`

phil.c.gz

`

@andrewboie
Copy link
Contributor

@pderwin thanks! I'll take this and adapt it to our user mode testing suite.

I talked to @ioannisg earlier, he's going to send a PR which will save/restore RNR when we take interrupts, I've marked my PR DNM meanwhile.

@andrewboie
Copy link
Contributor

@pderwin one thing I noticed in your code:

   k_thread_create(&low_priority_thread,
		   low_priority_thread_stack,
		   sizeof(low_priority_thread_stack),
		   low_priority_thread_entry,
		   NULL, NULL, NULL,
		   2,
		   ( K_INHERIT_PERMS | K_USER ),
		   K_NO_WAIT);

Use K_THREAD_STACK_SIZEOF() and not sizeof(). Depending on architecture, there can be extra data within the stack object, aside from the actual buffer.

@andrewboie
Copy link
Contributor

confirmed this affects Cortex-M, I can reproduce with mps2_an385 under QEMU.

@andrewboie
Copy link
Contributor

Also reproduces on mps2_an521

I implemented a torture test inspired by @pderwin see #22288. I found that ARC is broken as well, I opened #22290. x86/x86_64 passes.

vonhust pushed a commit to foss-for-synopsys-dwc-arc-processors/zephyr that referenced this issue Feb 5, 2020
reported by zephyrproject-rtos#22290 and similar to zephyrproject-rtos#22275, the access to mpu regs
in buffer validate should be atomic.

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
andrewboie pushed a commit that referenced this issue Feb 5, 2020
reported by #22290 and similar to #22275, the access to mpu regs
in buffer validate should be atomic.

Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
ioannisg pushed a commit to ioannisg/zephyr that referenced this issue Feb 12, 2020
We need to lock interrupts around programming the MPU index into
RNR and reading values  out of RASR/RBAR/RLAR, in case we
get scheduled or take an interrupt in between these two
operations.

Addresses zephyrproject-rtos#22275 for ARM Cortex-M.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Memory Protection bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants