-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arm: cortex-R & M: CONFIG_USERSPACE: intermittent Memory region write access failures #22275
Comments
Is this reproducible with the latest master branch? Cortex R was not supported by Zephyr 1.14. |
Raising priority as this appears to affect Cortex M as well. |
Our code base was just updated to V2.1 today, so it will be a day or two before I can move up to that level. I will go ahead and create the unit test that I was speaking of, and assuming that fails, post that for testing on your M systems. |
Just posted a patch #22281 for Cortex-M. |
We need to lock interrupts around programming the MPU index into RNR and reading values out of RASR/RBAR/RLAR, in case we get scheduled or take an interrupt in between these two operations. Addresses zephyrproject-rtos#22275 for ARM Cortex-M. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
I was able to create a small contained testcase that shows a failure on R. I suspect this will fail on M as well. It is attached as phil.c.gz. Sample serial output below. (15:03:40 169/ 0:00 0:00) uart:~$ idle 135 IDLE task pid: 0x00010fd4 priority: 15 ` |
@pderwin one thing I noticed in your code:
Use K_THREAD_STACK_SIZEOF() and not sizeof(). Depending on architecture, there can be extra data within the stack object, aside from the actual buffer. |
confirmed this affects Cortex-M, I can reproduce with mps2_an385 under QEMU. |
reported by zephyrproject-rtos#22290 and similar to zephyrproject-rtos#22275, the access to mpu regs in buffer validate should be atomic. Signed-off-by: Wayne Ren <wei.ren@synopsys.com>
We need to lock interrupts around programming the MPU index into RNR and reading values out of RASR/RBAR/RLAR, in case we get scheduled or take an interrupt in between these two operations. Addresses zephyrproject-rtos#22275 for ARM Cortex-M. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Describe the bug
I am getting intermittent run-time failures such as:
In this example, k_uptime_get() was called, which is going to save its 64-bit return value into memory (due to the syscall macros). The syscall gear is checking to see if the memory
address is valid. The routines wind down into mpu_buffer_validate(), which will call is_user_accessible_region() and finally into get_region_ap(). The code
is failing because get_region_ap() has a critical code section which is not wrapped with an irq_lock()/irq_unlock(). In this failure, we write the MPU index register, then
take an external interrupt, go off to another thread for a while ( which is going to re-load the MPU entries), return to the current thread (again re-loading the MPU entries), and read the attribute register (for the wrong index) and fail.
In reading the current Cortex-M code, I don't see it protecting these critical sections either, so I'm left wondering if the 'M' class will save the index register during
interrupt processing. If so, I would like to fix the 'R' code in a similar fashion.
There are several of these utility subroutines with this same critical code sections.
To Reproduce
I saw this on a Cortex-R based system. Ioannis believes the problem exists in an 'M' based system as well.
This is an intermittent failure that will be hard to reproduce.
A unit test might be to have a thread which is doing nothing more than calling k_uptime_get(); This will be calling the Z_SYSCALL_MEMORY_WRITE macro to check the destination buffer address. Asynchronous to this, do something on a timer interrupt to send a message to a higher priority thread. In my exact case, I was sending a message to a user mode work Q. The higher priority thread could just read the message/work-item and wait again. All we need to do is force the context switch to occur. I think this test should fail fairly quickly.
Expected behavior
The memory access should have passed. The location I was writing to was valid.
Impact
memory access fault. System shuts down
Screenshots or console output
n/a
Environment (please complete the following information):
Additional context
discussed in devel mailing list #6695
The text was updated successfully, but these errors were encountered: