Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_sbrk heap & stack collision check is risky #9542

Closed
alzix opened this issue Jan 30, 2019 · 9 comments
Closed

_sbrk heap & stack collision check is risky #9542

alzix opened this issue Jan 30, 2019 · 9 comments

Comments

@alzix
Copy link
Contributor

alzix commented Jan 30, 2019

Description

Reopening old discussion #1561 as the SW and HW has changed since last revisited.

Stack and Heap collision checked in _sbrk implementation by comparing to get_MSP() is risky.

                +-----------+ 0x00000000
                |           |
                |           |
                +-----------+
                |           |
                |    heap   |
                |     |     |
                |     |     |
                |     V     |
                +-----------+ <-- MSPLIM (ARMv8-M)
                |     ^     |
                |     |     |
                |     |     |
                |   stack   |
                |           |
   get_MSP() -->|           |
                |           |
                +-----------+
                |           |
                |           |
                +-----------+ 0xFFFFFFFF

In current mbed-os implementation MSP is used mainly for during initialization sequence and by interrupt handlers. MSP size can be predicted and carefully chosen by system integrator according to the application use case.
Current implementation assumes dynamic stack limit watermark and makes system unpredictable. Big allocations may work before stack is full but fail afterwards.

In addition as mentioned by @pan- #1561 (comment) , the check is not atomic and we do know for sure that MSP is used by IRQ handlers. This can lead to corruption.

Now we have new HW which was not considered during last discussion. For ARMv8-M we have stack limit registers. These registers will trigger fault once push operations are executed. This dynamic stack limit approach makes them unusable, as regular write to memory will be permitted.

Issue request type

[ ] Question
[x] Enhancement
[ ] Bug

CC @sg- @erezlandau

@kjbracey
Copy link
Contributor

The check is not atomic and we do know for sure that MSP is used by IRQ handlers. This can lead to corruption.

On this particular point - there's no atomicity issue here. MSP is totally atomic from the point of view of anyone calling malloc/_sbrk. And actually it's totally constant if in an RTOS environment - the main stack is always empty when queried by a thread (apart from the boot callstack leading up to osKernelStart).

Yes, interrupt handlers will change MSP while they're running, but that will be invisible to a thread/process reading it, because they put it back before returning. (Just like they change the PC but put it back).

So comparison against MSP is fine in principle - the problem is just that you're not allowing space for the interrupt handlers - if you want to base the calculation on MSP you need to be checking against CurrentMSP - StackSpaceAllowedForInterrupts - it seem's that was @pan-'s point in that comment.

Does PR #9092 change anything here? That was doing something to regularise main stack sizes. Maybe platforms are better at specifying their limit explicitly now such that the MSP check is redundant anyway - they could look at a stack base symbol? And yes, that explicit limit could also be programmed into MSPLIM.

@pan-
Copy link
Member

pan- commented Jan 30, 2019

@kjbracey-arm Thanks, that was my point.

I had a quick look at #9092 and it seems like we can use mbed_stack_isr_start as it is defined globally:

if (new_heap >= (unsigned char *)__get_MSP()) can be transformed into if (new_heap >= mbed_stack_isr_start)

@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-813

@kjbracey
Copy link
Contributor

I don't think we necessarily to rule out dynamic behaviour in ARMv8 either - MSPLIM could be dynamically set to current sbrk position, rather than mbed_stack_isr_start, so it can use any memory not used by the heap.

Now, that's not very useful in the RTOS setup, where the main stack is just for interrupts, but conceivably it remains useful in non-RTOS builds where the so-called "isr" stack is actually where main is running. I don't know if we set "ISR_STACK_SIZE" bigger for non-RTOS builds?

You might still want the MSP-based comparison if the MSP is allowed to spill below the nominal "ISR_start".

@pan-
Copy link
Member

pan- commented Jan 30, 2019

Agreed that MSPLIM can be dynamically adjust but we may miss cases where the ISR stack goes beyond its limit. Therefore an application may appear to work when memory pressure is low and crashes when the memory pressure is high. Wouldn't it be better to have it crashing all the time ?

I don't know if we set "ISR_STACK_SIZE" bigger for non-RTOS builds?

From what I read we don't. It should be at least equal to the main thread stack size and at most it should be equal to the stack size reserved for IRQ processing and the stack size reserved for the main thread.

@deepikabhavnani
Copy link

Thanks @alzix for raising this issue, I will like to add few things here:

The default model for GCC_ARM is one region ram model, which uses this implementation of _sbrk(). Its mainly for non-RTOS version. Most partners have added their own version wrap_sbrk() and _sbrk() to have 2-region ram model because of few issues listed below.

This is good time to rethink on single RAM region model (stack and heap together) for RTOS less build reasons being:

  1. GCC allocates memory at 4K boundary. ISR stack in Mbed-2 is 4K but in case of Mbed OS 5 it is 1K. Placing stack always at the bottom or after heap, we leave 3K of RAM memory unused. malloc unexpected behaviour on GCC_ARM compiler #5386
  2. In case of separate RAM banks we cannot extend Heap to multiple banks, and hence users prefer separate region for heap memory. Enable split heap in SRAM1 and SRAM2 on Nucleo L476RG #8963

Also #9092 is breaking change for all non-RTOS builds, we will need 2 ram region model to fix this. #9523

@deepikabhavnani
Copy link

deepikabhavnani commented Jan 30, 2019

I don't know if we set "ISR_STACK_SIZE" bigger for non-RTOS builds?

ISR stack size for non-RTOS builds is 4K and 1K for RTOS build.
Default stack size is in targets.json for RTOS less build https://github.com/ARMmbed/mbed-os/blob/master/targets/targets.json#L30

Overwritten in RTOS lib.json for RTOS build https://github.com/ARMmbed/mbed-os/blob/master/rtos/mbed_lib.json#L29

@deepikabhavnani
Copy link

@alzix - Stack and heap collision is taken care of in #9571.
Stack and heap are now defined as two separate memory regions. Please review #9571 if this looks good for Armv8M?

@deepikabhavnani
Copy link

deepikabhavnani commented Mar 1, 2019

Closing this as fix is applied in #9571. Feel free to reopen if issue still exists

@ARMmbed/mbed-os-maintainers - I cannot close this issue. Did something change ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants