GCC - Add support to split heap across 2-RAM banks #9944

deepikabhavnani · 2019-03-05T20:29:22Z

Description

GCC - Add support to split heap across 2-RAM banks.
Commit messages provide more information.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[X] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@marcuschangarm @juhoeskeli @c1728p9

Release Notes

Mbed heap split into 2-Ram banks is added which can be enabled by compiling source with MBED_SPLIT_HEAP macro. Linker symbols __mbed_sbrk_start __mbed_krbs_start __mbed_sbrk_start_0 __mbed_krbs_start_0 should be added by you in GCC linker script to state start and end of each HEAP region.

cmonr · 2019-03-05T21:47:20Z

@deepikabhavnani Please fix the trailing whitespace that astyle found.

cmonr · 2019-03-05T22:03:57Z

Holding because @sg- has a problem with this in an offline conversation.

deepikabhavnani · 2019-03-05T22:06:07Z

Need think of solution for trim operation in malloc, once the allocated memory is freed and trimmed back to region 1, it will not get allocated again.

marcuschangarm · 2019-03-06T00:35:46Z

I don't see the problem; the current sbrk implementation doesn't support malloc_trim anyway so we are not losing functionality.

If we want to support it, the easiest solution would be to wrap it:

int __wrap_malloc_trim(size_t pad) {
    return 0; // unable to release memory
}

deepikabhavnani · 2019-03-06T01:14:57Z

I need to check the free operation, if internally GCC does release of memory (not free) done at the end of last allocation which is usually done by passing -negative value to _sbrk(). We can have solution to that as well, just need to verify and add.

deepikabhavnani · 2019-03-06T04:47:04Z

@sg- I tried allocating and free entire heap memory multiple times with this example https://gist.github.com/deepikabhavnani/5a515f2bced590fa9a9688ecc864886d and it worked fine.

GCC does not make the memory available to OS when we do free operation, it internally maintains the link list of all allocations and combines adjacent chunks to coalesce them. Verified by allocating and freeing different sizes of allocations.

Reference : https://sourceware.org/glibc/wiki/MallocInternals

jeromecoutant

Maybe we can enable this MBED_SPLIT_HEAP in L476 as well ?
what happened if MBED_SPLIT_HEAP is not enabled for a L475 chip ? Should we define memory regions in the ld file depending on this MBED_SPLIT_HEAP macro ?

jeromecoutant · 2019-03-06T08:59:09Z

targets/targets.json

@@ -3936,7 +3937,7 @@
            }
        },
        "detect_code": ["0468"],
-        "macros_add": ["USBHOST_OTHER", "TWO_RAM_REGIONS"],


I understand that this TWO_RAM_REGIONS macro should have been deleted in #9571
Note it is also defined for RHOMBIO_L476DMW1K

👍 Thanks ~~will add fix for RHOMBIO_L476DMW1K in separate PR~~ Fixed

deepikabhavnani · 2019-03-06T15:58:34Z

Maybe we can enable this MBED_SPLIT_HEAP in L476 as well ?

👍 ~~Plan is to update other targets in separate PR after this is approved and tested.~~ PR moved to 5.13 - added fix for all ST targets. Please review

what happened if MBED_SPLIT_HEAP is not enabled for a L475 chip ? Should we define memory regions in the ld file depending on this MBED_SPLIT_HEAP macro ?

Adding macro in linker script will need tools update, instead shall we change. Shall I change __end__ = __mbed_sbrk_start

deepikabhavnani · 2019-03-06T16:06:35Z

Note for self -

mbed-os/TESTS/mbedmicro-rtos-mbed/heap_and_stack/main.cpp

Line 123 in ec03df4

    
           bool result = rangeinrange((uint32_t) temp, sizeof(linked_list), mbed_heap_start, mbed_heap_size);

This test should be updated for split heap support

targets/TARGET_STM/TARGET_STM32L4/TARGET_STM32L475xG/device/TOOLCHAIN_GCC_ARM/STM32L475XX.ld

bulislaw

There's no other actors in the system or "real system" for that matter, therefore releasing memory to the system is not needed.

0xc0170 · 2019-04-12T11:00:27Z

platform/mbed_retarget.cpp

+     * If the new address is outside the first region, start allocating from the second region.
+     */
+    if (once && (new_heap > (uint32_t) &__mbed_krbs_start_0)) {
+        once = false;


what does once do - either changing the name to be more clear about the intention or adding a comment. Or at least 1258 comment could include this boolean - it touches only address check

once - is indicator of allowing to enter just once. I can update the comment to add more clarity

What I meant by sbrk behavior is that if you try to allocate here more memory than is available in first region, the code moves to another region and you end up nothing being allocated from first region in the worst case. And you can't come back here either.

Actually, given the current behaviour of newlib's malloc, I don't think the once flag is needed.

The regions must be in memory address order, or newlib malloc would get confused. (Is that known/documented for this feature? I spotted it by inspection of newlib source, and it makes sense given the normal POSIX function of sbrk).

So the test could be if (heap <= (uint32_t) &__mbed_krbs_start_0 && new_heap > (uint32_t) &_mbed_krbs_start_0).

0xc0170

Changes looks fine, question:

I find the naming confusing: start_0 means it comes before _start - it is in the lower ram bank (lower address) as heap starts pointing to start_0.
This could be trivial I might be just confused with _start vs _start_0 naming - if I have a target with two RAM banks for heap and want to apply this start/0 to it - I would not know how to apply this to memory sections based on the release notes addition .
Why is it not _0 and _1 / first/second ?

deepikabhavnani · 2019-04-15T16:24:44Z

@0xc0170 - I understand the confusion, but only reason to not update both to _0 and _1 was to have compatibility with old/existing definitions. Some targets have __mbed_krbs_start and __mbed_sbrk_start in their linker files.

Let me know how I can improve to add more clarity (apart from changing to _0 and _1)

0xc0170 · 2019-04-16T08:46:37Z

CI started

mbed-ci · 2019-04-17T07:45:27Z

Test run: FAILED

Summary: 2 of 11 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

jenkins-ci/mbed-os-ci_exporter
jenkins-ci/mbed-os-ci_greentea-test

0xc0170 · 2019-04-17T07:56:08Z

Aborted build, we need CI time for 5.12.2 jobs today

deepikabhavnani · 2019-04-23T15:21:48Z

Aborted build, we need CI time for 5.12.2 jobs today

Can we start the CI on this?

bulislaw · 2019-04-24T17:07:21Z

Why is that 5.13? And not 5.12.x?

deepikabhavnani · 2019-04-24T18:03:57Z

Why is that 5.13? And not 5.12.x?

I believe because it is new feature. Initially it was 5.12.0, may be because of that?

juhoeskeli · 2019-04-25T07:27:00Z

Hello @deepikabhavnani & others. At some point I was reporting allocation problems with pelion-enablement on L475 with this PR. There the allocation/freeing is made in order of 10 /20 / 40 / 60kb. The 40kB allocation failed in that test.

There are several factors at play in here, originating from the newlib implementation. One of these factors is that the _sbrk allocation is aligned to 4kB, and it always tries to allocate one extra page. Second thing is that the allocator does not try to allocate only the missing bytes (e.g. for 40kB we would need ~20kB extra allocated, as we freed that already) but the whole deal (in this case 40kB) in addition. And this is where it fails in this case because with the extra page put on top there is not enough memory (only 63kB or so). Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use (if you allocate too much on the first go, you end up skipping the first region altogether). We looked this with @kjbracey-arm and he has filed question of this behaviour in here https://answers.launchpad.net/gcc-arm-embedded/+question/680433

adbridge · 2019-04-25T15:16:06Z

ci started

mbed-ci · 2019-04-25T19:32:53Z

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 5
Build artifacts

deepikabhavnani · 2019-04-25T20:48:56Z

Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use

@juhoeskeli - Agree with larger chunks of memory it will be good to pre-allocate them or follow the allocation order of larger to small. In case of single RAM bank GCC can take care of merging the allocation regions, but in 2-region model if first block is 40K and another one 60K initial different sequence will result into different behavior.

Example
RAM 1 - 40K Available RAM2 - 60K available
Allocation sizes 10K, 20k, 40K, 60K (allocate and free)

Sequence 1 - In order 10K, 20K, 40K, 60K
10K is allocated in RAM1 - followed by 20K in RAM1. Now even though 10K and 20K is freed, GCC will not coalese small chunks till memory is available. As a result 40K is allocated in RAM2 and though we have free memory available 60K will not be allocated. But 60K allocation should be successful if we free 40K as well.

Sequence 2 - In order 60K, 40K, 20K, 10K
This should allocate all memory chunks correctly without any failures

Sequence 2 - In order 60K, 20K, 40K, 10K
This should be successful as well. if we free 20K irrespective of 60K is freed or not.

@juhoeskeli - Please clarify if your observation aligns to this understanding.

adbridge · 2019-04-26T09:59:53Z

@deepikabhavnani I'm going to hold off on merging this until the question is answered

juhoeskeli · 2019-04-30T07:15:22Z

Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use

@juhoeskeli - Agree with larger chunks of memory it will be good to pre-allocate them or follow the allocation order of larger to small. In case of single RAM bank GCC can take care of merging the allocation regions, but in 2-region model if first block is 40K and another one 60K initial different sequence will result into different behavior.

Citing Kevin, from the link I posted earlier: "Best current work around from outside the library I can see is to have a "heap preload loop" at startup that allocates lots of small (<pagesize) blocks forcing the allocator to make maximally-efficient small sbrk calls, then releases them all."

Example
RAM 1 - 40K Available RAM2 - 60K available
Allocation sizes 10K, 20k, 40K, 60K (allocate and free)

Sequence 1 - In order 10K, 20K, 40K, 60K
10K is allocated in RAM1 - followed by 20K in RAM1. Now even though 10K and 20K is freed, GCC will not coalese small chunks till memory is available. As a result 40K is allocated in RAM2 and though we have free memory available 60K will not be allocated. But 60K allocation should be successful if we free 40K as well.

I think the 60k allocation will not be successful, as the allocator tries to allocate that 60k from address after the 40k which was freed.

Sequence 2 - In order 60K, 40K, 20K, 10K
This should allocate all memory chunks correctly without any failures

Yes, but what happens is that the RAM1 will be left unused and no way to reclaim that (due to how sbrk works - added a comment about this to relevant code section).

Sequence 2 - In order 60K, 20K, 40K, 10K
This should be successful as well. if we free 20K irrespective of 60K is freed or not.

Yes, but what happens is that the RAM1 will be left unused and no way to reclaim that. You must free the 60k otherwise it will not work.

@juhoeskeli - Please clarify if your observation aligns to this understanding.

@deepikabhavnani @adbridge I would still go forward with this, as there is nothing wrong with the PR itself. It is the feature of newlib allocator. With all these caveats I feel it is still better system like this than without. A PR could be later made with workaround to this allocation issue.

kjbracey · 2019-04-30T07:44:51Z

In case of single RAM bank GCC can take care of merging the allocation regions,

GCC/newlib still has the same basic problem in a single RAM bank. For example if you have 100K in a single region, the sequence malloc(60K); free; malloc(80K) will fail, as it issues the calls sbrk(60K) followed by sbrk(80K), and there isn't 140K of RAM.

The problem is that the sbrk interface is very crude, and newlib uses it in a fairly dumb way. sbrk cannot move backwards to return an earlier pointer to the first region it skipped, and it can't give some sort of limited return - eg for the sbrk(80K) it can't say "I don't have another 80K, but I can give you 40K contiguous with the 60K I already gave you, is that any good?".

And newlib also doesn't try harder - when sbrk(80K) fails, it doesn't attempt sbrk(20K) with the reasonable hope that it will get something contiguous with its free 60K at the top and thus satisfy the malloc(80K) request.

We can avoid the issue by something like this on init:

typedef struct block {
    ns_list_link_t link;
    // aim for block to be around 128 bytes - the SMALL_MEMORY "page size" for newlib malloc
    // (presently SMALL_MEMORY is not defined, so the page size is 4K, meaning 4K alignment
    // will be forced anyway, but this future-proofs in case SMALL_MEMORY gets set)
    char pad[100]; 
} block_t;

NS_LIST_HEAD(block_t, link) blocks;

// Fill heap with small blocks, so it makes small `sbrk` calls that fit nicely into our
// regions, getting as much as we can into each region, avoiding `sbrk`
// having to say "no" to a too-big request, or advance prematurely
// from one region to the next.
for (;;) {
    block_t *block = malloc(sizeof(block_t));
    if (!block) {
        break;
    }

    ns_list_add_to_end(&blocks, block);
}

// And free it all again - newlib will now have populated its free list with all available
// heap regions, and the heap will operate efficiently
ns_list_foreach_safe(block_t, block, &blocks) {
    ns_list_remove(&blocks, block);
    free(block);
}

tr_debug("Heap usable = %zu bytes", mallinfo().arena);

I think I'd advocate adding that code to the pre-main startup for the GCC toolchain, on all builds. Only concern is code-size impact for blinky/bootloader.

kjbracey

Approving, despite discussion above - none of the issues are caused by this change which is sound. Would suggest removing the once flag next time it's touched though.

cmonr requested review from a team March 5, 2019 20:34

cmonr added needs: review release-version: 5.12.0-rc1 labels Mar 5, 2019

deepikabhavnani force-pushed the stm32_splitheap branch from 043b82a to 62fc0e2 Compare March 5, 2019 20:46

c1728p9 approved these changes Mar 5, 2019

View reviewed changes

cmonr removed the release-version: 5.12.0-rc1 label Mar 5, 2019

cmonr added the release-version: 5.12.0-rc1 label Mar 5, 2019

cmonr added needs: work and removed needs: review labels Mar 5, 2019

ghost added the PM_DECLINED label Mar 5, 2019

cmonr removed the release-version: 5.12.0-rc1 label Mar 6, 2019

deepikabhavnani force-pushed the stm32_splitheap branch from 62fc0e2 to c37b361 Compare March 6, 2019 04:50

jeromecoutant reviewed Mar 6, 2019

View reviewed changes

deepikabhavnani force-pushed the stm32_splitheap branch from c37b361 to ae939cb Compare March 6, 2019 19:29

deepikabhavnani commented Mar 6, 2019

View reviewed changes

targets/TARGET_STM/TARGET_STM32L4/TARGET_STM32L475xG/device/TOOLCHAIN_GCC_ARM/STM32L475XX.ld Show resolved Hide resolved

cmonr added the release-version: 5.13.0-rc1 label Mar 6, 2019

deepikabhavnani commented Mar 6, 2019

View reviewed changes

targets/TARGET_STM/TARGET_STM32L4/TARGET_STM32L475xG/device/TOOLCHAIN_GCC_ARM/STM32L475XX.ld Show resolved Hide resolved

deepikabhavnani force-pushed the stm32_splitheap branch from ae939cb to 27eed4b Compare March 6, 2019 22:33

deepikabhavnani mentioned this pull request Mar 6, 2019

stm32l475: lwip symbols in sram1 #9793

Closed

bulislaw approved these changes Mar 7, 2019

View reviewed changes

deepikabhavnani force-pushed the stm32_splitheap branch from 27eed4b to ac16f56 Compare March 7, 2019 22:59

cmonr added the needs: review label Apr 12, 2019

jeromecoutant approved these changes Apr 12, 2019

View reviewed changes

0xc0170 reviewed Apr 12, 2019

View reviewed changes

Add more information and comments

20a341d

0xc0170 approved these changes Apr 16, 2019

View reviewed changes

0xc0170 added needs: CI and removed needs: review labels Apr 16, 2019

adbridge added needs: review and removed needs: CI labels Apr 26, 2019

kjbracey approved these changes Apr 30, 2019

View reviewed changes

0xc0170 added ready for merge and removed needs: review labels Apr 30, 2019

0xc0170 merged commit 1de0712 into ARMmbed:master Apr 30, 2019

adbridge removed the ready for merge label May 1, 2019

0xc0170 mentioned this pull request Aug 14, 2019

Provide multiple RAM region support for the heap when using GCC #9518

Closed

GCC - Add support to split heap across 2-RAM banks #9944

GCC - Add support to split heap across 2-RAM banks #9944

Uh oh!

Conversation

deepikabhavnani commented Mar 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pull request type

Reviewers

Release Notes

Uh oh!

cmonr commented Mar 5, 2019

Uh oh!

cmonr commented Mar 5, 2019

Uh oh!

deepikabhavnani commented Mar 5, 2019

Uh oh!

marcuschangarm commented Mar 6, 2019

Uh oh!

deepikabhavnani commented Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepikabhavnani commented Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromecoutant left a comment

Choose a reason for hiding this comment

Uh oh!

jeromecoutant Mar 6, 2019

Choose a reason for hiding this comment

Uh oh!

deepikabhavnani Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deepikabhavnani commented Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepikabhavnani commented Mar 6, 2019

Uh oh!

Uh oh!

Uh oh!

bulislaw left a comment

Choose a reason for hiding this comment

Uh oh!

0xc0170 Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deepikabhavnani Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

juhoeskeli Apr 30, 2019

Choose a reason for hiding this comment

Uh oh!

kjbracey Apr 30, 2019

Choose a reason for hiding this comment

Uh oh!

0xc0170 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deepikabhavnani commented Apr 15, 2019

Uh oh!

0xc0170 commented Apr 16, 2019

Uh oh!

mbed-ci commented Apr 17, 2019

Test run: FAILED

Uh oh!

0xc0170 commented Apr 17, 2019

Uh oh!

deepikabhavnani commented Apr 23, 2019

Uh oh!

bulislaw commented Apr 24, 2019

Uh oh!

deepikabhavnani commented Apr 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juhoeskeli commented Apr 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

deepikabhavnani commented Mar 5, 2019 •

edited

Loading

deepikabhavnani commented Mar 6, 2019 •

edited

Loading

deepikabhavnani commented Mar 6, 2019 •

edited

Loading

deepikabhavnani Mar 6, 2019 •

edited

Loading

deepikabhavnani commented Mar 6, 2019 •

edited

Loading

0xc0170 Apr 12, 2019 •

edited

Loading

0xc0170 left a comment •

edited

Loading

deepikabhavnani commented Apr 24, 2019 •

edited

Loading

juhoeskeli commented Apr 25, 2019 •

edited

Loading

juhoeskeli commented Apr 30, 2019 •

edited

Loading

kjbracey commented Apr 30, 2019 •

edited

Loading