Skip to content

GCC - Add support to split heap across 2-RAM banks #9944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 30, 2019

Conversation

deepikabhavnani
Copy link

@deepikabhavnani deepikabhavnani commented Mar 5, 2019

Description

GCC - Add support to split heap across 2-RAM banks.
Commit messages provide more information.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[X] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@marcuschangarm @juhoeskeli @c1728p9

Release Notes

Mbed heap split into 2-Ram banks is added which can be enabled by compiling source with MBED_SPLIT_HEAP macro. Linker symbols __mbed_sbrk_start __mbed_krbs_start __mbed_sbrk_start_0 __mbed_krbs_start_0 should be added by you in GCC linker script to state start and end of each HEAP region.

@cmonr
Copy link
Contributor

cmonr commented Mar 5, 2019

@deepikabhavnani Please fix the trailing whitespace that astyle found.

@cmonr
Copy link
Contributor

cmonr commented Mar 5, 2019

Holding because @sg- has a problem with this in an offline conversation.

@deepikabhavnani
Copy link
Author

Need think of solution for trim operation in malloc, once the allocated memory is freed and trimmed back to region 1, it will not get allocated again.

@ghost ghost added the PM_DECLINED label Mar 5, 2019
@marcuschangarm
Copy link
Contributor

I don't see the problem; the current sbrk implementation doesn't support malloc_trim anyway so we are not losing functionality.

If we want to support it, the easiest solution would be to wrap it:

int __wrap_malloc_trim(size_t pad) {
    return 0; // unable to release memory
}

@deepikabhavnani
Copy link
Author

deepikabhavnani commented Mar 6, 2019

I need to check the free operation, if internally GCC does release of memory (not free) done at the end of last allocation which is usually done by passing -negative value to _sbrk(). We can have solution to that as well, just need to verify and add.

@deepikabhavnani
Copy link
Author

deepikabhavnani commented Mar 6, 2019

@sg- I tried allocating and free entire heap memory multiple times with this example https://gist.github.com/deepikabhavnani/5a515f2bced590fa9a9688ecc864886d and it worked fine.

GCC does not make the memory available to OS when we do free operation, it internally maintains the link list of all allocations and combines adjacent chunks to coalesce them. Verified by allocating and freeing different sizes of allocations.

Reference : https://sourceware.org/glibc/wiki/MallocInternals

Copy link
Collaborator

@jeromecoutant jeromecoutant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Maybe we can enable this MBED_SPLIT_HEAP in L476 as well ?
  • what happened if MBED_SPLIT_HEAP is not enabled for a L475 chip ? Should we define memory regions in the ld file depending on this MBED_SPLIT_HEAP macro ?

@@ -3936,7 +3937,7 @@
}
},
"detect_code": ["0468"],
"macros_add": ["USBHOST_OTHER", "TWO_RAM_REGIONS"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this TWO_RAM_REGIONS macro should have been deleted in #9571
Note it is also defined for RHOMBIO_L476DMW1K

Copy link
Author

@deepikabhavnani deepikabhavnani Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks will add fix for RHOMBIO_L476DMW1K in separate PR Fixed

@deepikabhavnani
Copy link
Author

deepikabhavnani commented Mar 6, 2019

Maybe we can enable this MBED_SPLIT_HEAP in L476 as well ?

👍 Plan is to update other targets in separate PR after this is approved and tested. PR moved to 5.13 - added fix for all ST targets. Please review

what happened if MBED_SPLIT_HEAP is not enabled for a L475 chip ? Should we define memory regions in the ld file depending on this MBED_SPLIT_HEAP macro ?

Adding macro in linker script will need tools update, instead shall we change. Shall I change __end__ = __mbed_sbrk_start

@deepikabhavnani
Copy link
Author

Note for self -

bool result = rangeinrange((uint32_t) temp, sizeof(linked_list), mbed_heap_start, mbed_heap_size);

This test should be updated for split heap support

Copy link
Member

@bulislaw bulislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no other actors in the system or "real system" for that matter, therefore releasing memory to the system is not needed.

* If the new address is outside the first region, start allocating from the second region.
*/
if (once && (new_heap > (uint32_t) &__mbed_krbs_start_0)) {
once = false;
Copy link
Contributor

@0xc0170 0xc0170 Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does once do - either changing the name to be more clear about the intention or adding a comment. Or at least 1258 comment could include this boolean - it touches only address check

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once - is indicator of allowing to enter just once. I can update the comment to add more clarity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant by sbrk behavior is that if you try to allocate here more memory than is available in first region, the code moves to another region and you end up nothing being allocated from first region in the worst case. And you can't come back here either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, given the current behaviour of newlib's malloc, I don't think the once flag is needed.

The regions must be in memory address order, or newlib malloc would get confused. (Is that known/documented for this feature? I spotted it by inspection of newlib source, and it makes sense given the normal POSIX function of sbrk).

So the test could be if (heap <= (uint32_t) &__mbed_krbs_start_0 && new_heap > (uint32_t) &_mbed_krbs_start_0).

Copy link
Contributor

@0xc0170 0xc0170 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks fine, question:

I find the naming confusing: start_0 means it comes before _start - it is in the lower ram bank (lower address) as heap starts pointing to start_0.
This could be trivial I might be just confused with _start vs _start_0 naming - if I have a target with two RAM banks for heap and want to apply this start/0 to it - I would not know how to apply this to memory sections based on the release notes addition .
Why is it not _0 and _1 / first/second ?

@deepikabhavnani
Copy link
Author

@0xc0170 - I understand the confusion, but only reason to not update both to _0 and _1 was to have compatibility with old/existing definitions. Some targets have __mbed_krbs_start and __mbed_sbrk_start in their linker files.

Let me know how I can improve to add more clarity (apart from changing to _0 and _1)

@0xc0170
Copy link
Contributor

0xc0170 commented Apr 16, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Apr 17, 2019

Test run: FAILED

Summary: 2 of 11 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_exporter
  • jenkins-ci/mbed-os-ci_greentea-test

@0xc0170
Copy link
Contributor

0xc0170 commented Apr 17, 2019

Aborted build, we need CI time for 5.12.2 jobs today

@deepikabhavnani
Copy link
Author

Aborted build, we need CI time for 5.12.2 jobs today

Can we start the CI on this?

@bulislaw
Copy link
Member

Why is that 5.13? And not 5.12.x?

@deepikabhavnani
Copy link
Author

deepikabhavnani commented Apr 24, 2019

Why is that 5.13? And not 5.12.x?

I believe because it is new feature. Initially it was 5.12.0, may be because of that?

@juhoeskeli
Copy link
Contributor

juhoeskeli commented Apr 25, 2019

Hello @deepikabhavnani & others. At some point I was reporting allocation problems with pelion-enablement on L475 with this PR. There the allocation/freeing is made in order of 10 /20 / 40 / 60kb. The 40kB allocation failed in that test.

There are several factors at play in here, originating from the newlib implementation. One of these factors is that the _sbrk allocation is aligned to 4kB, and it always tries to allocate one extra page. Second thing is that the allocator does not try to allocate only the missing bytes (e.g. for 40kB we would need ~20kB extra allocated, as we freed that already) but the whole deal (in this case 40kB) in addition. And this is where it fails in this case because with the extra page put on top there is not enough memory (only 63kB or so). Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use (if you allocate too much on the first go, you end up skipping the first region altogether). We looked this with @kjbracey-arm and he has filed question of this behaviour in here https://answers.launchpad.net/gcc-arm-embedded/+question/680433

@adbridge
Copy link
Contributor

ci started

@mbed-ci
Copy link

mbed-ci commented Apr 25, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 5
Build artifacts

@deepikabhavnani
Copy link
Author

Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use

@juhoeskeli - Agree with larger chunks of memory it will be good to pre-allocate them or follow the allocation order of larger to small. In case of single RAM bank GCC can take care of merging the allocation regions, but in 2-region model if first block is 40K and another one 60K initial different sequence will result into different behavior.

Example
RAM 1 - 40K Available RAM2 - 60K available
Allocation sizes 10K, 20k, 40K, 60K (allocate and free)

Sequence 1 - In order 10K, 20K, 40K, 60K
10K is allocated in RAM1 - followed by 20K in RAM1. Now even though 10K and 20K is freed, GCC will not coalese small chunks till memory is available. As a result 40K is allocated in RAM2 and though we have free memory available 60K will not be allocated. But 60K allocation should be successful if we free 40K as well.

Sequence 2 - In order 60K, 40K, 20K, 10K
This should allocate all memory chunks correctly without any failures

Sequence 2 - In order 60K, 20K, 40K, 10K
This should be successful as well. if we free 20K irrespective of 60K is freed or not.

@juhoeskeli - Please clarify if your observation aligns to this understanding.

@adbridge
Copy link
Contributor

@deepikabhavnani I'm going to hold off on merging this until the question is answered

@juhoeskeli
Copy link
Contributor

juhoeskeli commented Apr 30, 2019

Probably the only way to make this work in application is to preallocate (and free) the memory so that both regions are available for use

@juhoeskeli - Agree with larger chunks of memory it will be good to pre-allocate them or follow the allocation order of larger to small. In case of single RAM bank GCC can take care of merging the allocation regions, but in 2-region model if first block is 40K and another one 60K initial different sequence will result into different behavior.

Citing Kevin, from the link I posted earlier: "Best current work around from outside the library I can see is to have a "heap preload loop" at startup that allocates lots of small (<pagesize) blocks forcing the allocator to make maximally-efficient small sbrk calls, then releases them all."

Example
RAM 1 - 40K Available RAM2 - 60K available
Allocation sizes 10K, 20k, 40K, 60K (allocate and free)

Sequence 1 - In order 10K, 20K, 40K, 60K
10K is allocated in RAM1 - followed by 20K in RAM1. Now even though 10K and 20K is freed, GCC will not coalese small chunks till memory is available. As a result 40K is allocated in RAM2 and though we have free memory available 60K will not be allocated. But 60K allocation should be successful if we free 40K as well.

I think the 60k allocation will not be successful, as the allocator tries to allocate that 60k from address after the 40k which was freed.

Sequence 2 - In order 60K, 40K, 20K, 10K
This should allocate all memory chunks correctly without any failures

Yes, but what happens is that the RAM1 will be left unused and no way to reclaim that (due to how sbrk works - added a comment about this to relevant code section).

Sequence 2 - In order 60K, 20K, 40K, 10K
This should be successful as well. if we free 20K irrespective of 60K is freed or not.

Yes, but what happens is that the RAM1 will be left unused and no way to reclaim that. You must free the 60k otherwise it will not work.

@juhoeskeli - Please clarify if your observation aligns to this understanding.

@deepikabhavnani @adbridge I would still go forward with this, as there is nothing wrong with the PR itself. It is the feature of newlib allocator. With all these caveats I feel it is still better system like this than without. A PR could be later made with workaround to this allocation issue.

@kjbracey
Copy link
Contributor

kjbracey commented Apr 30, 2019

In case of single RAM bank GCC can take care of merging the allocation regions,

GCC/newlib still has the same basic problem in a single RAM bank. For example if you have 100K in a single region, the sequence malloc(60K); free; malloc(80K) will fail, as it issues the calls sbrk(60K) followed by sbrk(80K), and there isn't 140K of RAM.

The problem is that the sbrk interface is very crude, and newlib uses it in a fairly dumb way. sbrk cannot move backwards to return an earlier pointer to the first region it skipped, and it can't give some sort of limited return - eg for the sbrk(80K) it can't say "I don't have another 80K, but I can give you 40K contiguous with the 60K I already gave you, is that any good?".

And newlib also doesn't try harder - when sbrk(80K) fails, it doesn't attempt sbrk(20K) with the reasonable hope that it will get something contiguous with its free 60K at the top and thus satisfy the malloc(80K) request.

We can avoid the issue by something like this on init:

typedef struct block {
    ns_list_link_t link;
    // aim for block to be around 128 bytes - the SMALL_MEMORY "page size" for newlib malloc
    // (presently SMALL_MEMORY is not defined, so the page size is 4K, meaning 4K alignment
    // will be forced anyway, but this future-proofs in case SMALL_MEMORY gets set)
    char pad[100]; 
} block_t;

NS_LIST_HEAD(block_t, link) blocks;

// Fill heap with small blocks, so it makes small `sbrk` calls that fit nicely into our
// regions, getting as much as we can into each region, avoiding `sbrk`
// having to say "no" to a too-big request, or advance prematurely
// from one region to the next.
for (;;) {
    block_t *block = malloc(sizeof(block_t));
    if (!block) {
        break;
    }

    ns_list_add_to_end(&blocks, block);
}

// And free it all again - newlib will now have populated its free list with all available
// heap regions, and the heap will operate efficiently
ns_list_foreach_safe(block_t, block, &blocks) {
    ns_list_remove(&blocks, block);
    free(block);
}

tr_debug("Heap usable = %zu bytes", mallinfo().arena);

I think I'd advocate adding that code to the pre-main startup for the GCC toolchain, on all builds. Only concern is code-size impact for blinky/bootloader.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, despite discussion above - none of the issues are caused by this change which is sound. Would suggest removing the once flag next time it's touched though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.