Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC privilege mode stacks waste memory due to alignment requirements #17755

Closed
andrewboie opened this issue Jul 24, 2019 · 17 comments · Fixed by #24021
Closed

ARC privilege mode stacks waste memory due to alignment requirements #17755

andrewboie opened this issue Jul 24, 2019 · 17 comments · Fixed by #24021
Assignees
Labels
area: ARC ARC Architecture area: Memory Protection bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@andrewboie
Copy link
Contributor

The current scheme for ARC stacks is as follows:

+-------------------------+-----------------+
| user mode stack         | privilege stack |
+-------------------------+-----------------+

However, on ARM EM the user mode stacks must be sized and aligned to a power of two since an MPU region must cover them. Adding the privilege stack to the same area results in memory being wasted:

+-------+-----+                +----------+-----+
| stk1  |priv |  dead memory   | stk2     | priv|
+-------+-----+                +----------+-----+

If we use the same privilege stack generation mechanism as ARM, then the stacks can be laid out in memory with no gaps in between:

+-------++--------+
| stk1  || stk2   |
+-------++--------+
@andrewboie andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Jul 24, 2019
@andrewboie andrewboie added the priority: medium Medium impact/importance bug label Jul 24, 2019
@vonhust
Copy link

vonhust commented Jul 25, 2019

@andrewboie it's limitted by the ARC MPUv2. For ARC MPUv3, there is no such a problem whose requirement is 32 bytes aligned.

For ARC MPUv2, it's very difficult to avoid memory waste even following ARM' approach.
For example, stk1 size is 2048 bytes, stk2 size is 8192 bytes.
Then

+-------++--------+
| stk1  || stk2   |
+-------++--------+

will become

+-------+                               +--------+
| stk1  |  dead memory (8192 - 2048 )   | stk2   |
+-------+                               +--------+

Because

  • stk1 requires start address aligined with 2048 bytes, size aligned with 2048 bytes
  • stk2 requires start address aligned with 8192 byes, size aligned with 8192 byes.

The correct address alignment should be

  • stk1 starts (8192 - 2048) with a size of 2048
  • stk2 starts 8192 with a size of 8192

Then what about stk3 with a size of 16 K?

In ARC's current solution ,the privileg stack or stack guard area does not affect the aligment of user stack,
because their size is 2048 bytes fixed, the minimum memory block required by MPU hardware.

If user stack is aligned , the privilege stack/ stack guard is aligned.

So, the better solution is improving the hardware, e.g. use MPUv3 or MPUv4 of ARC

@andrewboie
Copy link
Contributor Author

No, the memory sections are sorted in reverse order of object size:

For example, stk1 size is 2048 bytes, stk2 size is 8192 bytes.

This results in

+-------++--------+
| stk2  || stk1   |
+-------++--------+
Refs: v1.14.0-rc1-1125-gaf10d16a08
Author:     Daniel Leung <daniel.leung@intel.com>
AuthorDate: Sat Mar 9 00:58:57 2019 -0800
Commit:     Kumar Gala <kumar.gala@gmail.com>
CommitDate: Wed Mar 13 15:54:29 2019 -0500

    linker: sort sections by alignment

    This turns on the linker flag to sort sections by alignment
    in decreasing size of symbols. This helps to minimize
    padding between symbols.

    This also adds the linker options to sort common symbols by
    alignment in descending to minimize the need for padding.

    Signed-off-by: Daniel Leung <daniel.leung@intel.com>
---
 CMakeLists.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 4e04694a38..3ab4ef394c 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -384,6 +384,13 @@ zephyr_ld_options(
   ${LINKERFLAGPREFIX},--build-id=none
   )

+# Sort the common symbols and each input section by alignment
+# in descending order to minimize padding between these symbols.
+zephyr_ld_options(
+  ${LINKERFLAGPREFIX},--sort-common=descending
+  ${LINKERFLAGPREFIX},--sort-section=alignment
+  )
+
 get_property(TOPT GLOBAL PROPERTY TOPT)
 set_ifndef(  TOPT -T)

Please, migrate ARC to use privilege stack generation.

@vonhust
Copy link

vonhust commented Jul 25, 2019

@andrewboie emm... with the help of linker, that make senses. I will take a look at it and try to catch up with zephyr 2.0

@vonhust
Copy link

vonhust commented Aug 15, 2019

It's a kind of optimization. It won't corrupt the system, but a waste of memory. Lower its priority. plan to have a fix in 2.1 release

@vonhust vonhust added priority: low Low impact/importance bug area: ARC ARC Architecture area: Memory Protection and removed priority: medium Medium impact/importance bug labels Aug 15, 2019
@carlescufi carlescufi added Enhancement Changes/Updates/Additions to existing features and removed bug The issue is a bug, or the PR is fixing a bug labels Mar 16, 2020
@carlescufi
Copy link
Member

@vonhust moved from bug to enhancement. I assume this is still an issue.

@andrewboie
Copy link
Contributor Author

@carlescufi this prevents us from shipping a product on ARC with user mode enabled. This should be treated as a bug.

@andrewboie andrewboie self-assigned this Mar 24, 2020
@andrewboie andrewboie added bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug and removed Enhancement Changes/Updates/Additions to existing features priority: low Low impact/importance bug labels Mar 24, 2020
@nashif
Copy link
Member

nashif commented Mar 24, 2020

@ruuddw @abrodkin fyi

@carlescufi
Copy link
Member

@carlescufi this prevents us from shipping a product on ARC with user mode enabled. This should be treated as a bug.

Thanks @andrewboie, did not realize.

@andrewboie
Copy link
Contributor Author

We made it much easier to generate privilege mode stacks. There is now a Kconfig option for it, CONFIG_GEN_PRIV_STACKS.

@vonhust do you have time to implement this? It would be great if we could make the ARC stack layout similar to ARM for the V2 MPU.

@ruuddw
Copy link
Member

ruuddw commented Mar 26, 2020

@andrewboie:

this prevents us from shipping a product on ARC with user mode enabled. This should be treated as a bug

Any specific product/project in mind, or is this a general remark?
The power-of-two alignment restrictions only apply to ARC MPU v2 not v3 and later.
The alignement overhead can be somewhat controlled/reduced by carefully sizing of the stacks. Did anyone check how much memory would be saved, e.g. for some test applications?

Is the main concern indeed overhead, or a preference to align implementations over different architectures and remove the ARC optimization to dynamically combine/split the user and privilege stacks?

@andrewboie
Copy link
Contributor Author

The alignement overhead can be somewhat controlled/reduced by carefully sizing of the stacks.

Please provide detail on how this can be done.
Right now I am not seeing it. The thread stack has to have an MPU region to cover it. A privilege stack appended to the end means the next stack will have a significant gap to be properly aligned.

Is the main concern indeed overhead

Yes.

@andrewboie
Copy link
Contributor Author

andrewboie commented Mar 26, 2020

So, the better solution is improving the hardware, e.g. use MPUv3 or MPUv4 of ARC

The power-of-two alignment restrictions only apply to ARC MPU v2 not v3 and later.

@vonhust @ruuddw from my point of view, the MPU v2 implementation is unoptimized and not suitable for real-world use cases. If you don't want to fix/support it, can we remove MPU v2 support from the kernel entirely?

@ruuddw
Copy link
Member

ruuddw commented Mar 27, 2020

The carefull stack sizing I had in mind was to minimize large stacks. I don't challenge that there's an option for optimization, but trying to understand the real-world case you are worrying about. What would be the savings from optimization in real-world cases? The MPU v2 has a minimum region size of 2k so only for user stacks of 4k and larger there would be a saving (of at most the sum of (user stack-2k) for the larger stacks). For resource constrained systems I'd expect not many stacks larger than 2k. For systems with caches & DDR, the extra memory might not be a big deal.
There's a benefit in current layout asl well: the privilege stack acts as a guard area for the userspace stack, saving another 2k for an additional guard area).

@andrewboie
Copy link
Contributor Author

The carefull stack sizing I had in mind was to minimize large stacks. I don't challenge that there's an option for optimization, but trying to understand the real-world case you are worrying about. What would be the savings from optimization in real-world cases? The MPU v2 has a minimum region size of 2k so only for user stacks of 4k and larger there would be a saving (of at most the sum of (user stack-2k) for the larger stacks). For resource constrained systems I'd expect not many stacks larger than 2k.

@ruuddw please think about this for a moment.
Let's say you have five 2K stacks. An MPU region must be able to cover each one. The base address of each stack must be aligned to its own size which is 2k.
In order to align these stacks properly with the privilege stack appended to them, they are all effectively 4k in size. This wastes an incredible amount of memory.
If your stacks are 4k, they are effectively 8k. And so on.

There's a benefit in current layout asl well: the privilege stack acts as a guard area for the userspace stack, saving another 2k for an additional guard area).

What are you talking about?

First of all, MPU guard regions are not used, anywhere, on any arch, for stack buffers used by user mode threads. They are not necessary. The user thread gets granted access to the stack buffer by virtue of the MPU region programmed for it.

Second, on ARC the privilege mode stack is AFTER the user stack buffer, the layout is
[ user stack | privilege stack ]. The privilege stack isn't guarding anything!

MPU guard regions are an optional feature to catch stack overflows in supervisor mode, and on ARC that's only if the CPU is lacking the built-in support for stack checking. These MPU guards are orthogonal to user mode, userspace does not and has never required them. And since the minimum MPU size on ARC is so large hardly anyone is ever going to use it since there is 2k of memory wasted per thread.

@ruuddw
Copy link
Member

ruuddw commented Mar 31, 2020

For stacks < 2kbyte I don't see additional dead code in current implementation. The 4k per stack is fully used: 2k for user, 2k for privilege stack.
Agree on the other points, as well as that it is something to improve - just trying to understand the high priority and urgency.

@andrewboie
Copy link
Contributor Author

2k for privilege stack.

@ruuddw privilege mode stacks do not need to be 2k. They do not require an MPU region to cover them. The default is 1024 bytes. At minimum a kilobyte is being wasted per thread.

@ruuddw
Copy link
Member

ruuddw commented Apr 1, 2020

ok, we plan to address it for the 2.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARC ARC Architecture area: Memory Protection bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
5 participants