-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unaligned access in networking code causes unaligned exception on Nucleo-F429ZI #6307
Comments
Hi @therealprof, Can you eleaborate a bit on this issue, so we can investigate and fix it? |
@erwango I just built the default configuration for the Nucleo-F429ZI and ran it. The sample I was trying was the one in Meanwhile I've also been able to build a configuration that doesn't crash. I don't have the board here but I think the workaround was to change from Yes, it is systematic. And just looking at the exception and the executed code not at all surprising to the innocent bystander. If you'd like me to test anything specific or provide more output, I can dig up the board later today and run some stuff. |
Hi @therealprof |
@therealprof , any update? |
I am wondering about that net_ipv6_addr_create_solicited_node() function that looks like a culprit. It is missing UNALIGNED_PUT() in couple of places because those that are just u8_t (so one byte) assignments. I have not seen any issues in frdm_k64f or sam_e70_xplained board that use this code all the time, is there some more stricter alignment rules in stm32? |
One thing is I'm not able to run the sample fully (lwm2m connection PC <> board), so will be hard for me to reproduce the exact issue. Anyway, could this be linked with compiler ?
or at other places:
Did we changed something that could have impacted this? Btw, I've been able to run |
@erwango I just tested the suggested change using
and that particular crash went away after making the change. I also checked that it still occurs without patch in the latest git version. So that's good. Unfortunately I still cannot fully test the |
@therealprof , thanks for this feedback! Can you confirm you meet the issue (w/o the patch) with |
No, I don't see a crash in
|
@therealprof : I'm using version 6.2.0 which is coming from zephyr SDK. |
That seems to be only available for Linux. |
@therealprof , I had a try with your toolchain variant and I end up generating the same code as you do. I'll check what can be done about that. |
@jukkar : So according to http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/BABDJJGF.html, Compiling frdm_k64f, I find same behavior So stm32 is not more strict, but @therealprof might be one of the few that run GNU 7.2.1 (Dec 2017), and hit the problem, whatever the SoC. |
@erwango And that's not going to change unless Zephyr provides SDKs for Mac. ;) |
@therealprof Not a real solution, but as a temporary work around you can download the GNU ARM Embedded Toolchain 6.3.1 for Mac OS X from here: I tested it on K64F and it seemed to work just fine (where the 7.2.1 compiler was breaking in the same way you described). |
@mike-scott Yeah, I know more ways to work around that but frankly that's not something I'd want to spend my energy on. |
@erwango ping, any solution in sight? |
@jukkar, Now that root cause is identified, would you be able to have a look? |
I thought this was compiler issue, what is there in network code to fix? |
It's most likely still a error in the code but only exploited by a newer compiler. Also note that this error may also cause troubles on other platforms but only cause real crashes if the MCU has hardware to fault on memory alignment issues. I would imagine that the proper fix is to make sure that the addresses are properly aligned in memory. |
Since it appeared on recent compiler version, this is something that more and more users will face. |
I have just hit this same issue compiling with |
I have hit this same issue.
And I think the real reason is lack memory_barrier.
and the arm-none-eabi-gcc compiler generates different code:
|
Are there instructions somewhere how to use newer compiler version with zephyr? The zephyr-sdk is having gcc 6.x version and I am not able to reproduce this. |
@jukkar : http://docs.zephyrproject.org/getting_started/getting_started.html#using-custom-and-3rd-party-cross-compilers |
The gcc 7.x is generating code for ARM that is causing memory alignment error when net_ipv6_addr_create_solicited_node() is called by IP stack. Fix this by removing the offeding UNALIGNED_PUT() calls and replacing them by simple memset(). Fixes zephyrproject-rtos#6307 Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Yeah, I also don't see the root cause to be pronounced here clearly. And it is: The compiler doesn't behave correctly. The macro used in the code is
Our definition of UNALIGNED_PUT is defined as (include/toolchain/gcc.h):
So, how it's expected to work that compiler does the most conservative memory access when it deals with structures which has Again, that's just theorizing. The point is that we need to fix the definition of UNALIGNED_PUT and friends to work even with gcc 7.x, etc. (And as the current version works well with gcc 6.x, and it even it "optimizes" it (without breakage I hope) where possible, we'd rather have #ifdef for classical and new-too-smart versions). |
While this macro should work, I'd just yank everything in this macro and change it to a simple memset() call. GCC will just generate the right thing (e.g. just move-reg-to-memory instructions) if the size is small (i.e. word-size) and known at runtime; if, in the worst case, it can't figure out a series of instructions to do the right thing, it'll just be a function call, but the important thing is that it'll always be correct, even if we're paying the price of a function call. |
UNALIGNED_PUT() is not about memset(), it's about storing any-type value (uint8_t, uint16_t, uint32_t, add more if you like) into an unaligned memory address. How I'd do it (and how I'll argue to do it later to drop gcc-specificness) is to have UNALIGNED_PUT8, UNALIGNED_PUT16, UNALIGNED_PUT32 with the expected implementations (storing everything by uint8_t*). But first need to figure out how to fix gcc7 with the current thing ;-). |
Sorry -- I meant memcpy(). |
Well, I guess that's truly would be too inefficient, unless a compiler is truly very smart (but they usually smartass before that). @lpereira, we can race later to try to make it work ;-). |
GCC is already pretty smart about memcpy() for small sizes that are known at compile-time. At least on x86, it just generates a few mov instructions already if the arch is known to not have performance issues with unaligned access. |
That's GCC (we aren't fixed on one compiler, do we, at least we had a simple build system (ahem), and replaced with something else because there's Windows, surely, we won't stop there?), and it requires quite a good ability for coming from (constant) rvalues to storing them into (virtual) lvalues and dereferencing them back. I'll be pleasantly surprised if GCC can do such tricks, doing those borders on the level of prohibition in the C land. Do you have a ready macro? (I bet it'll have to be inline func tho.) |
The UNALIGNED_{PUT,GET}() macros are defined in the toolchain-specific headers (e.g. Something like this will most likely work (haven't tested, writing it directly as a comment): #define UNALIGNED_PUT(v, p) \
({ \
__typeof__(v) __tmp_value__ = (v); \
memcpy((p), &__tmp_value__, sizeof(__tmp_value__)); \
}) |
Here's how GNU 7.2.1 renders that for ARM:
In full compliance with C semantics, it creates a variable in memory (on the stack) as requested, and passes its address to a function. And as your macro has a bug - you type off the value, whereas it should off the pointer, it memcpy's an int (type of an integer value). |
So, the first problem we have is lack of the baseline neutral implementation. And that will require UNALIGNED_PUT8, UNALIGNED_PUT16, UNALIGNED_PUT32, as we can't rely on |
I'm happy to report that gcc7 persists in its error. I'm making following change, to test hypothesis that being 1st structure member what causes the issue:
Buy nah, it's smart enough to compensate out that char in the beginning and decrement of pointer, and generates strd still. I.e. it looks almost as peephole-style optimization, where it replaces 4 of strh with one strd, missing the alignment constraints on the pointer involved. And it's very unstable opimization, e.g. making the following change:
It no longer uses strd, but is back to strh (and even strb in one case too). Again looks like not a well thought-out optimization, but some peephole one, going wrong. |
Ah, ok, so it's defined by the very same include/toolchain/gcc.h. Looked strange, because we usually use uppercase for such macros. And thanks for the hint, I guess that's a winning workaround for this issue. It's still a workaround, because it prohibits all coalescing and optimization of multiple UNALIGNED_PUT's. There's nothing wrong with it, e.g. in Cortex-M3/4, two strb can be optimized into one strh (but not on Cortex-M0). It's just there should be limits to it, e.g. strd should not be used if pointer can be unaligned. I'll prepare patch for this. |
P.S. I also wanted to test volatile, but I guess it can only have stronger effect than compiler_barrier(). |
Some info about compiler barrier: http://preshing.com/20120625/memory-ordering-at-compile-time/ |
compiler_barrier() is itself defined down in this file. Without adding it, newer versions of GCC (7+) for ARM Cortex-M may mistakenly coalesce multiple strb/strh/str (store byte/half-word/word) instructions, which support unaligned access on some sub-architectures (Cortex-M3 and higher, but not on Cortex-M0), into strd (store double), which doesn't support unaligned access. Fixes: zephyrproject-rtos#6307 Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
compiler_barrier() is itself defined down in this file. Without adding it, newer versions of GCC (7+) for ARM Cortex-M may mistakenly coalesce multiple strb/strh/str (store byte/half-word/word) instructions, which support unaligned access on some sub-architectures (Cortex-M3 and higher, but not on Cortex-M0), into strd (store double), which doesn't support unaligned access. Fixes: #6307 Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
Trying to run some networking sample code (with the default configuration) on my new Nucleo-F429ZI board yields this interesting exception:
Very much to my surprise it is somewhat obvious where that unaligned exception is coming from:
It's somewhat less obvious (to me at least) what to do about it...
The text was updated successfully, but these errors were encountered: