Compiler support for embedded targets running the Thumb instruction set. #10942

neykov · 2013-12-12T21:54:27Z

Changes to allow generating executables for embedded boards:

Add support for no operating system (bare-metal) - none OS;
Add support for thumb* architecture family, i.e. thumbv7em;

The changes are for the compiler only. Std, Extra, etc. libraries can't be cross-compiled at this point therefore no changes to the build system are necessary to support the additional target.

Sample command to compile an executable for the target using a patched compiler:

rustc --target=thumbv7em-none-eabi --target-cpu=cortex-m4 main.rs \
   -o blinky.elf -A dead_code -Z hard-float -Z debug-info \
   --linker arm-none-eabi-gcc --link-args "-mcpu=cortex-m4  -mfloat-abi=hard -mfpu=fpv4-sp-d16 -Tsys/stm32_flash.ld -lstm32f4"

Tested with STM32F4DISCOVERY board.

…k unwinding, segmented stacks.

huonw · 2013-12-12T22:38:22Z

std, extra, etc. libraries can't be cross-compiled

I thought they could? Or do you just mean "they can't be cross-compiled to the platforms in this patch"?

alexcrichton · 2013-12-13T04:37:35Z

I'm nervous about landing something like this. There are a lot of special cases for this. I don't really understand how well the compiler can target an "unknown os" because there are many things which don't make sense if you have an OS (i.e. dynamic linking). There are some weird defines like DLL_EXTENSION = ".so" for the "none" os, but does that really make sense? We've also been very hesitant to remove generation of segmented stacks in the past.

This use case seems so specialized it seems like it would be more useful to compile to LLVM bytecode and then use llc manually to create an object file.

I'd need to think about this some more and talk it over with others before merging.

neykov · 2013-12-13T08:47:55Z

std, extra, etc. libraries can't be cross-compiled

I thought they could? Or do you just mean "they can't be cross-compiled to the platforms in this patch"?

Yes, I meant the None OS in particular. There is no POSIX API, dynamic memory allocation is optional, so large parts of the libraries don't even make sense in this context.

neykov · 2013-12-13T10:34:28Z

alexcrichton commented

As a first cut at the patch I wanted to have it working and put it for comments first before making any major changes. I followed the existing approach for adding new operating system which of course isn't entirely applicable to the no OS case but works fine for normal usage.

This use case seems so specialized

Rust seems like a very good fit for such environments. From embedded systems to operating systems and kernel modules, GPU processors. I don't agree that it is specialized. It might be currently the case but everywhere C is used Rust has the qualities to replace it. Why limit it because of the status quo.

There are a lot of special cases for this.

You are in a better position than me to assess what parts of the compiler would need special treatment, but from my experience so far it is not that much different. Some of the cases could be served by adding new options which would be useful anyways. So far the custom changes which are needed in my specific case are:

No PIC - can be used but just doesn't make sense when the locations are static.
Disable segmented stack (see below for details)
Disable exception handling since no unwinding will be performed - no std library, no tasks. The goal here is not to pull in C++ features which would just consume memory without actually being used.
Add option for hard FP, again not a strict requirement, just taking advantage of the hardware.

We've also been very hesitant to remove generation of segmented stacks in the past.

In the embedded case dynamic memory allocation should be optional. When static stack memory is used we known the end of the stack and can detect when an overflow will occur which still is a nice guarantee.

In case a memory allocator is present segmented stacks may have benefits on embedded as well, leading to reduced memory consumption because of the reduced stack requirements.

alexcrichton · 2013-12-13T16:48:01Z

Rust seems like a very good fit for such environments

I certainly hope so! I would very much love to see rust in as many places as C, but we need to be careful. I don't think that the best way to start compiling for embedded applications is to just do the first thing that works and then continue to power through. I believe that these changes all need to be thought out and considered before committing them. Some specific concerns I have are:

No PIC

Sure, it makes sense to compile code without this, but it also doesn't make sense to compile a dynamic library with this. I don't see any verification that the output is not a dynamic library. There may also be many other complications about mixing PIC/non-PIC code that I'm not aware of, and I'd want to explore what happens with different flavors of compilation.

Disable segmented stacks

We have discussed this before, and we have previously reached the conclusion that this is not the answer. We may be wrong in our conclusion, but our reasoning is that just because you're in an embedded environment it doesn't mean that you no longer should care about overflowing your stack. Stack overflow is still a very real problem that should be addressed, and this is why we haven't added an option to disable segmented stack generation in the compiler today.

Remember that we don't actually have segmented stacks in the segmented sense. All rust tasks have one monolithic stack that they run on. We use the __morestack prologue generation to detect stack overflow.

Disable exception handling since no unwinding will be performed

I agree that this is a useful compiling option, but this is much more nuanced than "just compiling some code with no landing pads". If this is added to the compiler, then all of a sudden we're going to start seeing some libraries compiled with landing pads and others not compiled with landing pads, and these libraries cannot be safely linked together. Right now we have a -Z no-landing-pads option, and sadly we don't verify that libraries linked together all have the same mode of landing pads, but that is a bug in the compiler that should be fixed. As a small (possibly too slow) way of doing this, I have modified the LTO pass in #10916 to remove all landing pads entirely from all dependent libraries regardless of how they were compiled.

The goal here is not to pull in C++ features

Yay! I very much want to be able to drop our dependency on libstdc++ (see #10469). As with landing pads, this is a tricky situation though. Your use case may not want unwinding, but there are still many many use cases for unwinding. We need to carefully think about a model for disabling unwinding instead of "just not linking to libstdc++".

Add option for hard FP

This sounds like it should be a compiler flag because all it does is affect codegen a little bit. I'm not too worried about this.

When static stack memory is used we known the end of the stack and can detect when an overflow will occur which still is a nice guarantee

If that is true, then there's no reason for the morestack prologues to be omitted. The relevant mechanisms that morestack uses should be configured to know about these bounds of the stack (so stack overflow can be detected).

neykov · 2013-12-15T22:26:45Z

After the above discussion I went back and checked what really are the hard requirements for the code to run on the embedded system.

No PIC

Not required, though nice to have. Can we discuss it in another issue or pull request? Shared libraries must be PIC, static libraries and executables can be both.

Disable segmented stacks

I will add support for this feature in the patch. I haven't realized that the project has already moved to static stack with overflow check only. This will require customizing the emitted function prologue since TLS is not available.

Disable exception handling since no unwinding will be performed

I mistakenly concluded that the C++ library is pulled into my build if this is enabled. This is not the case, instead libgcc provides the needed symbols at the expense of requiring dynamic memory allocation, but this can be overcome. So the executable can be compiled with exception handling but it will never be used because the runtime is not referenced at all.
I tried compiling with -Z no-landing-pads but the fnstart/fnend ops still appear in the assembly file. I suggest to skip the -arm-enable-ehabi* flags in case no-landing-pads is set?

Add option for hard FP

The existing implementation is fine for the time being. It will trigger default FP behavior which in this specific case is the hard ABI. In the future it would be nice to be able to control it mostly when linking with existing C libraries.

To sum up, after adding the new target and none OS the compiler generated code will work as is, with the exception of the segmented stack prologue which needs adapting for embedded targets.

alexcrichton · 2013-12-16T05:52:39Z

This will require customizing the emitted function prologue since TLS is not available

Not necessarily. This code does not access TLS in the normal sense. On x86, this check happens through the segment selector registers. I would encourage you to inspect the output for the architecture you're compiling to and see if you can't arrange for the world to be in such a state that stack checks will be enabled and will work as usual. I am personally unfamiliar with how stack checks work on arm, so I do not know what the actual codegen looks like.

I suggest to skip the -arm-enable-ehabi* flags in case no-landing-pads is set?

Sounds reasonable to me!

after adding the new target and none OS the compiler generated code will work as is

I am still wary of adding the concept of a "none" OS. Right now the segmented stack prologue will change depending on whatever platform (arch/os) you're compiling for, and we wouldn't know what to generate in the "none" case. I still believe that the segmented stack prologue should not be disabled at this time, and the selection of OS in your case will dictate the flavor of the segmented stack prologue that is generated.

emberian · 2013-12-25T00:26:39Z

@neykov I've written about the stack safety issue in http://cmr.github.io/blog/2013/10/21/on-stack-safety/

I think the best path forward for cases where dynamic checks (even memory mapping) is not acceptable is a whole-program stack size analysis.

LLVM really needs to be taught to have a custom stack check prelude, too, for cases where that isn't acceptable and there still isn't an MMU or MMU-lite that provides memory protection.

I'm in favor of the sentiment but agree with @alexcrichton that the easiest path forward is not the best.

whitequark · 2013-12-29T12:35:56Z

Side note: there is no need to duplicate the ARM ABI as abi::Thumb, since Thumb is just a compressed instruction encoding, not a distinct ABI. The only thing you possibly need to do is to emit -mthumb flag instead of -marm for the linker. I say "possibly" because as far as I know, that flag is completely unnecessary; it would affect compiled code, but Rust only invokes cc as linker.

alexcrichton · 2014-01-09T08:24:18Z

Closing due to a lack of activity, but if you have a rebased version with my last comment addressed, feel free to reopen!

neykov · 2014-01-09T21:48:22Z

I haven't given up on the patch, just the time available for working on it is a bit short lately. I will re-open once I have an updated version.

From what I've done so far I can confidently say that it is not possible to use the existing code for loading the stack limit due to the missing hardware. Both approaches used in the existing code are not applicable, one depends on MMU and the other on co-processor 15 (system control) being present. The change needed is not big, but nevertheless the LLVM code will need modification (one case for thumb and another for thumb2). The approach I am taking is to use the stack limit which is already laid out in the linker file. Since the stack is only one (no parallel processes) the limit is known at compile time. Any interrupts grow the same stack pointer. There is a more complex case where interrupts may have a separate stack space (i.e. when using an OS) but this is out of scope for the current implementation.

Regarding the "none" OS, if we follow the LLVM convention there is no such an OS, instead "unknown" is used as a catch-all value. My opinion is that for this and similar cases there should be a different option than the existing Linux/Windows/Android OSes. Perhaps the best way is to use "unknown" OS and have a mechanism for requesting specific behavior from the compiler depending on the current instance of "unknown" - in my case bare metal. For example requesting alternative stack guard function prologue, no unwinding support, etc. by various options. This will keep the supported triples in sync with the LLVM support.

@cmr I see one additional benefit of analyzing the stack requirements at compile time in the case of restricted memory - only the needed memory will be allocated with the rest available to dynamic allocation.

@whitequark "there is no need to duplicate the ARM ABI as abi::Thumb": From the point of view of the Rust compiler there is no difference between the two (now). From LLVM standpoint they are different architectures. To trigger thumb mode in LLVM one must pass it a thumb architecture either through the triple or overriding it using the -march flag (which will change the triple internally). I think it is better to have abi::Thumb, than to keep only abi::Arm and add an option to pass -march=thumb to LLVM (which is different from the -mthumb option). It really depends on whether keeping the triple format support between Rust and LLVM is a priority or the Rust triple behavior may deviate from LLVM's.

whitequark · 2014-01-09T21:52:31Z

@neykov the following one-line patch does the job because passing the triple is sufficient. try it yourself.

Also, I doubt that your choice of verifying the stack limit against symbols is correct. I for one can easily see an use case for multiple stacks on bare-metal hardware: an RTOS.

neykov · 2014-01-09T22:36:30Z

@whitequark Thanks, I will use this approach instead of introducing abi::Thumb.

Also, I doubt that your choice of verifying the stack limit against symbols is correct. I for one can easily see an use case for multiple stacks on bare-metal hardware: an RTOS.

Yes, I have listed this as option, but it is really not clear what to do in such a case. It is OS-dependent how the stack will be organized. And it is not bare metal really then - still embedded though. I don't believe it is possible to handle the single stack vs the multiple stacks + OS in a single common prologue. My intent is to make the bare metal case available. Afterwards alternative prologue implementations can be implemented as needed for the concrete OS requirements.

whitequark · 2014-01-09T22:41:40Z

@neykov I think it is possible, you just need a single level of indirection. Logically it works like this: a symbol located in RW memory points to a memory block. (This is analogous to TLS, except instead of some OS magic, the address is simply switched explicitly by RTOS). Prologue loads the first word from the block and assumes it's the stack limit.

Basically you would need to point the prologue to the "current task" structure, and make sure the first word is the stack limit. Not very flexible, but generic enough and allows easy retrofitting of even existing RTOSes (a global alias + some struct member rearranging).

You could even make it fallback to the simple case you have mentioned by using weak symbols in a creative way.

My main point is, modifying LLVM and rebuilding it and rustc for multiple platforms and updating them is a gigantic pain, we shouldn't require users to do that for everything that may require multiple threads. Especially in a language which begets threads by its very structure.

neykov · 2014-01-10T16:34:00Z

@whitequark I don't mind implementing it in the way you describe, but I am not convinced such an abstraction would serve even a single RTOS without any modifications. Even the port to android required changes in the prologue generator and it is all Linux. If even a single change is required it makes the whole effort worthless. This is why I am wary of implementing more advanced concepts without evidence that it will be worth it.

modifying LLVM and rebuilding it and rustc for multiple platforms and updating them is a gigantic pain

I agree and didn't have in mind that the users themselves should re-build Rust for their target RTOS. What I meant is that the prologue logic will evolve but only when a need with concrete requirements arises.

Anyway I hope that I will get even more feedback once I have a concrete patch, the difference between the two variants really isn't that much in terms of code so why not have both for review.

Your input is much appreciated.

whitequark · 2014-01-10T16:51:33Z

@neykov I haven't expected this, but FreeRTOS actually already has exactly this arrangement!

typedef struct tskTaskControlBlock
{
    volatile portSTACK_TYPE *pxTopOfStack;      /*< Points to the location of the last item placed on the tasks stack.  THIS MUST BE THE FIRST MEMBER OF THE TCB STRUCT. */

So the only thing one would need to make Rust cooperate with FreeRTOS is to alias pxCurrentTCB to whatever symbol the prologue uses.

I would also expect another RTOSes to have compatible task structure layouts, if only to remain compatible with this technique.

(Also, I'm currently writing an RTOS in Rust, intended to closely resemble libstd's API. This is why I originally started to think about handling stack overflow. I'll publish it in a few days.)

whitequark · 2014-01-10T16:57:38Z

@neykov Actually, I misread the FreeRTOS code above, it stores the top of stack there, not stack limit, so it's not compatible.

I would still argue that the required modifications to existing code are trivial.

Ignore more type aliases in `unnecessary_cast` This is potentially the worst code I've ever written, and even if not, it's very close to being on par with starb. This will ignore `call() as i32` and `local_obtained_from_call as i32` now. This should fix every reasonable way to reproduce rust-lang#10555, but likely not entirely. changelog: Ignore more type aliases in `unnecessary_cast`

neykov added 4 commits December 12, 2013 23:06

Add support for bare-metal - None OS as a possible target.

5d241c2

Add support for Thumb architecture as a possible target.

33cd827

Disable inapropriate funcitonality for bare-metal systems - PIC, stac…

4c22e3a

…k unwinding, segmented stacks.

Add command line option to enable hard FP ABI.

f848d62

alexcrichton closed this Jan 9, 2014

This was referenced Feb 2, 2014

Disable ARM EHABI when building without landing pads #11992

Closed

Add no-pic debug flag to disable position independent code #11995

Closed

Compiler support for embedded targets running the Thumb instruction set. #10942

Compiler support for embedded targets running the Thumb instruction set. #10942

Uh oh!

Conversation

neykov commented Dec 12, 2013

Uh oh!

huonw commented Dec 12, 2013

Uh oh!

alexcrichton commented Dec 13, 2013

Uh oh!

neykov commented Dec 13, 2013

Uh oh!

neykov commented Dec 13, 2013

Uh oh!

alexcrichton commented Dec 13, 2013

Uh oh!

neykov commented Dec 15, 2013

Uh oh!

alexcrichton commented Dec 16, 2013

Uh oh!

emberian commented Dec 25, 2013

Uh oh!

whitequark commented Dec 29, 2013

Uh oh!

alexcrichton commented Jan 9, 2014

Uh oh!

neykov commented Jan 9, 2014

Uh oh!

whitequark commented Jan 9, 2014

Uh oh!

neykov commented Jan 9, 2014

Uh oh!

whitequark commented Jan 9, 2014

Uh oh!

neykov commented Jan 10, 2014

Uh oh!

whitequark commented Jan 10, 2014

Uh oh!

whitequark commented Jan 10, 2014

Uh oh!

Uh oh!