-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler support for embedded targets running the Thumb instruction set. #10942
Conversation
I thought they could? Or do you just mean "they can't be cross-compiled to the platforms in this patch"? |
I'm nervous about landing something like this. There are a lot of special cases for this. I don't really understand how well the compiler can target an "unknown os" because there are many things which don't make sense if you have an OS (i.e. dynamic linking). There are some weird defines like This use case seems so specialized it seems like it would be more useful to compile to LLVM bytecode and then use I'd need to think about this some more and talk it over with others before merging. |
Yes, I meant the None OS in particular. There is no POSIX API, dynamic memory allocation is optional, so large parts of the libraries don't even make sense in this context. |
As a first cut at the patch I wanted to have it working and put it for comments first before making any major changes. I followed the existing approach for adding new operating system which of course isn't entirely applicable to the no OS case but works fine for normal usage.
Rust seems like a very good fit for such environments. From embedded systems to operating systems and kernel modules, GPU processors. I don't agree that it is specialized. It might be currently the case but everywhere C is used Rust has the qualities to replace it. Why limit it because of the status quo.
You are in a better position than me to assess what parts of the compiler would need special treatment, but from my experience so far it is not that much different. Some of the cases could be served by adding new options which would be useful anyways. So far the custom changes which are needed in my specific case are:
In the embedded case dynamic memory allocation should be optional. When static stack memory is used we known the end of the stack and can detect when an overflow will occur which still is a nice guarantee. In case a memory allocator is present segmented stacks may have benefits on embedded as well, leading to reduced memory consumption because of the reduced stack requirements. |
I certainly hope so! I would very much love to see rust in as many places as C, but we need to be careful. I don't think that the best way to start compiling for embedded applications is to just do the first thing that works and then continue to power through. I believe that these changes all need to be thought out and considered before committing them. Some specific concerns I have are:
Sure, it makes sense to compile code without this, but it also doesn't make sense to compile a dynamic library with this. I don't see any verification that the output is not a dynamic library. There may also be many other complications about mixing PIC/non-PIC code that I'm not aware of, and I'd want to explore what happens with different flavors of compilation.
We have discussed this before, and we have previously reached the conclusion that this is not the answer. We may be wrong in our conclusion, but our reasoning is that just because you're in an embedded environment it doesn't mean that you no longer should care about overflowing your stack. Stack overflow is still a very real problem that should be addressed, and this is why we haven't added an option to disable segmented stack generation in the compiler today. Remember that we don't actually have segmented stacks in the segmented sense. All rust tasks have one monolithic stack that they run on. We use the
I agree that this is a useful compiling option, but this is much more nuanced than "just compiling some code with no landing pads". If this is added to the compiler, then all of a sudden we're going to start seeing some libraries compiled with landing pads and others not compiled with landing pads, and these libraries cannot be safely linked together. Right now we have a
Yay! I very much want to be able to drop our dependency on libstdc++ (see #10469). As with landing pads, this is a tricky situation though. Your use case may not want unwinding, but there are still many many use cases for unwinding. We need to carefully think about a model for disabling unwinding instead of "just not linking to libstdc++".
This sounds like it should be a compiler flag because all it does is affect codegen a little bit. I'm not too worried about this.
If that is true, then there's no reason for the morestack prologues to be omitted. The relevant mechanisms that morestack uses should be configured to know about these bounds of the stack (so stack overflow can be detected). |
After the above discussion I went back and checked what really are the hard requirements for the code to run on the embedded system.
Not required, though nice to have. Can we discuss it in another issue or pull request? Shared libraries must be PIC, static libraries and executables can be both.
I will add support for this feature in the patch. I haven't realized that the project has already moved to static stack with overflow check only. This will require customizing the emitted function prologue since TLS is not available.
I mistakenly concluded that the C++ library is pulled into my build if this is enabled. This is not the case, instead libgcc provides the needed symbols at the expense of requiring dynamic memory allocation, but this can be overcome. So the executable can be compiled with exception handling but it will never be used because the runtime is not referenced at all.
The existing implementation is fine for the time being. It will trigger default FP behavior which in this specific case is the hard ABI. In the future it would be nice to be able to control it mostly when linking with existing C libraries. To sum up, after adding the new target and none OS the compiler generated code will work as is, with the exception of the segmented stack prologue which needs adapting for embedded targets. |
Not necessarily. This code does not access TLS in the normal sense. On x86, this check happens through the segment selector registers. I would encourage you to inspect the output for the architecture you're compiling to and see if you can't arrange for the world to be in such a state that stack checks will be enabled and will work as usual. I am personally unfamiliar with how stack checks work on arm, so I do not know what the actual codegen looks like.
Sounds reasonable to me!
I am still wary of adding the concept of a "none" OS. Right now the segmented stack prologue will change depending on whatever platform (arch/os) you're compiling for, and we wouldn't know what to generate in the "none" case. I still believe that the segmented stack prologue should not be disabled at this time, and the selection of OS in your case will dictate the flavor of the segmented stack prologue that is generated. |
@neykov I've written about the stack safety issue in http://cmr.github.io/blog/2013/10/21/on-stack-safety/ I think the best path forward for cases where dynamic checks (even memory mapping) is not acceptable is a whole-program stack size analysis. LLVM really needs to be taught to have a custom stack check prelude, too, for cases where that isn't acceptable and there still isn't an MMU or MMU-lite that provides memory protection. I'm in favor of the sentiment but agree with @alexcrichton that the easiest path forward is not the best. |
Side note: there is no need to duplicate the ARM ABI as |
Closing due to a lack of activity, but if you have a rebased version with my last comment addressed, feel free to reopen! |
I haven't given up on the patch, just the time available for working on it is a bit short lately. I will re-open once I have an updated version. From what I've done so far I can confidently say that it is not possible to use the existing code for loading the stack limit due to the missing hardware. Both approaches used in the existing code are not applicable, one depends on MMU and the other on co-processor 15 (system control) being present. The change needed is not big, but nevertheless the LLVM code will need modification (one case for thumb and another for thumb2). The approach I am taking is to use the stack limit which is already laid out in the linker file. Since the stack is only one (no parallel processes) the limit is known at compile time. Any interrupts grow the same stack pointer. There is a more complex case where interrupts may have a separate stack space (i.e. when using an OS) but this is out of scope for the current implementation. Regarding the "none" OS, if we follow the LLVM convention there is no such an OS, instead "unknown" is used as a catch-all value. My opinion is that for this and similar cases there should be a different option than the existing Linux/Windows/Android OSes. Perhaps the best way is to use "unknown" OS and have a mechanism for requesting specific behavior from the compiler depending on the current instance of "unknown" - in my case bare metal. For example requesting alternative stack guard function prologue, no unwinding support, etc. by various options. This will keep the supported triples in sync with the LLVM support. @cmr I see one additional benefit of analyzing the stack requirements at compile time in the case of restricted memory - only the needed memory will be allocated with the rest available to dynamic allocation. @whitequark "there is no need to duplicate the ARM ABI as abi::Thumb": From the point of view of the Rust compiler there is no difference between the two (now). From LLVM standpoint they are different architectures. To trigger thumb mode in LLVM one must pass it a thumb architecture either through the triple or overriding it using the -march flag (which will change the triple internally). I think it is better to have abi::Thumb, than to keep only abi::Arm and add an option to pass -march=thumb to LLVM (which is different from the -mthumb option). It really depends on whether keeping the triple format support between Rust and LLVM is a priority or the Rust triple behavior may deviate from LLVM's. |
@whitequark Thanks, I will use this approach instead of introducing abi::Thumb.
Yes, I have listed this as option, but it is really not clear what to do in such a case. It is OS-dependent how the stack will be organized. And it is not bare metal really then - still embedded though. I don't believe it is possible to handle the single stack vs the multiple stacks + OS in a single common prologue. My intent is to make the bare metal case available. Afterwards alternative prologue implementations can be implemented as needed for the concrete OS requirements. |
@neykov I think it is possible, you just need a single level of indirection. Logically it works like this: a symbol located in RW memory points to a memory block. (This is analogous to TLS, except instead of some OS magic, the address is simply switched explicitly by RTOS). Prologue loads the first word from the block and assumes it's the stack limit. Basically you would need to point the prologue to the "current task" structure, and make sure the first word is the stack limit. Not very flexible, but generic enough and allows easy retrofitting of even existing RTOSes (a global alias + some struct member rearranging). You could even make it fallback to the simple case you have mentioned by using weak symbols in a creative way. My main point is, modifying LLVM and rebuilding it and rustc for multiple platforms and updating them is a gigantic pain, we shouldn't require users to do that for everything that may require multiple threads. Especially in a language which begets threads by its very structure. |
@whitequark I don't mind implementing it in the way you describe, but I am not convinced such an abstraction would serve even a single RTOS without any modifications. Even the port to android required changes in the prologue generator and it is all Linux. If even a single change is required it makes the whole effort worthless. This is why I am wary of implementing more advanced concepts without evidence that it will be worth it.
I agree and didn't have in mind that the users themselves should re-build Rust for their target RTOS. What I meant is that the prologue logic will evolve but only when a need with concrete requirements arises. Anyway I hope that I will get even more feedback once I have a concrete patch, the difference between the two variants really isn't that much in terms of code so why not have both for review. Your input is much appreciated. |
@neykov I haven't expected this, but FreeRTOS actually already has exactly this arrangement!
So the only thing one would need to make Rust cooperate with FreeRTOS is to alias I would also expect another RTOSes to have compatible task structure layouts, if only to remain compatible with this technique. (Also, I'm currently writing an RTOS in Rust, intended to closely resemble libstd's API. This is why I originally started to think about handling stack overflow. I'll publish it in a few days.) |
@neykov Actually, I misread the FreeRTOS code above, it stores the top of stack there, not stack limit, so it's not compatible. I would still argue that the required modifications to existing code are trivial. |
Ignore more type aliases in `unnecessary_cast` This is potentially the worst code I've ever written, and even if not, it's very close to being on par with starb. This will ignore `call() as i32` and `local_obtained_from_call as i32` now. This should fix every reasonable way to reproduce rust-lang#10555, but likely not entirely. changelog: Ignore more type aliases in `unnecessary_cast`
Changes to allow generating executables for embedded boards:
The changes are for the compiler only. Std, Extra, etc. libraries can't be cross-compiled at this point therefore no changes to the build system are necessary to support the additional target.
Sample command to compile an executable for the target using a patched compiler:
Tested with STM32F4DISCOVERY board.