-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revisit type level integers #41
Comments
AlternativesBring back
|
Hi All
I can see a couple of alternatives.
1. Use a Threshold type (as you suggest) and make a (proc) macro that generates the “ugly” type bounds. I did not try implementing this, but we have at least two choices
- the programmer gives info about the claim structure, or
- the macro parses the whole function to figure out the claims in order to generate the type bounds. It does not need to change the fn code, just replicate it, but still we would want debug symbols for the replicated code (so that you may put breakpoints there and gdb should have precise info about the source code lines). It would amount to parsing the AST for a general Rust fn, which might be too complicated, so perhaps too much effort at this point.
2. Help LLVM to figure out that the Threshold could be traced. Not sure if that is possible though. (Exactly what causes LLVM to give up, some “recursion depth” for closure evaluation, or?)
3. Keep it as is, let LLVM do its best, perhaps we can live with it, LLVM will likely become smarter over time, so at some point the problem has vanished, I don’t know.
4. Skip the Threshold token check, meaning that we always manipulate BASEPRI, its in any case just a few cycles. Having it occasionally optimise it away may improve performance in cases, but make it worse in cases, perhaps a bit “frickle”. Skipping the check makes the cost VERY predictable.
If Resources are used at a higher granularity, as in the Concurrent Reactive Components (CRC) abstraction, I believe 4 to be the way to go ATM. (Overhead will be dominated by messaging in any case, so the few cycles to operate BASEPRI is likely not a performance issue, and OH will predicable in an intuitive way to the programmer.
Best,
Per
On 01 Aug 2017, at 04:35, Jorge Aparicio <notifications@github.com<mailto:notifications@github.com>> wrote:
One of the ways how RTFMv2 became simpler than v1 was the removal of type level
integers along with the various associated tokens. However this simplification
comes at a price: heavy reliance on LLVM for proper optimization.
Right now, there's only one token that the user has to deal with: the preemption
Threshold token. This token is actually a newtype over u8 that tracks the
preemption threshold of the system through the whole application. This token is
used to unlock Resources through the borrow{,_mut} and claim{,_mut}
methods. All these methods have branches and assertions in them for memory
safety. In a correctly written and properly optimized program the assertions in
borrow{,_mut}, all but one branch in claim{,_mut} and all the Threshold
tokens should be optimized away. However this requires that LLVM knows the
exact value of the Threshold token in every single node of the function
call graph. In complex enough call graphs LLVM "gives up" and generates code to
track the value of Threshold at runtime; this destroy performance: panicking
branches as well as all the branches in a claim{,_mut} call are kept in. In
the worst case scenario this can triple the size of the output .text section.
The only way I can think of to fix the problems outline above is to turn
Threshold into a type level integer. This way the token is guaranteed to not
exist at runtime; the token would become a zero sized type. This change would
turn the panicking branch in borrow{,_mut} into a compile error and make it
very easy for LLVM to optimize away the branches in claim{,_mut} because the
"value" of Threshold would be "local" to a function rather than require
the close-to global program analysis that v2 requires.
In principle we can move the API to type level integers today (see appendix) by
using the typenum<https://crates.io/crates/typenum> crate, but it seems that the implementation is blocked by
a rustc bug (cf. rust-lang/rust#43580<rust-lang/rust#43580>).
Downsides
The main downside of implementing this change is that generic code that uses the
Resource trait becomes much more complicated to write and to read.
This is generic code that deals with two resources, today:
fn foo<A, B>(t: &mut Threshold, a: A, b: B)
where
A: Resource<Data = u8>,
B: Resource<Data = u8>,
{
a.claim(t, |a, t| {
b.claim(t, |b, t| {
// ..
});
});
}
This is how that generic code would look like with the proposed typenum-based
API:
fn foo<A, B, CA, CB, THRESHOLD>(t: &mut T<THRESHOLD>, a: A, b: B)
where
A: Resource<Data = u8, Ceiling = CA>,
B: Resource<Data = u8, Ceiling = CB>,
CA: Unsigned,
CB: Unsigned,
THRESHOLD: Max<CA> + Unsigned, // needed for the outer claim
Maximum<THRESHOLD, CA>: Max<CB> + Unsigned, // needed for the inner claim
Maximum<Maximum<THRESHOLD, CA>, CB>: Unsigned, // also for the inner claim
{
a.claim(t, |a, t| {
b.claim(t, |b, t| {
// ..
});
});
}
Effectively every claim can add one bound to the where clause of the generic
function. The bounds give no extra information to the reader; they are just
there to please the compiler.
I think the only way to avoid this downside would be to build the API on top of
proper type level integers baked into the language. The problem is that they are
not implemented and that the API requires a type level cmp::max(A, B)
operator -- it's unknown when / if that operator will be implemented.
The other minor downside of the change is that the signature of task functions
would have to include the exact threshold level. Like this:
// priority = 3
fn exti0(t: &mut T3, r: EXTI0::Resources) { .. }
// priority = 4
fn exti1(t: &mut T4, r: EXTI0::Resources) { .. }
This doesn't seem too bad but changing the priority of a task will require you
to change the type of the threshold token as well.
Appendix
The whole new typenum-based API
#![no_std]
extern crate typenum;
use core::marker::PhantomData;
use typenum::{IsGreaterOrEqual, Max, Maximum, True, Unsigned};
/// A resource, a mechanism to safely share data between tasks
pub trait Resource {
/// The data protected by the resource
type Data: Send;
/// The ceiling of the resource
type Ceiling: Unsigned;
/// Borrows the resource data for the span of the current context (which may
/// be a critical section)
///
/// The current preemption threshold must be greater than the resource
/// ceiling for this to work; otherwise this call will cause the program to
/// not compile.
fn borrow<'cs, THRESHOLD>(
&'cs self,
t: &'cs T<THRESHOLD>,
) -> &'cs R<Self::Data, Self::Ceiling>
where
THRESHOLD: IsGreaterOrEqual<Self::Ceiling, Output = True> + Unsigned;
// plus `borrow_mut`
/// Grants access to the resource data for the span of the closure `f`
///
/// The closure may be executed in a critical section, created by raising
/// the preemption threshold, if required
fn claim<RTY, THRESHOLD, F>(&self, t: &mut T<THRESHOLD>, f: F) -> RTY
where
F: FnOnce(
&R<Self::Data, Self::Ceiling>,
&mut Maximum<THRESHOLD, Self::Ceiling>,
) -> RTY,
THRESHOLD: Unsigned + Max<Self::Ceiling>;
// plus `claim_mut`
}
/// An unlocked resource
pub struct R<DATA, CEILING>
where
CEILING: Unsigned,
DATA: Send,
{
data: DATA,
_ceiling: PhantomData<CEILING>,
}
/// Preemption threshold token
pub struct T<THRESHOLD>
where
THRESHOLD: Unsigned,
{
_threshold: PhantomData<THRESHOLD>,
}
cc @cr1901<https://github.com/cr1901> this would fix the misoptimization you have been seeing in your AT2XT program. Actually implementing this for MSP430 may be possible / easy because effectively there's only two priority levels in that architecture.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://github.com/japaric/cortex-m-rtfm/issues/41>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD5naEtJRgLyi9W_dJ2Fdyv32RFSja3yks5sTo7fgaJpZM4OpKDj>.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/japaric/cortex-m-rtfm","title":"japaric/cortex-m-rtfm","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/japaric/cortex-m-rtfm"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"revisit type level integers (#41)"}],"action":{"name":"View Issue","url":"https://github.com/japaric/cortex-m-rtfm/issues/41"}}}
|
Hi japaric, perlindgren! Let me thank you for your awesome work, both of you! The V2 release looks so much better, it places less burden on the programmer, has even less overhead, etc.
This would make M0(+) support worse again. As BASEPRI register is missing in these, it would translate to disabling every interrupt in every resource claim, if I understand correctly. Maybe these chips are not the most important target, but it would cripple the framework for them. japaric is already aware #1 , and I can always use V2 if things move forward in that way, but creating a proc macro for generating the boilerplate sounds better, at least for me. |
triage: v0.3.x used type level integers but v0.4.x removed them again in favor of a task local cell under the hood. This appears to optimize as well as the type level integers did. If there are problems with optimizations please open an issue with steps to reproduce. |
This is one possible solution to the stack overflow problem described in rtic-rs#34. This approach uses a linker wrapper, called [swap-ld], to generate the desired memory layout. See rtic-rs#34 for a description of the desired memory layout and rtic-rs#41 for a description of how `swap-ld` works. The observable effects of this change in cortex-m programs are: - the `_sbss` symbol is now override-able. - there is now a `.stack` linker section that denotes the span of the call stack. `.stack` won't be loaded into the program; it just exists for informative purposes (`swap-ld` uses this information). Given the following program: ``` rust fn main() { static mut X: u32 = 0; static mut Y: u32 = 1; loop { unsafe { ptr::write_volatile(&mut X, X + 1); ptr::write_volatile(&mut Y, Y + 1); } } } ``` If you link this program using the `arm-none-eabi-ld` linker, which is the cortex-m-quickstart default, you'll get the following memory layout: ``` console $ console section size addr .vector_table 0x130 0x8000000 .text 0x94 0x8000130 .rodata 0x0 0x80001c4 .stack 0x5000 0x20000000 .bss 0x4 0x20000000 .data 0x4 0x20000004 ``` Note how the space reserved for the stack (depicted by the `.stack` linker section) overlaps with the space where .bss and .data reside. If you, instead, link this program using `swap-ld` you'll get the following memory layout: ``` console $ arm-none-eabi-size -Ax app section size addr .vector_table 0x130 0x8000000 .text 0x94 0x8000130 .rodata 0x0 0x80001c4 .stack 0x4ff8 0x20000000 .bss 0x4 0x20004ff8 .data 0x4 0x20004ffc ``` Note that no overlap exists in this case and that the call stack size has reduced to accommodate the .bss and .data sections. Unlike rtic-rs#41 the addresses of static variables is now correct: ``` console $ arm-none-eabi-objdump -CD app Disassembly of section .vector_table: 08000000 <_svector_table>: 8000000: 20004ff8 strdcs r4, [r0], -r8 ; initial Stack Pointer 08000004 <cortex_m_rt::RESET_VECTOR>: 8000004: 08000131 stmdaeq r0, {r0, r4, r5, r8} 08000008 <EXCEPTIONS>: 8000008: 080001bd stmdaeq r0, {r0, r2, r3, r4, r5, r7, r8} (..) Disassembly of section .stack: 20000000 <.stack>: ... Disassembly of section .bss: 20004ff8 <cortex_m_quickstart::main::X>: 20004ff8: 00000000 andeq r0, r0, r0 Disassembly of section .data: 20004ffc <_sdata>: 20004ffc: 00000001 andeq r0, r0, r1 ``` closes rtic-rs#34 [swap-ld]: https://github.com/japaric/swap-ld
One of the ways how RTFMv2 became simpler than v1 was the removal of type level
integers along with the various associated tokens. However this simplification
comes at a price: heavy reliance on LLVM for proper optimization.
Right now, there's only one token that the user has to deal with: the preemption
Threshold
token. This token is actually a newtype overu8
that tracks thepreemption threshold of the system through the whole application. This token is
used to unlock
Resource
s through theborrow{,_mut}
andclaim{,_mut}
methods. All these methods have branches and assertions in them for memory
safety. In a correctly written and properly optimized program the assertions in
borrow{,_mut}
, all but one branch inclaim{,_mut}
and all theThreshold
tokens should be optimized away. However this requires that LLVM knows the
exact value of the
Threshold
token in every single node of the functioncall graph. In complex enough call graphs LLVM "gives up" and generates code to
track the value of
Threshold
at runtime; this destroy performance: panickingbranches as well as all the branches in a
claim{,_mut}
call are kept in. Inthe worst case scenario this can triple the size of the output
.text
section.The only way I can think of to fix the problems outline above is to turn
Threshold
into a type level integer. This way the token is guaranteed to notexist at runtime; the token would become a zero sized type. This change would
turn the panicking branch in
borrow{,_mut}
into a compile error and make itvery easy for LLVM to optimize away the branches in
claim{,_mut}
because the"value" of
Threshold
would be "local" to a function rather than requirethe close-to global program analysis that v2 requires.
In principle we can move the API to type level integers today (see appendix) by
using the
typenum
crate, but it seems that the implementation is blocked bya rustc bug (cf. rust-lang/rust#43580).
Downsides
The main downside of implementing this change is that generic code that uses the
Resource
trait becomes much more complicated to write and to read.This is generic code that deals with two resources, today:
This is how that generic code would look like with the proposed
typenum
-basedAPI:
Effectively every
claim
can add one bound to thewhere
clause of the genericfunction. The bounds give no extra information to the reader; they are just
there to please the compiler.
I think the only way to avoid this downside would be to build the API on top of
proper type level integers baked into the language (*). The problem is that they are
not implemented and that the API requires a type level
cmp::max(A, B)
operator (**) -- it's unknown when / if that operator will be implemented.
EDIT1: (*) because, in theory, in that case no bound should be required.
const N: u8
is known to be an integer and type level operations on it (where N1 > N2
) would be baked into the language.EDIT1: (**) A type level
if a > b
operator / clause is also required forborrow
.The other minor downside of the change is that the signature of task functions
would have to include the exact threshold level. Like this:
This doesn't seem too bad but changing the priority of a task will require you
to change the type of the threshold token as well.
Appendix
The whole new
typenum
-based APIcc @cr1901 this would fix the misoptimization you have been seeing in your AT2XT program. Actually implementing this for MSP430 may be possible / easy because effectively there's only two priority levels in that architecture.
The text was updated successfully, but these errors were encountered: