revisit type level integers #41

japaric · 2017-08-01T02:35:10Z

One of the ways how RTFMv2 became simpler than v1 was the removal of type level
integers along with the various associated tokens. However this simplification
comes at a price: heavy reliance on LLVM for proper optimization.

Right now, there's only one token that the user has to deal with: the preemption
Threshold token. This token is actually a newtype over u8 that tracks the
preemption threshold of the system through the whole application. This token is
used to unlock Resources through the borrow{,_mut} and claim{,_mut}
methods. All these methods have branches and assertions in them for memory
safety. In a correctly written and properly optimized program the assertions in
borrow{,_mut}, all but one branch in claim{,_mut} and all the Threshold
tokens should be optimized away. However this requires that LLVM knows the
exact value of the Threshold token in every single node of the function
call graph. In complex enough call graphs LLVM "gives up" and generates code to
track the value of Threshold at runtime; this destroy performance: panicking
branches as well as all the branches in a claim{,_mut} call are kept in. In
the worst case scenario this can triple the size of the output .text section.

The only way I can think of to fix the problems outline above is to turn
Threshold into a type level integer. This way the token is guaranteed to not
exist at runtime; the token would become a zero sized type. This change would
turn the panicking branch in borrow{,_mut} into a compile error and make it
very easy for LLVM to optimize away the branches in claim{,_mut} because the
"value" of Threshold would be "local" to a function rather than require
the close-to global program analysis that v2 requires.

In principle we can move the API to type level integers today (see appendix) by
using the typenum crate, but it seems that the implementation is blocked by
a rustc bug (cf. rust-lang/rust#43580).

Downsides

The main downside of implementing this change is that generic code that uses the
Resource trait becomes much more complicated to write and to read.

This is generic code that deals with two resources, today:

fn foo<A, B>(t: &mut Threshold, a: A, b: B)
where
    A: Resource<Data = u8>,
    B: Resource<Data = u8>,
{
    a.claim(t, |a, t| {
        b.claim(t, |b, t| {
            // ..
        });
    });
}

This is how that generic code would look like with the proposed typenum-based
API:

fn foo<A, B, CA, CB, THRESHOLD>(t: &mut T<THRESHOLD>, a: A, b: B)
where
    A: Resource<Data = u8, Ceiling = CA>,
    B: Resource<Data = u8, Ceiling = CB>,
    CA: Unsigned,
    CB: Unsigned,
    THRESHOLD: Max<CA> + Unsigned, // needed for the outer claim
    Maximum<THRESHOLD, CA>: Max<CB> + Unsigned, // needed for the inner claim
    Maximum<Maximum<THRESHOLD, CA>, CB>: Unsigned, // also for the inner claim
{
    a.claim(t, |a, t| {
        b.claim(t, |b, t| {
            // ..
        });
    });
}

Effectively every claim can add one bound to the where clause of the generic
function. The bounds give no extra information to the reader; they are just
there to please the compiler.

I think the only way to avoid this downside would be to build the API on top of
proper type level integers baked into the language (*). The problem is that they are
not implemented and that the API requires a type level cmp::max(A, B)
operator (**) -- it's unknown when / if that operator will be implemented.

EDIT1: (*) because, in theory, in that case no bound should be required. const N: u8 is known to be an integer and type level operations on it (where N1 > N2) would be baked into the language.

EDIT1: (**) A type level if a > b operator / clause is also required for borrow.

The other minor downside of the change is that the signature of task functions
would have to include the exact threshold level. Like this:

// priority = 3
fn exti0(t: &mut T3, r: EXTI0::Resources) { .. }

// priority = 4
fn exti1(t: &mut T4, r: EXTI0::Resources) { .. }

This doesn't seem too bad but changing the priority of a task will require you
to change the type of the threshold token as well.

Appendix

The whole new typenum-based API

#![no_std]

extern crate typenum;

use core::marker::PhantomData;

use typenum::{IsGreaterOrEqual, Max, Maximum, True, Unsigned};

/// A resource, a mechanism to safely share data between tasks
pub trait Resource {
    /// The data protected by the resource
    type Data: Send;

    /// The ceiling of the resource
    type Ceiling: Unsigned;

    /// Borrows the resource data for the span of the current context (which may
    /// be a critical section)
    ///
    /// The current preemption threshold must be greater than the resource
    /// ceiling for this to work; otherwise this call will cause the program to
    /// not compile.
    fn borrow<'cs, THRESHOLD>(
        &'cs self,
        t: &'cs T<THRESHOLD>,
    ) -> &'cs R<Self::Data, Self::Ceiling>
    where
        THRESHOLD: IsGreaterOrEqual<Self::Ceiling, Output = True> + Unsigned;

    // plus `borrow_mut`

    /// Grants access to the resource data for the span of the closure `f`
    ///
    /// The closure may be executed in a critical section, created by raising
    /// the preemption threshold, if required
    fn claim<RTY, THRESHOLD, F>(&self, t: &mut T<THRESHOLD>, f: F) -> RTY
    where
        F: FnOnce(
            &R<Self::Data, Self::Ceiling>,
            &mut Maximum<THRESHOLD, Self::Ceiling>,
        ) -> RTY,
        THRESHOLD: Unsigned + Max<Self::Ceiling>;

    // plus `claim_mut`
}

/// An unlocked resource
pub struct R<DATA, CEILING>
where
    CEILING: Unsigned,
    DATA: Send,
{
    data: DATA,
    _ceiling: PhantomData<CEILING>,
}

/// Preemption threshold token
pub struct T<THRESHOLD>
where
    THRESHOLD: Unsigned,
{
    _threshold: PhantomData<THRESHOLD>,
}

cc @cr1901 this would fix the misoptimization you have been seeing in your AT2XT program. Actually implementing this for MSP430 may be possible / easy because effectively there's only two priority levels in that architecture.

The text was updated successfully, but these errors were encountered:

japaric · 2017-08-01T02:45:25Z

Alternatives

Bring back `Threshold.raise`

We could remove Resource.claim{,_mut} and only have Resource.borrow{,_mut}.
Then users would have to manually raise the threshold to be able to access a
resource's data. Something like this:

fn exti0(t: T1, r: EXTI0::Resources) {
    t.raise::<T2>(|t| {
        let a = r.A.borrow(t);

        t.raise::<T3>(|t| {
            let b = r.B.borrow(t);
        });
    });
}

Not only this is much less ergonomic -- you have to specify what to raise the
current threshold to; this means you have to know the ceiling of each resource
-- but it also makes it possible to "overshoot" and raise the threshold more
than necessary -- e.g. you raise the threshold to 4 when only a threshold of 3
was required to access the resource -- causing more task blocking than necessary.

And I doubt this would make generic programming easier.

perlindgren · 2017-08-01T08:42:33Z

Hi All I can see a couple of alternatives. 1. Use a Threshold type (as you suggest) and make a (proc) macro that generates the “ugly” type bounds. I did not try implementing this, but we have at least two choices - the programmer gives info about the claim structure, or - the macro parses the whole function to figure out the claims in order to generate the type bounds. It does not need to change the fn code, just replicate it, but still we would want debug symbols for the replicated code (so that you may put breakpoints there and gdb should have precise info about the source code lines). It would amount to parsing the AST for a general Rust fn, which might be too complicated, so perhaps too much effort at this point. 2. Help LLVM to figure out that the Threshold could be traced. Not sure if that is possible though. (Exactly what causes LLVM to give up, some “recursion depth” for closure evaluation, or?) 3. Keep it as is, let LLVM do its best, perhaps we can live with it, LLVM will likely become smarter over time, so at some point the problem has vanished, I don’t know. 4. Skip the Threshold token check, meaning that we always manipulate BASEPRI, its in any case just a few cycles. Having it occasionally optimise it away may improve performance in cases, but make it worse in cases, perhaps a bit “frickle”. Skipping the check makes the cost VERY predictable. If Resources are used at a higher granularity, as in the Concurrent Reactive Components (CRC) abstraction, I believe 4 to be the way to go ATM. (Overhead will be dominated by messaging in any case, so the few cycles to operate BASEPRI is likely not a performance issue, and OH will predicable in an intuitive way to the programmer. Best, Per On 01 Aug 2017, at 04:35, Jorge Aparicio <notifications@github.com<mailto:notifications@github.com>> wrote: One of the ways how RTFMv2 became simpler than v1 was the removal of type level integers along with the various associated tokens. However this simplification comes at a price: heavy reliance on LLVM for proper optimization. Right now, there's only one token that the user has to deal with: the preemption Threshold token. This token is actually a newtype over u8 that tracks the preemption threshold of the system through the whole application. This token is used to unlock Resources through the borrow{,_mut} and claim{,_mut} methods. All these methods have branches and assertions in them for memory safety. In a correctly written and properly optimized program the assertions in borrow{,_mut}, all but one branch in claim{,_mut} and all the Threshold tokens should be optimized away. However this requires that LLVM knows the exact value of the Threshold token in every single node of the function call graph. In complex enough call graphs LLVM "gives up" and generates code to track the value of Threshold at runtime; this destroy performance: panicking branches as well as all the branches in a claim{,_mut} call are kept in. In the worst case scenario this can triple the size of the output .text section. The only way I can think of to fix the problems outline above is to turn Threshold into a type level integer. This way the token is guaranteed to not exist at runtime; the token would become a zero sized type. This change would turn the panicking branch in borrow{,_mut} into a compile error and make it very easy for LLVM to optimize away the branches in claim{,_mut} because the "value" of Threshold would be "local" to a function rather than require the close-to global program analysis that v2 requires. In principle we can move the API to type level integers today (see appendix) by using the typenum<https://crates.io/crates/typenum> crate, but it seems that the implementation is blocked by a rustc bug (cf. rust-lang/rust#43580<rust-lang/rust#43580>). Downsides The main downside of implementing this change is that generic code that uses the Resource trait becomes much more complicated to write and to read. This is generic code that deals with two resources, today: fn foo<A, B>(t: &mut Threshold, a: A, b: B) where A: Resource<Data = u8>, B: Resource<Data = u8>, { a.claim(t, |a, t| { b.claim(t, |b, t| { // .. }); }); } This is how that generic code would look like with the proposed typenum-based API: fn foo<A, B, CA, CB, THRESHOLD>(t: &mut T<THRESHOLD>, a: A, b: B) where A: Resource<Data = u8, Ceiling = CA>, B: Resource<Data = u8, Ceiling = CB>, CA: Unsigned, CB: Unsigned, THRESHOLD: Max<CA> + Unsigned, // needed for the outer claim Maximum<THRESHOLD, CA>: Max<CB> + Unsigned, // needed for the inner claim Maximum<Maximum<THRESHOLD, CA>, CB>: Unsigned, // also for the inner claim { a.claim(t, |a, t| { b.claim(t, |b, t| { // .. }); }); } Effectively every claim can add one bound to the where clause of the generic function. The bounds give no extra information to the reader; they are just there to please the compiler. I think the only way to avoid this downside would be to build the API on top of proper type level integers baked into the language. The problem is that they are not implemented and that the API requires a type level cmp::max(A, B) operator -- it's unknown when / if that operator will be implemented. The other minor downside of the change is that the signature of task functions would have to include the exact threshold level. Like this: // priority = 3 fn exti0(t: &mut T3, r: EXTI0::Resources) { .. } // priority = 4 fn exti1(t: &mut T4, r: EXTI0::Resources) { .. } This doesn't seem too bad but changing the priority of a task will require you to change the type of the threshold token as well. Appendix The whole new typenum-based API #![no_std] extern crate typenum; use core::marker::PhantomData; use typenum::{IsGreaterOrEqual, Max, Maximum, True, Unsigned}; /// A resource, a mechanism to safely share data between tasks pub trait Resource { /// The data protected by the resource type Data: Send; /// The ceiling of the resource type Ceiling: Unsigned; /// Borrows the resource data for the span of the current context (which may /// be a critical section) /// /// The current preemption threshold must be greater than the resource /// ceiling for this to work; otherwise this call will cause the program to /// not compile. fn borrow<'cs, THRESHOLD>( &'cs self, t: &'cs T<THRESHOLD>, ) -> &'cs R<Self::Data, Self::Ceiling> where THRESHOLD: IsGreaterOrEqual<Self::Ceiling, Output = True> + Unsigned; // plus `borrow_mut` /// Grants access to the resource data for the span of the closure `f` /// /// The closure may be executed in a critical section, created by raising /// the preemption threshold, if required fn claim<RTY, THRESHOLD, F>(&self, t: &mut T<THRESHOLD>, f: F) -> RTY where F: FnOnce( &R<Self::Data, Self::Ceiling>, &mut Maximum<THRESHOLD, Self::Ceiling>, ) -> RTY, THRESHOLD: Unsigned + Max<Self::Ceiling>; // plus `claim_mut` } /// An unlocked resource pub struct R<DATA, CEILING> where CEILING: Unsigned, DATA: Send, { data: DATA, _ceiling: PhantomData<CEILING>, } /// Preemption threshold token pub struct T<THRESHOLD> where THRESHOLD: Unsigned, { _threshold: PhantomData<THRESHOLD>, } cc @cr1901<https://github.com/cr1901> this would fix the misoptimization you have been seeing in your AT2XT program. Actually implementing this for MSP430 may be possible / easy because effectively there's only two priority levels in that architecture. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://github.com/japaric/cortex-m-rtfm/issues/41>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AD5naEtJRgLyi9W_dJ2Fdyv32RFSja3yks5sTo7fgaJpZM4OpKDj>. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/japaric/cortex-m-rtfm","title":"japaric/cortex-m-rtfm","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/japaric/cortex-m-rtfm"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"revisit type level integers (#41)"}],"action":{"name":"View Issue","url":"https://github.com/japaric/cortex-m-rtfm/issues/41"}}}

Samonitari · 2017-08-17T10:18:09Z

Hi japaric, perlindgren!

Let me thank you for your awesome work, both of you!
I already did thank Jorge in an e-mail about two month ago when I started looking into rust RTFM for my project, but I had less time developing my custom board than I hoped - it is summer... But it is finally coming together!

The V2 release looks so much better, it places less burden on the programmer, has even less overhead, etc.
Also, a big plus for me, as I am using them for my board: full support for Cortex-M0(+).
This brings us to the issue in a possible solution.

@perlindgren

Skip the Threshold token check, meaning that we always manipulate BASEPRI, its in any case just a few cycles. Having it occasionally optimise it away may improve performance in cases, but make it worse in cases, perhaps a bit “frickle”. Skipping the check makes the cost VERY predictable.

This would make M0(+) support worse again. As BASEPRI register is missing in these, it would translate to disabling every interrupt in every resource claim, if I understand correctly. Maybe these chips are not the most important target, but it would cripple the framework for them.
As a beginner-friendly-ish entry-point of the ARM microcontroller family (and also the recommended choice for tasks they can comfortably do, due to the package size, power-consumption, price etc.), I think M0(+) should not be mistreated.

japaric is already aware #1 , and I can always use V2 if things move forward in that way, but creating a proc macro for generating the boilerplate sounds better, at least for me.

…github.com/japaric/cortex-m-rtfm/issues/41

japaric · 2018-11-03T19:33:43Z

triage: v0.3.x used type level integers but v0.4.x removed them again in favor of a task local cell under the hood. This appears to optimize as well as the type level integers did. If there are problems with optimizations please open an issue with steps to reproduce.

41: RFC fixup after merge, template update r=korken89 a=AfoHT Rename the file, add links to relevant issues/PRs Additionally update the template to include the links (just like rust-lang) Co-authored-by: Henrik Tjäder <henrik@tjaders.com>

This is one possible solution to the stack overflow problem described in rtic-rs#34. This approach uses a linker wrapper, called [swap-ld], to generate the desired memory layout. See rtic-rs#34 for a description of the desired memory layout and rtic-rs#41 for a description of how `swap-ld` works. The observable effects of this change in cortex-m programs are: - the `_sbss` symbol is now override-able. - there is now a `.stack` linker section that denotes the span of the call stack. `.stack` won't be loaded into the program; it just exists for informative purposes (`swap-ld` uses this information). Given the following program: ``` rust fn main() { static mut X: u32 = 0; static mut Y: u32 = 1; loop { unsafe { ptr::write_volatile(&mut X, X + 1); ptr::write_volatile(&mut Y, Y + 1); } } } ``` If you link this program using the `arm-none-eabi-ld` linker, which is the cortex-m-quickstart default, you'll get the following memory layout: ``` console $ console section size addr .vector_table 0x130 0x8000000 .text 0x94 0x8000130 .rodata 0x0 0x80001c4 .stack 0x5000 0x20000000 .bss 0x4 0x20000000 .data 0x4 0x20000004 ``` Note how the space reserved for the stack (depicted by the `.stack` linker section) overlaps with the space where .bss and .data reside. If you, instead, link this program using `swap-ld` you'll get the following memory layout: ``` console $ arm-none-eabi-size -Ax app section size addr .vector_table 0x130 0x8000000 .text 0x94 0x8000130 .rodata 0x0 0x80001c4 .stack 0x4ff8 0x20000000 .bss 0x4 0x20004ff8 .data 0x4 0x20004ffc ``` Note that no overlap exists in this case and that the call stack size has reduced to accommodate the .bss and .data sections. Unlike rtic-rs#41 the addresses of static variables is now correct: ``` console $ arm-none-eabi-objdump -CD app Disassembly of section .vector_table: 08000000 <_svector_table>: 8000000: 20004ff8 strdcs r4, [r0], -r8 ; initial Stack Pointer 08000004 <cortex_m_rt::RESET_VECTOR>: 8000004: 08000131 stmdaeq r0, {r0, r4, r5, r8} 08000008 <EXCEPTIONS>: 8000008: 080001bd stmdaeq r0, {r0, r2, r3, r4, r5, r7, r8} (..) Disassembly of section .stack: 20000000 <.stack>: ... Disassembly of section .bss: 20004ff8 <cortex_m_quickstart::main::X>: 20004ff8: 00000000 andeq r0, r0, r0 Disassembly of section .data: 20004ffc <_sdata>: 20004ffc: 00000001 andeq r0, r0, r1 ``` closes rtic-rs#34 [swap-ld]: https://github.com/japaric/swap-ld

cr1901 referenced this issue in cr1901/msp430-rtfm Jan 28, 2018

Pin to specific version of RTFM syntax needed pending fix of https://…

f6163b7

…github.com/japaric/cortex-m-rtfm/issues/41

japaric closed this as completed Nov 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisit type level integers #41

revisit type level integers #41

japaric commented Aug 1, 2017 •

edited

Loading

japaric commented Aug 1, 2017

perlindgren commented Aug 1, 2017 via email

Samonitari commented Aug 17, 2017 •

edited

Loading

japaric commented Nov 3, 2018

revisit type level integers #41

revisit type level integers #41

Comments

japaric commented Aug 1, 2017 • edited Loading

Downsides

Appendix

japaric commented Aug 1, 2017

Alternatives

Bring back Threshold.raise

perlindgren commented Aug 1, 2017 via email

Samonitari commented Aug 17, 2017 • edited Loading

japaric commented Nov 3, 2018

japaric commented Aug 1, 2017 •

edited

Loading

Bring back `Threshold.raise`

Samonitari commented Aug 17, 2017 •

edited

Loading