Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit type level integers #41

Closed
japaric opened this issue Aug 1, 2017 · 4 comments
Closed

revisit type level integers #41

japaric opened this issue Aug 1, 2017 · 4 comments

Comments

@japaric
Copy link
Collaborator

japaric commented Aug 1, 2017

One of the ways how RTFMv2 became simpler than v1 was the removal of type level
integers along with the various associated tokens. However this simplification
comes at a price: heavy reliance on LLVM for proper optimization.

Right now, there's only one token that the user has to deal with: the preemption
Threshold token. This token is actually a newtype over u8 that tracks the
preemption threshold of the system through the whole application. This token is
used to unlock Resources through the borrow{,_mut} and claim{,_mut}
methods. All these methods have branches and assertions in them for memory
safety. In a correctly written and properly optimized program the assertions in
borrow{,_mut}, all but one branch in claim{,_mut} and all the Threshold
tokens should be optimized away. However this requires that LLVM knows the
exact value of the Threshold token in every single node of the function
call graph. In complex enough call graphs LLVM "gives up" and generates code to
track the value of Threshold at runtime; this destroy performance: panicking
branches as well as all the branches in a claim{,_mut} call are kept in. In
the worst case scenario this can triple the size of the output .text section.

The only way I can think of to fix the problems outline above is to turn
Threshold into a type level integer. This way the token is guaranteed to not
exist at runtime; the token would become a zero sized type. This change would
turn the panicking branch in borrow{,_mut} into a compile error and make it
very easy for LLVM to optimize away the branches in claim{,_mut} because the
"value" of Threshold would be "local" to a function rather than require
the close-to global program analysis that v2 requires.

In principle we can move the API to type level integers today (see appendix) by
using the typenum crate, but it seems that the implementation is blocked by
a rustc bug (cf. rust-lang/rust#43580).

Downsides

The main downside of implementing this change is that generic code that uses the
Resource trait becomes much more complicated to write and to read.

This is generic code that deals with two resources, today:

fn foo<A, B>(t: &mut Threshold, a: A, b: B)
where
    A: Resource<Data = u8>,
    B: Resource<Data = u8>,
{
    a.claim(t, |a, t| {
        b.claim(t, |b, t| {
            // ..
        });
    });
}

This is how that generic code would look like with the proposed typenum-based
API:

fn foo<A, B, CA, CB, THRESHOLD>(t: &mut T<THRESHOLD>, a: A, b: B)
where
    A: Resource<Data = u8, Ceiling = CA>,
    B: Resource<Data = u8, Ceiling = CB>,
    CA: Unsigned,
    CB: Unsigned,
    THRESHOLD: Max<CA> + Unsigned, // needed for the outer claim
    Maximum<THRESHOLD, CA>: Max<CB> + Unsigned, // needed for the inner claim
    Maximum<Maximum<THRESHOLD, CA>, CB>: Unsigned, // also for the inner claim
{
    a.claim(t, |a, t| {
        b.claim(t, |b, t| {
            // ..
        });
    });
}

Effectively every claim can add one bound to the where clause of the generic
function. The bounds give no extra information to the reader; they are just
there to please the compiler.

I think the only way to avoid this downside would be to build the API on top of
proper type level integers baked into the language (*). The problem is that they are
not implemented and that the API requires a type level cmp::max(A, B)
operator (**) -- it's unknown when / if that operator will be implemented.

EDIT1: (*) because, in theory, in that case no bound should be required. const N: u8 is known to be an integer and type level operations on it (where N1 > N2) would be baked into the language.

EDIT1: (**) A type level if a > b operator / clause is also required for borrow.

The other minor downside of the change is that the signature of task functions
would have to include the exact threshold level. Like this:

// priority = 3
fn exti0(t: &mut T3, r: EXTI0::Resources) { .. }

// priority = 4
fn exti1(t: &mut T4, r: EXTI0::Resources) { .. }

This doesn't seem too bad but changing the priority of a task will require you
to change the type of the threshold token as well.

Appendix

The whole new typenum-based API

#![no_std]

extern crate typenum;

use core::marker::PhantomData;

use typenum::{IsGreaterOrEqual, Max, Maximum, True, Unsigned};

/// A resource, a mechanism to safely share data between tasks
pub trait Resource {
    /// The data protected by the resource
    type Data: Send;

    /// The ceiling of the resource
    type Ceiling: Unsigned;

    /// Borrows the resource data for the span of the current context (which may
    /// be a critical section)
    ///
    /// The current preemption threshold must be greater than the resource
    /// ceiling for this to work; otherwise this call will cause the program to
    /// not compile.
    fn borrow<'cs, THRESHOLD>(
        &'cs self,
        t: &'cs T<THRESHOLD>,
    ) -> &'cs R<Self::Data, Self::Ceiling>
    where
        THRESHOLD: IsGreaterOrEqual<Self::Ceiling, Output = True> + Unsigned;

    // plus `borrow_mut`

    /// Grants access to the resource data for the span of the closure `f`
    ///
    /// The closure may be executed in a critical section, created by raising
    /// the preemption threshold, if required
    fn claim<RTY, THRESHOLD, F>(&self, t: &mut T<THRESHOLD>, f: F) -> RTY
    where
        F: FnOnce(
            &R<Self::Data, Self::Ceiling>,
            &mut Maximum<THRESHOLD, Self::Ceiling>,
        ) -> RTY,
        THRESHOLD: Unsigned + Max<Self::Ceiling>;

    // plus `claim_mut`
}

/// An unlocked resource
pub struct R<DATA, CEILING>
where
    CEILING: Unsigned,
    DATA: Send,
{
    data: DATA,
    _ceiling: PhantomData<CEILING>,
}

/// Preemption threshold token
pub struct T<THRESHOLD>
where
    THRESHOLD: Unsigned,
{
    _threshold: PhantomData<THRESHOLD>,
}

cc @cr1901 this would fix the misoptimization you have been seeing in your AT2XT program. Actually implementing this for MSP430 may be possible / easy because effectively there's only two priority levels in that architecture.

@japaric
Copy link
Collaborator Author

japaric commented Aug 1, 2017

Alternatives

Bring back Threshold.raise

We could remove Resource.claim{,_mut} and only have Resource.borrow{,_mut}.
Then users would have to manually raise the threshold to be able to access a
resource's data. Something like this:

fn exti0(t: T1, r: EXTI0::Resources) {
    t.raise::<T2>(|t| {
        let a = r.A.borrow(t);

        t.raise::<T3>(|t| {
            let b = r.B.borrow(t);
        });
    });
}

Not only this is much less ergonomic -- you have to specify what to raise the
current threshold to; this means you have to know the ceiling of each resource
-- but it also makes it possible to "overshoot" and raise the threshold more
than necessary -- e.g. you raise the threshold to 4 when only a threshold of 3
was required to access the resource -- causing more task blocking than necessary.

And I doubt this would make generic programming easier.

@perlindgren
Copy link
Collaborator

perlindgren commented Aug 1, 2017 via email

@Samonitari
Copy link

Samonitari commented Aug 17, 2017

Hi japaric, perlindgren!

Let me thank you for your awesome work, both of you!
I already did thank Jorge in an e-mail about two month ago when I started looking into rust RTFM for my project, but I had less time developing my custom board than I hoped - it is summer... But it is finally coming together!

The V2 release looks so much better, it places less burden on the programmer, has even less overhead, etc.
Also, a big plus for me, as I am using them for my board: full support for Cortex-M0(+).
This brings us to the issue in a possible solution.

@perlindgren

  1. Skip the Threshold token check, meaning that we always manipulate BASEPRI, its in any case just a few cycles. Having it occasionally optimise it away may improve performance in cases, but make it worse in cases, perhaps a bit “frickle”. Skipping the check makes the cost VERY predictable.

This would make M0(+) support worse again. As BASEPRI register is missing in these, it would translate to disabling every interrupt in every resource claim, if I understand correctly. Maybe these chips are not the most important target, but it would cripple the framework for them.
As a beginner-friendly-ish entry-point of the ARM microcontroller family (and also the recommended choice for tasks they can comfortably do, due to the package size, power-consumption, price etc.), I think M0(+) should not be mistreated.

japaric is already aware #1 , and I can always use V2 if things move forward in that way, but creating a proc macro for generating the boilerplate sounds better, at least for me.

@japaric
Copy link
Collaborator Author

japaric commented Nov 3, 2018

triage: v0.3.x used type level integers but v0.4.x removed them again in favor of a task local cell under the hood. This appears to optimize as well as the type level integers did. If there are problems with optimizations please open an issue with steps to reproduce.

@japaric japaric closed this as completed Nov 3, 2018
japaric pushed a commit that referenced this issue Sep 10, 2021
41: RFC fixup after merge, template update r=korken89 a=AfoHT

Rename the file, add links to relevant issues/PRs

Additionally update the template to include the links (just like rust-lang)

Co-authored-by: Henrik Tjäder <henrik@tjaders.com>
andrewgazelka pushed a commit to andrewgazelka/cortex-m-rtic that referenced this issue Nov 3, 2021
This is one possible solution to the stack overflow problem described in rtic-rs#34. This approach uses a
linker wrapper, called [swap-ld], to generate the desired memory layout. See rtic-rs#34 for a description
of the desired memory layout and rtic-rs#41 for a description of how `swap-ld` works.

The observable effects of this change in cortex-m programs are:

- the `_sbss` symbol is now override-able.
- there is now a `.stack` linker section that denotes the span of the call stack. `.stack` won't be
  loaded into the program; it just exists for informative purposes (`swap-ld` uses this
  information).

Given the following program:

``` rust
fn main() {
    static mut X: u32 = 0;
    static mut Y: u32 = 1;

    loop {
        unsafe {
            ptr::write_volatile(&mut X, X + 1);
            ptr::write_volatile(&mut Y, Y + 1);
        }
    }
}
```

If you link this program using the `arm-none-eabi-ld` linker, which is the cortex-m-quickstart
default, you'll get the following memory layout:

``` console
$ console
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x5000   0x20000000
.bss                                                                       0x4   0x20000000
.data                                                                      0x4   0x20000004
```

Note how the space reserved for the stack (depicted by the `.stack` linker section) overlaps with
the space where .bss and .data reside.

If you, instead, link this program using `swap-ld` you'll get the following memory layout:

``` console
$ arm-none-eabi-size -Ax app
section                                                                   size         addr
.vector_table                                                            0x130    0x8000000
.text                                                                     0x94    0x8000130
.rodata                                                                    0x0    0x80001c4
.stack                                                                  0x4ff8   0x20000000
.bss                                                                       0x4   0x20004ff8
.data                                                                      0x4   0x20004ffc
```

Note that no overlap exists in this case and that the call stack size has reduced to accommodate the
.bss and .data sections.

Unlike rtic-rs#41 the addresses of static variables is now correct:

``` console
$ arm-none-eabi-objdump -CD app
Disassembly of section .vector_table:

08000000 <_svector_table>:
 8000000:       20004ff8        strdcs  r4, [r0], -r8 ; initial Stack Pointer

08000004 <cortex_m_rt::RESET_VECTOR>:
 8000004:       08000131        stmdaeq r0, {r0, r4, r5, r8}

08000008 <EXCEPTIONS>:
 8000008:       080001bd        stmdaeq r0, {r0, r2, r3, r4, r5, r7, r8}
 (..)

Disassembly of section .stack:

20000000 <.stack>:
        ...

Disassembly of section .bss:

20004ff8 <cortex_m_quickstart::main::X>:
20004ff8:       00000000        andeq   r0, r0, r0

Disassembly of section .data:

20004ffc <_sdata>:
20004ffc:       00000001        andeq   r0, r0, r1
```

closes rtic-rs#34

[swap-ld]: https://github.com/japaric/swap-ld
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants