[discussion] Peripheral singletons are not multi-core friendly

## Background

### Private Peripheral Bus (PPB)

All the peripherals in the `cortex_m::Peripherals` struct sit on the Private
Peripheral Bus and have addresses of the form `0xE00x_xxxx`. Most of these
peripherals are interfaces to *internal* resources -- there's one instance of
those resources per core. Examples of internal PPB peripherals are the
`NVIC`, `MPU`, `SYST` (system timer), `ITM`, etc. The rest of peripherals are
interfaces to *external* resources meaning that all cores access the *same*
resources through these interfaces. An example of an external resource is the
`TPIU`.

The bottom line here is that interacting with a singleton like `CPUID` on one
core is different from interacting with it from another core even though they
are supposed to be the same singleton. For example, the `CPUID.base.read()`
operation can return different values depending on which core is executed.

### Multi-core API

In RTFM land we are exploring multi-core applications in two modes: homogeneous
mode and heterogeneous mode. Of course, it's very early days so we still don't
know what the ecosystem will adopt but these APIs let us show the problems with
the peripheral singletons.

#### Homogeneous

In this mode a binary crate is compiled using a single (compilation) target and
a single ELF image is produced. The image contains the entry points for all
cores and `static` variables have global visibility (visible to all cores) by
default.

``` rust
#![no_std]
#![no_main]

// visible to all cores
static X: AtomicBool = AtomicBool::new(false);

// core #0 user entry point
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    // ..
}

// core #1 user entry point
#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    // ..
}
```

If this program is compiled for the `thumbv8m.main-none-eabi` then the resulting
image can be flashed, for example, on a 2x Cortex-M33 device.

#### Heterogeneous

In this mode a binary crate is compiled for *multiple* (compilation) targets so
multiple ELF images are produced. There's one image for each core and each image
contains the entry point and `static` variables for that core. There's a special
linker section named `.shared` used to *share* `static` variables between cores:
make them visible to all cores. One opts into this `.shared` section using the
`#[shared]` attribute; the default is that `static` variables are visible only
to the core where it was defined.

``` rust
#![no_std]
#![no_main]

// visible to all cores
#[shared]
static X: AtomicBool = AtomicBool::new(false);

// visible only to core #0
#[cfg(core = "0")]
static Y: AtomicBool = AtomicBool::new(false);

// each core has a *copy* of this static variable
static Z: AtomicBool = AtomicBool::new(false);

// core #0 user entry point
#[cfg(core = "0")]
#[no_mangle]
unsafe extern "C" fn main() -> ! {
    // ..
}

// core #1 user entry point
#[cfg(core = "1")]
#[no_mangle]
unsafe extern "C" fn main() -> ! {
    // ..
}
```

If this program is compiled for the `thumbv7em-none-eabihf` and
`thumbv6m-none-eabi` targets then the resulting image can be flashed on a
Cortex-M4F + Cortex-M0+ device, for example.

## Issues

### A. `Send` is wrong

By definition, `Send` means that it's (memory) safe to transfer ownership of a
resource from one thread / core to another. In the case of these peripheral
singletons transferring them from one core to another is wrong because that
changes the *meaning* of the value.

``` rust
// homogeneous mode

use cortex_m::peripheral::DWT;

// used as a channel
static X: spin::Mutex<Option<DWT>> = Mutex::new(None);

#[no_mangle]
fn main_0() -> ! {
    let p: cortex_m::Peripherals = ..;

    // this refers to core #0's DWT
    let dwt = p.DWT;
    *X.lock() = Some(dwt);

    // ..
}

#[no_mangle]
fn main_1() -> ! {
    loop {
        if let Some(x) = X.lock().take() {
            // now this refers to core _#1_'s DWT
            let dwt: DWT = x;
        }
    }

    // ..
}
```

The other issue with `Send` is that makes it possible to break the singleton
invariant: one core can send another instance of e.g. `DWT` to a core that
already has one such instance.

### B. `take()` is unsound

The current implementation of `cortex_m::Peripherals::take` looks like this:

``` rust
static mut CORE_PERIPHERALS: bool = false;

impl Peripherals {
    pub fn take() -> Option<Self> {
        interrupt::free(|_| {
            if unsafe { CORE_PERIPHERALS } {
                None
            } else {
                Some(unsafe { Peripherals::steal() })
            }
        })
    }

}
```

This is unsound in homogeneous multi-core mode because `interrupt::free` doesn't
synchronize multi-core access to `static mut` variables; it only synchronizes
accesses from the same core.

Using `AtomicBool.compare_swap` or a similar API would make this multi-core
memory safe but that would not work on ARMv6-M because that CAS API doesn't
exist on `thumbv6m-none-eabi`.

### C. `take()` is wrong

The following program panics in homogeneous multi-core mode but should work.

``` rust
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    let p = cortex_m::Peripherals::take().unwrap();
    let now = p.DWT.cyccnt.read();

    // ..
}

#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    let p = cortex_m::Peripherals::take().unwrap();
    let now = p.DWT.cyccnt.read();

    // ..
}
```

This panics because both `Peripherals::take` are accessing the same guard.
However, it is OK for each core to access its own DWT peripheral /
cycle counter instance.

## Potential countermeasures

### `!Send`

To avoid issue (A) we could remove the `Send` implementation from all the
peripheral singletons. The downside of this approach is we would also forbid
sending a peripheral singleton from `main` or a interrupt handler to another
*within the same core*.

### `LocalSend`

Another alternative to avoid (A) is to remove the `Send` implementation from the
singletons and instead implement a new `LocalSend` trait that means safe to send
within execution contexts running on the same core. Then frameworks like RTFM
can require the `LocalSend` for message passing within one core and the `Send`
trait for cross-core message passing.

The downside of this approach is that it requires using the nightly channel
because `auto trait`s, which are required to bridge `Send` and `LocalSend`,
are unstable.

``` rust
// crate: local-send
pub unsafe auto trait LocalSend {}

// all cross-core Send-safe types are also core-local Send safe
unsafe impl<T> LocalSend for T where T: Send {}

// crate: cortex-m
pub struct DWT {
    _not_send: PhantomData<*mut ()>,
}

impl !Send for DWT {}
unsafe impl LocalSend for DWT {}
```

### Core-local `take`

One way to deal with (B) and (C) is to have one guard `static` variable *per*
core. Assuming a dual core system `take` would be written as follows:

``` rust
// crate: cortex-m
// for core #0
static mut PERIPHERALS0: bool = false;
// for core #1
static mut PERIPHERALS1: bool = false;

impl Peripherals {
   pub fn take() -> Option<Self> {
       interrupt::free(|_| unsafe {
          let guard = if core_id() == 0 {
              &mut PERIPHERALS0
          } else {
              &mut PERIPHERALS1
          };

          if *guard {
              None
          } else {
              *guard = true;
              Some(Peripherals { .. })
          }
       })
   }
}

fn core_id() -> u8 {
    // returns `0` on core #0
    // returns `1` on core #1
}

// crate: app
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    let p = Peripherals::take().unwrap();
    // ..
}

#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    let p = Peripherals::take().unwrap();
    // ..
}
```

The problem is implementing `core_id` in homogeneous mode. AFAICT, there's no
Cortex-M memory mapped register that returns a "core id" (cf. RISC-V `mhartid`);
nor there is a processor register that can be used to hold a "core id" (cf.
RISC-V registers: `x3` (global pointer) and `x4` (thread pointer)) -- though one
could use the usually unused PSP (Process Stack Pointer) register for this
purpose.

### Global singletons

A completely different approach to peripheral access that avoids the three
aforementioned issues is a *global* singleton API. For example:

``` rust
// crate: cortex-m
use bare_metal::CriticalSection;

// A global singleton
pub struct DWT;

// the actual peripheral
pub struct DWT_ {
    _not_send_or_sync: PhantomData<*mut ()>,
    pub cyccnt: CYCCNT,
    // .. other registers ..
}

impl DWT_ {
    // NOTE private
    unsafe fn new() -> Self {
        ..
    }
}

impl DWT {
    /// Grants temporary, synchronized access to the DWT peripheral
    pub fn borrow(cs: &CriticalSection, f: impl FnOnce(&DWT_)) {
        unsafe { f(&DWT_::new()) }
    }

    /// Grants temporary, unsynchronized access to the DWT peripheral
    pub unsafe fn borrow_unchecked(f: impl FnOnce(&DWT_)) {
        f(&DWT_::new())
    }
}

// crate: app
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
    interrupt::free(|cs| {
        DWT::borrow(cs, |dwt| {
            // accesses its own DWT
            dwt.cyccnt.write(0);
        });
    });

    // ..
}

#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
    interrupt::free(|cs| {
        DWT::borrow(cs, |dwt| {
            // accesses its own DWT
            dwt.cyccnt.write(0);
        });
    });

    // ..
}
```

The downside of this approach is that because there's no ownership it's hard to
build abstractions on top of peripherals. One could add a panicky API to seal
peripherals:

``` rust
impl DWT {
    /// Seals this peripheral; all future calls to `borrow` will panic
    ///
    /// This function panics if it's called twice
    pub fn seal(cs: &CriticalSection) {
        // ..
    }
}
```

Ownership could be emulated by first sealing the peripheral and then having the
abstraction access the peripheral exclusively through the `borrow_unchecked`
API.

``` rust
pub struct Timer {
    // not Send because this semantically owns `SYST_` which is also not `Send`
    _not_send: PhantomData<* mut()>,
}

impl Timer {
    pub fn new() -> Self {
        // this operation effectively turns this type into an owned singleton
        SYST::seal();

        Self { _private: () }
    }

    pub fn set_timeout(&mut self, dur: Duration) {
         unsafe {
            SYST::borrow_unchecked(|syst| {
                // ..
            })
         }
    }
}
```

### Internal vs external resources

In the case of external resources like the `TPIU` I think we want to keep the
existing owned singleton / `take` API, with the semantics that only one core can
take these peripherals, because more than one core should not access these
resources in an unsynchronized fashion.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[discussion] Peripheral singletons are not multi-core friendly #149

Background

Private Peripheral Bus (PPB)

Multi-core API

Homogeneous

Heterogeneous

Issues

A. `Send` is wrong

B. `take()` is unsound

C. `take()` is wrong

Potential countermeasures

`!Send`

`LocalSend`

Core-local `take`

Global singletons

Internal vs external resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[discussion] Peripheral singletons are not multi-core friendly #149

Description

Background

Private Peripheral Bus (PPB)

Multi-core API

Homogeneous

Heterogeneous

Issues

A. Send is wrong

B. take() is unsound

C. take() is wrong

Potential countermeasures

!Send

LocalSend

Core-local take

Global singletons

Internal vs external resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

A. `Send` is wrong

B. `take()` is unsound

C. `take()` is wrong

`!Send`

`LocalSend`

Core-local `take`