Description
Background
Private Peripheral Bus (PPB)
All the peripherals in the cortex_m::Peripherals
struct sit on the Private
Peripheral Bus and have addresses of the form 0xE00x_xxxx
. Most of these
peripherals are interfaces to internal resources -- there's one instance of
those resources per core. Examples of internal PPB peripherals are the
NVIC
, MPU
, SYST
(system timer), ITM
, etc. The rest of peripherals are
interfaces to external resources meaning that all cores access the same
resources through these interfaces. An example of an external resource is the
TPIU
.
The bottom line here is that interacting with a singleton like CPUID
on one
core is different from interacting with it from another core even though they
are supposed to be the same singleton. For example, the CPUID.base.read()
operation can return different values depending on which core is executed.
Multi-core API
In RTFM land we are exploring multi-core applications in two modes: homogeneous
mode and heterogeneous mode. Of course, it's very early days so we still don't
know what the ecosystem will adopt but these APIs let us show the problems with
the peripheral singletons.
Homogeneous
In this mode a binary crate is compiled using a single (compilation) target and
a single ELF image is produced. The image contains the entry points for all
cores and static
variables have global visibility (visible to all cores) by
default.
#![no_std]
#![no_main]
// visible to all cores
static X: AtomicBool = AtomicBool::new(false);
// core #0 user entry point
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
// ..
}
// core #1 user entry point
#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
// ..
}
If this program is compiled for the thumbv8m.main-none-eabi
then the resulting
image can be flashed, for example, on a 2x Cortex-M33 device.
Heterogeneous
In this mode a binary crate is compiled for multiple (compilation) targets so
multiple ELF images are produced. There's one image for each core and each image
contains the entry point and static
variables for that core. There's a special
linker section named .shared
used to share static
variables between cores:
make them visible to all cores. One opts into this .shared
section using the
#[shared]
attribute; the default is that static
variables are visible only
to the core where it was defined.
#![no_std]
#![no_main]
// visible to all cores
#[shared]
static X: AtomicBool = AtomicBool::new(false);
// visible only to core #0
#[cfg(core = "0")]
static Y: AtomicBool = AtomicBool::new(false);
// each core has a *copy* of this static variable
static Z: AtomicBool = AtomicBool::new(false);
// core #0 user entry point
#[cfg(core = "0")]
#[no_mangle]
unsafe extern "C" fn main() -> ! {
// ..
}
// core #1 user entry point
#[cfg(core = "1")]
#[no_mangle]
unsafe extern "C" fn main() -> ! {
// ..
}
If this program is compiled for the thumbv7em-none-eabihf
and
thumbv6m-none-eabi
targets then the resulting image can be flashed on a
Cortex-M4F + Cortex-M0+ device, for example.
Issues
A. Send
is wrong
By definition, Send
means that it's (memory) safe to transfer ownership of a
resource from one thread / core to another. In the case of these peripheral
singletons transferring them from one core to another is wrong because that
changes the meaning of the value.
// homogeneous mode
use cortex_m::peripheral::DWT;
// used as a channel
static X: spin::Mutex<Option<DWT>> = Mutex::new(None);
#[no_mangle]
fn main_0() -> ! {
let p: cortex_m::Peripherals = ..;
// this refers to core #0's DWT
let dwt = p.DWT;
*X.lock() = Some(dwt);
// ..
}
#[no_mangle]
fn main_1() -> ! {
loop {
if let Some(x) = X.lock().take() {
// now this refers to core _#1_'s DWT
let dwt: DWT = x;
}
}
// ..
}
The other issue with Send
is that makes it possible to break the singleton
invariant: one core can send another instance of e.g. DWT
to a core that
already has one such instance.
B. take()
is unsound
The current implementation of cortex_m::Peripherals::take
looks like this:
static mut CORE_PERIPHERALS: bool = false;
impl Peripherals {
pub fn take() -> Option<Self> {
interrupt::free(|_| {
if unsafe { CORE_PERIPHERALS } {
None
} else {
Some(unsafe { Peripherals::steal() })
}
})
}
}
This is unsound in homogeneous multi-core mode because interrupt::free
doesn't
synchronize multi-core access to static mut
variables; it only synchronizes
accesses from the same core.
Using AtomicBool.compare_swap
or a similar API would make this multi-core
memory safe but that would not work on ARMv6-M because that CAS API doesn't
exist on thumbv6m-none-eabi
.
C. take()
is wrong
The following program panics in homogeneous multi-core mode but should work.
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
let p = cortex_m::Peripherals::take().unwrap();
let now = p.DWT.cyccnt.read();
// ..
}
#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
let p = cortex_m::Peripherals::take().unwrap();
let now = p.DWT.cyccnt.read();
// ..
}
This panics because both Peripherals::take
are accessing the same guard.
However, it is OK for each core to access its own DWT peripheral /
cycle counter instance.
Potential countermeasures
!Send
To avoid issue (A) we could remove the Send
implementation from all the
peripheral singletons. The downside of this approach is we would also forbid
sending a peripheral singleton from main
or a interrupt handler to another
within the same core.
LocalSend
Another alternative to avoid (A) is to remove the Send
implementation from the
singletons and instead implement a new LocalSend
trait that means safe to send
within execution contexts running on the same core. Then frameworks like RTFM
can require the LocalSend
for message passing within one core and the Send
trait for cross-core message passing.
The downside of this approach is that it requires using the nightly channel
because auto trait
s, which are required to bridge Send
and LocalSend
,
are unstable.
// crate: local-send
pub unsafe auto trait LocalSend {}
// all cross-core Send-safe types are also core-local Send safe
unsafe impl<T> LocalSend for T where T: Send {}
// crate: cortex-m
pub struct DWT {
_not_send: PhantomData<*mut ()>,
}
impl !Send for DWT {}
unsafe impl LocalSend for DWT {}
Core-local take
One way to deal with (B) and (C) is to have one guard static
variable per
core. Assuming a dual core system take
would be written as follows:
// crate: cortex-m
// for core #0
static mut PERIPHERALS0: bool = false;
// for core #1
static mut PERIPHERALS1: bool = false;
impl Peripherals {
pub fn take() -> Option<Self> {
interrupt::free(|_| unsafe {
let guard = if core_id() == 0 {
&mut PERIPHERALS0
} else {
&mut PERIPHERALS1
};
if *guard {
None
} else {
*guard = true;
Some(Peripherals { .. })
}
})
}
}
fn core_id() -> u8 {
// returns `0` on core #0
// returns `1` on core #1
}
// crate: app
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
let p = Peripherals::take().unwrap();
// ..
}
#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
let p = Peripherals::take().unwrap();
// ..
}
The problem is implementing core_id
in homogeneous mode. AFAICT, there's no
Cortex-M memory mapped register that returns a "core id" (cf. RISC-V mhartid
);
nor there is a processor register that can be used to hold a "core id" (cf.
RISC-V registers: x3
(global pointer) and x4
(thread pointer)) -- though one
could use the usually unused PSP (Process Stack Pointer) register for this
purpose.
Global singletons
A completely different approach to peripheral access that avoids the three
aforementioned issues is a global singleton API. For example:
// crate: cortex-m
use bare_metal::CriticalSection;
// A global singleton
pub struct DWT;
// the actual peripheral
pub struct DWT_ {
_not_send_or_sync: PhantomData<*mut ()>,
pub cyccnt: CYCCNT,
// .. other registers ..
}
impl DWT_ {
// NOTE private
unsafe fn new() -> Self {
..
}
}
impl DWT {
/// Grants temporary, synchronized access to the DWT peripheral
pub fn borrow(cs: &CriticalSection, f: impl FnOnce(&DWT_)) {
unsafe { f(&DWT_::new()) }
}
/// Grants temporary, unsynchronized access to the DWT peripheral
pub unsafe fn borrow_unchecked(f: impl FnOnce(&DWT_)) {
f(&DWT_::new())
}
}
// crate: app
#[no_mangle]
unsafe extern "C" fn main_0() -> ! {
interrupt::free(|cs| {
DWT::borrow(cs, |dwt| {
// accesses its own DWT
dwt.cyccnt.write(0);
});
});
// ..
}
#[no_mangle]
unsafe extern "C" fn main_1() -> ! {
interrupt::free(|cs| {
DWT::borrow(cs, |dwt| {
// accesses its own DWT
dwt.cyccnt.write(0);
});
});
// ..
}
The downside of this approach is that because there's no ownership it's hard to
build abstractions on top of peripherals. One could add a panicky API to seal
peripherals:
impl DWT {
/// Seals this peripheral; all future calls to `borrow` will panic
///
/// This function panics if it's called twice
pub fn seal(cs: &CriticalSection) {
// ..
}
}
Ownership could be emulated by first sealing the peripheral and then having the
abstraction access the peripheral exclusively through the borrow_unchecked
API.
pub struct Timer {
// not Send because this semantically owns `SYST_` which is also not `Send`
_not_send: PhantomData<* mut()>,
}
impl Timer {
pub fn new() -> Self {
// this operation effectively turns this type into an owned singleton
SYST::seal();
Self { _private: () }
}
pub fn set_timeout(&mut self, dur: Duration) {
unsafe {
SYST::borrow_unchecked(|syst| {
// ..
})
}
}
}
Internal vs external resources
In the case of external resources like the TPIU
I think we want to keep the
existing owned singleton / take
API, with the semantics that only one core can
take these peripherals, because more than one core should not access these
resources in an unsynchronized fashion.