-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMA based API proposal #14
Comments
To elaborate why it doesn't: I think plain references are not enough for memory impl Serial {
// this is a mixture of the original Serial.write_all and Buffer.release
#[async]
fn write_all(&self, buffer: &[u8]) -> Result<(), Error> {
let usart1 = self.0;
let dma1 = self.1;
// There's a transfer in progress
if dma1.ccr4.read().en().bit_is_set() {
return Err(Error::InUse)
}
let buffer: &[u8] = buffer.lock().as_ref();
// number of bytes to write
dma1.cndtr4.write(|w| w.ndt().bits(buffer.len() as u16));
// from address
dma1.cmar4.write(|w| w.bits(buffer.as_ptr() as u32));
// to address
dma1.cpar4.write(|w| w.bits(&usart1.dr as *const _ as u32));
// start transfer
dma1.ccr4.modify(|_, w| w.en().set());
loop {
if dma1.isr.read().teif4().is_set() {
return Err(Error::Transfer)
} else if dma1.isr.read().tcif4().is_set() {
if status == Status::Locked {
unsafe { self.unlock() }
} else if status == Status::MutLocked {
unsafe { self.unlock_mut() }
}
// clear flag
dma1.ifcr.write(|w| w.ctcif5().set());
return Ok(())
} else {
// transfer not over
yield
}
}
}
} Here's an example of using the above API to cause memory unsafety without using fn main() {
foo();
bar();
}
fn foo() {
let mut buffer = [0; 256];
let gen = SERIAL.read_exact(&mut buffer);
// the right thing to do is: drive the transfer to completion
// loop { gen.resume() }
// but nothing forces me to do that so I can return here
// deallocating `buffer` while the DMA transfer is ongoing
}
fn bar() {
// the DMA transfer continues and corrupts this stack frame
let x = 0u32;
let y = 0u32;
} |
I might be missing something, but what is the point of the Conversely, the |
Overall this seems really nice though I'd strongly prefer just
Why? A lot of the time (SPI, ADC, DAC, etc) the DMA transfers u16 or u32, and
I think this is less common than static buffers but not entirely uncommon; you might well want to allocate something on the stack because you're only going to need it inside this scope, but then still want to use it with DMA before you're done with this function. I think this is much more true of multithreading where you have a stack for each thread/coroutine and so the buffer would live in that thread's stack. On a separate note, what about heap allocated Buffers for systems using dynamic allocation?
I'm aware "you can't rely on
I've never needed to, but consider most DMA peripherals treat TX and RX as separate streams: you might well want to DMA from a UART, modify the data somehow, then DMA it back out. Or DMA from an ADC, modify the buffer in-place, then replay it out the DAC. That would all require using the Buffer in different channels serially. Streaming the same data out of two independent DACs would require concurrent access (though on the STM32 specifically this isn't an issue as you can use the DACs together by writing a single register). It would be a shame for this use case to be totally impossible...
Can we have a |
They are implementation details (non user facing APIs) used to implement the safe user facing API (Serial.write_all / Buffer.release)
You can't deadlock at least not in the Mutex sense. If you try to access a
Actually it's modelled exactly the same as a RefCell's Ref. But if you want to use the Mutex analogy then it would be equivalent to a RwLock.
You can read the buffer while it's also being read by the DMA in parallel. Not that I can give you an example about why that's useful off the top of my head. But it's a memory safe possibility and I don't see a reason to not allow it; someone may find it useful. If the overhead of the reference count worries you we can add a
I actually meant to type "array of bytes (e.g.
You mean never-ending threads? I have mentioned before that stuff allocated in never-ending functions (
At first glance it seems that it would have the same problem at stack allocated arrays as in "there's no way to enforce that the boxed slice won't be deallocated midway the DMA transfer". I suppose the API could be tweaked to hand over the ownership of the buffer to the DMA (?) during the transfer and then have some
You can't rely on drop happening :-). You can safely A proper solution to this "you can't rely on drop" stuff is adding linear types (I think that's what they are called) to Rust. A linear type means "this value MUST be dropped / consumed in this scope (unless you return it from the scope)". Rust has affine types: "this value should be dropped / consumed in this scope but it's fine if it's not (mem::forget)".
OK but if we add this kind of feature we would have to lose some static typing. We would probably have to store in the buffer (i.e. at runtime) which channel is it being used with. Which means an enum of DMA channels rather encoding the channel as a phantom type. Seems doable but needs prototyping.
Possibly. It will probably involve some ad-hoc traits that are implementation details and that are not supposed to be used by the end user. |
You can use an
It is possible to have unwinding on larger microcontrollers (I know someone who did it on the less modestly sized STM32s) but it seems like an idea bad enough that I'm not sure if we ever want to support it...
This doesn't work. An |
Yeah that would work but it's limited to [T; 32]. This really wants to be
I was actually thinking of double / nested panics when unwinding would call a destructor that panics (IDK if, today, that situation causes an abort in std-land). But ... since ARM micros default to panic = abort that situation can't occur because the first panic will cause an abort.
Right. So, @adamgreig, we can't rely on destructors to preserve memory safety due to mem::forget. |
How bad would it be to use closures to ensure "destructor" is run? (Like in case of scoped threads.) |
It's pretty bad, but I guess you can always fall back to an unsafe or poisoning API. |
@Kixunil that would be too limiting. With a closures based API you can't split the start / finish operations in two tasks / interrupts, or create a futures based API to |
Ok, I've gotten more of a chance to look this over. First of all, I believe that standardized shared buffers are probably one of the most critical requirements for building a microcontroller ecosystem -- so thankyou for putting in the effort of crafting an API around it Simplicity of Data SharingI have written the library defrag-rs with the goal of allowing sharing and defragmentation of memory. I strongly advise the API presented there where memory pools give out The basic idea is that you might have an ecosystem of memory "pool" libraries that keeps track of a whole bunch of different data buffers. You would call A few key points:
Comments on
|
I'm not particularly happy about this design. It seems to lock me into dynamic memory management of sorts. |
@whitequark the application specific "pools" could be statically allocated buffers where you know all the (typed) indexes at compile time (and they would be checked at compile time) -- or they could be semi-dynamic where there is a set number of known-size buffers, or they could be from a library like We would have to make sure that ALL of these are supported -- but I think they can be 😄 |
I agree with @whitequark in that I wouldn't want the API here to be locked into Code-wise
Like I said above we can have two variants: the current " |
@japaric if |
I really think the simplest way to think about this is we want types that work like data on the heap, but are actually statically allocated. |
Also, I agree that data has to be backed statically. Maybe the type should be renamed |
On a slightly different note, a common DMA use case that I don't think is covered here are circular buffers, where a single large buffer is read/written continuously in a loop. Typically you have a "half complete" interrupt as well as a "complete" interrupt, allowing you to perform processing in one half of the buffer while the DMA deals with the other half. I use this to stream ADC data into memory and then deal with half of it while the ADC continues reading into the other half; since the ADC runs at a particular sampling frequency I can't afford to stop/start the DMA or otherwise interrupt it. As far as I can tell the whole purpose of the A very similar use case is double buffered DMA, where the DMA is given two buffers and automatically swaps between them as each one is filled. This might be the easier scenario to handle: the DMA-using function would be given two separate buffers and uses the half-complete/complete interrupts to lock/unlock them in turn. Once you had that working you probably wouldn't need to support circular single buffers, as the user could just take their original buffer and split it into two consecutive buffers. There might be some devices that only support circular buffers and not double buffers: those could have the device-specific implementation require the two buffers passed in are consecutive in memory, then hand them to the DMA as a single circular buffer and manage locking/unlocking as usual. |
I need to catch up on thread, but I always thought |
@vitiral I don't see how e.g. a memory pool being statically allocated helps here; as long as you have a dynamic memory allocation scheme you can deallocate the buffer the DMA is using and that would be Bad. The problem to solve here is how you prevent that from happening without relying on destructors running for memory safety. @adamgreig Already on my TODO list. I was wondering if it's possible to support circular / doube buffering withouting creating a new @Ericson2314 I don't immediately see how
That's one option but @adamgreig mentioned above the use case of wanting to use a single statically allocated buffer with several peripherals. With ownership that seems to me like it would get messy really quickly specially if your peripherals are in |
I've opened #15 with my counter proposal. @japaric I think the key you might be missing is that you have to pass the When you want to share a I'm pretty sure rust's lifetimes will prevent you from doing anything stupid here (unless you use |
Also, it is really important to poing out that If libraries defined a type that implements Therefore |
Rethinking ThingsI've been thinking about this some more, and I think we are taking the fundamental wrong approach here. Where we are wrong is that data needs to be locked at all. I felt like this as the beginning because I had @japaric's RTFM blog post in the back of my head which guanatees, among other things, "deadlock free execution guaranteed at compile time." Rust's std library gets away with avoiding a concurrency primitive (i.e. greenthreads) because:
I get the impression that this API is trying to be "general" like the stdlib -- for use with frameworks other than RTFM (like TockOS, or with no concurrency framework). I think this is a serious mistake. RTFM allows us to have lock and deadlock freedom with no performance overhead. For the embedded standard library of the future we need to be opinionated about our concurrency primitives in order to realize such massive gains. ProposalFor In REQ-layers I have laid out what I think are the primary layers necessary for development. However, I am now thinking that the In summary, the following layers should be based on RTFM, the other layers will be general and could be used in any embedded project:
What this means for this issueNeither the |
I don't see how that would work except for having the method block for the
Yes, that's an explicit design goal, cf. bullet 3: callback model = RTFM tasks.
I don't think it's a mistake. If we want to have code sharing to prevent every If you want a HAL tailored for RTFM tasks then you can build a
Indeed it does that as far as concurrency and data race freedom are involved. I've tested this DMA based API with the RTFM framework and the runtime checks
Note that RTFM tasks are not the only way to do multitasking in the RTFM TIL (a few days ago actually) that you can use the fn foo(array: &A) where A: Unsize<[u8]> {
let slice: &[u8] = array;
}
foo(&[0; 33]);
foo(&[0; 1024]); The problem is that the I was thinking about a non-stoppable circular DMA API and I can't see a way to Anyhow the problem I see goes like this: struct CircBuffer<A> {
first_half: A,
second_half: A,
// plus some other state to track which half is used by the DMA or borrowed
// by the main processor
}
// you start a circular DMA transfer that uses the CircBuffer
let circ_buffer = ..;
start_circular_dma(circ_buffer);
// now you borrow the second half while the DMA is using the first half
let half: &mut [u8] = circ_buffer.borrow_mut();
loop {
// do stuff with half, but never break from `loop`
} The problem there is that, because the DMA transfer never stops, at some point The problem goes away (maybe?) if the DMA transfer is not "automatically" |
That's ironic; I knew about
Assuming it's going to be stabilized that seems fine, since all embedded work is nightly-only anyway in foreseeable future. |
Hmm. Maybe. For my use cases I have a timer trigger the ADC, which takes a sample, and then the ADC fires a DMA request to move a word from the ADC register to memory. To avoid missing any samples you need to restart the DMA after the transfer complete interrupt, but before the next sample completes. At full sampling rate and full clock speed you have perhaps 70 cycles between samples, fewer once the DMA has actually fired an interrupt, which I think might just be doable but seems pretty tight to rely on. At lower sample rates (you might only be sampling at a few 10s of ksps) you'd have loads of time, no problem. It becomes mildly annoying to have to restart the DMA each transfer, but it can be made nice by a nice API, and the unsafe hatch is still there to use if you'd rather. I'd still worry about potentially missing a sample because the API can't ensure that your DMA-restarting code actually runs in time, but at least with a suitable interrupt priority and so forth it should be possible. At high sample rates you might just have to accept that it will be an unsafe thing and you have to be a little careful with your circular buffer? I agree with your analysis -- we can't statically enforce that the code using the reference to one half will complete before the DMA swaps halves. Ultimately if your code hits this problem it's because it already cannot keep up with the data rate, so you were sort of lost from the beginning anyway. I'd still like to find some good way of doing circular/double buffers because it is a fairly common use case for datalogging and DSP and most things where you are running the ADCs at a serious rate, but it seems pretty tough. Perhaps the best we can do is an easy to use but unsafe API, with suitable documentation around it. It might also be possible to detect the failure condition after the fact in some cases -- when the user asks for the other half of the buffer but the DMA has already swapped to it for instance -- and throw a panic when that happens, just to aid development/debugging. My feeling coming from C is that I'm not that unhappy about a few unsafe blocks here and there, especially if they can be relatively benign to handle. I'd be handling everything carefully in C, instead. But I worry this blinds me to what might otherwise be possible with Rust! PS. 👍 for keeping |
We can add a new auto trait for non-mandatory destructors and make RC and
Tails-calls down the road might somewhat complicate this, but hopefully there is a work around. [In general low-level event-driven code is fundamentally written in a continuation-passing style, and Rust doesn't really deal well with that on many levels. I was hoping that with the focus on The DMA I've thought most about is high bandwidth NICs ("thought" as haven't written anything actually here, so please take with grain of salf!). The "buffer descriptor queue" model many use makes ownership rather than borrowing more natural--even if extra data needs to be tracked, it can be stored in an a "side ring buffer" I suppose (morally equivalent to ring buffer with bigger cells as they share head / tail state, but compatible with the NIC ABI).
I suppose since |
@japaric I think I am beyond my capability of understanding things here. I must just not be wrapping my head around the DMA. Some quick comments I think could help:
Are you saying the method goes out of scope before it completes without being able to store its buffer somewhere? Who is owning the Nobody keeps ownership on the I think I'm starting to get it... it is so much lower-level than memory management -- this is really low level stuff. Most of what I've been saying has been really not helpful -- sorry about that. |
Even when dealing solely with DMA initiated by the CPU (as opposed the NIC case where the outside world sending packets triggers DMA), I think an ownership-based model can still work, with some global data structure would own the the buffer on the peripheral's behalf:
This relies on either one pending DMA request at a time, or some way to disambiguate between pending requests to key into the parking lot. I assume that won't be an onerous requirement as other stuff probably needs it anyways. |
They are semantically the same thing once you have one in your hands. The Example: static B: Mutex<Buffer> = ..;
fn main() {
let buffer = cortex_m::interrupt::free(|cs| { // interrupts are disabled here
// let's assume that borrow returns `&'static Buffer`
// (the existing API returns `&'cs Buffer` where `'cs` is the lifetime of
// this critical section)
let buffer: &'static Buffer = B.borrow(cs);
buffer // now buffer can escape this critical section (because of `'static`)
}); // interrupts are re-enabled here
// BAD the buffer can be accessed outside the critical section; this is racy
buffer.borrow();
}
Visually this is more or less what happens: (sorry for the crappy diagram) // CPU DMA
let buffer = BUFFER.lock();
// ..
serial.write_all(buffer).unwrap(); // Ref1 ------------->
// Ref1
let n = buffer.borrow().len(); // Ref2 Ref1
// Ref1
block! { // Ref1
buffer // <------------- Ref1
.release() // drop(Ref1)
}.unwrap(); When the DMA transfer starts the CPU conceptually sends a
I can't see how this can be implemented efficiently. At first glance it seems The important bit here is that you may want to do something like this:
Context 1 and 2 may have different stacks (a context could be a thread or a The |
@japaric why doesn't |
The lot would hold an owning pointer (or Now getting such a pointer will also require some other abstraction, or plain unsafety, but is nicely decoupled from the DMA interface itself. |
@japaric somewhat relatedly, I wish we could lable a function "call once" such that it has exclusive access to any statics declared within. |
@vitiral
OK. I think this is getting too into the hypothetical territory. We want to build a HAL that we can use today so it can't depend or wait for features with no known timeline -- much less on features with unknown likelihood of ever getting added to the language.
Sorry, do you have concrete example / code snippet? I don't see how would this work. First issue I see is that the DMA is not a general purpose processor and it can't run arbitrary code so you can't literally send a owning pointer to the DMA and then have it back when the DMA transfer is over. The DMA has no memory to stash the buffer / pointer. What has to happen is that the CPU must poll the DMA to see if the DMA transfer finished, then the CPU can mark a buffer as no longer owned by the DMA. Secondly I don't see how a memcpy is not required in the example I gave where the DMA transfer is split in two contexts: one that allocates the buffer and starts the transfer on that buffer, and a second context that marks the transfer as done and uses the buffer. The first context may be deallocated when the second context executes so a memcpy is required for memory safety AFAICT.
That sounds effect system-y (*) and viral: a function must be "call once" to call another "call once" function. Or perhaps it would be Anyhow, "call once"-ness sounds like it's more strict than necessary. We have (*) Another feature that doesn't exist today and with unknown timeline. |
@japaric I think what I am proposing is less exotic and different from yours than it sounds? I do like to think about features we don't have, but that's not the core of it. First of all, in your [For sake of showing the whole interaction, I make this method do triple duty: it sets the handler on completion, and the hardware reads the buffer and then writes information back before interrupting. In practice things would probably be broken apart, but then I'd need to write more functions to convey my point :).] static Handler: RacyCell<Option<(
unsafe fn(usize, usize),
usize,
usize,
)>> = Local::new(None);
pub fn read_and_write_all<RR, RW, B, F>(
&self,
read_buffer: R,
write_buffer: R,
new_handler: F
) -> Result<(), Error>
where RR: Deref<B> + 'static,
RW: DerefMut<B> + 'static, // Deref*Mut* for now, 'cause it's just bytes
B: Unsize<[u8]> + ?Sized,
F: fn(RW, R)
{
// Ensure ABI of handler
// With associated constants maybe we can do this statically!
debug_assert_eq!(size_of::<RW>, size_of::<usize>);
debug_assert_eq!(size_of::<RR>, size_of::<usize>);
// There's a transfer in progress
if self.dma1.ccrw4.read().en().bit_is_set() {
return Err(Error::InUse)
}
debug_assert!(Handler.get() == None);
// number of bytes to read
self.dma1.cndtr4.write(|w| w.ndt().bits(read_buffer.len() as u16));
// read address
self.dma1.cmar4.write(|w| w.bits(read_buffer.as_ptr() as u32));
// number of bytes to write
self.dma1.cndtw4.write(|w| w.ndt().bits(write_buffer.len() as u16));
// write address
self.dma1.cmaw4.write(|w| w.bits(write_buffer.as_ptr() as u32));
// Store handler and buffer pointers for later.
//
// This is really just a lousy closure but we
// pretend that we actually moved the buffer pointers to the peripheral and
// got them back with the interrupt.
*Handler.get() = Some((
new_handler as unsafe fn(usize, usize),
transmute(write_buffer),
transmute(read_buffer),
));
// start transfer
dma1.ccrw4.modify(|_, w| w.en().set());
Ok(())
} Your |
@Ericson2314 Thanks for your example. Your implementation looks OK (*) but it doesn't inter-operate with the existing
Neither of the existing abstractions (Local / Mutex / Resource) return a Local doesn't return a // (This function can't be called by the processor because EXTI0 has no
// constructor. The hardware (the NVIC) is what will call this function and
// will never call `interrupt_handler` while an `interrupt_handler` is being
// currently executed)
fn interrupt_handler(mut task: EXTI0) {
static FOO: Local<Thing, EXTI0> = Local::new(..);
// NOTE this doesn't freeze `task` because there's no "connection" between
// the lifetime of the input `&mut task` and the output
let foo: &'static mut Thing = FOO.borrow_mut(&mut task);
// Two `&mut -` references to the same data
let alias_foo: &'static mut Thing = FOO.borrow_mut(&mut task);
} What abstraction do you propose to use with this API to let users write zero (*) I'm personally wary of putting static variables in functions as that makes I was revisiting the circular buffer abstraction and I think I have come up with (I'm going to write this using "const generics" ( struct CircBuffer<const N: usize, T>
where
T: AtomicRead,
{
buffer: [T; 2 * N],
status: Cell<Status>,
}
unsafe trait AtomicRead {}
unsafe impl AtomicRead for u8 {}
unsafe impl AtomicRead for u16 {}
unsafe impl AtomicRead for u32 {}
enum Status {
/// Not in use by the DMA
Free,
/// The DMA is mutating the first half of the buffer
MutatingFirstHalf,
/// The DMA is mutating the second half of the buffer
MutatingSecondHalf,
}
/// Some analog pin
struct PA0;
impl PA0 {
/// Start conversion store the values in `buffer`
fn start<const N: usize>(&self, buffer: Ref<CircBuffer<N, u16>>) -> Result<(), Error> {
buffer.lock(); // buffer.status = MutatingFirstHalf
// Start circular DMA transfer with
// addr = buffer.buffer.as_ptr() and len = 2 * N
}
}
impl CircBuffer<const N: usize, T> {
fn read(&self) -> nb::Result<&[RO<T>], Error> {
let status = self.status.get();
assert_ne!(status, Status::Free); // transfer not started!
let isr = DMA1.isr.read();
if isr.teif1().is_set() {
Err(nb::Error::Other(Error::Transfer)) // transfer error
} else {
match status {
Status::MutatingFirstHalf => {
if isr.tcif1().is_set() {
// we didn't call `read` timely and the DMA is already
// mutating the first half again!
Err(nb::Error::Other(Error::Overrun))
} else if isr.htif1().is_set() {
// DMA done with the first half
// (clear flag, etc.)
let buffer: &[T] = unsafe { &(*self.buffer.get())[..N] };
// some magic here to transform &[T] into &[RO<T>]
// ..
} else {
// transfer not done
Err(nb::Error::WouldBlock)
}
},
Status::MutatingSecondHalf => {
if isr.htif1().is_set() {
Err(nb::Error::Other(Error::Overrun))
} else if isr.tcif1().is_set() {
let buffer: &[T] = unsafe { &(*self.buffer.get())[N..] };
// ..
} else {
Err(nb::Error::WouldBlock)
}
},
Status::Free => unreachable!(), // see `assert!` above
}
}
}
}
// example
let buffer: Ref<CircBuffer> = ..;
PA0.start(buffer);
{
// blocks until first half of the transfer is completed
let first_half: &[RO<u16>] = block!(buffer.read());
let x = first_half[0].read();
let y = first_half[1].read();
// ..
}
{
// blocks until the second half of the transfer is completed
let second_half: &[RO<u16>] = block!(buffer.read());
let x = second_half[0].read();
let y = second_half[1].read();
// ..
} Hopefully that's relatively straightforward except for the The solution I chose is to only ever have read only access to the circular The let half = block!(circ_buffer.read());
let x = half[0].read();
let y = half[0].read();
if x == y {
// foo
} else {
// the compiler *can't* optimize this branch away
// this is what we want because the DMA may have mutated `half[0]` between
// the two consecutive reads
} There's a problem though: the selection of // goal memcpy `circ_buffer_half` into `other_buffer`
for (i, slot) in other_buffer.iter_mut().enumerate() {
*slot = circ_buffer_half[i].read(); // RHS is a volatile read
} doesn't lower to a This problem could probably be fixed by returning an |
@japaric While dereferencing a lock guard gives us a borrowed pointer with a non-static lifetime, the lock guard itself should have no problem being With |
Right.
Hmm, how would the user safely put the |
Instead of saying the buffer items must be sizes that can be read atomically, I think it makes more sense to require that they are sizes the DMA can handle? The STM32 DMA can read/write memory in sizes of u8, u16, and u32, and will do those atomically, but can't transfer any other sizes. Additionally it can be different between peripheral and memory, i.e. you can read a u32 from a peripheral register and write it as four consecutive u8 writes to memory. So it might make more sense to impl some DMA newtype for the sizes the device DMA operates on. The idea that this then prevents inconsistent values I'm less sure about - e.g. if I As an aside, this pattern only covers the peripheral-to-memory case (eg ADC) but would need extending for memory-to-peripheral (DAC etc). Perhaps the enum variants should just be Free/FirstHalf/SecondHalf without "mutating" for the cases where the DMA is reading? And we'd need a As a more minor aside, I think N should be the total buffer length, not the length of a half, but that's a bit bikesheddy and not that big a deal. It's a shame we lose the lowering to memcpy. I think it'd be worth having the AtomicSlice or similar wrapper to allow easier/faster copying as it's probably a common use case for this buffer (eg copying it into a network buffer for transmission or something). |
I think you mean LIFO? I'm not sure there is a way to force LIFO across a continuation, so there might not be a way to do that safely regardless. Moreover, it is not clear to me that one is even allowed to aquire resources across a job boundary with RTFM/SRP—though before I read your linked papers I knew next to nothing about hard real to so take that with a grain of salt. I'm not sure there is a safe way to get the |
OK. Whole new idea based on your last comment. Forget about the atomic reads and I have changed impl CircBuffer<const N: usize, T> {
fn read<F, R>(&self, f: F) -> nb::Result<R, Error>
where
F: FnOnce(&[T; N]) -> R,
{
let status = self.status.get();
assert_ne!(status, Status::Free); // transfer not started!
let isr = DMA1.isr.read();
if isr.teif1().is_set() {
Err(nb::Error::Other(Error::Transfer)) // transfer error
} else {
match status {
Status::MutatingFirstHalf => {
if isr.tcif1().is_set() {
// we didn't call `read` timely and the DMA is already
// mutating the first half again!
Err(nb::Error::Other(Error::Overrun))
} else if isr.htif1().is_set() {
// DMA done with the first half
// (clear flag, etc.)
let half: &[T; N] =
unsafe { &(*self.buffer.get())[..N] };
let ret = f(half);
if isr.tcif1().is_set() {
// executing `f` took too long and the DMA is
// already mutating the first part again. This is
// an overrun error
Err(nb::Error::Other(Error::Overrun))
} else {
Ok(ret)
}
} else {
// transfer not done
Err(nb::Error::WouldBlock)
}
}
Status::MutatingSecondHalf => {
// counterpart of MutatingFirstHalf
}
Status::Free => unreachable!(), // see `assert!` above
}
}
}
} Now usage looks like this: let buffer: Ref<CircBuffer> = ..;
PA0.start(buffer);
{
// blocks until first half of the transfer is completed
// then executes the closure on the first half of the buffer
let average = block!(buffer.read(|half| {
half.iter().cloned().sum::<u16>()
})).unwrap();
// ..
}
{
// This copies the half DMA buffer into the stack
// This does optimize into a memcpy (!)
let second_half = block!(buffer.read(|half| *half)).unwrap();
// ..
} So what happens now is that once the transfer is done the closure passed to I think this is general enough that the closure could directly operate on Yeah, sorry I meant LIFO.
|
That looks nice @japaric, good idea. I agree that it could work with an &mut and therefore be the same for reading as writing, though I'm not sure yet what a good name would be... I still think the variant names should change from |
@japaric So from the papers I got the impression that at the beginning and end of a run of a job, no resources were held: the corresponding release to every acquire occurs within the same job. Thinking about it some more, it is probably safe to extend that model to allow acquiring a resource over a job boundary as one would for DMA, [Safe in terms of functional correctness, I'm not sure it doesn't adversely affect meeting deadlines.] The only requirement I see to continue avoid deadlocks is ensure if job A initiates a DMA with continuation job B, then job A can only begin if any resources potentially needed by either A or B are unused. Also, despite the literature, I can't see why the LIFO requirement actually matters. Due to the shared stack, jobs will actually acquire resources in a LIFO manner regardless. That alone seems sufficient to prevent dead locks. The manipulation of priorities does become more complicated, but this is no problem and probably doesn't even have a run-time cost. Anyways I think |
Yeah, that makes sense.
No resource that the task will use can be in use by other tasks when it starts; otherwise it shouldn't have started in the first place. That's the basic gist.
Yes, you can model it as if a process (a sequence of tasks) withheld the resource for its whole duration.
But the compiler doesn't know if this will be the case or not so you still need runtime checks to panic or raise an error if this situation (A happens again before B ends - that would indicate an scheduling problem) occurs. At least if you want to keep the API safe (no
You need to nest critical sections in a LIFO manner to access resources with different ceilings within a single task to keep the model sound. Using drop guards to unlock is not sound. See below: static R3; // resource with ceiling = 3
static R2; // resource with ceiling = 2
// with priority 1
fn task1() {
// starts with preemption threshold = 1
let r1 = R1.claim_mut(); // raises preemption threshold to 3
let r2 = R2.claim_mut(); // no change in preemption threshold
..;
drop(r1); // drops preemption threshold to 1
..;
// <~ preempted by task2
..;
drop(r2);
}
// with priority 2
fn task2() {
// BAD data race. task2 has a reference to R2 while task1 also has a
// reference to it
let r2 = R2.claim_mut();
} preemption threshold = priority another task must have to be able to preempt the current task / scope Things would have been fine if LIFO order was maintained: fn task1() {
// starts with preemption threshold = 1
let r1 = R1.claim_mut(); // raises preemption threshold to 3
let r2 = R2.claim_mut(); // no change in preemption threshold
..;
drop(r2);
..;
// <~ task2 tries to preempt but it can't
..;
drop(r1); // drops preemption threshold to 1
// task2 preempts task1 here
..;
} No data race in this case. But it's better to use closure instead of drop guards to make the first situation impossible: fn task1() {
// starts with preemption threshold = 1
R1.claim_mut(|r1| { // raises preemption threshold to 3
..;
R2.claim_mut(|r2| { // no change in preemption threshold
// <~ task2 tries to preempt but it can't
}); // no change in preemption threshold
}); // drops preemption threshold to 1
// task2 preempts task1 here
..;
}
It was designed to work with tasks and resources.
With Local you can't move the buffer around across tasks because the data is pinned to a single task. The 'static requirement can't be met by task local data so with your API tasks can't start a DMA transfer. In the upcoming version of RTFM the only place where &'static mut Local safely occurs is in the |
I mean that your Buffer is not a SRP resource; it requires dynamic checks to safely access. It does work with RTFM, but so could mine, as one could not get a
The callback can take an unconstructable ZST that the caller also provides----morally handing off the capability.
I think this is the problem, not the lack of FIFO itself. We have something very similar to a an immutable borrow going where as long as at least one resource with preemption ceiling N is held, the threshold must be at least high. [Now with a single threshhold this isn't terribly interesting other than for making a safe interface with guards---one could just make their borrows FIFO and have the same behavior as only the threshold matters. But if the implementation uses interrupt masks instead, we need not have totally ordered prevention levels: set inclusion induces a partial order, and masks traces like While in principle Rust has the power to type check this statically with the borrow checker, that is not exposed to the user so we'd need to actually maintain counters the guards can decrement. Oh well; maybe someday. The counters would be guaranteed never to go below 0, however.
I realize now local was never a good fit because we need to mask the initiator task while the peripheral and then callback owns the buffer. But with the lock-guard-and-counter approach above, I think we do have ourselves a safe and ergonomic interface. |
This implements the "rooting" mechanism proposed in #47. However, it implements a `root` constructor function instead of list of `roots` values as originally proposed. In a nutshell: - There's a new field, `root`, which takes a path to the "root" constructor function. - This constructor has signature `fn() -> T` - When the `root` field is used the signature of `init` changes to accommodate a `&'static mut T` argument at the end. The `T` in that argument type matches the type returned by the "root" constructor. - The "root"-ed value is stack allocated. This enables the safe creation of `&'static mut` references. Example below: ``` rust //#![feature(proc_macro)] //#![no_std] extern crate blue_pill; extern crate cortex_m_rt; extern crate cortex_m_rtfm as rtfm; extern crate heapless; use blue_pill::stm32f103xx; use heapless::RingBuffer; use heapless::ring_buffer::{Consumer, Producer}; use rtfm::{app, Threshold}; use stm32f103xx::Interrupt; app! { device: stm32f103xx, resources: { static CONSUMER: Consumer<'static, u32, [u32; 8]>; static PRODUCER: Producer<'static, u32, [u32; 8]>; }, root: root, idle: { resources: [CONSUMER], }, tasks: { EXTI0: { path: exti0, resources: [PRODUCER], }, } } struct Root { rb: RingBuffer<u32, [u32; 8]>, } fn root() -> Root { Root { rb: RingBuffer::new(), } } fn init(_p: init::Peripherals, root: &'static mut Root) -> init::LateResourceValues { let (p, c) = root.rb.split(); init::LateResourceValues { CONSUMER: c, PRODUCER: p, } } fn idle(_t: &mut Threshold, r: idle::Resources) -> ! { rtfm::set_pending(Interrupt::EXTI0); loop { if r.CONSUMER.dequeue().is_some() { rtfm::bkpt(); } else { rtfm::wfi(); } } } fn exti0(_t: &mut Threshold, r: EXTI0::Resources) { r.PRODUCER.enqueue(42).ok(); rtfm::bkpt(); } ``` This produces the following machine code: ``` armasm 0800019c <EXTI0>: 800019c: f240 0000 movw r0, #0 80001a0: f2c2 0000 movt r0, #8192 ; 0x2000 80001a4: 6800 ldr r0, [r0, #0] 80001a6: 6803 ldr r3, [r0, #0] 80001a8: 6842 ldr r2, [r0, #4] 80001aa: 1c51 adds r1, r2, #1 80001ac: f001 0107 and.w r1, r1, #7 80001b0: 4299 cmp r1, r3 80001b2: d006 beq.n 80001c2 <EXTI0+0x26> 80001b4: eb00 0282 add.w r2, r0, r2, lsl #2 80001b8: 232a movs r3, #42 ; 0x2a 80001ba: 6093 str r3, [r2, #8] 80001bc: f3bf 8f5f dmb sy 80001c0: 6041 str r1, [r0, #4] 80001c2: be00 bkpt 0x0000 80001c4: 4770 bx lr 080001c6 <main>: 80001c6: b08a sub sp, #40 ; 0x28 ; Root allocation 80001c8: f240 1030 movw r0, #304 ; 0x130 80001cc: 4669 mov r1, sp 80001ce: 22f0 movs r2, #240 ; 0xf0 80001d0: f6c0 0000 movt r0, #2048 ; 0x800 80001d4: 7800 ldrb r0, [r0, #0] 80001d6: 2000 movs r0, #0 80001d8: e9cd 0000 strd r0, r0, [sp] 80001dc: f240 0000 movw r0, #0 80001e0: f2c2 0000 movt r0, #8192 ; 0x2000 80001e4: b672 cpsid i 80001e6: 6001 str r1, [r0, #0] ; PRODUCER = .. 80001e8: f240 0004 movw r0, #4 80001ec: f2c2 0000 movt r0, #8192 ; 0x2000 80001f0: 6001 str r1, [r0, #0] ; CONSUMER = .. 80001f2: f24e 4106 movw r1, #58374 ; 0xe406 80001f6: f2ce 0100 movt r1, #57344 ; 0xe000 80001fa: 700a strb r2, [r1, #0] 80001fc: f24e 1100 movw r1, #57600 ; 0xe100 8000200: 2240 movs r2, #64 ; 0x40 8000202: f2ce 0100 movt r1, #57344 ; 0xe000 8000206: 600a str r2, [r1, #0] 8000208: b662 cpsie i 800020a: f8c1 2100 str.w r2, [r1, #256] ; 0x100 800020e: e006 b.n 800021e <main+0x58> 8000210: f3bf 8f5f dmb sy 8000214: 3201 adds r2, #1 8000216: f002 0207 and.w r2, r2, #7 800021a: 600a str r2, [r1, #0] 800021c: be00 bkpt 0x0000 800021e: 6801 ldr r1, [r0, #0] 8000220: 684b ldr r3, [r1, #4] 8000222: 680a ldr r2, [r1, #0] 8000224: 429a cmp r2, r3 8000226: d1f3 bne.n 8000210 <main+0x4a> 8000228: bf30 wfi 800022a: e7f8 b.n 800021e <main+0x58> ``` Unresolved questions: - Is this mechanism memory safe in presence of `panic!` unwinding? - If not, can we generate a compile error if `panic = abort` is *not* used? - How does this affect the DMA API proposed in rust-embedded/embedded-hal#14 cc @pftbest
It would be really awesome if somebody could come up with some api's for DMA that we could add to the hal library.. Atleast I am struggling to get a good design :) |
@nikhilkalige I plan to revisit this after RFC japaric/cortex-m-rtfm#59 (safe At this point I'm unsure what (traits? structs?) should eventually land in embedded-hal to promote a common DMA API across device crates. |
I'm going to close this RFC since there's a simpler approach to DMA in the works. The new approach will get its own RFC posted in this issue tracker. |
For reference, said simpler approach is discussed in #37 |
14: Bump i2cdev version r=therealprof a=ryankurte rust-i2cdev has a new minor version that removes a pile of dependencies (and speeds up builds measurably), see: rust-embedded/rust-i2cdev#47 This PR bumps i2cdev to the updated version and adds a missing newline to the end of cargo.toml Co-authored-by: Ryan Kurte <ryankurte@gmail.com>
15: Release new patch on crates.io with minimised i2c dependencies r=therealprof a=ryankurte Following rust-embedded#14 this release contains a newer version of the i2cdev crate that decreases the number of dependencies and thus build time (measured on an rpi3 from 1hr 20 mins to 20 mins). Co-authored-by: Ryan Kurte <ryankurte@gmail.com>
In this issue I'm going to present the DMA API I have implemented for the blue-pill and that I have used in my recent robotic application with the goal of using it to start a discussion about DMA based API.
Pre-requisites
You should understand how
RefCell
works and how it achieves dynamic borrowing.Detailed design
Core idea
The DMA peripheral behaves like an "external mutator". I like to think of it as an independent processor / core / thread that can mutate / access data in parallel to the main processor. From that POV a DMA transfer looks like a fork-join operation like the ones you can do in std-land with threads.
With that mindset: in a DMA transfer you want to hand out "ownership" of the data / buffer from the processor to the DMA for the span of the transfer and then claim it back when the transfer finishes. Except that "ownership" in the full sense ("the owner is in charge of calling the destructor when the value is no longer needed") is not applicable here because the DMA can't destroy the buffer is using for the transfer. So instead of ownership what the proposed API will transfer is temporary read or write access to the data.
Read or write access sounds like
&-
and&mut-
references but the proposed API won't use those. Instead it will use buffers withRefCell
-style dynamic borrowing.Buffer
The main component of the proposed API is the
Buffer
abstraction:The
data
andflag
fields effectively form aRefCell<B>
. Although not explicitly boundedB
can only be an array of bytes ([u8; N]
). TheCHANNEL
type parameter indicates which DMA channel this buffer can work with; possible values are phantom types likeDma1Channel1
,Dma1Channel2
, etc. Finally thestatus
field indicates if the DMA is currently in possession of the buffer.Buffer
exposes publicborrow
andborrow_mut
methods that behave exactly like the onesRefCell
has.Buffer
also exposes privatelock
s andunlock
s methods that are meant to be used only to implement DMA based APIs.The
lock
andunlock
pair behaves like a splitborrow
operation. Aborrow
call will check that no mutable references exist and then hand out a shared reference to data wrapped in aRef
type while increasing theBuffer
shared reference counter by one. When thatRef
value goes out of scope (it's destroyed) theBuffer
shared reference counter goes down by one. Alock
call does the first part: it checks for mutable references, increases the counter and hands out a shared reference to the data. However the return type is a plain shared reference&T
so when that value goes out of scope the shared reference counter won't be decremented. To decrement that counterunlock
must be called. Likewise thelock_mut
andunlock_mut
pair behaves like a splitborrow_mut
operation.You can see why it's called
lock
andunlock
: oncelock
ed theBuffer
can't no longer hand out mutable references (borrow_mut
) to its inner data until it getsunlock
ed. Similarly once aBuffer
has beenlock_mut
ed it can't hand out any reference to its inner data until it getsunlock_mut
ed.DMA based API
Now let's see how to build a DMA based API using the
Buffer
abstraction. As an example we'll build aSerial.write_all
method that asynchronously serializes a buffer using a DMA transfer.Aside from the
Ref
in the function signature, which is not acell::Ref
(it's astatic_ref::Ref
), this should look fairly straightforward. For now readRef<'a, T>
as&'a T
; they are semantically equivalent. I'll cover what theRef
newtype is for in the next section.Let's see how to use this method:
At this point the transfer is ongoing and we can't mutably access the buffer while the transfer is in progress. How do we get the buffer back from the DMA?
The
Buffer.release
is a potentially blocking operation that checks if the DMA transfer is over andunlock
s theBuffer
if it is. Note that the aboveimpl
ementation is specifically forCHANNEL == Dma1Channel4
. Other similar implementations can cover the other DMA channels.Continuing the example:
Alternatively, using callbacks / tasks:
static_ref::Ref
The
Ref<'a, T>
abstraction is a newtype over&'a T
that encodes the invariant that theT
to which theRef
is pointing to is actually stored in astatic
variable and thus can't never be deallocated.The reason
Ref
is used instead of just&-
in thewrite_all
method is to prevent using stack allocatedBuffer
s with that method. Stack allocatedBuffer
s are rather dangerous as there's no mechanism to prevent them from being deallocated. See below what can happen if aSerial.read_exact
method used&-
instead ofRef
:Unresolved questions
Is there an equally flexible (note that with this approach the DMA transfer start and finish operations can live in different tasks / interrupts / contexts) alternative that involves no runtime checks?
Should we extend this to also work with
Buffer
s allocated on the stack? My first idea to allow that was to add a "drop bomb" toBuffer
. As in the destructor will panic if there's an outstandinglock
. See below. (But this probably a bad idea since panicking destructor is equal to abort (?))Do note that an unsafe
Ref::new
exists so you can create aBuffer
on the stack and wrap a reference to it in aRef
value. This will be like asserting that the buffer will never get deallocated. Of course it's up to you to ensure that's actually the case.In the blue-pill implementation
Buffer
is defined in theblue-pill
crate itself but to turn DMA based APIs into traits we would have to move thatBuffer
abstraction into theembedded-hal
crate. That complicates things because (a)lock
et al. would have to become public and (b) e.g.Dma1Channel1
wants to be defined in thestm32f103xx-hal-impl
crate but that means one can't writeimpl Buffer<B, Dma1Channel1>
due to coherence. I have no idea how to sort out these implementation details.This API doesn't support using a single
Buffer
with more than one DMA channel (neither serially or concurrently). Is that something we want to support?Since we have the chance: should we reduce the size of the
flag
field? The core implementation usesflag: usize
but4294967295
simultaneouscell::Ref
sounds like a very unlikely scenario in a microcontroller.The text was updated successfully, but these errors were encountered: