-
Notifications
You must be signed in to change notification settings - Fork 167
Add cache maintenance functions for ARMv7-M #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamgreig Thanks for working on this! I had no idea these registers even existed. I left some comments about the Rusty-ness of the proposed API.
About your questions:
Testing that these worked as expected was as fun as you'd imagine but I'm confident they do what's expected and match the CMSIS implementation.
As I have no idea what this is really doing and have no way to test this I'll trust you ;-)
I'm not super happy at how you have to pass &cpuid into some of the SCB cache methods, but I can't see any nice way around this.
I suppose we could merge the two register blocks into a single one and have all methods on that struct.
The SCB methods currently accesses the CBP block with get() but as this whole block is WO, stateless, and just consists of memory-mapped operations, I assume this isn't a problem
Seems fine to me.
Calling invalidate_dcache while the dcache is enabled is generally a poor idea
Perhaps we can return a Result::Err
if invalidate_dcache is called when the DCache was enabled (assuming there's some way of checking that the DCache is enabled at runtime).
it's not clear when you'd actually want to call invalidate_dcache (outside of enable_dcache which calls it before enabling the cache). I included it because it's in CMSIS
In that case perhaps we can keep that method private until someone specifally requests it.
src/peripheral/mod.rs
Outdated
@@ -100,13 +104,65 @@ pub struct Cpuid { | |||
pub isar: [RO<u32>; 5], | |||
reserved1: u32, | |||
/// Cache Level ID | |||
#[cfg(armv7m)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observation: technically breaking change even if it probably made no sense to use these on ARMv6-M.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a problem? I can drop this cfg gate if you want, but it will never have done anything useful on ARMv6-M and cortex-m
is still <v1.0 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with making breaking changes but I'd like to minimize the number of minor version bumps when possible, specially when those changes require breaking changes in other crates. There's already one breaking change scheduled for v0.3.0: #41 and I have another in mind. How about postponing this one so it lands with the other two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep sure, makes sense to combine the breaking changes into one release. Want to postpone this whole set of features or just the cfg flag on these registers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the cfg flags
src/peripheral/mod.rs
Outdated
} | ||
|
||
#[cfg(armv7m)] | ||
const CSSELR_IND_POS: u32 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could these constant be moved into a cfg'd module and then imported to reduce the number of cfg attributes? Something like this:
// Sub-Architecture Specific
#[cfg(armv7m)]
mod sas {
pub const FOO: u32 = 0;
}
#[cfg(not(armv7m))]
mod sas {}
use self::sas::*;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can just drop the cfg entirely. They're const, they're only used in methods that are behind cfg gates, they're not pub, so they simply won't appear when not being used.
src/peripheral/mod.rs
Outdated
/// * `level`: the required cache level minus 1, e.g. 0 for L1, 1 for L2 | ||
/// * `ind`: select instruction cache (1) or data/unified cache (0) | ||
#[cfg(armv7m)] | ||
pub fn select_cache(&self, level: u32, ind: u32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably accept enums as inputs to avoid passing non-sensical values like level = 9_001
. Either that or assert the valid range.
Same with the functions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a panic for level and make ind an enum, which I think best matches what they're used for.
src/peripheral/mod.rs
Outdated
/// | ||
/// * `level`: the required cache level minus 1, e.g. 0 for L1, 1 for L2 | ||
/// * `ind`: select instruction cache (1) or data/unified cache (0) | ||
#[cfg(armv7m)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cfg attribute on all these methods can be converted into a single cf attribute on the enclosing impl block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do. I avoided it because "what if you wanted some other non-arm7-specific methods on CPUID"? but of course they can go in another impl block.
src/peripheral/mod.rs
Outdated
|
||
/// Returns the number of sets in the selected cache | ||
#[cfg(armv7m)] | ||
pub fn cache_num_sets(&self, level: u32, ind: u32) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps change the return type to u16 since the return value can't exceed u16::MAX AFAICT.
Same comment for the method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to u16 but basically it's just meant doing a bunch of casting from u32 to u16 when we read it, and from u16 back to u32 when we use it (giving a set/way to the cache maintenance operations). Not sure it's really worthwhile?
src/peripheral/mod.rs
Outdated
/// `size`: size of the memory block, in number of bytes, a multiple of 32 | ||
#[cfg(armv7m)] | ||
#[inline] | ||
pub fn invalidate_dcache_by_address(&self, addr: u32, size: u32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirements stated in the documentation should be asserted in the body of this method, otherwise (I suppose) Bad Stuff could happen if the user calls this with non aligned address and sizes not multiple of 32.
Alternatively you could say that size
has units of "lines" and a size
of 1 equals to 32 bytes.
Same comment for the methods below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the architecture actually masks the lower bits of the address to enforce 32 bit alignment, though I can't find a reference for this. We could enforce this by just masking the address. You'd usually want to use this method like invalidate_dcache_by_address(&mybuf, mybuf.len())
so I think it makes sense to allow it to accept non-32-bit aligned addresses and sizes and just ensure you invalidate at least what was asked for. That behaviour would match the CMSIS version too.
src/peripheral/mod.rs
Outdated
pub struct Cbp { | ||
/// I-cache invalidate all to PoU | ||
pub iciallu: WO<u32>, | ||
reserved0: RW<u32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need to wrap reserved0
in RW
it can just have type u32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/// I-cache invalidate by MVA to PoU | ||
#[inline(always)] | ||
pub fn icimvau(&self, mva: u32) { | ||
unsafe { self.icimvau.write(mva); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the whole u32 range a valid input here? What happens if I call cbp.icimvau(u32::MAX)
?
Some comments about the inputs of the other methods below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All u32 are valid here, it's a "modified virtual address" aka the address of whatever you want to invalidate. Invalidating an address that's not in the cache isn't a problem (since you couldn't have known if it was cached or not anyway).
build.rs
Outdated
println!("cargo:rustc-cfg=armv7m"); | ||
} else if target.starts_with("thumbv7em-") { | ||
println!("cargo:rustc-cfg=armv7m"); | ||
println!("cargo:rustc-cfg=armv7em"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to enable this cfg if we are not using it right now. It can be left commented out until the need for it arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll comment it out.
☔ The latest upstream changes (presumably 3951582) made this pull request unmergeable. Please resolve the merge conflicts. |
Thank you for the feedback!
I think having CPUID available separately is nice because you often want to read things from there separately to using SCB. All of CPUID is read-only except this one CSSELR, which changes what you read from the other CS* registers. We could conceivably move the CS* registers into SCB but that seems a bit weird too. I'm not sure.
Yes, on reflection I think keeping invalidate_dcache private makes sense. You can still invalidate by MVA (which is what you'd usually be doing anyway) so I don't think it's a problem. I'll respond to the other comments in their threads and push some commits with changes. |
I think that's all the points addressed (though still a somewhat outstanding question on needing to pass cpuid in, and the breaking change of hiding the registers that don't appear on ARMv6-M). Let me know if you'd rather have those commits squashed. |
If you hold off on this for a day, I'll add a couple of methods to check if the caches are enabled or not, and make enabling-while-enabled/disabling-while-disabled a no-op. At the moment if you called The rest of the changes will stay the same so do let me know if you're happy with them. |
Okay, done. That should prevent the more obvious problems and lets you just call |
src/peripheral/mod.rs
Outdated
|
||
let mut addr = addr & 0xFFFF_FFE0; | ||
|
||
for _ in 0..(size/LINESIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if address is 32 byte aligned and size is 31 this would be a no-op which sounds like not what you want. Sounds like this should be ((size - 1) / LINESIZE) + 1
or something like that but with a check for size == 0 that should be a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point.
src/peripheral/mod.rs
Outdated
/// Invalidates cache starting from the lowest 32-byte aligned address represented by `addr`, | ||
/// in blocks of 32 bytes until at least `size` bytes have been invalidated. | ||
#[inline] | ||
pub fn invalidate_dcache_by_address(&self, addr: u32, size: u32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd usually want to use this method like
invalidate_dcache_by_address(&mybuf, mybuf.len())
In that case these should probably be addr: usize
and size: usize
. That should require less casts. Maybe even make addr: *const T
(or directly accept a single argument: buf: &[T]
) but I'm not sure about that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder about this. How about keeping the current functions (though changing to usize), and then adding invalidate_slice(slice: &[T])
and invalidate(&T)
that uses core::mem::size_of
to get the size?
src/peripheral/mod.rs
Outdated
/// * `level`: the required cache level minus 1, e.g. 0 for L1, 1 for L2 | ||
/// * `ind`: select instruction cache or data/unified cache | ||
pub fn select_cache(&self, level: u8, ind: CsselrCacheType) { | ||
assert!(level<8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more. Maybe the assert is not necessary given the mask below? We do it like that in the svd2rust API. All inputs get masked to make them valid and we don't have assertions to check their range. However svd2rust documentation indicates that masking is going on. If we drop the assertion here then we should add a note about level
getting masked to always be < 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/peripheral/mod.rs
Outdated
pub fn select_cache(&self, level: u8, ind: CsselrCacheType) { | ||
assert!(level<8); | ||
unsafe { self.csselr.write( | ||
(((level as u32) << CSSELR_LEVEL_POS) & CSSELR_LEVEL_MASK) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a comment: I prefer to mask the value before shifting it to the left as that avoids overflows (which turn into panics in debug mode). Now that may or may not occur in this case; I'm just pointing out this problem in the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, sure. It can't happen here because level is a u8 but it could happen for way
in the CBP methods. I would rather keep the _MASK
constants as shifted, but we can just shift them down again before masking and it'l compile to the same deal.
src/peripheral/mod.rs
Outdated
|
||
/// D-cache clean by set-way | ||
#[inline(always)] | ||
pub fn dccsw(&self, set: u16, way: u16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be worthwhile to note the vaild input range of set
and way
and that anything outside that range will be masked to return to the valid range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks for the changes, @adamgreig. I left some more comments about things I didn't notice in the first review, sorry :-). |
This prevents these changes being breaking changes as these fields used to be exposed. To be re-added in a future breaking release.
4e3ddbb
to
73ebe63
Compare
Thanks! This definitely started life as a very C-like API and I think it will be much nicer as a Rusty one. I've updated everything except the (now marked as outdated) thread about the higher level functions:
|
Yeah, I think we can go ahead and land the usize, usize version now and then revisit later if the convenience functions for &T and &[T] make sense. Could you open a RFC issue about that?
👍 Thanks again for working on this. Let's see what the bot has to say. @homunkulus r+ |
📌 Commit 8f42e8f has been approved by |
💔 Test failed - status-travis |
Aw. You can't take the config flag off all those other constants because then they're not used in the other architectures. Could:
|
Or a mixture of options 3 and 2.
It was just an example; you could use |
How does this look? Now builds for me on all architectures. |
LGTM @homunkulus r+ |
📌 Commit 0947e74 has been approved by |
☀️ Test successful - status-travis |
🎉 thanks for the review! |
This PR adds the cache control primitives, and the higher-level cache maintenance operations that are also present in CMSIS. Cortex-M7 devices generally have a DCache and ICache which can sometimes be useful (mostly if you have external RAM) but you need to specifically clean and invalidate cache entries to ensure coherency between the CPU and DMA or other bus masters.
Specifically, this PR:
dcisw()
)All table/section references refer to the ARMv7-M architecture reference manual.
Testing that these worked as expected was as fun as you'd imagine but I'm confident they do what's expected and match the CMSIS implementation.
A few open questions...
SCB_CleanDCache()
macros from CMSIS and because the key cache enable bits are in SCB, but you could imagine moving these to methods on CBP instead.get()
but as this whole block is WO, stateless, and just consists of memory-mapped operations, I assume this isn't a problem.invalidate_dcache
while the dcache is enabled is generally a poor idea: your stack including callstack could be reset to some older state among other problems. If you build this with optimisations,invalidate_dcache
gets inlined and uses registers enough that it will complete and do what you expect, but if the cache contains anything that should have been cleaned you will get some unexpected results. If you build without optimisations, it will just get stuck in the invalidate loop as the loop counter update gets wiped out. I wrote an inlineable asm version which solves this specific problem, but still means your stack will probably not be correct, and it's a bunch messier, so I think this simple non-asm version is the one to use. Since disabling the DCache also does a clean and invalidate afterwards, it's not clear when you'd actually want to callinvalidate_dcache
(outside ofenable_dcache
which calls it before enabling the cache). I included it because it's in CMSIS.