Skip to content

Add cache maintenance functions for ARMv7-M #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 15, 2017

Conversation

adamgreig
Copy link
Member

@adamgreig adamgreig commented Jun 9, 2017

This PR adds the cache control primitives, and the higher-level cache maintenance operations that are also present in CMSIS. Cortex-M7 devices generally have a DCache and ICache which can sometimes be useful (mostly if you have external RAM) but you need to specifically clean and invalidate cache entries to ensure coherency between the CPU and DMA or other bus masters.

Specifically, this PR:

  • Extends the config flags to include armv7m and armv7em (with -M also being set for -EM parts)
  • Puts the existing CLIDR, CTR, CCSIDR, and CSSELR CPUID registers behind cfg(armv7m) as they are only implemented since ARMv7-M.
    • CSSELR is changed to RW - though the table B4-1 says it's RO, the specific information in B4.8.3 says it is RW, and it's used by writing to it.
  • Adds methods on CPUID to access those cache related registers and retrieve cache number of sets/ways
  • Adds the CBP memory-mapped operations block from Table B2-1, which contains operations to clean and invalidate the instruction and data caches and branch predictor on ARMv7-M parts that include them.
  • Adds methods for each of the memory locations in the CBP block to actually perform those operations, with appropriate arguments.
    • Though note the set/way functions are specific to Cortex-M7 and might not apply to a future ARMv7-M implementation with a different cache setup (see the comment in dcisw())
  • Adds new methods to SCB that mirror the equivalent SCB_* cache functions in CMSIS (see core_cm7.h).

All table/section references refer to the ARMv7-M architecture reference manual.

Testing that these worked as expected was as fun as you'd imagine but I'm confident they do what's expected and match the CMSIS implementation.

Running cache test...
Operation               Cache  RAM
----------------------------------
Initial....................A    A
Increment..................B    B
Enable DCache..............B    B
Increment..................C    B
Clean DCache...............C    C
Increment..................D    C
Invalidate DCache..........C    C
Increment..................D    C
CleanInvalidate DCache.....D    D
Increment..................E    D
Clean DCache wrong addr....E    D
Clean DCache right addr....E    E
Increment..................F    E
Inval DCache wrong addr....F    E
Inval DCache right addr....E    E
Increment..................F    E
CI DCache wrong addr.......F    E
CI DCache right addr.......F    F
Increment..................G    F
Disable DCache.............G    G

A few open questions...

  • I'm not super happy at how you have to pass &cpuid into some of the SCB cache methods, but I can't see any nice way around this. The functions are in SCB to match the SCB_CleanDCache() macros from CMSIS and because the key cache enable bits are in SCB, but you could imagine moving these to methods on CBP instead.
  • The SCB methods currently accesses the CBP block with get() but as this whole block is WO, stateless, and just consists of memory-mapped operations, I assume this isn't a problem.
  • Calling invalidate_dcache while the dcache is enabled is generally a poor idea: your stack including callstack could be reset to some older state among other problems. If you build this with optimisations, invalidate_dcache gets inlined and uses registers enough that it will complete and do what you expect, but if the cache contains anything that should have been cleaned you will get some unexpected results. If you build without optimisations, it will just get stuck in the invalidate loop as the loop counter update gets wiped out. I wrote an inlineable asm version which solves this specific problem, but still means your stack will probably not be correct, and it's a bunch messier, so I think this simple non-asm version is the one to use. Since disabling the DCache also does a clean and invalidate afterwards, it's not clear when you'd actually want to call invalidate_dcache (outside of enable_dcache which calls it before enabling the cache). I included it because it's in CMSIS.

Copy link
Member

@japaric japaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamgreig Thanks for working on this! I had no idea these registers even existed. I left some comments about the Rusty-ness of the proposed API.

About your questions:

Testing that these worked as expected was as fun as you'd imagine but I'm confident they do what's expected and match the CMSIS implementation.

As I have no idea what this is really doing and have no way to test this I'll trust you ;-)

I'm not super happy at how you have to pass &cpuid into some of the SCB cache methods, but I can't see any nice way around this.

I suppose we could merge the two register blocks into a single one and have all methods on that struct.

The SCB methods currently accesses the CBP block with get() but as this whole block is WO, stateless, and just consists of memory-mapped operations, I assume this isn't a problem

Seems fine to me.

Calling invalidate_dcache while the dcache is enabled is generally a poor idea

Perhaps we can return a Result::Err if invalidate_dcache is called when the DCache was enabled (assuming there's some way of checking that the DCache is enabled at runtime).

it's not clear when you'd actually want to call invalidate_dcache (outside of enable_dcache which calls it before enabling the cache). I included it because it's in CMSIS

In that case perhaps we can keep that method private until someone specifally requests it.

@@ -100,13 +104,65 @@ pub struct Cpuid {
pub isar: [RO<u32>; 5],
reserved1: u32,
/// Cache Level ID
#[cfg(armv7m)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: technically breaking change even if it probably made no sense to use these on ARMv6-M.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a problem? I can drop this cfg gate if you want, but it will never have done anything useful on ARMv6-M and cortex-m is still <v1.0 too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with making breaking changes but I'd like to minimize the number of minor version bumps when possible, specially when those changes require breaking changes in other crates. There's already one breaking change scheduled for v0.3.0: #41 and I have another in mind. How about postponing this one so it lands with the other two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sure, makes sense to combine the breaking changes into one release. Want to postpone this whole set of features or just the cfg flag on these registers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the cfg flags

}

#[cfg(armv7m)]
const CSSELR_IND_POS: u32 = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these constant be moved into a cfg'd module and then imported to reduce the number of cfg attributes? Something like this:

// Sub-Architecture Specific
#[cfg(armv7m)]
mod sas {
    pub const FOO: u32 = 0;
}

#[cfg(not(armv7m))]
mod sas {}

use self::sas::*;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can just drop the cfg entirely. They're const, they're only used in methods that are behind cfg gates, they're not pub, so they simply won't appear when not being used.

/// * `level`: the required cache level minus 1, e.g. 0 for L1, 1 for L2
/// * `ind`: select instruction cache (1) or data/unified cache (0)
#[cfg(armv7m)]
pub fn select_cache(&self, level: u32, ind: u32) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably accept enums as inputs to avoid passing non-sensical values like level = 9_001. Either that or assert the valid range.

Same with the functions below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a panic for level and make ind an enum, which I think best matches what they're used for.

///
/// * `level`: the required cache level minus 1, e.g. 0 for L1, 1 for L2
/// * `ind`: select instruction cache (1) or data/unified cache (0)
#[cfg(armv7m)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cfg attribute on all these methods can be converted into a single cf attribute on the enclosing impl block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do. I avoided it because "what if you wanted some other non-arm7-specific methods on CPUID"? but of course they can go in another impl block.


/// Returns the number of sets in the selected cache
#[cfg(armv7m)]
pub fn cache_num_sets(&self, level: u32, ind: u32) -> u32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change the return type to u16 since the return value can't exceed u16::MAX AFAICT.

Same comment for the method below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to u16 but basically it's just meant doing a bunch of casting from u32 to u16 when we read it, and from u16 back to u32 when we use it (giving a set/way to the cache maintenance operations). Not sure it's really worthwhile?

/// `size`: size of the memory block, in number of bytes, a multiple of 32
#[cfg(armv7m)]
#[inline]
pub fn invalidate_dcache_by_address(&self, addr: u32, size: u32) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements stated in the documentation should be asserted in the body of this method, otherwise (I suppose) Bad Stuff could happen if the user calls this with non aligned address and sizes not multiple of 32.

Alternatively you could say that size has units of "lines" and a size of 1 equals to 32 bytes.

Same comment for the methods below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the architecture actually masks the lower bits of the address to enforce 32 bit alignment, though I can't find a reference for this. We could enforce this by just masking the address. You'd usually want to use this method like invalidate_dcache_by_address(&mybuf, mybuf.len()) so I think it makes sense to allow it to accept non-32-bit aligned addresses and sizes and just ensure you invalidate at least what was asked for. That behaviour would match the CMSIS version too.

pub struct Cbp {
/// I-cache invalidate all to PoU
pub iciallu: WO<u32>,
reserved0: RW<u32>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to wrap reserved0 in RW it can just have type u32

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// I-cache invalidate by MVA to PoU
#[inline(always)]
pub fn icimvau(&self, mva: u32) {
unsafe { self.icimvau.write(mva); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the whole u32 range a valid input here? What happens if I call cbp.icimvau(u32::MAX)?

Some comments about the inputs of the other methods below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All u32 are valid here, it's a "modified virtual address" aka the address of whatever you want to invalidate. Invalidating an address that's not in the cache isn't a problem (since you couldn't have known if it was cached or not anyway).

build.rs Outdated
println!("cargo:rustc-cfg=armv7m");
} else if target.starts_with("thumbv7em-") {
println!("cargo:rustc-cfg=armv7m");
println!("cargo:rustc-cfg=armv7em");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to enable this cfg if we are not using it right now. It can be left commented out until the need for it arises.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll comment it out.

@homunkulus
Copy link
Contributor

☔ The latest upstream changes (presumably 3951582) made this pull request unmergeable. Please resolve the merge conflicts.

@adamgreig
Copy link
Member Author

adamgreig commented Jun 12, 2017

Thank you for the feedback!

I suppose we could merge the two register blocks into a single one and have all methods on that struct.

I think having CPUID available separately is nice because you often want to read things from there separately to using SCB. All of CPUID is read-only except this one CSSELR, which changes what you read from the other CS* registers. We could conceivably move the CS* registers into SCB but that seems a bit weird too. I'm not sure.

Perhaps we can return a Result::Err if invalidate_dcache is called when the DCache was enabled (assuming there's some way of checking that the DCache is enabled at runtime).
In that case perhaps we can keep that method private until someone specifally requests it.

Yes, on reflection I think keeping invalidate_dcache private makes sense. You can still invalidate by MVA (which is what you'd usually be doing anyway) so I don't think it's a problem.

I'll respond to the other comments in their threads and push some commits with changes.

@adamgreig
Copy link
Member Author

I think that's all the points addressed (though still a somewhat outstanding question on needing to pass cpuid in, and the breaking change of hiding the registers that don't appear on ARMv6-M).

Let me know if you'd rather have those commits squashed.

@adamgreig
Copy link
Member Author

adamgreig commented Jun 13, 2017

If you hold off on this for a day, I'll add a couple of methods to check if the caches are enabled or not, and make enabling-while-enabled/disabling-while-disabled a no-op.

At the moment if you called disable_dcache while it was already disabled, you'd cause the (not at all up to date) dcache to be cleaned back into main memory with bad consequences, but there's no convenient way for the user to check if it's enabled or not.

The rest of the changes will stay the same so do let me know if you're happy with them.

@adamgreig
Copy link
Member Author

Okay, done. That should prevent the more obvious problems and lets you just call disable_dcache to ensure it's turned off, without having to care if it was previously turned on or not, and vice versa.


let mut addr = addr & 0xFFFF_FFE0;

for _ in 0..(size/LINESIZE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, if address is 32 byte aligned and size is 31 this would be a no-op which sounds like not what you want. Sounds like this should be ((size - 1) / LINESIZE) + 1 or something like that but with a check for size == 0 that should be a no-op.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.

/// Invalidates cache starting from the lowest 32-byte aligned address represented by `addr`,
/// in blocks of 32 bytes until at least `size` bytes have been invalidated.
#[inline]
pub fn invalidate_dcache_by_address(&self, addr: u32, size: u32) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd usually want to use this method like invalidate_dcache_by_address(&mybuf, mybuf.len())

In that case these should probably be addr: usize and size: usize. That should require less casts. Maybe even make addr: *const T (or directly accept a single argument: buf: &[T]) but I'm not sure about that one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder about this. How about keeping the current functions (though changing to usize), and then adding invalidate_slice(slice: &[T]) and invalidate(&T) that uses core::mem::size_of to get the size?

/// * `level`: the required cache level minus 1, e.g. 0 for L1, 1 for L2
/// * `ind`: select instruction cache or data/unified cache
pub fn select_cache(&self, level: u8, ind: CsselrCacheType) {
assert!(level<8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more. Maybe the assert is not necessary given the mask below? We do it like that in the svd2rust API. All inputs get masked to make them valid and we don't have assertions to check their range. However svd2rust documentation indicates that masking is going on. If we drop the assertion here then we should add a note about level getting masked to always be < 8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pub fn select_cache(&self, level: u8, ind: CsselrCacheType) {
assert!(level<8);
unsafe { self.csselr.write(
(((level as u32) << CSSELR_LEVEL_POS) & CSSELR_LEVEL_MASK) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment: I prefer to mask the value before shifting it to the left as that avoids overflows (which turn into panics in debug mode). Now that may or may not occur in this case; I'm just pointing out this problem in the general case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sure. It can't happen here because level is a u8 but it could happen for way in the CBP methods. I would rather keep the _MASK constants as shifted, but we can just shift them down again before masking and it'l compile to the same deal.


/// D-cache clean by set-way
#[inline(always)]
pub fn dccsw(&self, set: u16, way: u16) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be worthwhile to note the vaild input range of set and way and that anything outside that range will be masked to return to the valid range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@japaric
Copy link
Member

japaric commented Jun 14, 2017

Thanks for the changes, @adamgreig. I left some more comments about things I didn't notice in the first review, sorry :-).

This prevents these changes being breaking changes as these fields used to be
exposed. To be re-added in a future breaking release.
@adamgreig adamgreig force-pushed the cache_control branch 2 times, most recently from 4e3ddbb to 73ebe63 Compare June 14, 2017 23:06
@adamgreig
Copy link
Member Author

Thanks! This definitely started life as a very C-like API and I think it will be much nicer as a Rusty one. I've updated everything except the (now marked as outdated) thread about the higher level functions:

Hmm, I wonder about this. How about keeping the current functions (though changing to usize), and then adding invalidate_slice(slice: &[T]) and invalidate(&T) that uses core::mem::size_of to get the size?

@japaric
Copy link
Member

japaric commented Jun 15, 2017

I've updated everything except the (now marked as outdated) thread about the higher level functions:

Yeah, I think we can go ahead and land the usize, usize version now and then revisit later if the convenience functions for &T and &[T] make sense. Could you open a RFC issue about that?

This definitely started life as a very C-like API and I think it will be much nicer as a Rusty one

👍

Thanks again for working on this. Let's see what the bot has to say.

@homunkulus r+

@homunkulus
Copy link
Contributor

📌 Commit 8f42e8f has been approved by japaric

@homunkulus
Copy link
Contributor

⌛ Testing commit 8f42e8f with merge 8f42e8f...

@homunkulus
Copy link
Contributor

💔 Test failed - status-travis

@adamgreig
Copy link
Member Author

Aw. You can't take the config flag off all those other constants because then they're not used in the other architectures.

Could:

  • Put individual cfg on each constant. Messy.
  • Put them in a flagged module as you suggest. I'm not sure about the name sas as it's a bit.. weird? Perhaps a module per peripheral for architecture specific constants.
  • Put them as consts inside each method that actually uses them. Many of these constants are only used by a single method.

@japaric
Copy link
Member

japaric commented Jun 15, 2017

Or a mixture of options 3 and 2.

I'm not sure about the name sas as it's a bit.. weird?

It was just an example; you could use mod armv7m or something like that.

@adamgreig
Copy link
Member Author

How does this look? Now builds for me on all architectures.

@japaric
Copy link
Member

japaric commented Jun 15, 2017

LGTM

@homunkulus r+

@homunkulus
Copy link
Contributor

📌 Commit 0947e74 has been approved by japaric

@homunkulus
Copy link
Contributor

⌛ Testing commit 0947e74 with merge 0947e74...

@homunkulus
Copy link
Contributor

☀️ Test successful - status-travis
Approved by: japaric
Pushing 0947e74 to master...

@homunkulus homunkulus merged commit 0947e74 into rust-embedded:master Jun 15, 2017
@adamgreig adamgreig deleted the cache_control branch June 15, 2017 03:21
@adamgreig
Copy link
Member Author

🎉 thanks for the review!

adamgreig pushed a commit that referenced this pull request Jan 12, 2022
drop the /DISCARD/ section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants