Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a sync API for working with global mutable state. #107

Merged
merged 16 commits into from
Feb 21, 2021

Conversation

Lymia
Copy link
Contributor

@Lymia Lymia commented Feb 13, 2021

This PR is an alternative to #103 that tries to maintain correctness for all types (by disabling IRQs and doing a plain copy) and use inline assembly to provide fast paths for small (and aligned) types. This avoids the need for separate types or any runtime checks, as the only problem if you use a type that doesn't work with the fast path is that... well. You use a slow path that can cause the audio to skip. (But that isn't memory unsafely.)

The PR also adds some helper types based on Static for some common other uses of global variables:

  • RawMutex and Mutex provide a way to prevent an IRQ and normal code from running the same thing at once when this would cause safety issues.
  • InitOnce implements something pretty similar to lazy_static!, allowing you to run some computation only once, after which the result is cached.

Implementation notes:

  • Static::write and Static::read use LDR/STR (and their half-word and byte variants) for small types, and use STMIA/LDMIA to save/load the entire contents of the Static without a chance for an IRQ to interrupt it for larger types.
  • Static::replace uses the SWP and SWPB instructions for its fast path. It works for any type that is 4-byte aligned and less than 4 bytes, or any type that is only one byte.
  • InitOnce and RawMutex are both based on Static::swap, allowing them to atomically set and check status bytes, even in the presence of IRQs.
  • This messes with the target .json a little to enable thumb interworking. Some of the functions here use ARM-only instructions that can't be used in THUMB-mode.
  • Unfortunately, I don't think I can avoid the use of compiler_fence entirely. As such, I've added a stub for the __sync_synchronize function. This causes an unneeded function call to an empty function every time a compiler fence is used, but this is a minimal cost. It doesn't need to actually do anything, because there's no actual memory barriers needed on GBA.

Finally, this fixes a few minor bugs that are a bit too small to put in a separate PR:

  • Fixes an integer overflow in the hello_world example that was causing it to panic in debug builds.
  • Fixes an issue where the LLVM was optimizing out writes to the input buffer for DMA reads. (e.g. in the hello_world example, the Mode3::dma_clear_to was actually copying uninitialized memory into VRAM instead of the intended color value. It only seemed to work before because the value happened to end up in zeroed memory.)

Copy link
Member

@Lokathor Lokathor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, as much of it as I understood.

src/sync.rs Outdated
/// Marks that a given pointer is read by volatile means without actually
/// reading it.
#[inline(always)]
pub fn volatile_mark_ro<T>(val: *const T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is doing what it claims to be doing. I'd like more of an explanation in the rustdoc comment here. Do I just need to go read the inline assembly docs? If so you can just say that and give the link to those. If the explanation is something else we should write that out.

Copy link
Contributor Author

@Lymia Lymia Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda very suspicious that this even works myself.

The documentation for asm! states that "by default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc." This should work well enough, since an external FFI call is about the right thing for representing e.g. what the DMA unit does. (i.e. reads from the input *const and writes out to the *mut pointer -- if in a concurrent way that's hard to represent in Rust's type system.)

The thing that makes me doubt myself a lot is that this code is basically the same as std::hint::black_box, except it uses the asm! macro instead of llvm_asm!.

... which is concerning because the documentation for that function says "black_box is only (and can only be) provided on a 'best-effort' basis. The extent to which it can block optimisations may vary depending upon the platform and code-gen backend used. Programs cannot rely on black_box for correctness in any way.". I'm not sure if that's there for future-proofing or if there's a known way that LLVM can optimize across black_box in an unwanted way already.

We could always just use compiler_fence instead of having these functions, honestly, which will work for sure at the cost of emitting calls to the empty __sync_synchronize function until that bug is fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this fixes the bug in the hello_world example in practice, but I don't know if it's guaranteed to always do so in the future.

Copy link

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loka asked me to take a look. I know nothing about embedded or GBA, but have written a lot of concurrent and lock-free code. My comments are mostly under the assumption that the concerns between sharing data between interrupts and normal code are basically the same as with multithreaded, although I know in practice they are somewhat less. The fence I'm referring to is compiler_fence, not a real fence, since I know you can't use one of those.

* Added needed compiler fences to `InitOnce::try_get`.
* Change the error in `RawMutex::raw_unlock` to better reflect the cause.
@Lymia
Copy link
Contributor Author

Lymia commented Feb 21, 2021

What's left to do in this PR?

@Lokathor
Copy link
Member

I think we're good. @Lymia i'll merge if you're done here.

@Lymia
Copy link
Contributor Author

Lymia commented Feb 21, 2021

I can't think of anything else I want to do here. Should be good to merge.

@Lokathor Lokathor merged commit 2aa59bb into rust-console:master Feb 21, 2021
@Lokathor Lokathor removed their assignment Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants