Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler read/write reorderings make DMA flaky #6

Open
ayrtonm opened this issue Jan 15, 2022 · 3 comments
Open

Compiler read/write reorderings make DMA flaky #6

ayrtonm opened this issue Jan 15, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@ayrtonm
Copy link
Owner

ayrtonm commented Jan 15, 2022

DMA transfers can be flaky due to compiler misoptimizations. The problem is that llvm emits a MIPS-II instruction for the compiler_fence. So rustc gives the following error when I try adding the fence. Afaiu other llvm backends implement compiler fences without emitting instructions so it should be possible to fix this codegen issue.

LLVM ERROR: Cannot select: 0x7f26902f1f88: ch = MipsISD::Sync 0x7f269009e838, Constant:i32<0>
  0x7f269026b230: i32 = Constant<0>
In function: _ZN3psx11framebuffer11Framebuffer4swap17h0b2ca510268f3486E

For now the least flaky solution seems to be calling black_box on the pointer passed to the DMA channel (even though it shouldn't be relied on for correctness). I still get builds where DMA doesn't work (usually when toggling LTO or the min-panic feature), but it seems less frequent than with the llvm_asm("":::"memory", "volatile") hack.

@ayrtonm ayrtonm added the bug Something isn't working label Jan 15, 2022
@ayrtonm ayrtonm changed the title Compiler read/write reordering make DMA flaky Compiler read/write reorderings make DMA flaky Jan 15, 2022
@ayrtonm
Copy link
Owner Author

ayrtonm commented Apr 14, 2022

The new patch impiaaa provided fixes the problem with compiler_fence and now examples/cube (which makes use of DMA) seems to be working in mednafen with any combination of build flags. I should look at the IR to see what the fence actually expanded to.

@ayrtonm
Copy link
Owner Author

ayrtonm commented May 25, 2022

Turns out the patch might not fix the underlying issue since it makes compiler_fences expand to libcalls to __sync_synchronize. This is provided by libgcc, but MIPS-I doesn't have a sync instruction so it might not be possible to implement the fence correctly this way. The proper fix would be to add support for pseudo-fences in the LLVM backend. At least some __sync_synchronize implementations in libgcc use asm("nop":::"memory") so that may also be good enough. This might be what libgcc defaults to for MIPS-I too, but I haven't gone through the source in depth.

To avoid having to build libgcc, the sdk could also provide __sync_synchronize or I could try inlining the asm("nop":::"memory") in the DMA routines. Either way it'd be nice to get one of the fixes upstreamed in LLVM since the target has the core::sync::atomic module disabled due to this problem. This means that AtomicU8 and AtomicU16 aren't available even though they don't require fences.

@ayrtonm
Copy link
Owner Author

ayrtonm commented Jun 5, 2022

After playing around with asm!("nop") fences, I'm starting to think that the biggest misoptimization issue is actually related to linked list packets. Essentially since linked list packets are separate objects, the compiler has no way of knowing that when the GPU DMA channel gets a reference to one packet it will follow the "pointer" (i.e. next: PhysAddr([u8; 3])) to the next packet. This is mostly a problem when going from a normal build to LTO since the compiler seems to be eliding the writes that initialize and modify packets. The most reliable solution seems to be calling black_box on the next packet reference when linking packets (again this is a total hack).

An alternative which may work without this hack is to attach a PhantomData<&'a ()> with the lifetime of the reference to the next packet. This won't modify the layout of Packet<T> (which must be repr(C)) but makes the packet linking API more complicated. In cases where a packet doesn't ever point to anything the lifetime can be unconstrained or set to 'static to avoid bubbling it up into whatever struct contains the Packet<'a, T>. I'm not sure how feasible this is since the current link_packets implementation won't work as-is with this lifetime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant