Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPU LLVM arm64+macOS port #12115

Merged
merged 16 commits into from
Jun 14, 2022
Merged

PPU LLVM arm64+macOS port #12115

merged 16 commits into from
Jun 14, 2022

Conversation

sguo35
Copy link
Contributor

@sguo35 sguo35 commented May 28, 2022

Fixes PPU LLVM recompiler to work for arm64 (only ppu_thread.elf works so far). Fixes virtual memory handling and W^X JIT restriction handling on macOS to work for Apple Silicon (lot of hacks 😢 ). Also fixes debug symbols on Mac.

Potentially breaking changes:

  • x86_pshufb renamed to pshufb
  • All calls out of JIT code to rpcs3 C++ code are marked non-tail now (as they aren't tail calls)

Build:

brew install molten-vk vulkan-headers sdl2 nasm qt@5 ninja cmake glew ffmpeg
export Qt5_DIR=$(brew --prefix)/opt/qt5
export VULKAN_SDK=$(brew --prefix)/opt/molten-vk
# taken from build-mac.sh
ln -s "$VULKAN_SDK/lib/libMoltenVK.dylib" "$VULKAN_SDK/lib/libvulkan.dylib"
export VK_ICD_FILENAMES=$VULKAN_SDK/share/vulkan/icd.d/MoltenVK_icd.json

git clone git@github.com:sguo35/rpcs3.git
cd rpcs3
git checkout arm64
git submodule update --init

# taken from build-mac.sh
sed -i '' "s/extern const double NSAppKitVersionNumber;/const double NSAppKitVersionNumber = 1343;/g" 3rdparty/hidapi/hidapi/mac/hid.c

mkdir build
cd build
cmake .. -DPNG_ARM_NEON=on -DUSE_ALSA=OFF -DUSE_PULSE=OFF -DUSE_AUDIOUNIT=ON -DUSE_NATIVE_INSTRUCTIONS=OFF -DUSE_SYSTEM_FFMPEG=on -DCMAKE_OSX_ARCHITECTURES="arm64" -DLLVM_TARGETS_TO_BUILD="X86;AArch64;ARM" -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DUSE_SYSTEM_MVK=on
ninja && codesign --force --deep --sign - bin/rpcs3.app/Contents/MacOS/rpcs3

# Run it
./bin/rpcs3.app/Contents/MacOS/rpcs3

You may get some MoltenVK errors in build, in that case install VulkanSDK and run this (replace the 1.3.204.1 with your actual version number):

 export VULKAN_SDK=$HOME/VulkanSDK/1.3.204.1/macOS
 export VK_ICD_FILENAMES=$HOME/VulkanSDK/1.3.204.1/macOS/share/vulkan/icd.d/MoltenVK_icd.json
 export DYLD_LIBRARY_PATH="$VULKAN_SDK/lib:${DYLD_LIBRARY_PATH:-}"
 export VK_LAYER_PATH="$VULKAN_SDK/share/vulkan/explicit_layer.d"

Tested only on ppu_thread.elf, I tried with an actual game and something in SPU code will segfault. You may also get a PPU thread segfault the first time you run ppu_thread.elf after compiling, but it should go away if you run it again. I haven't been able to track down why yet.

Major hacks:

  • unmapping memory is done by just telling the OS to swap it out and marking it RW, because of the 16K page issue
  • mmap overwriting existing mmap'd memory with MAP_JIT tag is done by munmap and then mmap the same location, since Apple bans MAP_FIXED + MAP_JIT from overwriting existing mmap entries

Copy link
Contributor

@Megamouse Megamouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at the code style

@@ -135,7 +140,11 @@ namespace utils
size += 0x10000;
}

#ifdef __APPLE__
auto ptr = ::mmap(use_addr, size, PROT_NONE, MAP_ANON | MAP_PRIVATE | MAP_JIT | c_map_noreserve, -1, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're using spaces instead of tabs, which is why it may appear as correct on your end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that only these lines were wrong. I replaced with tabs it looks good now.

@sguo35
Copy link
Contributor Author

sguo35 commented Jun 1, 2022

@Megamouse Not sure why GH is showing some of the indents, I checked in vim and Github and they are not there (see https://github.com/RPCS3/rpcs3/blob/91add25d9cc26ec74aa5eff785d8776197759b43/rpcs3/util/vm_native.cpp for example). Fixed the rest of the formatting, sorry.

@Nekotekina I cherry-picked your commits and seems to work.

sguo35 added 13 commits June 2, 2022 17:17
Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.
Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.
Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.
Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.
Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.
Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.
Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.
Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.
Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.
@sguo35 sguo35 force-pushed the arm64 branch 2 times, most recently from e2e1365 to 9f9cb81 Compare June 5, 2022 04:05
@sguo35
Copy link
Contributor Author

sguo35 commented Jun 9, 2022

@Nekotekina Is there a blocker preventing this from being merged?

@Nekotekina
Copy link
Member

Probably no, just needs some corrections.

@kd-11
Copy link
Contributor

kd-11 commented Jun 12, 2022

I can confirm this also fixes the ppu test on linux aarch64 (asahi)
Trying any actual apps crashes with segfaults when invoking module entrypoints (specifically cellVideoOutGetState) but that is outside the scope here.

Comment on lines 4249 to 4253
#ifdef ARCH_X64
SetFpr(op.frd, m_ir->CreateXor(xormask, Call(GetType<s32>(), "llvm.x86.sse2.cvtsd2si", m_ir->CreateInsertElement(GetUndef<f64[2]>(), b, u64{0}))));
#else
SetFpr(op.frd, m_ir->CreateXor(xormask, Call(GetType<s32>(), "llvm.aarch64.neon.fcvtns.i32.f64", b)));
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be dumb, but why are you calling the x86 instruction inside ifdef ARCH_X64, and the aarch64 inside the else. Shouldn't it be the opposite way?

Copy link
Member

@AniLeo AniLeo Jun 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X64 is not aarch64

Copy link
Contributor

@nastys nastys Jun 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It basically says "if ARCH_x64 (aka Intel x86_64), use SSE2; else, assume it's AArch64 (aka ARM64) and use Neon."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants