-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M1/M2 Compatibility #56
Comments
Hi @gilaroni, I have started on M1 support, it might take a bit but it's on the roadmap, yes. |
Hello, |
Any news here, its always not possible to install valgrind .... brew install --HEAD LouisBrunner/valgrind/valgrind If reporting this issue please do so at (not Homebrew/brew or Homebrew/homebrew-core): valgrind's formula was built from an unstable upstream --HEAD. |
any update for the M1/M2 compatibility? |
ran into the same error as @eliesaikali on my M2 Pro MacBook Pro running macOS Ventura 13.2.1 (22D68). |
+1 |
M1 Pro MacBook Pro running macOS Venture 13.3.1. Same issue as @eliesaikali and @hacknus. |
Any details about the roadmap of Valgrind runs in the M1 arch? |
any update for the M1/M2 compatibility? |
This comment was marked as spam.
This comment was marked as spam.
Still need help? |
Would really like Valgrind w/ MacOS Silicon for school purposes. We're using it at school!
|
Pleeeassseee bump this up. We're using valgrind in our cs101 class to help with memory leaks in projects related to singly linked lists and pointers. We're also going to need valgrind in our upcoming cs102 class. I don't mind using our college's ubuntu desktops with valgrind, but the labs close at 10pm, and a lot of us do our best studying after 10pm. |
Speaking from experience, a huge amount of work is needed to get things running smoothly. |
What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ? |
@MartinDelille You can use leaks if you like on MacOS |
I wasn't aware of this xcode tool! Thanks a lot! 👌 |
Yes probably, but some people want to contribute like @JoonasMykkanen, but he doesn't get any answer |
@LouisBrunner OK, very interesting. I'm seeing exactly the same thing. When the assert happens the guest code is
Which is 0x15e38 + 0x1bd8 = 0x17a10
And the assert in VEX is
Initially I thought that I was getting some address range error and writing into and corrupting the generated code. If you're seeing the same thing that is less likely. I did try Linux arm64 with clang (but using libstdc++ not libc++ - don't think that makes a difference) and I didn't see any errors like this. I'll see if I can use ld.lld. |
It's difficult to pin down because I have many different scenarios. Running with vgdbI get SIGILLs during the early stage of dyld setup (around this area https://github.com/apple-oss-distributions/dyld/blob/d1a0f6869ece370913a3f749617e457f3b4cd7c4/dyld/dyldMain.cpp#L1195). The exact guest RIP or instruction is never the same as the crash happens in the VEX-generated code. It's the
IIRC the crash is always on the load. Running with lldbI get a crash on the Running directlyA healthy mix of SIGILLs, asserts and sometimes mmap failure (not sure yet why this is so mercurial). What is so odd to me is that it's so consistent. As I am doing this testing, I only get SIGILLs but yesterday when I tried it was only asserts. This morning it was mmap failures. SummaryWhile it's really likely we are encountering the same issue, I also have other problems which might be causing this. Or the evCheck probem and the assertion are related somehow. |
I also get SIGILLs, for instance if I single step using vgdb or --vex-guest-max-insns=1 This isn't code that I know at all well unfortunately. This looks too similar for it to be a coincidence. I don't see much in the way of platform dependent stuff in any of the files, so it's a bit of a mystery why there is no problem on Linux. I tried adding an "vassert(False)" in chainXDirect_ARM64 on Linux this morning and it triggered straight away, so the problem isn't that macOS and FreeBSD use that function and not Linux. |
If I understand the code correctly it works as follows.
Then the assert is checking that the address above corresponds to disp_cp_chain_me_EXPECTED I think that address is 0x1002991220 which contains
That address should match disp_cp_chain_me_EXPECTED, but that contains something completely different.
The function in the assert is a bit hard to follow as it doesn't extract the address and compare the two addresses, it generates opcodes from the address and compares the opcodes. My thoughts at the moment are that this us a problem of matching chainXDirect_ARM64 and unchainXDirect_ARM64. It looks to me like "chain" is being called on instructions that have been already "chained" once but not "unchained". |
And there may be something in this bugzilla https://bugs.kde.org/show_bug.cgi?id=412377 since there is some connection between chaining and the icache:
|
Hmm no Linux doesn't seem to use unchainXDirect_ARM64 but every place_to_chain address is unique. That's not the case on FreeBSD. |
Good shout,
Me neither, it's quite tough.
That's true but while using breakpoints in LLDB I was able to run
Thanks for the deep dive, I haven't looked at the assert much because the SIGILL is much more problematic for me right now.
This is also where I have to do special JIT memory permission tricks due to changes in Apple Silicon (https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon#Disable-Write-Protections-Before-You-Generate-Instructions). I wonder if this plays a factor as well.
I haven't checked in details but they looked pretty unique on macOS. What do they look like on FreeBSD? |
Quick update: thanks to @paulfloyd we have found the source of the SIGILLs/assertions (which was that cache invalidation for JIT was only enabled for Linux). This particular problem is now fixed on both macOS and FreeBSD. I also fixed some syscall issues and added even more support for new arm64 instructions. While this is good news as Apple Silicon support is definitely moving forward given that Valgrind is now reliably running part of the guest binaries, it is not yet running them completely without crash. However we have never been this close to getting it working and I am very hopeful for the near future! |
I don't have the bandwidth to contribute to this right now-- can I buy you a coffee or some takeout? You're holding all our hopes and dreams. |
FreeBSD arm64 is now working reasonably well. No signals or threads just yet. == 709 tests, 235 stderr failures, 47 stdout failures, 5 stderrB failures, 6 stdoutB failures, 0 post failures == |
This comment was marked as duplicate.
This comment was marked as duplicate.
Great news, I have managed to run a guest binary on M1 through Valgrind for the first time. But while I am very pleased, let me be absolutely clear: this is in no way stable and can at best be described as an experimental prototype. I just thought it was too much progress not to post it. If you'd like to test it for yourself, you can clone this repo and build the Summary of the roadmap ahead:
|
Thanks for all of the hard work on this @LouisBrunner, getting valgrind to work for Apple Silicon is going to be so great for anyone trying to do development in C/C++ on a Mac. As it stands I switched to Ubuntu for my tasks that require memchecks, but just know that many of us cannot wait to jump back and check out any first stable version. |
And a bit of news from upstream. This morning I pushed the code for FreeBSD arm64. Other than the occasional failure related to setting up memory for new threads (I think) it works pretty well. Might help a bit with stuff like signal resumption where Darwin and FreeBSD have very similar code. |
Not sure if helpful, but FYI I got something semiworking on a complex app with 2 patches, one that adds a (hackish) DC_ZVA instruction implementation and another that avoids crashes trying to load debug info for a lot a system libraries.
With this, memcheck seems to work perfectly for my case. --tool=massif however does not... |
There are a few bugzilla items similar to this. First we need a better implementation of the mrs instructions. Are they documented for Darwin? |
Oh, very interesting! Does your app uses threads at all? Do you interface with objc in any way?
Thanks, I will look into those. If you have any extra background on the reasoning behind those fixes, that would be very nice. Some specific questions I have:
If you get it working, feel free to post here. I am focusing mostly on memcheck as it's the default tool and there is still so much that is broken.
Couldn't find much personally. The OSS source code have a few mentions of the Apple-specific registries and that's kind of it. I don't even think you can run
Not sure what you mean by this. I added a few extra ones here, here and here. |
I've only started looking at this. My understanding is that the MSR instructions are kind of like a 'soft' version of CPUID. Unlike CPUID which returns baked-in values MRS traps to the kernel. What I don't know yet is whether it is always safe to use a dirty helper or not. It will be a problem if the kernel reports capabilities that Valgrind doesn't support. |
I don't think so, the OS parts of macOS at least seem to not bother and just assume things, like the granularity at which dc zva works As far as I can tell the libc usages don't care what the MSR contains (understandable, the loop would be far more complex and costly if it did).
No, no threads at all (by default). Just a command-line application primarily targetting Linux, so no objc either.
Sorry, I should indeed have written more: there is no issue with my binary. The problem is with in-memory system libraries where this triggered. First I tried to skip all offending binaries by name in img_from_memory (it ONLY happens with system libraries and when img_from_memory is used), but it got a bit much and went with this more general solution.
Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check.
Yes, makes total sense. The massif issues, whatever they are, seem a bit beyond my skills. For now I'll just hope that someone else does some fixes that just happen to fix that, too! |
macOS isn't my only concern. I also want to ensure that everything works correctly on FreeBSD and Linux. |
None of my patches should change anything for FreeBSD or Linux. |
That ticket is old and lacks reproducers, but it sounds like Android might have the same issue (optimized libraries not checking if the instruction is available before using it). |
I put my patch also on that ticket, maybe there is some user feedback on it. Anyway feel free to re-use any part of my patches in any way you want. |
Hello, I've just tried building the M1 branch on my M1 Max and I got the exact same issue than rdoeffinger related to debug info.
After applying the patch and recompiling vg-in-place is now able to trace trivial applications such as ls. I have tried running a more advanced own app but it has 2 issues:
This is the output of VG when attempting to trace a custom CLI tool using a Rust dylib (everything is built with debug info including the Rust dylib):
I wonder if these errors are due to VG internal issues or if they are real and Apply really needs to fix their dyld... Are these errors expected? EDIT: I also wonder why it's calling _NSThreadPoisoned as in my test I have not used any threads. |
@Yuri6037 FYI, I edited your comment to remove the bulk of the errors (as they aren't particularly useful for this discussion and can still be accessed through the Github edit history otherwise) because your message made it difficult to scroll through this issue.
Good to know that it isn't an isolated issue. I have yet to reproduce it, I will have to look into it a bit more.
Valgrind should provide all DYLD env vars to the real dyld when running your binary. If you have a MRE, I would gladly look into that issue.
This is usually due to Valgrind not knowing about how the memory is laid out on macOS. In the past, they have been hidden through a copious amount of suppressions (a special Valgrind file). I haven't seen those specific one yet, so I don't know what's causing it but the general rule I found is: if you dig down to why an issue like
As stated in my latest update, this is currently expected. I haven't had time to investigate why that happens apart that objc is being loaded at some point. You don't have to use threads especially but if you use any macOS capability which rely on objc, you will have this crash.
I am not very knowledgeable about this so thanks for this background as it will make easier to research and fix. |
I just pushed a bunch of changes upstream for arm64 concerning several mrs and dc opcodes. That includes mrs dczd_el0 and dc zva. |
will there be a version for m1?
The text was updated successfully, but these errors were encountered: