-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iOS: building with LLVM causes some exception catch clauses to not work #56100
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionSee the following test code: https://gist.github.com/rolfbjarne/f5c1a3697343c5e9bb6cb8e5b796d328#file-appdelegate-cs-L23-L73 When executed on an iOS device (AOT compiled), this produces the following output:
if I enable LLVM, the following happens:
It's a rather strange bug, because in my original test it doesn't crash this way, the In any case, let me know if you can reproduce with this information, or I'll create a test case you can use (which will likely require building a custom branch of xamarin-macios).
|
How can I reproduce this ? |
|
I can reproduce, however the failure seems random, sometimes it works. |
Did this ever work with .net core ? |
I don't know, it's the first time we've had support for LLVM (I ran into this while implementing LLVM support). |
EDIT: Nevermind, I can intermittently repro this failure EDIT 2: I can't reproduce this using "normal" LLVM FullAOT (not statically linking everything into a single binary) on arm64 Linux |
A note: if I build the repro with EDIT: Nevermind, it doesn't. I can reproduce this in Release too. |
Doesn't seem like a consistent repro. @imhameed - if you determine consistent repro steps and have a low risk fix, we will consider backporting to 6.0 - For now moving to 7.0 |
This is blocking LLVM for us. It's 100% consistent on our main test suite, where it breaks a huge number of tests. There's no way we can release LLVM support with this bug. |
Got it @rolfbjarne ---- @imhameed has been working on narrowing down the issue, so we should be able to get in a fix. @imhameed please update this with your progress |
A smaller repro: https://gist.github.com/imhameed/cdeba5b8bb9cd879b688b97292602e58 Still fails intermittently for me, even when re-launching an already-installed copy of this via mlaunch. Exception handling ought to be deterministic, but maybe the attempt to open a socket can fail in multiple ways that all raise the same exception, and one of these failing cases has bad codegen. Haven't yet looked at the IR we're generating. Also bad: this code crashes with an "Unhandled managed exception" 100% of the time. |
Nested try clauses ought to fall back to mini instead of LLVM: runtime/src/mono/mono/mini/mini-llvm.c Lines 11051 to 11057 in 44b1cd6
Working on getting a locally-built cross-compiler shimmed into xamarin-macios right now. EDIT: Got a locally-built cross-compiler working with the existing toolchain. I see |
Ref: https://bugzilla.xamarin.com/37/37273/bug.html (since it's not obvious where the original issue was filed) |
Also related (tracking a longer-term cleaner fix): #54176 |
So llvm used to work with previous xam.ios versions based on the mono/mono codebase, I don't think this problem is caused by bcl changes, it seems more likely to be a change in the build system, a missing flag etc. |
Observation: The generated assembly and the "JitInfo EH clause" list for The flags I saw used when building with LLVM are:
and without:
If I hack up |
Here's the LLVM IR along with the mono-aot-cross/opt/llc flags used for compilation for this repro: https://gist.github.com/imhameed/bd2564d0df45c580294888eb9f4770c4 I tried using a cross-compiler built against LLVM 9 and rebuilding the repro using LLVM 9 opt and llc. That made no difference. And nested clauses might not be related; https://gist.github.com/imhameed/ccd542f151624c6399bd6d5c0f74b085 also crashes. Going to trace our EH code next. |
If you change the testcase to:
does it still fail ? |
@vargaz Yes:
still causes:
|
Got a locally-built copy of the runtime working and I can printf debug on-device now. When built with LLVM, the runtime fails to find any associated catch instruction ranges when unwinding. I'll keep looking into why. |
When unwinding, the unwinder instruction pointer is set to an address that is not contained within any of the module ranges registered with |
JFYI I was able to reproduce this on maccatalyst-arm64. It may be slightly easier to debug than the iOS version. Curiously, when compiled as part of the FunctionalTests framework in dotnet/runtime it run just fine. When compiled using Xamarin SDK it failed |
@filipnavara how did you build this with the dotnet/runtime functional test framework? And which repro .cs did you use? (Currently I'm hacking up llc to dump mono unwinding information to stdout in a somewhat readable way.) |
So, I started with an M1 MacBook and Then I replaced Program.cs of the AOT-LLVM functional test with one of your codes above (doesn't matter which one but I used the first reduced repro from GIST). Finally I compiled the sample with if you need more precise steps I can send them when I get back to my computer. It never crashed though. I also copied the same source code into a |
I'll give that a shot; thanks. I'll take a look at the AOT flags we use and why the functional test has different behavior. |
If there's anything else I can test or help with, let me know. I am afraid I am still few steps behind you though so there's not much I can offer. I compared the AOT flags in both MSBuild logs and they seemed identical. However, different tasks are used to compose the final set so I could have easily missed something. |
Unwinding directives for LLVM-compiled Raise and SelectHostName in https://gist.github.com/imhameed/ccd542f151624c6399bd6d5c0f74b085 look reasonable: https://gist.github.com/imhameed/aef3e164aa20cffe5d7e1e0ac85c149e (just a prologue stack adjustment followed by |
I can confirm that it fails very early on in the unwinding so it's likely not a problem in the actual unwinding information. Working unwind:
Non-working unwind:
|
I've narrowed it down a bit. When looking up the very first frame The looked up entry looks like this:
The code size looks incorrect. |
It seems that it's possible to catch the corruption early on by adding this:
to |
So, what happens is that one AOT module has both JIT code and LLVM code. When linked into final executable these sections are quite far apart and there is a different code in-between. When the last JIT method is added it (incorrectly) fills the whole gap. Later lookups for the code that is actually in the gap returns incorrect information. |
Thank you! And it looks like the link order xamarin-macios is using now is determined here: https://github.com/xamarin/xamarin-macios/blob/4380161309528e543107ede2ea5881299495d7ef/dotnet/targets/Xamarin.Shared.Sdk.targets#L856-L866 Contrast with: https://github.com/xamarin/xamarin-macios/blob/xamarin-ios-13.18.0.21/tools/mtouch/Target.cs#L1120-L1125 |
Thanks for explaining the missing piece of what has changed! I don't necessarily care if my PR is accepted as the final solution but I am happy to see progress on this and explanation for the behaviour. |
…58491) Fixes #56100 Here's a little bit of background on the problem. When multiple AOT modules got linked into the same executable for iOS/MacCatalyst the LLVM and JIT code sections of different modules got interleaved. The resulting AOT module info looks something like this: ``` JIT start: 0x101002100 JIT end: 0x1010035e0 LLVM start: 0x104c027c8 LLVM end: 0x104c603e8 Sorted methods: 0x101002250 0x101002390 0x101002470 0x104c027c8 0x104c027e0 ... ``` The previous code incorrectly assumed that the third method had a code size `0x104c027c8 - 0x101002470`. When inserted into lookup table any of the AOT code inside the range `0x1010035e0 (JIT code end) - 0x104c027c8 (LLVM code start)` would incorrectly end up being mapped to the last JIT method. (Note: The JIT name here is misleading.)
Description
See the following test code: https://gist.github.com/rolfbjarne/f5c1a3697343c5e9bb6cb8e5b796d328#file-appdelegate-cs-L23-L73
When executed on an iOS device (AOT compiled), this produces the following output:
if I enable LLVM, the following happens:
It's a rather strange bug, because in my original test it doesn't crash this way, the
SelectHostName
code causes a lot of other test failures due toAssert.Throws<FooException>
statements not actually catching theFooException
, but then the general NUnit catch handler catches these exceptions, and turn them into test failures. The baffling effect was that running the tests outside of our test harness (without needing to call theSelectHostName
method), the tests passed.In any case, let me know if you can reproduce with this information, or I'll create a test case you can use (which will likely require building a custom branch of xamarin-macios).
The text was updated successfully, but these errors were encountered: