-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT regression with Android+net6 P7 #55375
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to 'arch-android': @steveisok, @akoeplinger Issue DetailsDescriptionContext: dotnet/android#6072 PR dotnet/android#6072 is a .NET 6 P7 SDK bump from dotnet/installer@e8b3b6be to dotnet/installer@bd5653f3, which includes dotnet/runtime changes in 02f70d0...0605bb3. Changes within this commit range breaks one of our unit tests: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=4952636&view=ms.vss-test-web.build-test-results-tab&runId=23305012&resultId=100052&paneView=debug
This failure only happens when the JIT is used. We have a "sibling" app which runs the same tests with the interpreter enabled, and those tests pass. Repro steps:
Of particular note -- revisited below -- is the call stack: unless some form of inlining or something is going on, that call stack isn't possible: the Configuration
Regression?Yes. This was previously working with 02f70d0; see also dotnet/android@d593b72. Other informationThe original failing build may disappear, and someone will want access to the assemblies used in the test. Assemblies within the Which calls:
This is where things get weird: in particular, note the body of the public XmlPullParserReader (IJavaObject source)
{
this.source = (IXmlPullParser) source;
supports_ns = this.source.GetFeature (XmlPullParser.FeatureProcessNamespaces);
supports_ns_report_as_attr = this.source.GetFeature (XmlPullParser.FeatureReportNamespaceAttributes);
} Note that the Recall again the stack trace in the failure:
Where is (Also note that this test passes when using the interpreter. Mono.Android.NET_Tests-Interpreter-Signed.zip "Fortunately", this is a linked app, so we can "just" disassemble
One of those call sites is from
Which leaves I can't easily reach a Thus, my conjecture: "something broke" in the JIT (02f70d0...0605bb3) which causes the post-linked Unfortunately, nothing jumps out to me in the src/mono changes:
(Linker change could also be in play, but since the post-linked IL for
|
/cc @SamMonoRT |
Related: https://discord.com/channels/732297728826277939/732297837953679412/862827287343136839 I re-ran the suite with inlining disabled:
Using
…which was unexpected. Turns out, disabling lots of the optimizations also triggers the same assertion. I was able to get |
Another idea which came to mind was to enable
Unfortunately, this takes significantly longer to execute: 19 minutes:
It also produced a 758M log Looking for the
we can find something that looks reasonable:
Thing is, this does look weird. We have a For context, the disassembled IL for
IL shows a
All is well and good. But then!
huh? As per the IL shown above, after the first "Verifying" this, Additionally, the Somehow, the second |
Context: dotnet/runtime#55375 There are lots of tests in `tests/Mono.Android-Tests`, but only two tests are failing. Add a `[Category("dotnet-runtime-55375")]` to the failing tests so that we can more easily execute *just* the tests in question: adb shell am instrument -e include dotnet-runtime-55375 -e loglevel Verbose -w Mono.Android.NET_Tests/xamarin.android.runtimetests.NUnitInstrumentation
Context: #6072 Context: dotnet/runtime#55375 Context: https://discord.com/channels/732297728826277939/732297837953679412/862829757460512788 @vargaz wondered if dotnet/runtime#55375 is because of the linker. Disable the linker, and lets see what happens!
@vargaz suggestion to disable the linker worked: the unit tests pass when linking is disabled. This suggests something in here is responsible for the breakage: dotnet/linker@a07cab7...f574448 This still confuses me, because it suggests that perhaps I can't rely on |
Context: dotnet/runtime#55375 (comment) This reverts commit 004725d. > Disable the linker, and lets see what happens! What happens is…all the tests pass! Which suggests that this *is* a linker-related (adjacent?) issue, *not* a JIT issue, as I had previously believed. (I still don't understand *how* it's linker-related, and now I have to wonder if I can't "believe" the output of `ikdasm`, which is an equally troubling thought…) Re-enable the linker, so that tests once again *fail*. (I'm also not able to repro the crash locally, so if I want to further explore things, exploring CI output is the only path forward.)
This reverts commit 5062dd8. Context: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=4963171&view=results Context: #6072 (comment) I *thought* that re-enabling the linker (23310da) would cause the unit tests to once again start failing. The **APKs .NET** tests are *passing*. Meaning `Android.RuntimeTests.XmlReaderPullParserTest.ToLocalJniHandle()` is now *passing*. I had thought this was due to the linker, but could it instead be due to commit 5062dd8? Revert 5062dd8; let's see if it once again breaks. *Locally*, `XmlReaderPullParserTest.ToLocalJniHandle()` fails when commit 5062dd88is not applied.
…and I'm wrong. Disabling the linker isn't what made the tests pass. Adding The fact that adding a |
I'm now able to repro locally. (Yay.). Disabling the linker does not fix the issue. I do not think that this is related to linker behavior. |
New, easier, repro steps!
Expected results: no crash! Actual results: crash! See
|
Note that the new and improved and simpler repro also fails when the linker is disabled ( |
Now that it's smaller, trace output should be more reasonable, right?
Results in a much more "reasonable" 13 MB trace file, in case that's of interest: trace.zip @vargaz had previously suggested disabling inlining. Previously, using
|
@vargaz is this something you can take a look at? |
I would suspect 5b337cb |
[ |
@vargaz : thanks! Unfortunately, My repro steps: cp $HOME/.nuget/packages/microsoft.netcore.app.runtime.mono.android-arm64/6.0.0-preview.7.21352.16/runtimes/android-arm64/native/libmonosgen-2.0.so $HOME/.nuget/packages/microsoft.netcore.app.runtime.mono.android-arm64/6.0.0-preview.7.21352.16/runtimes/android-arm64/native/libmonosgen-2.0.so.orig
cp ~/Downloads/good.so $HOME/.nuget/packages/microsoft.netcore.app.runtime.mono.android-arm64/6.0.0-preview.7.21352.16/runtimes/android-arm64/native/libmonosgen-2.0.so
# try good.so:
cd path/to/scratch.getxml
rm -Rf bin obj
~/android-toolchain/dotnet/dotnet build /t:Install
adb shell am start-activity scratch.getxml/my.Activity
## result: no crash. Yay!
# verify w/ bad.so
cp ~/Downloads/bad.so $HOME/.nuget/packages/microsoft.netcore.app.runtime.mono.android-arm64/6.0.0-preview.7.21352.16/runtimes/android-arm64/native/libmonosgen-2.0.so
rm -Rf bin obj
~/android-toolchain/dotnet/dotnet build /t:Install
adb shell am start-activity scratch.getxml/my.Activity
## result: crashes. Good sanity check!
# Try fix.so
cp ~/Downloads/fix.so $HOME/.nuget/packages/microsoft.netcore.app.runtime.mono.android-arm64/6.0.0-preview.7.21352.16/runtimes/android-arm64/native/libmonosgen-2.0.so
rm -Rf bin obj
~/android-toolchain/dotnet/dotnet build /t:Install
adb shell am start-activity scratch.getxml/my.Activity
## result: crashes. :-( |
@vargaz provided additional native libs to try: https://www.dropbox.com/s/d1k8kprxorhbb85/libs-2.tar.gz?dl=0 Results:
|
Using the latest
Which raises a terrible thought: previously, I was only checking to see if the app launched or not. I wasn't checking
|
Using the most recently provided
|
Should be fixed now, please verify with newer packages. |
Description
Context: dotnet/android#6072
Context: 02f70d0...0605bb3
Context: dotnet/linker@a07cab7...f574448
PR dotnet/android#6072 is a .NET 6 P7 SDK bump from dotnet/installer@e8b3b6be to dotnet/installer@bd5653f3, which includes dotnet/runtime changes in 02f70d0...0605bb3.
Changes within this commit range breaks one of our unit tests: https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=4952636&view=ms.vss-test-web.build-test-results-tab&runId=23305012&resultId=100052&paneView=debug
This failure only happens when the JIT is used. We have a "sibling" app which runs the same tests with the interpreter enabled, and those tests pass.
Repro steps:
Configure an Android device.
Download the test app: Mono.Android.NET_Tests-Signed.zip
Rename
Mono.Android.NET_Tests-Signed.zip
toMono.Android.NET_Tests-Signed.apk
Install the app on your Android device:
In a new shell window, begin capturing
adb logcat
:Run the unit tests:
Read
log.txt
:Of particular note -- revisited below -- is the call stack: unless some form of inlining or something is going on, that call stack isn't possible: the
XmlPullParserReader(IJavaObject)
constructor doesn't call.GetNamespace()
. See the Other Information section for additional musings.Configuration
Which version of .NET is the code running on? 0605bb3
What OS and version, and what distro if applicable? Android API-29 Emulator, as created by:
What is the architecture (x64, x86, ARM, ARM64)? x64
Do you know whether it is specific to that configuration? No
Regression?
Yes. This was previously working with 02f70d0; see also dotnet/android@d593b72.
Other information
The original failing build may disappear, and someone will want access to the assemblies used in the test. Assemblies within the
.apk
are compressed. Here are the assemblies from the.apk
, uncompressed: assemblies-uncompressed.zipThe unit test: https://github.com/xamarin/xamarin-android/blob/7c5fab13329ee898fb1562f83576a7ca881f2881/tests/Mono.Android-Tests/Android.Runtime/XmlReaderPullParserTest.cs#L18
Which calls:
Android.Content.Res.Resources.GetXml()
, which is generated code:XmlResourceParserReader.FromJniHandle()
&XmlResourceParserReader.FromNative()
: https://github.com/xamarin/xamarin-android/blob/7c5fab13329ee898fb1562f83576a7ca881f2881/src/Mono.Android/Android.Runtime/XmlPullParserReader.cs#L29-L44XmlResourceParserReader(IJavaObject)
constructor: https://github.com/xamarin/xamarin-android/blob/7c5fab13329ee898fb1562f83576a7ca881f2881/src/Mono.Android/Android.Runtime/XmlPullParserReader.cs#L17-L21XmlPullParserReader(IJavaObject)
constructor: https://github.com/xamarin/xamarin-android/blob/7c5fab13329ee898fb1562f83576a7ca881f2881/src/Mono.Android/Android.Runtime/XmlPullParserReader.cs#L78-L83This is where things get weird: in particular, note the body of the
XmlPullParserReader(IJavaObject)
constructor:Note that the
XmlPullParserReader(IJavaObject)
constructor invokes a.GetFeature()
method.Recall again the stack trace in the failure:
Where is
Android.Content.Res.IXmlResourceParserInvoker.GetNamespace(string)
coming from? There is no.GetNamespace()
invocation, just.GetFeature()
.(Also note that this test passes when using the interpreter. Mono.Android.NET_Tests-Interpreter-Signed.zip
Some of the assemblies within the interpreter-based
.apk
differ from those in the JIT-based.apk
, but notMono.Android.dll
.)"Fortunately", this is a linked app, so we can "just" disassemble
Mono.Android.dll
and find all occurrences of::GetNamespace
:One of those call sites is from
XmlPullParserReader::GetAttribute(string, string)
. The other is fromIXmlPullParserInvoker::n_GetNamespace_Ljava_lang_String_()
.n_
-prefixed methods are "JNI Marshal Methods"; they're called "by" Java, not by managed code. The only "normal" way it could appear is if Java were a calling stack frame. This is possible, but doesn't feel particularly plausible in this situation.Which leaves
XmlPullParserReader::GetAttribute(string, string)
, which only has two callers:XmlPullParserReader::GetAttribute(string)
andXmlReaderPullParser::GetAttributeValue(string, string)
.I can't easily reach a
.GetNamespace()
method from the::GetAttribute()
methods. (Maybe I gave up too soon?)Thus, my conjecture: "something broke" in the JIT (02f70d0...0605bb3) which causes the post-linked
IXmlPullParser::GetFeature(string)
invocation to instead invoke aGetNamespace()
method.Unfortunately, nothing jumps out to me in the src/mono changes:
(Linker change could also be in play, but since the post-linked IL for
XmlPullParserReader::.ctor()
continues to contain a method invocation toIXmlPullParser::GetFeature(string)
, this seems less likely.)The text was updated successfully, but these errors were encountered: