Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Add a no-exec code manager for AOT compilation; switch Catalyst CI to JustInterp AOT mode #53197

Merged
merged 18 commits into from
Jun 2, 2021

Conversation

lambdageek
Copy link
Member

@lambdageek lambdageek commented May 24, 2021

Since we cannot easily JIT on arm64, we need to change how we run in interp mode. The old way apparently has some JIT fallbacks, so we need to switch to the mode supported by AOT.


Don't allocate pages with execute permission if we're never going to be executing code. Also don't try to toggle per-thread write protection if we're not expecting to write to executable pages.


Don't assert on Catalyst in mono_codeman_enable_write

We defensively also toggle the page write protect bits when resolving some
AOT patch targets in mono_resolve_patch_target_ext. Instead allow the
call, but don't do anything.


Set ENABLE_MONOTOUCH for MacCatalyst arm64. Define MONOTOUCH in one place


[aot] mscorlib.dll isn't CoreLib on netcore. It's a forwarding assembly. Don't treat it specially


[testing] In JustInterp mode, only AOT System.Private.CoreLib

We only need to AOT the trampolines in System.Private.CoreLib in
interpreter-only mode. We don't need to AOT any of the user code in
other assemblies.

Side effect: fixes the System.Runtime.Loader.DefaultContext testsuite
in JustInterp mode. Still broken in Full AOT mode.
(That testsuite references the System.Runtime.Loader.Noop.Assembly
assembly, but with a different filename
System.Runtime.Loader.Noop.Assembly_test.dll which causes linking
errors due to incorrect symbols in AOT module registration in
AppleAppBuilder)


Fixes #53106

@ghost
Copy link

ghost commented May 24, 2021

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

Don't allocate pages with execute permission if we're never going to be executing code. Also don't try to toggle per-thread write protection if we're not expecting to write to executable pages.

Author: lambdageek
Assignees: -
Labels:

area-Infrastructure-mono

Milestone: -

@lambdageek
Copy link
Member Author

lambdageek commented May 24, 2021

ping @steveisok @vargaz

I haven't tested it yet. But this will stop allocating executable pages on all platforms when we're just doing AOT compilation. As a side-effect it will let the AOT compiler run without special permissions on M1 macs

@vargaz it's just the last 5 commits that have the interesting changes.

Don't allocate pages with execute permission if we're never going to
be executing code.  Also don't try to toggle per-thread write
protection if we're not expecting to write to executable pages.
@lambdageek lambdageek changed the title [DRAFT][mono] Add a no-exec code manager for AOT compilation [mono] Add a no-exec code manager for AOT compilation May 25, 2021
@lambdageek
Copy link
Member Author

lambdageek commented May 25, 2021

Ok, System.Buffers.Tests and System.Runtime.Tests both compile and run with FullAOT MacCatalyst arm64 and with JustInterp MacCatalyst arm64. For System.Runtime.Tests I had to enable MONOTOUCH so we'd get the infinite trampoline pages - otherwise the test runs out of trampolines.

@vargaz System.Runtime.Tests crashes sprodadically consistently with a suspicious stack trace that looks like some kind of gsharedvt ref types problem. it looks kind of familiar but I couldn't recall the details.

https://gist.github.com/lambdageek/1906b3f28166cdf150f4840240caef4d

To repro, build the runtime on M1 with:

./build.sh --os maccatalyst -c Release

then in src/libraries/System.Runtime/tests/

../../../../dotnet.sh build /p:RunAOTCompilation=true /p:MonoForceInterpreter=false /t:Test /p:TargetOS=MacCatalyst -c Release

then run the AOT binary under lldb.

@lambdageek lambdageek requested a review from vargaz May 25, 2021 04:35
We defensively also toggle the page write protect bits when resolving some
AOT patch targets in mono_resolve_patch_target_ext.  Instead allow the
call, but don't do anything.
It's a forwarding assembly.  Don't treat it specially
We only need to AOT the trampolines in System.Private.CoreLib in
interpreter-only mode.  We don't need to AOT any of the user code in
other assemblies.

Side effect: fixes the System.Runtime.Loader.DefaultContext testsuite
in JustInterp mode.  Still broken in Full AOT mode.
(That testsuite references the System.Runtime.Loader.Noop.Assembly
assembly, but with a different filename
System.Runtime.Loader.Noop.Assembly_test.dll which causes linking
errors due to incorrect symbols in AOT module registration in
AppleAppBuilder)
@lambdageek lambdageek changed the title [mono] Add a no-exec code manager for AOT compilation [mono] Add a no-exec code manager for AOT compilation; switch Catalyst CI to JustInterp AOT mode May 26, 2021
@lambdageek lambdageek marked this pull request as ready for review May 26, 2021 16:34
@lambdageek
Copy link
Member Author

The catalyst arm64 tests all compiled, apparently, but some of the runs bailed out like this:

+ sudo launchctl asuser 505 sh ./xharness-runner.apple.sh --targets maccatalyst --timeout 00:30:00 --launch-timeout 00:30:00 --includes-test-runner --expected-exit-code 0 --app /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app --xharness-cli-path /tmp/helix/working/A78F091E/p/microsoft.dotnet.xharness.cli/1.0.0-prerelease.21271.1/tools/net6.0/any/Microsoft.DotNet.XHarness.CLI.dll --output-directory /tmp/helix/working/A78F091E/w/A01708C4/uploads
XHarness command issued: apple test --app /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app --output-directory /tmp/helix/working/A78F091E/w/A01708C4/uploads --targets maccatalyst --timeout 00:30:00 --launch-timeout 00:30:00 --xcode /Applications/Xcode124.app -v
[13:22:41] info: Preparing run for maccatalyst
[13:22:41] info: Getting app bundle information from '/tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app'
[13:22:41] dbug: 13:22:41.9782580 Running /usr/libexec/PlistBuddy -c "Print CFBundleName" /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app/Contents/Info.plist
[13:22:42] dbug: 13:22:42.0421390 Process PlistBuddy exited with 0
[13:22:42] dbug: 13:22:42.0548320 Running /usr/libexec/PlistBuddy -c "Print CFBundleIdentifier" /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app/Contents/Info.plist
[13:22:42] dbug: 13:22:42.0834200 Process PlistBuddy exited with 0
[13:22:42] dbug: 13:22:42.0860260 Running /usr/libexec/PlistBuddy -c "Print UIRequiredDeviceCapabilities" /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app/Contents/Info.plist
[13:22:42] dbug: 13:22:42.1006400 Process PlistBuddy exited with 1
[13:22:42] dbug: 13:22:42.1037350 Property UIRequiredDeviceCapabilities not present in Info.plist, assuming 32-bit is not supported
[13:22:42] dbug: 13:22:42.1042640 Running /usr/libexec/PlistBuddy -c "Print CFBundleExecutable" /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app/Contents/Info.plist
[13:22:42] dbug: 13:22:42.1195970 Process PlistBuddy exited with 0
[13:22:42] dbug: 13:22:42.1533000 Test log server listening on: 0.0.0.0:53270
[13:22:42] dbug: 13:22:42.1562290 *** Executing 'Microsoft.Extensions.Configuration.Xml.Tests' on MacCatalyst ***
[13:22:42] dbug: 13:22:42.1739760 Running chmod +x /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app/Contents/MacOS/Microsoft.Extensions.Configuration.Xml.Tests
[13:22:42] dbug: 13:22:42.1881020 Process chmod exited with 0
[13:22:42] dbug: 13:22:42.1961090 Running open -W /tmp/helix/working/A78F091E/w/A01708C4/e/Microsoft.Extensions.Configuration.Xml.Tests.app
                 With env vars: NUNIT_AUTOEXIT=true NUNIT_HOSTPORT=53270 NUNIT_ENABLE_XML_OUTPUT=true NUNIT_XML_VERSION=xUnit NUNIT_HOSTNAME=127.0.0.1
[13:22:42] dbug: 13:22:42.2675530 The application cannot be opened for an unexpected reason, error=Error Domain=NSOSStatusErrorDomain Code=-10827 "kLSNoExecutableErr: The executable is missing" UserInfo={_LSLine=3691, _LSFunction=_LSOpenStuffCallLocal}
[13:22:42] dbug: 13:22:42.2694550 Process open exited with 1
[13:22:52] dbug: 13:22:52.3682980 Test run failed
[13:22:52] dbug: 13:22:52.4036520 Could not find pid in mtouch output.
[13:22:52] dbug: 13:22:52.4073260 Test execution started
[13:22:53] dbug: 13:22:53.5971330 Test run crashed before it started (no log file produced)
[13:22:53] dbug: 13:22:53.6061860 No crash reports, waiting 30 seconds for the crash report service...
[13:23:25] fail: Application run crashed
                 No test log file was produced
                 
                 Check logs for more information
XHarness exit code: 80 (APP_CRASH)

I will investigate

@lambdageek
Copy link
Member Author

lambdageek commented May 27, 2021

I will investigate

@steveisok @akoeplinger I think xharness will need to do something - macOS seems flaky

Okay, I can repro the failure it's really weird. Basically if I build with ./build.sh -s mono+libs+host+packs+libs.tests --os maccatalyst --arch arm64 -c Release /p:ArchiveTests=true /p:RunAOTCompilation=true /p:MonoForceInterpreter=true, then the build will leave .app files in artifacts/helix/tests/MacCatalyst.AnyCPU.Release/*.app

If I then "simulate" what helix seems to do by just doing

mkdir -p /tmp/f
cd /tmp/f
cp -R <runtime>/artifacts/helix/tests/MacCatalyst.AnyCPU.Release/Microsoft.Extensions.Configuration.Xml.Tests.app . ; open -W Microsoft.Extensions.Configuration.Xml.Tests.app

it will immediately fail with Domain=NSOSStatusErrorDomain Code=-10827 "kLSNoExecutableErr: The executable is missing"

However both of the following work:

cp -R <runtime>/artifacts/helix/tests/MacCatalyst.AnyCPU.Release/Microsoft.Extensions.Configuration.Xml.Tests.app . ; sleep 1; open -W Microsoft.Extensions.Configuration.Xml.Tests.app

or

cp -R <runtime>/artifacts/helix/tests/MacCatalyst.AnyCPU.Release/Microsoft.Extensions.Configuration.Xml.Tests.app . ; sync; open -W Microsoft.Extensions.Configuration.Xml.Tests.app

So I think the right thing is for xharness to do a sync before the open -W

@lambdageek
Copy link
Member Author

Added an xharness issue about it dotnet/xharness#611

@lambdageek
Copy link
Member Author

lambdageek commented May 28, 2021

Catalyst launch issue should be fixed by dotnet/xharness@6969531 -- need to wait for maestro to do its thing in #53423

@lambdageek
Copy link
Member Author

@steveisok I think this is good to go. There were a couple of failures but I think they can be investigated separately - I don't think they're indicative of some underlying problem with this PR.

  1. System.IO.MemoryMappedFiles.Tests - fails some test cases and then crashes on CI and locally using xharness (but not if I run the app directly under lldb for some reason). Probably tests that won't work due to W^X protection
  • System.IO.MemoryMappedFiles.Tests.MemoryMappedViewStreamTests.ValidAccessLevelCombinations(mapAccess: ReadWriteExecute, viewAccess: ReadWriteExecute)
  • System.IO.MemoryMappedFiles.Tests.MemoryMappedViewAccessorTests.ValidAccessLevelCombinations(mapAccess: ReadWriteExecute, viewAccess: ReadWriteExecute)
  1. A failure in System.Linq.Expressions.Tests.InterpreterTests.ConstructorThrows_StackTrace. Reproduces locally. But I don't have a handy explanation why it would fail only under Catalyst with interpreter.
  2. System.Net.WebSockets.Client.Tests - can't reproduce locally with xharness or under lldb. Maybe some network issue on CI

@lambdageek
Copy link
Member Author

Not sure what to make of the GC/Scenarios/LeakWheel/leakwheel/leakwheel.sh failure on Mono llvmaot Pri0 Runtime Tests Run Linux x64 release - I didn't change how mono_valloc() behaves on linux - so not obvious how sgen is affected by this PR.

@steveisok
Copy link
Member

@lambdageek I'd say just skip the ones you find troubling.

@lambdageek
Copy link
Member Author

Aha... leakwheel is #53452

@lambdageek
Copy link
Member Author

CoreCLR failure is unrelated.

Catalyst arm64 had one flaky System.IO.FileSystem.Watcher.Tests failure - but that testsuite also flakes on macOS with mono and with coreclr, so I don't think it's related to this PR.

Catalyst x64 had some flaky failures, too, but also don't look related.

Merging.

@lambdageek lambdageek merged commit 1d9ff9e into dotnet:main Jun 2, 2021
@SamMonoRT
Copy link
Member

CoreCLR failure is unrelated.

Catalyst arm64 had one flaky System.IO.FileSystem.Watcher.Tests failure - but that testsuite also flakes on macOS with mono and with coreclr, so I don't think it's related to this PR.

Catalyst x64 had some flaky failures, too, but also don't look related.

Merging.

@steveisok - should we log an issue to investigate and proactively disable the flaky test - possibly dragging down pass rates for corresponding lanes.

@steveisok
Copy link
Member

Yep, @mdh1418 please follow up and skip what Aleksey outlined.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MacCatalyst arm64 AOT JustInterp fails on CI
4 participants