Skip to content

[NativeAOT LLVM] Integrate dotnet.js into ILCompiler.LLVM #2453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Dec 14, 2023

Conversation

maraf
Copy link
Member

@maraf maraf commented Dec 7, 2023

  • Include dotnet.js in ILCompiler framework folder
    • dotnet.js and dotnet.native.js built from typescript
    • dotnet.es6.pre.js, dotnet.es6.lib.js and dotnet.es6.extpost.js emscripten partials & library
  • Remove emscripten build from wasm.proj and build it as part of naot-llvm build
  • Produce dotnet.js compatible app when DotNetJsApi=true is set in users app
    • JS interop doesn't work yet
  • Changes to typescript
    • Changes are driven by NativeAOT const for now. In follow ups we might come with a way to "override" functions different for NativeAOT
    • Use emscripten's callMain at the moment
    • Remove jiterpreter initialization
    • Environment variables not supported at the moment
    • Runtime options not supported
    • Remove managed exports initialization
  • Add a smoke DotnetJs.csproj

@maraf maraf added the area-NativeAOT-LLVM LLVM generation for Native AOT compilation (including Web Assembly) label Dec 7, 2023
@maraf maraf requested review from pavelsavara and a team December 7, 2023 13:53
@maraf maraf self-assigned this Dec 7, 2023
@@ -2,5 +2,5 @@ if (_nativeModuleLoaded) throw new Error("Native module already loaded");
_nativeModuleLoaded = true;
// see https://github.com/emscripten-core/emscripten/issues/19832
Module["getMemory"] = function () { return wasmMemory; }
createDotnetRuntime = Module = createDotnetRuntime(Module);
moduleArg = Module = moduleArg(Module);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Member Author

@maraf maraf Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emscripten 3.1.47 generates

var createDotnetRuntime = (() => {
  var _scriptDir = import.meta.url;
  
  return (
async function(moduleArg = {}) {

I dunno yet if that can be configured somehow and I'm missing that flag or whether it's in empscripten (main has 3.1.34)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<_EmccLinkFlags Include="-s EXPORT_NAME="'createDotnetRuntime'"" />

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's worth replacing the whole wasm.proj with latest from main ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<_EmccLinkFlags Include="-s EXPORT_NAME="'createDotnetRuntime'"" />

I already have this one

<_EmccLDFlags Include="-s EXPORT_NAME=&quot;'createDotnetRuntime'&quot;" />

It produces the createDotnetRuntime, but the argument passed in is called moduleArg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a sample dotnet.native.js that is generated by emcripten 3.1.47 dotnet.native.zip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, it seems we use createDotnetRuntime as default export in dotnet.d.ts but it seems it's a lie anyway.

@SingleAccretion
Copy link

A couple general questions, to start things off.

1) What is the general code flow (dependency graph) of the JS code? For example, here's the detailed graph for the current runtime:

Emscripten-generated code (host)
  [Emscripten ABI]
    Native entry point from bootstrapper.cpp
      <application code>
      [Emscripten ABI]
        pal_random.js; --js-library
        emscripten_get_callstack; Emscripten-provided system function

This needn't be as detailed, I am just looking for what the major components are, what their responsibilities are, and where they live (/how the functions they consist of are called).

2) What are the components in the build process of JS components? What is the build flow? How do you debug that build? Any common gotchas/known issues w.r.t. local development? We spend a very good chunk of our time fixing or altering the build in one way or another on this branch, so it's important to know how it works.

@pavelsavara
Copy link
Member

    pal_random.js; --js-library

this doesn't exist anymore. It's moved into src\mono\wasm\runtime\crypto.ts

@SingleAccretion
Copy link

this doesn't exist anymore. It's moved into src\mono\wasm\runtime\crypto.ts

Yes; we restored it in this branch for the time being.

@SingleAccretion
Copy link

Add sample running main with dotnet.js

This should be added as a smoke test under src/tests/nativeaot/SmokeTests/HelloWasm.

Comment on lines 89 to 92
<_EmccDefaultFlags Condition="'$(WasmEnableSIMD)' == 'true'">-msimd128</_EmccDefaultFlags>
<EmccStackSize Condition="'$(EmccStackSize)' == ''">5MB</EmccStackSize>
<EmccInitialHeapSize Condition="'$(EmccInitialHeapSize)' == '' and '$(_WasmCalculatedInitialHeapSize)' != '' and $(_WasmCalculatedInitialHeapSize) > 16777216">$(_WasmCalculatedInitialHeapSize)</EmccInitialHeapSize>
<EmccInitialHeapSize Condition="'$(EmccInitialHeapSize)' == ''">16777216</EmccInitialHeapSize>
Copy link

@SingleAccretion SingleAccretion Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things that already exist in our targets (e. g. we set the default stack size). Let's rationalize things as follows: there are "JS API" properties and "runtime" properties.

"JS API" properties apply to Browser + '$(DotNetJsApi)'=='true' only. They affect JS-specific things and/or expose emcc knobs. They should be used/defined here.

"Runtime" properties apply to both Browser and WASI. They will often affect core runtime semantics. They will be used/defined here:

<ItemGroup>
<NativeLibrary Include="$(IlcFrameworkNativePath)libSystem.Native.a" />
<NativeLibrary Include="$(IlcFrameworkNativePath)libSystem.Globalization.Native.a" />
<NativeLibrary Condition="$(NativeLib) == '' and '$(CustomNativeMain)' != 'true'" Include="$(IlcSdkPath)libbootstrapper.o" />
<NativeLibrary Condition="$(NativeLib) != '' or '$(CustomNativeMain)' == 'true'" Include="$(IlcSdkPath)libbootstrapperdll.o" />
<NativeLibrary Include="$(IlcSdkPath)libPortableRuntime.a" />
<NativeLibrary Include="$(IlcSdkPath)libstandalonegc-disabled$(LibFileExt)" />
<NativeLibrary Condition="'$(IlcLlvmExceptionHandlingModel)' == 'cpp'" Include="$(IlcSdkPath)libCppExceptionHandling.a" />
<NativeLibrary Condition="'$(IlcLlvmExceptionHandlingModel)' == 'wasm'" Include="$(IlcSdkPath)libWasmExceptionHandling.a" />
<NativeLibrary Condition="'$(IlcLlvmExceptionHandlingModel)' == 'emulated'" Include="$(IlcSdkPath)libEmulatedExceptionHandling.a" />
<!-- wasi-sdk clang wants forward slashes, emscripten doesn't care -->
<CustomLinkerArg Include="@(NativeLibrary->'&quot;%(Identity)&quot;'->Replace(&quot;\&quot;, &quot;/&quot;))" />
<CustomLinkerArg Include="@(NativeObjects->'&quot;%(Identity)&quot;'->Replace(&quot;\&quot;, &quot;/&quot;))" />
<CustomLinkerArg Include="-o &quot;$(NativeBinary.Replace(&quot;\&quot;, &quot;/&quot;))&quot;" />
<CustomLinkerArg Condition="'$(NativeDebugSymbols)' == 'true'" Include="-g3" />
<CustomLinkerArg Condition="'$(NativeDebugSymbols)' != 'true'" Include="-Wl,--strip-all" />
<CustomLinkerArg Condition="'$(NativeDebugSymbols)' != 'true'" Include="-Wl,--compress-relocations" />
<!-- see https://github.com/emscripten-core/emscripten/issues/16836 for the issue that means we have to force the linker to include these symbols before LTO -->
<!--<CustomLinkerArg Condition="'$(Optimize)' != 'true'" Include="$(WasmOptimizationSetting) -flto" /> -->
<CustomLinkerArg Condition="'$(IlcLlvmTarget)' != ''" Include="-target $(IlcLlvmTarget)" />
<CustomLinkerArg Condition="'$(IlcLlvmExceptionHandlingModel)' == 'wasm'" Include="-fwasm-exceptions" />
</ItemGroup>
<ItemGroup Condition = "'$(_targetOS)' == 'browser'" >
<CustomLinkerArg Include="--js-library &quot;$(IlcFrameworkNativePath)pal_random.lib.js&quot;" />
<CustomLinkerArg Condition="'$(WasmHtmlTemplate)' != ''" Include="--shell-file &quot;$(WasmHtmlTemplate)&quot;" />
<CustomLinkerArg Condition="'$(ExportsFile)' != ''" Include="-s EXPORTED_FUNCTIONS=@&quot;$(ExportsFile)&quot;" />
<CustomLinkerArg Include="-s ALLOW_MEMORY_GROWTH=1" />
<CustomLinkerArg Include="-s ERROR_ON_UNDEFINED_SYMBOLS=0" />
<CustomLinkerArg Include="-s GLOBAL_BASE=$(IlcWasmGlobalBase)" />
<CustomLinkerArg Include="-s TOTAL_STACK=$(IlcWasmStackSize)" />
<CustomLinkerArg Condition="'$(WasmEnableJSBigIntIntegration)' == 'true'" Include="-s WASM_BIGINT=1" />
<CustomLinkerArg Condition="'$(IlcLlvmExceptionHandlingModel)' == 'cpp'" Include="-s DISABLE_EXCEPTION_CATCHING=0" />
</ItemGroup>
<!-- wasm-ld only supports listing exports on the command line -->
<ReadLinesFromFile File="$(ExportsFile)" Condition="'$(_targetOS)' == 'wasi' and '$(ExportsFile)' != ''">
<Output TaskParameter="Lines" ItemName="CustomLinkerArgExport" />
</ReadLinesFromFile>
<ItemGroup Condition ="'$(_targetOS)' == 'wasi'" >
<CustomLinkerArg Include="--sysroot=&quot;$(WASI_SDK_PATH.Replace(&quot;\&quot;, &quot;/&quot;))/share/wasi-sysroot&quot;" />
<CustomLinkerArg Include="@(CustomLinkerArgExport->'-Wl,--export=%(Identity)')" />
<!-- Wasi has lots of undefined symbols currently, mostly globalization -->
<CustomLinkerArg Include="-Wl,--unresolved-symbols=ignore-all -Wl,--error-limit=0" />
<CustomLinkerArg Include="-lstdc++ -lwasi-emulated-process-clocks -lwasi-emulated-signal -lwasi-emulated-getpid" />
<CustomLinkerArg Include="-Wl,--max-memory=2147483648" />
<CustomLinkerArg Include="-Wl,--global-base=$(IlcWasmGlobalBase)" />
<CustomLinkerArg Include="-Wl,-z,stack-size=$(IlcWasmStackSize)" />
<CustomLinkerArg Include="-mexec-model=reactor" Condition="'$(NativeLib)' == 'Shared'" />
</ItemGroup>
<ItemGroup>
<CustomLinkerArg Include="@(LinkerArg)" />
</ItemGroup>
<WriteLinesToFile File="$(NativeIntermediateOutputPath)link.rsp" Lines="@(CustomLinkerArg)" Overwrite="true" Encoding="utf-8" />
<Exec Command="$(WasmLinkerPath) @$(NativeIntermediateOutputPath)link.rsp $(EmccExtraArgs)" EnvironmentVariables="@(EmscriptenEnvVars)" />
</Target>
(for now, we're planning to move them to a different file).

In this framework:

  1. EmccStackSize is a "runtime" property. It should affect IlcWasmStackSize.
  2. EmccInitialHeapSize - likewise.
  3. WasmEnableSIMD is a no-op - we don't support SIMD yet - and so should be deleted.

I am also interested which of these MSBuild properties are documented. There should a clear separation between what is public (and cannot be changed), like WasmEnableSIMD, and what is not.

I will make a number of comments below about "runtime" properties. The way to fix them is to either: move the linker flag setting "down", or add an NYI comment instead, or just delete it because it's not a public property and we don't need to support all emcc flags by hand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move runtime properties as suggested
Removed WasmEnableSIMD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavelsavara
Copy link
Member

FYI dotnet/runtime#95775 we are refactoring those proj quite often.
I'm not sure that optimizing it here makes ton of sense from merging perspective.

@SingleAccretion
Copy link

I'm not sure that optimizing it here makes ton of sense from merging perspective.

Since we decided to keep a copy (of the compiler options-related MSBuild things), there is no conflict risk. There is still the problem of keeping the copy up-to-date, but it's the same whether we simplify it or not. And a number of these options have to be rewritten in any case.

Copy link

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more feedback. Major point is the wasm.proj build. The earlier two questions still stand as well.

Thank you for adding the comments!

Comment on lines +24 to +26
if not exist %__JsFilePath% (
set __JsFilePath=%1\native\main.js
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit curious on the thinking about allowing project-specific output file names. It is non-standard for .NET projects for the output binary to not be named after the project itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is non standard for .NET projects. On web it's up to the user how he names his entrypoint. Nothing settled IMO. Our templates has main.js as entrypoint.

I can rename main.js in the test to DotnetJs.js (as the project itself) if you prefer that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bike_shed.js 😉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's fine to leave this as it is for now. I suppose it is not really the main file I am worried about, but the all of dotnet.* files. Ideally, the target there would be to have the same output as what Emscripten generates: one Project.js, one Project.wasm. It helps optics/perception. Nothing to do about that at this point, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have multiple files for performance and modularity reasons. We are not going back to single JS file. It's more trouble than worth to live inside emscripten's JS postprocessing tooling. Also you need extra JS file for threads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My perspective is "pay for play" design.

We also prefer that.

(e. g.: only link in JS string marshalling if something has a JSImport/JSExport with a string in it)

linker across C#/C/JS is though. Especially with JS being very dynamic.

In this specific example, we interpret the JS side of the "marshal single argument" from metadata about the method signature. And so we have to have all marshallers available on JS side.

Alternative is to ship generated JS code for each JSImport/JSExport. It has tooling and deployment troubles. Roslyn doesn't support that either.

Copy link

@SingleAccretion SingleAccretion Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not willing to try again. If you want to have look, see Net6 codebase we had.

Just curious - was it 'simply' that the code was ugly, or there was some more fundamental problem (too many Emscripten bugs, hard to track down failure modes, ...)?

In this specific example, we interpret the JS side of the "marshal single argument" from metadata about the method signature. And so we have to have all marshallers available on JS side.

Its possible to write it in such a way so as to make the dependencies explicit. For example, you can imagine that for:

[JSImport] void func(string);

The generator would generate:

[DllImport("*")] void js_marshal_string();
[JSImport, DependsOn("js_marshal_string")] void func(string); // Imaginary attribute; just for the sake of argument

That imaginary attribute would lead to passing --undefined:js_marshal_string to the linker, which would preserve that marshalling method in a C-like static linking scenario. Admittedly, this is questionable design, but good enough for illustrative purposes.

It is fine to say that "well, this is a rather complicated thing to do, so we will settle for some middle-of-the-road design, like making trimming based in feature switches and JS modules (one module per switch)", but then you really have to be careful with the amount of JS (hence my earlier point about avoiding it).

Copy link
Member Author

@maraf maraf Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative is to ship generated JS code for each JSImport/JSExport. It has tooling and deployment troubles. Roslyn doesn't support that either.

I'm dreaming about this one ❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - was it 'simply' that the code was ugly, or there was some more fundamental problem (too many Emscripten bugs, hard to track down failure modes, ...)?

Just go and have look :)

[DllImport("*")] void js_marshal_string();
[JSImport, DependsOn("js_marshal_string")] void func(string); // Imaginary attribute; just for the sake of argument

And to use that to fabricate roots for JS trimmer, right ?
JSExport could be more difficult.
Would that be multi-pass ?

It is fine to say that "well, this is a rather complicated thing to do, so we will settle for some middle-of-the-road design, like making trimming based in feature switches and JS modules (one module per switch)", but then you really have to be careful with the amount of JS

That's exactly where we are.

(hence my earlier point about avoiding it).

In that sense I agree, but not as top priority or goal on it's own.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to use that to fabricate roots for JS trimmer, right ? JSExport could be more difficult.

Right. I haven't thought about this too deeply. On the surface, both imports and exports have similar requirements, "just" the direction of marshaling is reversed.

Would that be multi-pass ?

I don't immediately see why it needs to be.

@maraf
Copy link
Member Author

maraf commented Dec 13, 2023

  1. What is the general code flow (dependency graph) of the JS code?

In browser case and the new smoke test concretely, it is

  • Browse opens index.html
  • User main.js script executes
    • Imports dotnet.js which is public API for .NET
    • Calls dotnet.create
      • Loads JS modules (dotnet.native.js and dotnet.runtime.js)
      • Loads WASM (dotnet.native.wasm)
      • Emscripten module starts
    • Calls runMain
      • Emscripten's callMain executes
  1. What are the components in the build process of JS components? What is the build flow? How do you debug that build? Any common gotchas/known issues w.r.t. local development? We spend a very good chunk of our time fixing or altering the build in one way or another on this branch, so it's important to know how it works.

Typescript is built using rollup. In debug build we get unminified version. In release they are minified. In both cases there are source maps pointing to original typescript definitions.

Does it answer the questions?

@SingleAccretion
Copy link

Does it answer the questions?

Yes, thank you.

Comment on lines +24 to +26
if not exist %__JsFilePath% (
set __JsFilePath=%1\native\main.js
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to use that to fabricate roots for JS trimmer, right ? JSExport could be more difficult.

Right. I haven't thought about this too deeply. On the surface, both imports and exports have similar requirements, "just" the direction of marshaling is reversed.

Would that be multi-pass ?

I don't immediately see why it needs to be.

Copy link

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM. Thank you!

@yowl
Copy link
Contributor

yowl commented Dec 14, 2023

Very interesting, great work. Is it too early to add anything to the docs, like adding this to your project file if you want to use it

<DotNetJsApi>true</DotNetJsApi>

Are people going to think that [JSImport]/[JSExport] are supported with this change in NativeAOT-LLVM ?

@maraf
Copy link
Member Author

maraf commented Dec 14, 2023

Are people going to think that [JSImport]/[JSExport] are supported with this change in NativeAOT-LLVM ?

JS interop is tracked here #2434. I'll make it work as separated PRs. All that's working now is (synchronously) calling Main

Very interesting, great work. Is it too early to add anything to the docs, like adding this to your project file if you want to use it

I would wait a bit until there are more features working

@maraf maraf merged commit 0641109 into dotnet:feature/NativeAOT-LLVM Dec 14, 2023
@maraf maraf deleted the dotnetjs branch December 14, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-NativeAOT-LLVM LLVM generation for Native AOT compilation (including Web Assembly)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants