Introduce WASM JITs for interpreter opcodes, do_jit_call, and interp_entry wrappers #76477

kg · 2022-10-01T02:04:14Z

This PR introduces a "jiterpreter" just-in-time WASM compiler for interpreter opcodes along with a set of specialized WASM JIT compilers to improve the performance of interp_entry and do_jit_call. The result is significantly improved performance for pure-interpreter workloads along with measurable speedups for interp->aot and aot->interp transitions in mixed mode AOT applications.

The jiterpreter can be thought of as an additional layer on top of interpreter tiering, and works by inserting "trace entry points" into interpreter opcode streams. Once an entry point is reached often enough, the JIT kicks in to compile as many following interpreter opcodes into WASM as possible. Once a jitted 'trace' is available, when the interpreter reaches the entry point it can invoke the trace to natively execute dozens of interpreter opcodes at once without loop dispatch overhead. Any trace entry point that fails to JIT is replaced with a special NOP.

See https://paper.dropbox.com/doc/Jiterpreter-For-WebAssembly--BqLn88GjC3z4GDuE09vamtw4Ag-mJ5TRc7xF1PNAGMgwU49K for a more detailed overview of the jiterpreter design.

Due to browser limitations it is not possible to JIT entire methods at once, so entry points have to be placed at key locations within a given method, like immediately after a backwards branch target (for loop bodies). The jiterpreter has a simple single-pass compiler, so when it encounters an opcode or complex pattern it can't handle it 'bails out' to return control to the interpreter. Complex control flow like exceptions, calls and returns are also handled by bailing out to the interpreter. Traces share the interpreter's state and locals instead of maintaining their own, which allows seamless bailouts and maintains existing GC/thread safety characteristics.

The jiterpreter infrastructure is also used to on-demand compile specialized wrappers for interp_entry and do_jit_call. All the critical reflective/dynamic work (like enumerating parameter types) is done at compile time to produce a small, branchless dispatcher that is able to set up an interpreter stack frame or invoke native code with fewer copies than the generic wrappers currently used. To determine which do_jit_call sites should be optimized, we maintain a hit counter much like we do for traces, and only JIT the most frequently executed ones. For interp_entry we optimize all of them.

An additional optimization in this PR leverages the jiterpreter infrastructure to optimize all do_jit_call execution by generating a dedicated function (either in WASM or JS) to trap exceptions and dispatch native calls. The current implementation uses mono_llvm_cpp_catch_exception to perform an indirect call bounced through JavaScript, which adds a measurable amount. By using specialized code we can perform a direct call using WASM Exception Handling (if available) or using a specialized JavaScript function that doesn't do any indirection or table lookups. This optimization by itself speeds up System.Runtime.Tests in AOT mode by about 2 seconds. You can find the function template in the new do-jit-call.wat and do-jit-call.wasm files.

The jiterpreter's features are configurable using a set of new runtime options, the key ones being:

--jiterpreter-enable-traces
Enables the insertion of trace entry points in interpreter functions and JIT compilation for those entry points.
--jiterpreter-enable-interp-entry
Enables the generation of specialized interp_entry wrappers. Note that this option is not retroactive, and these are typically all generated at app startup.
--jiterpreter-enable-jit-call
Enables the generation of specialized do_jit_call wrappers.
--jiterpreter-enable-stats
Enables dumping detailed statistics on the jiterpreter, including how many WASM functions were jitted, which interpreter opcodes caused the most failures, and how much time was spent. If statistics are enabled they are automatically dumped at app exit, but they can be manually dumped using an INTERNAL api.

This PR also contains some scaffolding changes and interpreter optimizations that were necessary in order to finish it and test it on benchmark scenarios:

Added interpreter opcodes for Math.Min/Max/Abs (without these lots of compute is impossible to JIT)
Updated a couple of the sample csprojs to make it easy to turn AOT+trimming on or off at build time
Lowered browser-bench's default heap size (the default can cause OOM during runs, partly due to a memory leak in Chrome)
Adds a simple raytracer sample that is engineered to transition in and out of the interpreter frequently when AOT'd, to serve as a useful benchmark for various parts of the PR
Updated the 'cwraps' typescript API to produce useful error messages when an exported C function is missing
Replaced mono_wasm_array_length C API with a new safe mono_wasm_array_length_ref variant
Added an optimization to remove the linker placeholders that currently sit between the runtime's C and TypeScript glue (see startup.ts)

ghost · 2022-10-01T02:04:24Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This (draft) PR introduces a 'jiterpreter' which on-the-fly compiles 'traces' of interpreter opcodes into WebAssembly so that the interpreter can transition into and out of them for closer-to-native performance when AOT isn't possible. See https://paper.dropbox.com/doc/Jiterpreter-For-WebAssembly--BqLn88GjC3z4GDuE09vamtw4Ag-mJ5TRc7xF1PNAGMgwU49K for a detailed overview.

The main changes can be broken down into:

A set of interpreter opcodes that control the generation and execution of jitted WASM
Support C APIs and runtime wrappers used by the jitted WASM
Instrumentation and a custom tiering system used to decide when to jit a method to WASM (in the long run this may be removed in favor of just using the interpreter's existing tiering system)
A single-pass compiler that can translate a large subset of mono interpreter opcodes into wasm opcodes

This PR also contains a few small optimizations that significantly improve the performance of the interpreter (and as a result, the jiterpreter) on some compute-heavy test cases like raytracers. One of the raytracer test cases is currently included in the PR (but will probably be removed).

While the PR is not 'ready', review for the runtime/interpreter internals bits would be appreciated, there are definitely parts that feel like they need cleanup and suggestions on how to do them right would be great. At present this is able to pass the full System.Runtime test suite in the console with performance roughly at parity with it disabled (including the time spent actually compiling WASM), and for the raytracer test cases I'm using the performance improvements are in the neighborhood of 11x native -> 6x native. The biggest current limitation is the lack of support for backward branches and calls - I don't expect to ever add support for calls, but ideally I can either add backward branches or introduce a mechanism for turning loop/try bodies into their own traces. The jiterpreter also currently ignores debugger opcodes, but it's possible to implement them. The biggest performance problem is that calling through function pointers to a dynamically linked function in browsers is just plain slow, so traces have to be relatively large to overcome that.

Author:	kg
Assignees:	-
Labels:	`NO-MERGE`, `arch-wasm`, `area-Codegen-JIT-mono`, `area-Codegen-Interpreter-mono`
Milestone:	-

kg · 2022-10-07T19:38:32Z

PR description updated to cover the new 'jiterpreter trampolines', which are jitted replacements for the AOT interp_entry wrappers. The now-sabotaged raytracer sample in the PR forces aot/interp transitions during execution due to runtime generics and delegates, so it shows wildly different timings in each of the 4 possible execution modes (regular AOT, AOT+trampolines, regular interpreter, interpreter+jiterpreter+trampolines). There are opportunities to JIT other wrappers to further improve the performance of this test in AOT mode, I think - it spends a lot of time on overhead in various places instead of actually doing work, and before I sabotaged it it was MUCH faster in AOT.

Comparison timings follow (best time selected from a set of runs in each case). Note that if you are running it for measurement, do not open devtools until all your runs are done, because devtools deoptimize JS and wasm code. If you want to profile it, start the profiler and then reload the page while the profiler is running.

/// aot, no optimizations
Rendering finished in 7191 ms
// jiterpreter produced 0 traces from 0 candidates, and 0 trampolines
// time spent: 0ms generating, 0ms compiling wasm


/// aot with traces and trampolines enabled
Rendering finished in 6863 ms
// jiterpreter produced 0 traces from 2 candidates, and 2 trampolines
// time spent: 3ms generating, 2ms compiling wasm


/// interpreter, no optimizations
Rendering finished in 4868 ms
// jiterpreter produced 0 traces from 0 candidates, and 0 trampolines
// time spent: 0ms generating, 0ms compiling wasm


/// interpreter with traces and trampolines enabled
Rendering finished in 4070 ms
// jiterpreter produced 3 traces from 5 candidates, and 0 trampolines
// time spent: 6ms generating, 0ms compiling wasm

kg · 2022-10-13T02:42:13Z

I reworked the size threshold logic used to select where to insert traces, and implemented a C-side heuristic that is able to make fairly accurate guesses about where traces should go which reduces the overhead introduced by the jiterpreter when it fails to JIT a method. The timings for the toy raytracer testcase are still roughly as above, but the addition of the heuristic means less time is spent generating code in the runtime test suite, so the jiterpreter is now a measurable perf improvement for it. Some summary timings for fresh runs of the toy raytracer, pavel's more complex raytracer sample (w/o SIMD), and the system.runtime test suite (which currently is incompatible with AOT):

name	interpreter (msec)	JITerpreter (msec)	% of baseline	AOT (msec)	AOT+JIT (msec)	% of baseline
`System.Runtime.Tests`	148743	128629	86.47%	N/A	N/A	N/A
Sabotaged raytracer	9321	6681	71.67%	12345	11645	94.32%
Pavel's raytracer	20462	13534	66.14%	2267	N/A	N/A
browser-bench large JSON serialize	71.27	58.73	82.40%	8.03	8.00	N/A
browser-bench large JSON deserialize	102.55	79.25	77.27%	14.47	14.60	N/A

(edited to add more timings)
(edit 2: updated system.runtime test timings because a crash fix made them slower and I implemented more opcodes)

src/mono/mono/mini/interp/interp.c

src/mono/wasm/runtime/dotnet-legacy.d.ts

src/mono/wasm/runtime/jiterpreter-opcodes.ts

src/mono/wasm/runtime/jiterpreter-support.ts

lewing · 2022-11-08T21:06:00Z

@BrzVlad @vargaz is there more review needed here or are things in good shape?

pavelsavara · 2022-11-09T07:59:18Z

I would love this to be optional part of dotnet.js behind a feature flag. It may be in next PR tho.

kg · 2022-11-09T09:56:02Z

I would love this to be optional part of dotnet.js behind a feature flag. It may be in next PR tho.

With the options interface in this PR, we can set the default to off and then you use a runtime flag to turn it on. See the .withRuntimeOptions(["--jiterpreter-enable-stats"]) in browser-bench, there are flags to enable/disable parts of the jiterpreter as well.

pavelsavara · 2022-11-09T11:14:17Z

With the options interface in this PR, we can set the default to off and then you use a runtime flag to turn it on.

I mean, to link it out via rollup.
Or even better, this would be great candidate to make it separate ES6 module and dynamically import() it only when the config flag is set.

kg · 2022-11-10T13:55:34Z

Here are some updated timings for various scenarios, since some of the performance characteristics have changed since the previous timings (a few key reasons are that batching the JIT operation can cause slower code to run for a period of time - this slows down console tests but not most real applications, and that I had to revert the inlining threshold).

System.Runtime.Tests now passes with AOT enabled, but while the jiterpreter is active for it, the fact that the suite runs synchronously means that the improvement is small enough to be within the margin of error, so I omitted it along with a couple others that were ~100%. (Nothing relevant got slower)

name	baseline (ms)	jiterpreter (ms)	relative time (%)
Json, large serialize (AOT)	8.1799	7.6687	93.75%
Json, large deserialize (AOT)	14.3750	14.1799	98.64%
JSInterop, LegacyExportInt (AOT)	8.4975	7.9102	93.09%
JSInterop, JSExportInt (AOT)	6.4096	5.3799	83.94%
JSInterop, JSImportInt (AOT)	0.7562	0.6748	89.24%
mixed-mode raytracer (AOT)	12751	9861	77.34%
System.Runtime.Tests (interp)	138651.7	129579.6	93.46%
Json, small serialize (interp)	0.2517	0.2093	83.15%
Json, small deserialize (interp)	0.3567	0.2901	81.33%
Json, large serialize (interp)	72.5972	57.2889	78.91%
Json, large deserialize (interp)	97.6667	79.5303	81.43%
JSInterop, LegacyExportInt (interp)	4.8936	4.8633	99.38%
JSInterop, JSExportInt (interp)	3.6925	3.2976	89.31%
mixed-mode raytracer (interp)	9271	7024	75.76%
pavel's raytracer (interp)	21353	13810	64.67%

kg · 2022-11-10T22:21:57Z

The latest commit repurposes the little toy .wasm module used for WASM EH support detection to function as an optimized replacement for mono_llvm_cpp_catch_exception when performing JIT calls. If WASM EH is available, it will be used to directly invoke jit_call_cb, reducing the cost of that to a single indirect function call (no WASM-JS-WASM transition overhead). If WASM EH isn't available, a specialized JS function will be invoked instead, which is still a bit cheaper than mono_llvm_cpp_catch_exception because we are able to skip the extra indirect function call thanks to knowing that the call target is always jit_call_cb.

I verified that all do_jit_call paths are faster with this optimization in browser-bench, but the most convincing numbers come from System.Runtime.Tests in AOT mode:
mono_llvm_cpp_catch_exception: 43.646669s
specialized js: 43.251835s
specialized wasm eh: 42.549922s (97.5% of baseline)

This new optimization works even if you disable the rest of the jiterpreter.

kg · 2022-11-10T22:39:21Z

My current plan for landing this is to disable the jiterpreter JIT features (traces, jit_call wrappers, and interp_entry wrappers) and only land it with the new optimized jit call exception handling path enabled, since it provides a good speed improvement and is lower risk. That lets us flush out any unexpected issues with JITing wasm this way along with the parts of the scaffolding that are always enabled (like the options engine, feature detection, etc). Then in a followup PR I can flip the jiterpreter JITs to enabled so that it's easy to revert that part without having to change tons of files.

src/mono/mono/mini/interp/jiterpreter.h

vargaz · 2022-11-14T16:02:24Z

src/mono/mono/mini/interp/jiterpreter.c

+	const int mtl_l = strlen(mtl);
+
+	if (!strcmp (option, "jiterpreter-enable-traces"))
+		jiterpreter_traces_enabled = TRUE;


It would be better to add these as options to utils/options-def.h. It would auto generate the argument parsing code, etc.

From what I saw while looking at the header I wouldn't be able to use it to do this, but I'll give it another try.

src/mono/mono/mini/interp/jiterpreter.h

src/mono/mono/mini/interp/transform.h

vargaz · 2022-11-14T16:41:24Z

The mono changes look ok to me otherwise.

Jiterpreter traces work (yay alignment) Use function pointers to invoke jiterpreter traces Implement binops, basic arithmetic, and some unary ops cknull Strlen ldloca_s + print statistics at exit getchr, track bailout count Use a JS dispatcher for traces because emscripten addFunction is bad. Use unchecked memory setters in traces. Enable tracing for more code. Checkpoint ldfld Checkpoint stfld Checkpoint: F32 arithmetic Add enable flag Move back to addFunction because while the overhead of adding the trace functions is higher, the invoke performance is better Checkpoint: Traces continue past brfalse/brtrue if the branch is not taken Checkpoint wasm generation Generates dummy traces Checkpoint: Works again Optimized trace generator Traces work again Fixed codegen for memmove and ldloca Use wasm memmove instruction Implement cknull Implement brfalse and brtrue Refactor null checks stloc refactoring Basic ldfld/stfld support Checkpoint ldind (broken) ldind works ldfld_o stfld_o ldfld_vt Unary math intrinsics and ldlen/ldelem_ref Fix math intrinsics, add some i8 operations Working (u|i)32 -> i64 promotion stind Add fp remainder operation, adjust thresholds to avoid making tests way slower Fix remainder codegen, implement mono_ldptr ldelema1 ldelema1 error handling ldelem for ints and floats Checkpoint trace enter optimization (broken) Generate moves instead of a memmove for small constant-sized memmoves More accurate trace length measurement, higher trace length requirement for better perf Check in simple raytracer sample code De-inline intersection test because the branches cause lots of trace bailouts add_iN_imm and mul_iN_imm ldc_i8 for small constants 64-bit relops and a few 64-bit superinsns Checkpoint relop branches Fixed relop branches Specify alignment of ldloc/stloc where possible Fix imm add/mul opcodes and improve debugging Pass locals to traces instead of frame. Hard-code data items into the trace instead of loading them. Pass method name to prepare_jiterpreter, include method name in trace name, add flag to trace method name when aborts happen. Add Min and Max interpreter opcodes Try to keep interpreter tiering working Add abs interp opcode, add missing float32 interp opcodes. Fix codegen for f32 pow/min/etc. Don't generate additional jiterpreter enter opcodes when inlining. Fix a warning Fix more warnings, don't pass ip to traces Checkpoint work on static fields; fix unbalanced stack on stfld Static fields work Support for generating trace entry points at the target of all backward branches. When bailing out for a branch, add the displacement to the bailout target. Restructure the simple raytracer Categorize bailouts and count each type separately when counting is active Integer divide and remainder Implement MUL_OVF_I4 and MUL_OVF_UN_I4. Fix the getelema failure check being inverted. Add support for more precise tracking of what method calls cause traces to abort Implement INITOBJ Implement STIND_REF i4 to u1/u2 conversions Add i4->i2 and i4->i1 conversions Fix min and max interp opcodes Detect when a trace is approaching the browser wasm size limit (4kb) and abort it so that it doesn't fail to compile Debugging and logging improvements Optimize generated module boilerplate for size Rework branch patch point system to be less sloppy Handle different branch depths Checkpoint new approach to branches Fix ldc failures unbalancing the stack. Fix some branch problems Implement GETCHR and STRLEN Fully qualified trace names Move some stuff into a new file Optimize some null checks to not perform a double memory load Cache i4 mul/div/rem operands in locals instead of re-loading them from memory after range checks Less aggressive 'a branch can never skip past this' abort for calls Fix math operand caching Fix ldlen Improved error handling and instrumentation u4 -> i4 with underflow check Shifts with immediates ldind with constant offset Re-enable stelem Add ldlen_span, fix incorrect base in trace error messages newobj_vt_inlined and newobj_inlined ldc_i8 for full value range ldobj_vt and cpobj_vt stobj_vt br(true|false)_i8_s and ldftn_addr Fix local index assignment. implement unsigned i8 div and rem intrins_span_ctor Improved instrumentation for estimated opcode importance, implement more intrinsics If a method has the Intrinsic attribute, treat it as if AggressiveInlining were set in the interpreter. This makes System.Numerics.Vectors types much faster. Fix remainder opcodes. Implement castclass and isinst Implement INTRINS_MEMORYMARSHAL_GETARRAYDATAREF. Add partial support for safepoint branches and with-immediate relop branches ldelem_vt Replace JS remainder implementation with a wrapper around C fmod. Implement atan2. Fix casing so aot-cross can be located during build of samples Fix i32/i64 type mismatch for an opcode. Rebase damage fixes. When rejecting a trace, set its abort reason to trace-too-small. Implement r8 conditional branches. Implement ldind_offset stfld_vt Checkpoint: Easier enable/disable for jiterpreter, mess up the raytracer sample so that it causes AOT to bail into the interp Implement MINT_UNBOX, fix counters Optimize some jiterpreter entry points by moving them into interp.c so they can be inlined Don't call stackval_from_data for pointers Checkpoint backwards branch support Backwards branch entry points seem to work now Rename various jiterpreter entry points and make sure they don't use external APIs or volatile Add some missing indirect opcodes and rework the indirect code Fix negative constant indirect offsets Implement MINT_BOX Adjust thresholds and add some missing offset opcodes Code cleanup Move various jiterpreter code to its own .c file Checkpoint Checkpoint Add heuristic filter to avoid inserting trace enter opcodes in places where they are unlikely to produce a trace Fix crash at startup with jiterpreter turned off Branch block heuristic fixes When estimating trace lengths, don't count opcodes like nop and dummy_use Lower trace length threshold since it ignores dummy opcodes now, adjust test benchmark to be slower, prevent leb being linked out when jit wrappers are enabled Repair rebase damage eslint and wasi build fixes --amend C4206 workaround Diff cleanup Diff cleanup Fix jiterp opdefs Checkpoint: Move most jiterpreter options into c so they can be accessed from everywhere Move the opcode abort counts into C and update them in the heuristic abort logic as well. More options improvements Revert inlining limit to see if it fixes the crash; fix osx build Manually enable jiterpreter stats in browser-bench lint fix; implement ld_delegate_method_ptr since it shows up in browser bench wrappers Disable jiterpreter and wrapper jit by default if threading is enabled. Update comments. Checkpoint code cleanup Code cleanup Code and project file cleanup Implement castclass_interface and isinst_interface Implement localloc and newarr. Pass frame pointer to traces. CI build fix Rework call target tracking so it's more useful. Move jiterpreter APIs to INTERNAL. Implement add_ovf for i4 and u4, implement leave_s, improve usefulness of hottest failed traces output Implement LDTSFLDA, improve trace list filtering Implement BOX_VT Unify overflow check and castclass/isinst helpers to reduce number of imports. Implement more conversions and cast/isinst variants Unify overflow conversions and implement a few more. Implement i8 immediate comparison branches. Fix overflow in bailout counting Checkpoint: add jit for specialized do_jit_call trampolines Checkpoint Optimized out wasmtable.get overhead when invoking jit call wrappers Revert test changes Checkpoint repairing merge damage Temporarily disable getitem_span since it changed Implement getitem_span and getitem_localspan CI build fixes Address PR feedback Address PR feedback Address PR feedback Address PR feedback Refactor import section generation; checkpoint non-indirect jit call wrappers Use direct call instead of indirect call for jit call wrappers. Optimize some of the __linker_exports generated wrappers to use .call instead of .apply so there is less runtime overhead on C-to-JS calls Improved linker export wrappers for lower c-to-js overhead Replace linker export wrappers with the actual export functions at startup to remove overhead Remove debug print and fix infinitely growing import list Revert lib.js changes since the optimized wrapper complexity is not needed anymore lint fix for CI build Fix computed goto non-wasm interpreter build due to jiterpreter opcodes Implement LEAVE and BR alongside the short versions Implement support for wasm exception handling in jit_call wrappers Aligned memops in a few key places Remove MINT_NEWARR support since it appears to break NDPin test and it's not terribly important to have. Fix instrumented method support Remove instrumented method JIT do_jit_call wrappers in groups if possible instead of one at a time. If multiple do_jit_call cinfos share the same target function, share a single wrapper Improve error message for missing lazy cwraps Lower initial heap size for browser-bench because the default of 512mb causes OOM during the appstart tests Also generate groups of interp_entry wrappers in one go instead of generating one module each Repair merge damage Fix name collision when generating multiple interp entry wrappers at once Inline stackval_from_data logic for common types into interp_entry wrappers Optimize jitcall hit counter Fix jitcall counter continuing to increment even if jit fails Detect whether the runtime supports WASM EH and if not, disable it automatically Improve formatting of log messages Code cleanup Fix crash due to not handling the unbox flag for the this-reference in interp_entry wrappers Rename some of the jiterpreter configuration bools Add a missing jiterpreter config switch Don't rely on atob for the wasm EH support probe, just convert a hex literal by hand instead Clean up some options, adjust some thresholds, and add a couple new comments Code cleanups, rearrangement, and add comments Add support for optimized JS and WASM EH based implementations of mono_llvm_cpp_catch_exception Formatting improvements

BrzVlad · 2022-11-17T09:47:50Z

src/mono/mono/mini/interp/jiterpreter.c

+		//  multiple times and waste some work. At present this is unavoidable because
+		//  control flow means we can end up with two traces covering different subsets
+		//  of the same method in order to handle loops and resuming
+		gboolean should_generate = enabled && should_generate_trace_here(bb, td->last_ins);


The code around here operates on the wrong assumption that interp instructions are linked together from the first all the way to the last instruction in the method which is wrong. Instructions are linked together only within the same basic block. td->last_ins is pretty much useless and I should probably get rid of it at some point since it is bad design (after an interp_add_ins we can access the added instruction from td->last_ins so we set data into it)

BrzVlad · 2022-11-17T09:50:10Z

src/mono/mono/mini/interp/jiterpreter.c

+	// A preceding trace may have been in a branch block, but we only care whether the current
+	//  trace will have a branch block opened, because that determines whether calls and branches
+	//  will unconditionally abort the trace or not.
+	gboolean inside_branch_block = FALSE;


This serves no purpose currently. Once we hit a branching opcode the bblock ends and we stop the iteration. If you want to continue probing the trace after a branch you should continue with bb->next_bb and restart with first_ins from there.

You're right, this used to do something but it doesn't anymore. I'll figure out whether it's dead code to be removed or whether I need to fix it, since the heuristic is working pretty well as-is.

BrzVlad · 2022-11-17T10:25:58Z

src/mono/wasm/runtime/jiterpreter-opcodes.ts

+
+// Keep this file in sync with mintops.def. The order and values need to match exactly.
+
+export const enum MintOpcode {


mintops.def changes very often. Can't we do anything to simply include it ? Or have this automatically generated ?

I could probably include it at build time and text parse it at runtime, but that would break the typescript type checking (which has caught quite a few bugs). I could probably also figure out some way to generate this during the build process. I'll look into it.

Note that if the mintops change is actually changing the implementation of an opcode, that'll break the jiterpreter unless it's also updated. Do you have thoughts on how that should be handled? The easiest solution is to disable the opcode until someone who knows typescript can update it and re-enabled, but it means that modifying the interp requires potentially modifying the jiterp.

I think it should be possible for the runtime (src/mono/mono) build to create a .ts file using some C preprocessor steps (or, worst case, using an MSBuild task). and then the wasm build (src/mono/wasm/runtime) can just consume the .ts file from elsewhere.

Note that if the mintops change is actually changing the implementation of an opcode, that'll break the jiterpreter unless it's also updated. Do you have thoughts on how that should be handled?

You mean if the semantics of an opcode change? I think we should probably just handle it ad-hoc and disable the opcode.

kg added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) arch-wasm WebAssembly architecture area-Codegen-JIT-mono area-Codegen-Interpreter-mono labels Oct 1, 2022

ghost assigned kg Oct 1, 2022

teo-tsirpanis added area-Codegen-meta-mono and removed area-Codegen-JIT-mono area-Codegen-Interpreter-mono labels Oct 1, 2022

build-analysis bot mentioned this pull request Oct 6, 2022

503's dotnet/arcade#10943

Closed

2 tasks

kg changed the title ~~Introduce a WASM JIT for interpreter opcodes~~ Introduce WASM JITs for interpreter opcodes and interp_entry wrappers Oct 7, 2022

kg force-pushed the wasm-jiterpreter branch from 83b97cc to 76cb2cc Compare October 11, 2022 21:31

kg force-pushed the wasm-jiterpreter branch from 5779067 to 8395bd9 Compare October 13, 2022 03:02

This was referenced Oct 14, 2022

Tracking Nuget 429s dotnet/arcade#10885

Closed

Tracking issue for CI build timeouts #76454

Closed

Nuget restore error caused by 503s dotnet/arcade#11239

Closed

kg force-pushed the wasm-jiterpreter branch from d39209f to fa24221 Compare October 17, 2022 23:27

kg marked this pull request as ready for review October 17, 2022 23:45

kg requested review from lewing, pavelsavara, BrzVlad, vargaz, lambdageek and SamMonoRT as code owners October 17, 2022 23:45

pavelsavara reviewed Oct 18, 2022

View reviewed changes

src/mono/mono/mini/interp/interp.c Outdated Show resolved Hide resolved

pavelsavara reviewed Oct 18, 2022

View reviewed changes

src/mono/wasm/runtime/dotnet-legacy.d.ts Outdated Show resolved Hide resolved

pavelsavara reviewed Oct 18, 2022

View reviewed changes

src/mono/wasm/runtime/jiterpreter-opcodes.ts Show resolved Hide resolved

pavelsavara reviewed Oct 18, 2022

View reviewed changes

src/mono/wasm/runtime/jiterpreter-support.ts Show resolved Hide resolved

kg force-pushed the wasm-jiterpreter branch 2 times, most recently from 8d16571 to 9b93f14 Compare November 8, 2022 11:47

kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 11, 2022

kg mentioned this pull request Nov 11, 2022

Various support changes and cleanups for the jiterpreter PR #78225

Merged

vargaz reviewed Nov 14, 2022

View reviewed changes

src/mono/mono/mini/interp/jiterpreter.h Show resolved Hide resolved

vargaz reviewed Nov 14, 2022

View reviewed changes

src/mono/mono/mini/interp/jiterpreter.h Outdated Show resolved Hide resolved

vargaz reviewed Nov 14, 2022

View reviewed changes

src/mono/mono/mini/interp/transform.h Show resolved Hide resolved

vargaz approved these changes Nov 14, 2022

View reviewed changes

kg force-pushed the wasm-jiterpreter branch from 86457a5 to 39b87ee Compare November 15, 2022 19:45

kg added 2 commits November 15, 2022 11:57

Address PR feedback

6b9da71

Code cleanup; Disable jiterpreter features by default for initial merge

1c57617

kg merged commit ce58915 into dotnet:main Nov 15, 2022

kg mentioned this pull request Nov 17, 2022

[wasm] enable JITs for interp->aot and aot->interp transitions #78493

Merged

BrzVlad reviewed Nov 17, 2022

View reviewed changes

kg mentioned this pull request Nov 18, 2022

[wasm] Jiterpreter tracking issue #78428

Open

This was referenced Nov 28, 2022

[Perf] Linux/x64: 60 Improvements on 11/15/2022 10:02:59 PM dotnet/perf-autofiling-issues#10015

Open

[Perf] Linux/x64: 14 Improvements on 11/15/2022 10:02:59 PM dotnet/perf-autofiling-issues#9976

Open

ghost locked as resolved and limited conversation to collaborators Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce WASM JITs for interpreter opcodes, do_jit_call, and interp_entry wrappers #76477

Introduce WASM JITs for interpreter opcodes, do_jit_call, and interp_entry wrappers #76477

kg commented Oct 1, 2022 •

edited

Loading

ghost commented Oct 1, 2022

kg commented Oct 7, 2022

kg commented Oct 13, 2022 •

edited

Loading

lewing commented Nov 8, 2022

pavelsavara commented Nov 9, 2022

kg commented Nov 9, 2022

pavelsavara commented Nov 9, 2022

kg commented Nov 10, 2022 •

edited

Loading

kg commented Nov 10, 2022

kg commented Nov 10, 2022

vargaz Nov 14, 2022

kg Nov 14, 2022

vargaz commented Nov 14, 2022

BrzVlad Nov 17, 2022 •

edited

Loading

BrzVlad Nov 17, 2022 •

edited

Loading

kg Nov 17, 2022

BrzVlad Nov 17, 2022

kg Nov 17, 2022

lambdageek Nov 17, 2022 •

edited

Loading


		// Keep this file in sync with mintops.def. The order and values need to match exactly.

		export const enum MintOpcode {

Introduce WASM JITs for interpreter opcodes, do_jit_call, and interp_entry wrappers #76477

Introduce WASM JITs for interpreter opcodes, do_jit_call, and interp_entry wrappers #76477

Conversation

kg commented Oct 1, 2022 • edited Loading

ghost commented Oct 1, 2022

kg commented Oct 7, 2022

kg commented Oct 13, 2022 • edited Loading

lewing commented Nov 8, 2022

pavelsavara commented Nov 9, 2022

kg commented Nov 9, 2022

pavelsavara commented Nov 9, 2022

kg commented Nov 10, 2022 • edited Loading

kg commented Nov 10, 2022

kg commented Nov 10, 2022

vargaz Nov 14, 2022

Choose a reason for hiding this comment

kg Nov 14, 2022

Choose a reason for hiding this comment

vargaz commented Nov 14, 2022

BrzVlad Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

BrzVlad Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

kg Nov 17, 2022

Choose a reason for hiding this comment

BrzVlad Nov 17, 2022

Choose a reason for hiding this comment

kg Nov 17, 2022

Choose a reason for hiding this comment

lambdageek Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

kg commented Oct 1, 2022 •

edited

Loading

kg commented Oct 13, 2022 •

edited

Loading

kg commented Nov 10, 2022 •

edited

Loading

BrzVlad Nov 17, 2022 •

edited

Loading

BrzVlad Nov 17, 2022 •

edited

Loading

lambdageek Nov 17, 2022 •

edited

Loading