Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm][post-MVP] garbage collection proposal in .NET #94420

Open
lambdageek opened this issue Nov 6, 2023 · 3 comments
Open

[wasm][post-MVP] garbage collection proposal in .NET #94420

lambdageek opened this issue Nov 6, 2023 · 3 comments
Labels
arch-wasm WebAssembly architecture area-VM-meta-mono os-browser Browser variant of arch-wasm os-wasi Related to WASI variant of arch-wasm
Milestone

Comments

@lambdageek
Copy link
Member

lambdageek commented Nov 6, 2023

Summary

This issue tracks the support for the WebAssembly post-MVP garbage collection proposal in .NET.

See #94351 for a summary of other WebAssembly proposals and their status.

Proposal

Repo: https://github.com/WebAssembly/gc
Overview: https://github.com/WebAssembly/gc/blob/main/proposals/gc/Overview.md
GC proposal post-v1 roadmap: https://github.com/WebAssembly/gc/blob/main/proposals/gc/Post-MVP.md

.NET Scenarios and User Stories

User story: front-end developer

As a .NET user interested in creating dynamic browser-based applications, I would like to deploy my .NET application using WebAssembly. I am interested in fast load times and performance. I am willing to change my application in order to achieve these goals

Upstream dependencies: engines and toolchain

Required WebAssembly engine support

The WebAssembly v1 GC Proposal (WasmGC) has been implemented by multiple browsers. It is now enabled by default in Chromium 119 based browsers and Firefox 120.

Expected .NET targets or supported configurations: Desktop browsers + Mobile browsers + Node + WASI

Engine Status
Chrome ≥ 119
Firefox ≥ 120
Safari ?
Node ?
wasmtime ?

Required WebAssembly toolchain support

We need LLVM to support this spec for AOT; we need clang to support it for the interpreter.

Friction between the v1 WasmGC spec and .NET semantics

A good summary of the tradeoffs of adopting the WasmGC proposal and the scope of the engineering effort required can be found in this article https://v8.dev/blog/wasm-gc-porting

Language Semantics
When you recompile a VM in a traditional port you get the exact language you expect, since you’re running familiar code that implements that language. That’s a major advantage! In comparison, with a WasmGC port you may end up considering compromises in semantics in return for efficiency. That is because with WasmGC we define new GC types—structs and arrays—and compile to them. As a result, we can’t simply compile a VM written in C, C++, Rust, or similar languages to that form, since those only compile to linear memory, and so WasmGC can’t help with the great majority of existing VM codebases. Instead, in a WasmGC port you typically write new code that transforms your language’s constructs into WasmGC primitives. And there are multiple ways to do that transformation, with different tradeoffs.

In order for .NET to effectively leverage the wasmGC proposal, various post-v1 features will need to be implemented by the WebAssembly hosts.

We have previously noted some of the limitations of the v1 proposal when it comes to implementing a .NET runtime. We highlight some of those issues and additional ones below:

Object layout and Interior pointers

Heap-allocated .NET objects consist of a header followed by object data. The header contains data such as:

  • the vtable/type pointer
  • a sync word for locking and hashing on the object
  • a length field for arrays/strings
  • for multi-dimensional arrays, a pointer to a struct describing the dimensions.

The type system of the current proposal doesn't support this layout, i.e. a header followed by array data. Also, .NET supports arrays of structs, i.e. an array of struct S { object o; int i;} would look like this memory: [object, int, object, int, ...]

In .NET, its quite common to have pointers into the middle of objects (arrays), and pointers to one past the end of an array. The GC proposal doesn't support this.

This is addressed in the post-v1 struct flattening proposal.
In .NET we currently guarantee that interior pointers themselves are only ever stored on the stack, not in other heap objects - this may play well with the post-v1 proposal.

Currently in the .NET BCL some algorithms are implemented using Unsafe.AsPointer which converts a managed reference (to a value on the stack or to the interior of an object on the managed heap) into an unmanaged pointer. These algorithms will need to be rewritten.

Interop with C/C++ code

The .NET runtimes are written in C/C++ and assume that object references are normal C pointers which point to linear memory, and objects can be accessed from C code as a pointers to C structs. The current proposal places allocated objects outside linear memory and adds new accessors to read/write their
contents. To allow manipulation of these objects from C code would require extensions to the C compilers.

There has been some work in clang to support externref although we will need to do extensive work on the native side of the .NET runtime as well.

A related issue is interop with C code using [DllImport] and the C# fixed expressions and pointers: since managed objects are not in the linear memory, passing pointers to fixed array or string data will require copying and possibly differ from existing semantics of these features.

Finalization, dependent handles and weak references

The .NET runtime needs to be notified somehow when an object with a finalizer dies. .NET finalizers support resurrection allowing an object to become live when its finalizer runs.

.NET supports multiple kinds of weak references (including weak references that are not zeroed out when the target object is finalized and is resurrected in the finalizer) which might not be supported by the underlying JS GC.

.NET supports dependent handles (or ephemerons) (publicly through the ConditionalWeakTable class, but internally as a special type of GC handle that references two objects) that allows a target object to be kept alive as long as key object is alive.

These are all post-v1 features with no public spec.

The .NET base class library takes advantage of all three constructs, so to use the MVP GC proposal we would need to rewrite portions of the BCL.

Threading

Threading is a post-v1 feature of the wasm GC proposal. In particular Post-MVP includes this rationale:

Shared references have not been included in the GC MVP, because they will require engines to implement concurrent garbage collection. That requires major changes to most existing Web implementation, that will probably take a long time to implement, let alone optimise. It seems highly preferable not to gate GC support on that.

There is some WASM CG work on a further threading proposal that will include shared WebAssembly instances that will incorporate shared GC. But the v1 WasmGC work is incompatible with the v1 threading proposal.

Work to support the v1 threading proposal in .NET is currently ongoing and tracked in #68162

Current status

In order for .NET to make use of the v1 GC proposal we would need to significantly alter the semantics of existing .NET code or limit the use of certain features including refs, Span<T>, [DllImport] both in user code and in the C# base class library, including in the implementation of System.Private.CoreLib. Moderating the use of ref and Span<T> would be contrary to the past several years’ efforts in the .NET runtime to increase the use of stack allocated values and avoid heap allocations in performance-sensitive code. While there are some benefits to utilizing the WasmGC proposal as outlined in https://v8.dev/blog/wasm-gc-porting, the v1 WasmGC semantics do not match .NET requirements.

We will continue to monitor the evolution of the post-v1 WasmGC spec, but at this time we are not planning to adopt it.

@lambdageek lambdageek added arch-wasm WebAssembly architecture area-VM-meta-mono os-wasi Related to WASI variant of arch-wasm os-browser Browser variant of arch-wasm labels Nov 6, 2023
@lambdageek lambdageek added this to the Future milestone Nov 6, 2023
@ghost
Copy link

ghost commented Nov 6, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Summary

This issue tracks the support for the WebAssembly post-MVP garbage collection proposal in .NET.

See #94351 for a summary of other WebAssembly proposals and their status.

Proposal

Repo: https://github.com/WebAssembly/gc
Explainer or overview: https://github.com/WebAssembly/tail-call/blob/main/proposals/tail-call/Overview.md
GC proposal Post-MVP v1 roadmap: https://github.com/WebAssembly/gc/blob/main/proposals/gc/Post-MVP.md

.NET Scenarios and User Stories

User story: front-end developer

As a .NET user interested in creating dynamic browser-based applications, I would like to deploy my .NET application using WebAssembly. I am interested in fast load times and performance. I am willing to change my application in order to achieve these goals

Upstream dependencies: engines and toolchain

Required WebAssembly engine support

The WebAssembly v1 GC Proposal (WasmGC) has been implemented by multiple browsers. It is now enabled by default in Chromium 119 based browsers and Firefox 120.

Expected .NET targets or supported configurations: Desktop browsers + Mobile browsers + Node + WASI

Engine Status
Chrome ≥ 119
Firefox ≥ 120
Safari ?
Node ?
wasmtime ?

Required WebAssembly toolchain support

We need LLVM to support this spec for AOT; we need clang to support it for the interpreter.

Friction between the v1 WasmGC spec and .NET semantics

A good summary of the tradeoffs of adopting the WasmGC proposal and the scope of the engineering effort required can be found in this article https://v8.dev/blog/wasm-gc-porting

Language Semantics
When you recompile a VM in a traditional port you get the exact language you expect, since you’re running familiar code that implements that language. That’s a major advantage! In comparison, with a WasmGC port you may end up considering compromises in semantics in return for efficiency. That is because with WasmGC we define new GC types—structs and arrays—and compile to them. As a result, we can’t simply compile a VM written in C, C++, Rust, or similar languages to that form, since those only compile to linear memory, and so WasmGC can’t help with the great majority of existing VM codebases. Instead, in a WasmGC port you typically write new code that transforms your language’s constructs into WasmGC primitives. And there are multiple ways to do that transformation, with different tradeoffs.

In order for .NET to effectively leverage the wasmGC proposal, various post-v1 features will need to be implemented by the WebAssembly hosts.

We have previously noted some of the limitations of the v1 proposal when it comes to implementing a .NET runtime. We highlight some of those issues and additional ones below:

Object layout and Interior pointers

Heap-allocated .NET objects consist of a header followed by object data. The header contains data such as:

  • the vtable/type pointer
  • a sync word for locking and hashing on the object
  • a length field for arrays/strings
  • for multi-dimensional arrays, a pointer to a struct describing the dimensions.

The type system of the current proposal doesn't support this layout, i.e. a header followed by array data. Also, .NET supports arrays of structs, i.e. an array of struct S { object o; int i;} would look like this memory: [object, int, object, int, ...]

In .NET, its quite common to have pointers into the middle of objects (arrays), and pointers to one past the end of an array. The GC proposal doesn't support this.

This is addressed in the post-v1 struct flattening proposal.
In .NET we currently guarantee that interior pointers are only ever on the stack, not in other heap objects - this may play well with the post-v1 proposal.

Interop with C/C++ code

The .NET runtimes are written in C/C++ and assume that object references are normal C pointers which point to linear memory, and objects can be accessed from C code as a pointers to C structs. The current proposal places allocated objects outside linear memory and adds new accessors to read/write their
contents. To allow manipulation of these objects from C code would require extensions to the C compilers.

There has been some work in clang to support externref although we will need to do extensive work on the native side of the .NET runtime as well.

A related issue is interop with C code using [DllImport] and the C# fixed expressions and pointers: since managed objects are not in the linear memory, passing pointers to fixed array or string data will require copying and possibly differ from existing semantics of these features.

Finalization and weak references

The .NET runtime needs to be notified somehow when an object with a finalizer dies. .NET supports multiple kinds of weak references which might not be supported by the underlying JS GC. These are both post-v1 features with no public spec.

The .NET base class library takes advantage of both, so to use the MVP GC proposal we would need to rewrite portions of the BCL

Threading

Threading is a post-v1 feature of the wasm GC proposal. In particular Post-MVP includes this rationale:

Shared references have not been included in the GC MVP, because they will require engines to implement concurrent garbage collection. That requires major changes to most existing Web implementation, that will probably take a long time to implement, let alone optimise. It seems highly preferable not to gate GC support on that.

There is some WASM CG work on a further threading proposal that will include shared WebAssembly instances that will incorporate shared GC. But the v1 WasmGC work is incompatible with the v1 threading proposal.

Work to support the v1 threading proposal in .NET is currently ongoing and tracked in #68162

Current status

In order for .NET to make use of the v1 GC proposal we would need to significantly alter the semantics of existing .NET code or limit the use of certain features including refs, Span<T>, [DllImport] both in user code and in the C# base class library, including in the implementation of System.Private.CoreLib. Moderating the use of ref and Span<T> would be contrary to the past several years’ efforts in the .NET runtime to increase the use of stack allocated values and avoid heap allocations in performance-sensitive code. While there are some benefits to utilizing the WasmGC proposal as outlined in https://v8.dev/blog/wasm-gc-porting, the v1 WasmGC semantics do not match .NET requirements.

We will continue to monitor the evolution of the post-v1 WasmGC spec, but at this time we are not planning to adopt it.

Author: lambdageek
Assignees: -
Labels:

arch-wasm, area-VM-meta-mono, os-wasi, os-browser

Milestone: Future

@lambdageek
Copy link
Member Author

lambdageek commented Nov 6, 2023

Potential areas of exploration with the v1 WasmGC and related specs

  • Investigate applicability of the reference types proposal for JS interop
  • Investigate possibility of implementing a restricted .NET runtime without finalization or weak references, ref types, and using a uniform anyref representation for heap structures. Basically something similar to this strawman JS implementation thought experiment. (This would likely need to take the form of a community-driven runtimelab project)
  • Explore the possibility of adding alternate finalization mechanisms for .NET (such as Java’s PhantomReference<T> and ReferenceQueue<T>, or JavaScript's FinalizationRegistry) and contribute to the WASM GC post-v1 spec.

@lambdageek
Copy link
Member Author

Dynamic field access in the interpreter

The v1 syntax for structure field access uses a constant type and field index struct.get_<sx>? $t i : [(ref null $t)] -> [t] This means we will need a dynamic number of field getter/setter methods to help the getter/setter opcodes. Alternately we may need to use an object representation that places arrays of ref/non-ref fields inside every object and use the array opcodes array.get_<sx>? $t : [(ref null $t) i32] -> [t]. Since arrays aren’t flattened, every .NET heap object will be at least 3 allocations: the object header, the non-blittable fields array and the blittable fields array.

The post-v1 situation might be better (we might be able to flatten some of the structure and use array access to get at the field in the interpreter) although in that case we might be giving up subtyping which means we might need specialized accessor methods for the interpreter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly architecture area-VM-meta-mono os-browser Browser variant of arch-wasm os-wasi Related to WASI variant of arch-wasm
Projects
None yet
Development

No branches or pull requests

1 participant