-
Notifications
You must be signed in to change notification settings - Fork 776
Prototype GC instructions #2935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not aware of anyone planning to work on this. But I'm also not sure if it makes sense atm to experiment in multiple toolchain projects. cc @binji for thoughts. I like the idea for a pass to lower GC to non-GC, yeah, that sounds like it could be useful for polyfilling etc.! |
I also still hope to use wasp in Binaryen! We'll be in a good position to evaluate that project after the current Stack IR redesign/refactoring/replacement is done. That being said, GC support would require much more work in Binaryen than just the parsing that wasp would provide. If anyone from the wider community would be interested in taking a stab at implementing experimental GC support, I'd be happy help out with guidance and code reviews. |
So I took an initial stab at this and added the instruction classes Consequently, the next aspect to get right is how to represent struct and array types. One strategy might be to:
Does this approach sound ok? |
I'm not sure exactly how binaryen does it, but the more invasive change (in my experience) is to ValueType, which now must also support an optional index (or optional name, for the text format). |
@dcodeIO The recently-redesigned |
Can you elaborate a bit why this information has to live on the types directly? So far I imagined that we'll need some sort of mapping of types (global to all modules in Binaryen, describing layouts of structs, arrays or functions) to the types section (local to each module, like function signatures) and that indices or names would be a property of the link from a global type to a
I see, thanks, looking at it now. Hmm, perhaps separate |
There are (at least) two new value types: (type $t ...)
(table 0 (ref $t))
(global (ref $t) ...)
(func (param (ref $t)) (result (ref null $t))
(local (ref null $t))
(block (param (ref null $t)) (result (ref $t))
...
)
)
... |
The map keys are used when creating a new |
Thanks guys! So, would it be viable to establish something like class Type {
enum Kind {
Default,
Multi,
Function,
Struct,
Array
};
uintptr_t id;
Type::Kind kind;
union {
MultivalueDefinition multiDef;
FunctionDefinition funcDef;
StructDefinition structDef;
ArrayDefinition arrayDef;
};
bool isStruct() { return kind == Type::Kind::Struct; }
bool isNullable() {
switch (kind) {
...
case Type::Kind::Struct: return structDef.nullable;
...
default: return false;
}
}
...
}
struct MultivalueDefinition {
std::vector<Type> types;
};
struct FunctionDefinition {
std::vector<Type> params;
std::vector<Type> results;
bool nullable;
};
struct Field {
Type type;
bool mutable_;
};
struct StructDefinition {
std::vector<Field> fields;
bool nullable;
};
struct ArrayDefinition {
Field element;
bool nullable;
}; with instances of If not, what'd be a good alternative fitting into the existing code base? |
Yeah, that's almost what I have in mind. The only difference is that your code snippet above adds a A few other notes so I don't have to repeat them later in code review:
|
I think this is a reasonable way to do it, though it ends up combining the two concepts of a "value type" and the "type definition". One concern is that a type definition has an explicit index space, but value types don't. Since this model compresses all types together, you would need to have a separate mapping from a type index to your You'll need to be a bit careful when doing this too, since types can be recursive in different ways, but should be considered equivalent, e.g.
Here |
Binaryen IR does not have a concept of type definitions or type indices, so I don't think this will be a problem. We reconstruct the type section and all type indices on demand when the module is emitted. |
So far I came up with these: typedef std::vector<Type> Tuple;
struct Field {
Type type;
bool mutable_;
Field(Type type, bool mutable_ = false) : type(type), mutable_(mutable_) {}
bool operator==(const Field& other) const {
return type == other.type && mutable_ == other.mutable_;
}
bool operator!=(const Field& other) const { return !(*this == other); }
};
typedef std::vector<Field> FieldList;
struct Struct {
FieldList fields;
bool nullable;
Struct(FieldList fields, bool nullable = false)
: fields(fields), nullable(nullable) {}
bool operator==(const Struct& other) const {
return fields == other.fields && nullable == other.nullable;
}
bool operator!=(const Struct& other) const { return !(*this == other); }
};
struct Array {
Field element;
bool nullable;
Array(Field element, bool nullable = false)
: element(element), nullable(nullable) {}
bool operator==(const Array& other) const {
return element == other.element && nullable == other.nullable;
}
bool operator!=(const Array& other) const { return !(*this == other); }
};
struct TypeDef { // move to wasm-type.cpp ?
enum Kind { TupleKind, SignatureKind, StructKind, ArrayKind };
Kind kind;
union Def {
Def(Tuple tuple) : tuple(tuple) {}
Def(Signature signature) : signature(signature) {}
Def(Struct struct_) : struct_(struct_) {}
Def(Array array) : array(array) {}
~Def() {}
Tuple tuple;
Signature signature;
Struct struct_;
Array array;
} def;
TypeDef(Tuple tuple) : kind(TupleKind), def(tuple) {}
TypeDef(Signature signature) : kind(SignatureKind), def(signature) {}
TypeDef(Struct struct_) : kind(StructKind), def(struct_) {}
TypeDef(Array array) : kind(ArrayKind), def(array) {}
bool operator==(const TypeDef& other) const {
if (kind != other.kind)
return false;
switch (kind) {
case TupleKind:
return def.tuple == other.def.tuple;
case SignatureKind:
return def.signature == other.def.signature;
case StructKind:
return def.struct_ == other.def.struct_;
case ArrayKind:
return def.array == other.def.array;
default:
WASM_UNREACHABLE("unexpected kind");
}
}
bool operator!=(const TypeDef& other) const { return !(*this == other); }
}; reusing |
This looks great so far! I think it is fine to have all this in wasm-types.h so that client code can build up an arbitrarily complex |
Alright, going to dive into it :) Have been playing with the hashing a bit now, and I have been wondering if this can be simplified a bit by doing about what boost does? namespace wasm {
template<typename T> inline void rehash(std::size_t& s, const T& v) {
std::hash<T> h;
s ^= h(v) + 0x9e3779b9 + (s << 6) + (s >> 2);
}
} From what I learned so far this approach would free us from confusion between size_t operator()(const vector<wasm::Type>& types) const {
size_t res = std::hash<size_t>{}(types.size());
for (auto t : types) {
wasm::rehash<uint64_t>(res, t.getID());
}
return res;
} Or is there a particular reason for choosing the djb2 approach? Something with Might look like this then: size_t hash<wasm::TypeDef>::operator()(const wasm::TypeDef& typeDef) const {
size_t res = hash<uint32_t>{}(uint32_t(typeDef.kind));
switch (typeDef.kind) {
case wasm::TypeDef::Kind::TupleKind: {
auto& tuple = typeDef.def.tuple;
wasm::rehash_std(res, tuple.size());
for (auto t : tuple) {
wasm::rehash_std(res, t.getID());
}
return res;
}
case wasm::TypeDef::Kind::SignatureKind: {
auto& signature = typeDef.def.signature;
wasm::rehash_std(res, signature.params.getID());
wasm::rehash_std(res, signature.results.getID());
return res;
}
case wasm::TypeDef::Kind::StructKind: {
auto& struct_ = typeDef.def.struct_;
auto& fields = struct_.fields;
wasm::rehash_std(res, fields.size());
for (auto f : fields) {
wasm::rehash_std(res, f.type.getID());
wasm::rehash_std(res, f.mutable_);
}
wasm::rehash_std(res, struct_.nullable);
return res;
}
case wasm::TypeDef::Kind::ArrayKind: {
auto& array = typeDef.def.array;
auto& element = array.element;
wasm::rehash_std(res, element.type.getID());
wasm::rehash_std(res, element.mutable_);
wasm::rehash_std(res, array.nullable);
return res;
}
default:
WASM_UNREACHABLE("unexpected kined");
}
} |
I also worry about keeping all the types in a global cache essentially. While the deduplication helps to keep memory footprint low for a single module, a long-running process reading modules, modifying and re-emitting them is going to never reclaim memory of types used in at least one module. The more distinct types there are, the more pressing the issue will become. Currently, the only way to fill up memory would be to create lots of different combinations of tuple types, which is unlikely but possible, but with structs, arrays and ultimately rtts there'll be a lot more ways to create distinct types. What do you think of only deduplicating basic types globally, and deduplicating instances of complex types per module so these dispose together with the module? Perhaps a concept of |
@dcodeIO I share your concern about global caches (especially for users that intend to use binaryen as a library), but even in the largest modules I doubt the cost of unique types is going to compare to the cost of expressions. Perhaps the simplest solution is to add a mechanism (if it doesn't already exist) to clear the global cache. |
I would be fine changing how we do hashing, especially if we can make it simpler and more generic, but perhaps @kripken knows some design constraint that would prevent this change. In order to clean up interned type definitions, we need to ensure that there are no |
Yeah, a manual GC mechanism (e.g. given a set of Modules, clean up any type definitions not used by any of them) would also work, and probably be simpler than splitting up the cache. |
Alright, going to leave it the way it is for now and look into cleanup later, thanks! I've also opened a draft PR of a type refactor according to the comments above meanwhile. Unfortunately, it raises additional questions regarding |
Closing this issue because we have good support for the GC proposal now. |
According to WebAssembly/gc#81 and the linked status document, there is early prototyping work of the GC proposal done in WABT and V8, and I'd love to start playing with it. Are there any plans on the Binaryen side already? I'd also be happy to help where I can, that is if there's anything I can do that doesn't hurt more to review than it helps. Just let me know :)
One interesting aspect also is that once the respective instructions are available in Binaryen, there also comes the opportunity to transform a GC-enabled module to a non-GC module by essentially polyfilling a runtime. I'd imagine that a transform pass like this would be useful for all sorts of tools and languages to ease the transition in the future (emit Wasm GC today, run without), and perhaps some bits of what we have at AS already can be used to make this possible.
The text was updated successfully, but these errors were encountered: