-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[RyuJIT Wasm] Initial implementation of cast operations #122301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds initial support for cast operations in the RyuJIT WebAssembly backend, enabling compilation of cast-related tests. The implementation includes instruction definitions, cost estimates, lowering changes, and code generation for various type conversions.
- Adds WebAssembly cast instruction definitions (wrap, truncate, extend, convert, promote, demote, reinterpret)
- Implements
genCodeForCastfunction with support for int/long and float/double conversions - Updates cost estimation for WASM cast operations
- Excludes WASM from 32-bit TYP_LONG special handling in lowering
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/coreclr/jit/instrswasm.h | Adds 32 new instruction definitions for WASM cast/conversion operations with opcodes 0xA7-0xC4 |
| src/coreclr/jit/lower.cpp | Excludes WASM from 32-bit long handling since WASM has native 64-bit support |
| src/coreclr/jit/gentree.cpp | Replaces NYI with cost estimates (costEx=2, costSz=1) for WASM cast operations |
| src/coreclr/jit/codegenwasm.cpp | Implements genCodeForCast function and adds 3-argument PackOperAndType overload for handling cast operations with source and destination types |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
cc @dotnet/jit-contrib |
| // TODO: Floating point conversions - we need to figure out where semantics require a helper and where they | ||
| // don't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to decide what our "baseline ISA" is going to be and write that down somewhere.
If we're going with 1.0/MVP as the baseline, this will either need inline flow emitted, or a helper call (I think both options are fine?). If we include saturating-fp into the baseline, we can emit these as simple instructions.
In terms of reach, saturating-FP is pretty widespread in larger engines (browsers/wasmtime), but there are also engines out in the wild which implement just the bare minimum. We've had at least one such user in NAOT-LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was discussing this with Adam and I think our best starting point is to use helpers for everything that we know have the right semantics, and then switch over to generating "native" wasm for our baseline ISA as needed as an optimization. IIRC the code size of the right semantics in native wasm can be quite significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying to adopt 1.0 as the baseline?
This decision affects not just RyuJit but some higher-level code as well. It means we need to saturating-fp to the list of "instructions sets" in CG2 and such.
As for semantics, I am pretty sure the saturating-fp instructions (*trunc_sat*) are exactly we need for fp->int conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use something like wasm-feature-detect and opportunistlically emit code like we do for other archs https://github.com/dotnet/runtime/pull/115335/files, i.e. expose a feature bitmap via API in src/native/minipal/cpufeatures.h. Then crossgen2/ilc can also expose the supported ISA levels via their existing --instruction-set arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and opportunistlically emit code like we do for other archs
Yes and no. We will use --instruction-set for not-in-the-baseline extensions, but we should only emit one version of the code since you'll fail to validate if the extension is not supported so there is no point in emitting a runtime check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing against using 2.0 or 3.0 as the baseline, I just meant that as a starting point we shouldn't try to generate complex code in order to get the semantics right if we could just use a helper instead. So if i.e. the saturating truncate does what we want, we can use it - but jiterp had to generate complex code for some of our conversions just like clang does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @lewing baseline can be whatever is supported by the 3 major browsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened #122309 about this. We can revisit it when implementing FP -> int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening that. I think we can probably use the support guide here: https://webassembly.org/features to decide? For example, I see non-trapping conversions as currently implemented in Chrome, Firefox, and Safari.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening that. I think we can probably use the support guide here: https://webassembly.org/features to decide? For example, I see non-trapping conversions as currently implemented in Chrome, Firefox, and Safari.
Let's discuss it in #122311.
| { | ||
| // NOTE: For this, RyuJIT seems to just generate an i32 load of the i64 operand instead of a GT_CAST. | ||
| // I suspect once we implement use of wasm locals instead of the linear stack, GT_CAST will appear. | ||
| case PackOperAndType(GT_CAST, TYP_INT, TYP_LONG): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is genCodeForCast (and we can have another method for bitcasts, which are a bit different), this opcode (GT_CAST) field is redundant.
fromType needs to be genActualType(tree->CastOp()). In general, it is almost always a mistake to use the raw node type except when dealing with indirections and you know you want the indirected (small) type instead of the produced node type.
Nit: I think it would look better if we swapping the from-type and to-type. That way it can read like so:
case (TYP_INT, TYP_SHORT)
case (TYP_INT, TYP_USHORT)
...
And so on instead of:
case (TYP_SHORT, TYP_INT)
case (TYP_USHORT, TYP_INT)
(I've always found it a bit peculiar that in the dumps we also use this to-type <- from-type notation, but in dumps there is typically only one cast)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer your suggestion too, but I wanted to match the ordering of the wasm opcode names.
| // NOTE: For this, RyuJIT seems to just generate an i32 load of the i64 operand instead of a GT_CAST. | ||
| // I suspect once we implement use of wasm locals instead of the linear stack, GT_CAST will appear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part needs to actually be fixed in genCodeForLclVar, under the genIsValidReg(varDsc->GetRegNum()) case, we need to insert INS_i32_wrap_i64 if (tree->TypeIs(TYP_INT) && (varDsc->TypeGet() == TYP_LONG)).
| if (tree->gtOverflow()) | ||
| NYI_WASM("Overflow checks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this to the top of the method I think, to avoid adding it into each case.
| INST(f64_max, "f64.max", 0, IF_OPCODE, 0xA5) | ||
| INST(f64_copysign,"f64.copysign",0, IF_OPCODE, 0xA6) | ||
| // Unary operations | ||
| INST(i32_wrap_i64, "i32.wrap_i64", 0, IF_OPCODE, 0xA7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting (alignment) is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought jitformat would fix it, but it didn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We do explicitly disable jit-format for these kinds of lists via // clang-format off/on, since it doesn't format them sensibly)
| // TODO-WASM: Better estimate of costs for these opcodes. Most of them are one op on x64 but may be | ||
| // multiple uops. | ||
| costEx = 2; | ||
| // TODO-WASM: 1 byte opcodes except for the int->fp saturating casts which are 2 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // TODO-WASM: 1 byte opcodes except for the int->fp saturating casts which are 2 bytes. |
| NYI_WASM("Cast costing"); | ||
| costEx = 0; | ||
| costSz = 0; | ||
| costSz = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| costSz = 1; | |
| costSz = varTypeIsFloating(op1) && !varTypeIsFloating(tree) ? 2 : 1; |
Let's fix it for good while we're here.
| // TODO-WASM: Better estimate of costs for these opcodes. Most of them are one op on x64 but may be | ||
| // multiple uops. | ||
| costEx = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // TODO-WASM: Better estimate of costs for these opcodes. Most of them are one op on x64 but may be | |
| // multiple uops. | |
| costEx = 2; | |
| costEx = 1; |
We can just be approximately right, there is no need to make it more complex I think. We can make floats a bit more expensive like arm64, but it's not necessary. And the vast majority of casts are not FP-related, so they shouldn't be costed as 2.
cc @adamperlin
Ensures the following tests compile:
