Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Enable v128 and PackedSimd in the interpreter #85920

Merged
merged 1 commit into from
May 18, 2023

Conversation

kg
Copy link
Member

@kg kg commented May 8, 2023

EDIT: This PR enables vector128 SIMD and PackedSimd in the interpreter and jiterpreter for wasm, and fixes a few bugs/missing opcodes. It also changes how we do trimming around SIMD.

EDIT: Original description below

Right now both v128 and packedsimd are disabled for WASM because of an issue with test results xml getting cut off at the start/end. Testing turning them back on now that some other fixes have landed.

It only seems to affect testResults.xml and it's like the first 1-8 bytes or last 1-8 bytes of the xml going missing at some point in the flow from the test runner writing it to stdout and the harness capturing it. I've been struggling to come up with a plausible theory for how we could end up with missing bytes due to vectorization being turned on, since I wouldn't expect 'copy 100085 bytes from this buffer to stdout' to depend on something like PackedSimd or Vector128 since it's largely going to be memmoves of fixed sizes.

@kg kg added the arch-wasm WebAssembly architecture label May 8, 2023
@ghost ghost assigned kg May 8, 2023
@ghost
Copy link

ghost commented May 8, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Right now both v128 and packedsimd are disabled for WASM because of an issue with test results xml getting cut off at the start/end. Testing turning them back on now that some other fixes have landed.

It only seems to affect testResults.xml and it's like the first 1-8 bytes or last 1-8 bytes of the xml going missing at some point in the flow from the test runner writing it to stdout and the harness capturing it. I've been struggling to come up with a plausible theory for how we could end up with missing bytes due to vectorization being turned on, since I wouldn't expect 'copy 100085 bytes from this buffer to stdout' to depend on something like PackedSimd or Vector128 since it's largely going to be memmoves of fixed sizes.

Author: kg
Assignees: -
Labels:

arch-wasm

Milestone: -

@kg
Copy link
Member Author

kg commented May 8, 2023

cc @stephentoub have you seen anything like this before? Perhaps when some of the BCL vectorization work was previously in progress? I'm totally stumped. If you look at the test failures you can see the test suites pass but then the result data was too short and corrupted

@kg
Copy link
Member Author

kg commented May 8, 2023

My latest theory is that the problem is https://github.com/dotnet/xharness/blob/aa434d0c7e6eb46df1ec11b3c63add37d835c4d0/src/Microsoft.DotNet.XHarness.TestRunners.Xunit/ThreadlessXunitTestRunner.cs#LL125C34-L125C48 which is why the test suites themselves are passing. It looks like the base64 implementation is vectorized if hw accelerated, which explains why this causes it to break, and it also sort of explains why only a few characters at the start/end are missing

@kg
Copy link
Member Author

kg commented May 12, 2023

The issue appears to be specifically caused by enabling the jiterpreter's implementation of Vector128.CreateScalar. Not sure what's wrong with it or why tests don't fail, but disabling those opcodes specifically seems to make the xharness problem go away.

@kg
Copy link
Member Author

kg commented May 15, 2023

Since I've determined that the PackedSimd issue is blocked on interp changes, I'm going to move all the bug fixes from that PR into this one so we can merge those bug fixes with this.

@kg kg added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 16, 2023
@kg
Copy link
Member Author

kg commented May 16, 2023

Due to how trimming works, it's not possible to have runtime options for interpreter SIMD support or make PackedSimd optional - the trimmer overrides Vector128 and PackedSimd support to 'true' (which is correct for AOT code!) and the interp has no ability to override that. Right now on main things work because V128 IsHardwareAccelerated is false, but once it flips to true, PackedSimd methods will just get called no matter what. So this will have to be a V128+PackedSimd all-in-one PR, and that means it's blocked on some interp bug fixes.

@lewing
Copy link
Member

lewing commented May 16, 2023

Due to how trimming works, it's not possible to have runtime options for interpreter SIMD support or make PackedSimd optional - the trimmer overrides Vector128 and PackedSimd support to 'true' (which is correct for AOT code!) and the interp has no ability to override that. Right now on main things work because V128 IsHardwareAccelerated is false, but once it flips to true, PackedSimd methods will just get called no matter what. So this will have to be a V128+PackedSimd all-in-one PR, and that means it's blocked on some interp bug fixes.

If we want to remove this limitation I think it is possible to do so

@kg
Copy link
Member Author

kg commented May 16, 2023

Due to how trimming works, it's not possible to have runtime options for interpreter SIMD support or make PackedSimd optional - the trimmer overrides Vector128 and PackedSimd support to 'true' (which is correct for AOT code!) and the interp has no ability to override that. Right now on main things work because V128 IsHardwareAccelerated is false, but once it flips to true, PackedSimd methods will just get called no matter what. So this will have to be a V128+PackedSimd all-in-one PR, and that means it's blocked on some interp bug fixes.

If we want to remove this limitation I think it is possible to do so

Radek pointed out that we can disable the substitutions for interp-only, so I'm going to do that. But it seems unfixable for AOT since AOT code can transition into the interp at any time

@kg kg changed the title [wasm] Enable v128 in the interpreter [wasm] Enable v128 and PackedSimd in the interpreter May 16, 2023
    Fix (?) the create scalar intrinsics in the jiterpreter
    Fix trimming removing SIMD even if SIMD was enabled
    Add wasm opcode mapping for extract msb
    Fix some SN_ lists in transform-simd being ordered incorrectly
    Due to mixed mode it is not possible to control interp SIMD with a runtime option, so remove the options
    Disable linker substitutions for SIMD unless AOT is active, so the interp is responsible for the properties
    Generate a MINT_NIY for unimplemented PackedSimd methods
@kg kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 18, 2023
@kg kg requested a review from vargaz May 18, 2023 19:14
@kg
Copy link
Member Author

kg commented May 18, 2023

Now that the debug vector alignment fix has landed on main, I expect this to pass tests on CI without that 'testResults.xml's prologue goes missing' problem.

@kg
Copy link
Member Author

kg commented May 18, 2023

Failures are unrelated. Merging.

@kg kg merged commit 23ddef3 into dotnet:main May 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-VM-meta-mono
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants