Skip to content

Conversation

lewing
Copy link
Member

@lewing lewing commented May 7, 2025

Fall back to element access for 64x2 const elements, otherwise prefer the vectorized version.

Copy link
Contributor

Tagging subscribers to this area: @steveisok, @vitek-karas
See info in area-owners.md if you want to be subscribed.

@lewing lewing requested a review from kg May 7, 2025 03:09
@lewing lewing marked this pull request as ready for review May 7, 2025 03:09
@Copilot Copilot AI review requested due to automatic review settings May 7, 2025 03:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the implementation of the OP_WASM_SIMD_SWIZZLE operation for constant indices. Key changes include a revised handling of constant versus non‐constant swizzle index vectors, the removal of an early bitcast of rhs, and an updated combination of computed index vectors using a bitwise OR instead of addition.

@lewing lewing requested a review from radekdoulik May 7, 2025 03:53
@lewing lewing added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 8, 2025
@lewing
Copy link
Member Author

lewing commented May 8, 2025

The codegen for the const case is not pretty but it is roughly equivalent to the old codegen.

out of curiosity I checked what the codegen for i64x2 non-const case looks like where llvm synthesizes min

 local $4 v128
 local.get $0
 local.get $1
 v128.load align:4    [SIMD]
 local.get $2
 v128.load align:4    [SIMD]
 local.tee $4
 v128.const 0x00000000000000020000000000000002    [SIMD]
 v128.const 0xffffffffffffffffffffffffffffffff    [SIMD]
 v128.const 0x00000000000000000000000000000000    [SIMD]
 local.get $4
 i64x2.extract.lane 0    [SIMD]
 i64.const 2
 i64.lt.u
 select
 i64.const -1
 i64.const 0
 local.get $4
 i64x2.extract.lane 1    [SIMD]
 i64.const 2
 i64.lt.u
 select
 i64x2.replace.lane 1    [SIMD]
 v128.bitselect    [SIMD]
 i32.const 3
 i8x16.shl    [SIMD]
 v128.const 0x08080808080808080000000000000000    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.const 0x07060504030201000706050403020100    [SIMD]
 v128.or    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.store    [SIMD]

when the intrinsic exists

 local.get $0
 local.get $1
 v128.load align:4    [SIMD]
 local.get $2
 v128.load align:4    [SIMD]
 v128.const 0x00000004000000040000000400000004    [SIMD]
 i32x4.min.u    [SIMD]
 i32.const 2
 i8x16.shl    [SIMD]
 v128.const 0x0c0c0c0c080808080404040400000000    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.const 0x03020100030201000302010003020100    [SIMD]
 v128.or    [SIMD]
 i8x16.swizzle    [SIMD]
 v128.store    [SIMD]

@lewing lewing changed the title [wasm][aot] Optimize OP_WASM_SIMD_SWIZZLE for constant indices [wasm][aot] Optimize 64 bit const shuffles. Otherwise prefer vector swizzle. May 8, 2025
@lewing lewing removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 8, 2025
@lewing
Copy link
Member Author

lewing commented May 8, 2025

I made it fall back to the old code only for 64x2, but that case should really just be written by hand.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants