You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
@AndrewScheidecker mentioned in his review of #1 the possibility of including vectorized rotate instructions to match the existing scalar instructions. They would have these signatures:
i8x16.rotl(x: v128, n: i32) -> v128
i16x8.rotl(x: v128, n: i32) -> v128
i32x4.rotl(x: v128, n: i32) -> v128
i64x2.rotl(x: v128, n: i32) -> v128
i8x16.rotr(x: v128, n: i32) -> v128
i16x8.rotr(x: v128, n: i32) -> v128
i32x4.rotr(x: v128, n: i32) -> v128
i64x2.rotr(x: v128, n: i32) -> v128
The semantics would be to rotate the lanes independently by the scalar n. This can be expressed in terms of the vectorized shift operators:
Are vectorized rotate instructions available in SIMD instruction sets we care about?
SSE+AVX does not have vectorized rotates.
AMD's XOP extension does have vectorized rotates, but it looks like it's deprecated as their new processors no longer support it.
Are there plausible applications for vectorized rotates?
Hashing is what I had in mind.
It looks like Blake2b is be designed around the lack of a rotate in SSE(see section 2.2 in blake2.pdf). However, it does still have two 64x2 rotates by 63 bits every round, which it implements as a xor,
a shift, and an add. Given that is intended to be faster than the naive lowering of a rotation to a xor and two shifts, it probably wouldn't use a rotation operator. The other "rotations" it uses are by constant multiples of 8 so that can be implemented as swizzles.
Both SHA2 and SHA3 use rotates. However, x86 and ARM already have special instructions for SHA2, and SHA3 is designed for efficient hardware implementation. Maybe the right approach there is to eventually add higher-level SHA2 and SHA3 instructions that can leverage whatever hardware support there may be (or at least an efficient native software implementation).
@AndrewScheidecker mentioned in his review of #1 the possibility of including vectorized rotate instructions to match the existing scalar instructions. They would have these signatures:
i8x16.rotl(x: v128, n: i32) -> v128
i16x8.rotl(x: v128, n: i32) -> v128
i32x4.rotl(x: v128, n: i32) -> v128
i64x2.rotl(x: v128, n: i32) -> v128
i8x16.rotr(x: v128, n: i32) -> v128
i16x8.rotr(x: v128, n: i32) -> v128
i32x4.rotr(x: v128, n: i32) -> v128
i64x2.rotr(x: v128, n: i32) -> v128
The semantics would be to rotate the lanes independently by the scalar
n
. This can be expressed in terms of the vectorized shift operators:Questions:
The text was updated successfully, but these errors were encountered: