This repository was archived by the owner on Dec 22, 2021. It is now read-only.
i64x2.min_s and i64x2.max_s instructions #417
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
This is proposal to add 64-bit variant of the existing
min_sandmax_sinstructions. Only x86 processors with AVX512 natively support these instructions, but ARMv7 NEON, ARM64 and x86 with SSE4.2 or AVX can efficiently emulate them with 2-4 instructions.Applications
Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX512F and AVX512VL instruction sets
y = i64x2.min_s(a, b)is lowered toVPMINSQ xmm_y, xmm_a, xmm_by = i64x2.max_s(a, b)is lowered toVPMAXSQ xmm_y, xmm_a, xmm_bx86/x86-64 processors with AVX instruction set
y = i64x2.min_s(a, b)(yis notaandyis notb) is lowered to:VPCMPGTQ xmm_y, xmm_a, xmm_bVPBLENDVB xmm_y, xmm_a, xmm_b, xmm_yy = i64x2.max_s(a, b)(yis notaandyis notb) is lowered to:VPCMPGTQ xmm_y, xmm_a, xmm_bVPBLENDVB xmm_y, xmm_b, xmm_a, xmm_yx86/x86-64 processors with SSE4.2 instruction set
y = i64x2.min_s(a, b)(yis notbanda/b/yare not inxmm0) is lowered to:MOVDQA xmm0, xmm_aMOVDQA xmm_y, xmm_aPCMPGTQ xmm0, xmm_bPBLENDVB xmm_y, xmm_by = i64x2.max_s(a, b)(yis notaanda/b/yare not inxmm0) is lowered to:MOVDQA xmm0, xmm_aMOVDQA xmm_y, xmm_bPCMPGTQ xmm0, xmm_bPBLENDVB xmm_y, xmm_ax86/x86-64 processors with SSE4.1 instruction set
Based on this answer by user aqrit on Stack Overflow
y = i64x2.min_s(a, b)(yis notaandyis notbanda/b/yare not inxmm0) is lowered to:MOVDQA xmm0, xmm_bMOVDQA xmm_y, xmm_aPSUBQ xmm0, xmm_aPCMPEQD xmm_y, xmm_bPAND xmm0, xmm_yMOVDQA xmm_y, xmm_aPCMPGTD xmm_y, xmm_bPOR xmm0, xmm_yMOVDQA xmm_y, xmm_aPSHUFD xmm0, xmm0, 0xF5PBLENDVB xmm_y, xmm_by = i64x2.max_s(a, b)(yis notaandyis notbanda/b/yare not inxmm0) is lowered to:MOVDQA xmm0, xmm_bMOVDQA xmm_y, xmm_aPSUBQ xmm0, xmm_aPCMPEQD xmm_y, xmm_bPAND xmm0, xmm_yMOVDQA xmm_y, xmm_aPCMPGTD xmm_y, xmm_bPOR xmm0, xmm_yMOVDQA xmm_y, xmm_bPSHUFD xmm0, xmm0, 0xF5PBLENDVB xmm_y, xmm_ax86/x86-64 processors with SSE2 instruction set
Based on this answer by user aqrit on Stack Overflow
y = i64x2.min_s(a, b)(yis notaandyis notb) is lowered to:MOVDQA xmm_y, xmm_bMOVDQA xmm_tmp, xmm_aPSUBQ xmm_y, xmm_aPCMPEQD xmm_tmp, xmm_bPAND xmm_y, xmm_tmpMOVDQA xmm_tmp, xmm_aPCMPGTD xmm_tmp, xmm_bPOR xmm_y, xmm_tmpMOVDQA xmm_tmp, xmm_bPSHUFD xmm_y, xmm_y, 0xF5PAND xmm_tmp, xmm_yPANDN xmm_y, xmm_aPOR xmm_y, xmm_tmpy = i64x2.max_s(a, b)(yis notaandyis notb) is lowered to:MOVDQA xmm_y, xmm_bMOVDQA xmm_tmp, xmm_aPSUBQ xmm_y, xmm_aPCMPEQD xmm_tmp, xmm_bPAND xmm_y, xmm_tmpMOVDQA xmm_tmp, xmm_aPCMPGTD xmm_tmp, xmm_bPOR xmm_y, xmm_tmpMOVDQA xmm_tmp, xmm_aPSHUFD xmm_y, xmm_y, 0xF5PAND xmm_tmp, xmm_yPANDN xmm_y, xmm_bPOR xmm_y, xmm_tmpARM64 processors
y = i64x2.min_s(a, b)(yis notaandyis notb) is lowered to:CMGT Vy.2D, Va.2D, Vb.2DBSL Vy.16B, Vb.16B, Va.16By = i64x2.max_s(a, b)(yis notaandyis notb) is lowered to:CMGT Vy.2D, Va.2D, Vb.2DBSL Vy.16B, Va.16B, Vb.16BARMv7 processors with NEON instruction set
Based on this answer by user aqrit on Stack Overflow
y = i64x2.min_s(a, b)(yis notaandyis notb) is lowered to:VQSUB.S64 Qy, Qb, QaVSHR.S64 Qy, Qy, #63VBSL Qy, Qb, Qay = i64x2.max_s(a, b)(yis notaandyis notb) is lowered to:VQSUB.S64 Qy, Qb, QaVSHR.S64 Qy, Qy, #63VBSL Qy, Qa, Qb