Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats #10383

Closed
voinokin opened this issue May 27, 2018 · 1 comment · Fixed by dotnet/coreclr#17637
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Milestone

Comments

@voinokin
Copy link

The [V]INSERTPS operation differs from similarly named operations that maps to [V]PINSRW (SSE2+) and [V]PINSRB/D/Q (SSE4.1+).

Here's how it is declared in API:

        /// <summary>
        /// __m128 _mm_insert_ps (__m128 a, __m128 b, const int imm8)
        ///   INSERTPS xmm, xmm/m32, imm8
        /// </summary>
        public static Vector128<float> Insert(Vector128<float> value, float data, byte index) => Insert(value, data, index);

In fact, the operation either loads the value from [m32] and merges it with source XMM reg at specified position, or merges value of selected 32-bit element from XMM reg (2nd operand) with source XMM reg (1st operand).
Additionally, it can zero some or all elements of result.

Here's how it is implemented in CPU:

INSERTPS (128-bit Legacy SSE version)

IF (SRC = REG) THEN COUNT_S←imm8[7:6]
    ELSE COUNT_S←0
COUNT_D ←imm8[5:4]
ZMASK ←imm8[3:0]
CASE (COUNT_S) OF
    0: TMP←SRC[31:0]
    1: TMP←SRC[63:32]
    2: TMP←SRC[95:64]
    3: TMP←SRC[127:96]
ESAC;
CASE (COUNT_D) OF
    0: TMP2[31:0]←TMP
        TMP2[127:32] ←DEST[127:32]
    1: TMP2[63:32]←TMP
        TMP2[31:0] ←DEST[31:0]
        TMP2[127:64] ←DEST[127:64]
    2: TMP2[95:64]←TMP
        TMP2[63:0] ←DEST[63:0]
        TMP2[127:96] ←DEST[127:96]
    3: TMP2[127:96]←TMP
        TMP2[95:0] ←DEST[95:0]
ESAC;
IF (ZMASK[0] = 1) THEN DEST[31:0]←00000000H
    ELSE DEST[31:0]←TMP2[31:0]
IF (ZMASK[1] = 1) THEN DEST[63:32]←00000000H
    ELSE DEST[63:32]←TMP2[63:32]
IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
    ELSE DEST[95:64]←TMP2[95:64]
IF (ZMASK[3] = 1) THEN DEST[127:96]←00000000H
    ELSE DEST[127:96]←TMP2[127:96]
DEST[MAXVL-1:128] (Unmodified)

VINSERTPS (VEX.128 and EVEX encoded version)

IF (SRC = REG) THEN COUNT_S←imm8[7:6]
    ELSE COUNT_S←0
COUNT_D ← imm8[5:4]
ZMASK ← imm8[3:0]
CASE (COUNT_S) OF
    0: TMP←SRC2[31:0]
    1: TMP←SRC2[63:32]
    2: TMP←SRC2[95:64]
    3: TMP←SRC2[127:96]
ESAC;
CASE (COUNT_D) OF
    0: TMP2[31:0]←TMP
        TMP2[127:32] ← SRC1[127:32]
    1: TMP2[63:32]←TMP
        TMP2[31:0] ← SRC1[31:0]
        TMP2[127:64] ← SRC1[127:64]
    2: TMP2[95:64]←TMP
        TMP2[63:0] ← SRC1[63:0]
        TMP2[127:96] ← SRC1[127:96]
    3: TMP2[127:96]←TMP
        TMP2[95:0] ← SRC1[95:0]
ESAC;
IF (ZMASK[0] = 1) THEN DEST[31:0]←00000000H
    ELSE DEST[31:0]←TMP2[31:0]
IF (ZMASK[1] = 1) THEN DEST[63:32]←00000000H
    ELSE DEST[63:32]←TMP2[63:32]
IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
    ELSE DEST[95:64]←TMP2[95:64]
IF (ZMASK[3] = 1) THEN DEST[127:96]←00000000H
    ELSE DEST[127:96]←TMP2[127:96]
DEST[MAXVL-1:128] ← 0
@voinokin voinokin changed the title HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector or 32-bit floats HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats May 27, 2018
@fiigii
Copy link
Contributor

fiigii commented May 28, 2018

Ah, good catch. Yes, SSE4.1 insertps indeed has different semantics from other types. We need to fix this API (perhaps, dotnet/coreclr#17637 is a good opportunity).

cc @CarolEidt @tannergooding

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants