Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AVX512 BF16 instructions #5483

Closed
prasun3 opened this issue May 5, 2022 · 20 comments · Fixed by #5489
Closed

Add support for AVX512 BF16 instructions #5483

prasun3 opened this issue May 5, 2022 · 20 comments · Fixed by #5489
Assignees

Comments

@prasun3
Copy link
Contributor

prasun3 commented May 5, 2022

Add a pointer to any prior users list discussion.
NA

Is your feature request related to a problem? Please describe.
Unable to run programs that use BF16 instructions.

Describe the solution you'd like
Enable programs that use BF16 instructions

Do you have any implementation in mind for this feature?
Add encoder/decoder support and tests.

Describe alternatives you've considered
None

Additional context
None

@prasun3 prasun3 self-assigned this May 5, 2022
@prasun3
Copy link
Contributor Author

prasun3 commented May 5, 2022

There are three BF16 instructions. One of them writes to a "smaller" destination register. I am not sure how to encode this size information in the decode tables. I used Ve but that sets the destination reg size to be the same as the source reg size. I looked at a few other OPSZ_ enums but couldn't figure out one that works.

VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data

EVEX.128.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, xmm2/m128/m32bcst
EVEX.256.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, ymm2/m256/m32bcst
EVEX.512.F3.0F38.W0 72 /r VCVTNEPS2BF16 ymm1{k1}{z}, zmm2/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2
A       Full    ModRM:reg (w)   ModRM:r/m (r)

Example

This is the objdump output for the different operand sizes

 62 f2 7e 0b 72 d3       vcvtneps2bf16 xmm2{k3},xmm3
 62 d2 7e 2b 72 d3       vcvtneps2bf16 xmm2{k3},ymm11
 62 92 7e 4b 72 d7       vcvtneps2bf16 ymm2{k3},zmm31

This is what the DR disassembler produces if I set the destination operand to Ve. Note that instead of xmm2 and ymm2 in the last two cases, it picks ymm2 and zmm2.

 62 f2 7e 0b 72 d3    vcvtneps2bf16 xmm2 {k3}, xmm3
 62 d2 7e 2b 72 d3    vcvtneps2bf16 ymm2 {k3}, ymm11
 62 92 7e 4b 72 d7    vcvtneps2bf16 zmm2 {k3}, zmm31

Any idea what I should set the operand size to? This is what I have currently

  },{ /* evex_W_ext 272 */
    {OP_vcvtneps2bf16, 0xf3387208, "vcvtneps2bf16", Ve, xx, KEd, We, xx, mrm|evex|ttfv, x, END_LIST},
    {OP_vcvtneps2bf16, 0xf3387208, "vcvtneps2bf16", Ve, xx, KEd, We, xx, mrm|evex|ttfv, x, END_LIST},
    {INVALID, 0, "(bad)", xx, xx, xx, xx, xx, no, x, NA},
    {INVALID, 0, "(bad)", xx, xx, xx, xx, xx, no, x, NA},

@khuey
Copy link
Contributor

khuey commented May 5, 2022

I think you want OPSZ_half_16_vex32_evex64?

@prasun3
Copy link
Contributor Author

prasun3 commented May 5, 2022

I don't see it in the enum (https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/opnd_api.h#L69). Do you mean I should add it?

@khuey
Copy link
Contributor

khuey commented May 5, 2022

It's in the set of weird extra OPSZ values at

OPSZ_half_16_vex32_evex64, /* 64 bits, but can be half of XMM register;

prasun3 added a commit that referenced this issue May 6, 2022
**AVX512 bfloat16 instructions**

These are the three bfloat16 instructions.

VCVTNE2PS2BF16—Convert Two Packed Single Data to One Packed BF16 Data
```
EVEX.128.F2.0F38.W0 72 /r VCVTNE2PS2BF16 xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F2.0F38.W0 72 /r VCVTNE2PS2BF16 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F2.0F38.W0 72 /r VCVTNE2PS2BF16 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2       Operand 3
A       Full    ModRM:reg (w)   EVEX.vvvv (r)   ModRM:r/m (r)
```
VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data
```
EVEX.128.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, xmm2/m128/m32bcst
EVEX.256.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, ymm2/m256/m32bcst
EVEX.512.F3.0F38.W0 72 /r VCVTNEPS2BF16 ymm1{k1}{z}, zmm2/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2
A       Full    ModRM:reg (w)   ModRM:r/m (r)
```

VDPBF16PS—Dot Product of BF16 Pairs Accumulated into Packed Single Precision
```
EVEX.128.F3.0F38.W0 52 /r VDPBF16PS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F3.0F38.W0 52 /r VDPBF16PS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F3.0F38.W0 52 /r VDPBF16PS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst

Op/En   Tuple   Operand 1       Operand 2       Operand 3
A       Full    ModRM:reg (w)   EVEX.vvvv (r)   ModRM:r/m (r)
```

**List of places to update**

From https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/x86/opcode_api.h#L53

```
 * When adding new instructions, be sure to update all of these places:
 *   1) decode_table op_instr array
 *   2) decode_table decoding table entries
 *   3) OP_ enum (here) via x86opnums.pl
 *   4) update OP_LAST at end of enum here
 *   5) decode_fast tables if necessary (they are conservative)
 *   6) instr_create macros
 *   7) suite/tests/api/ir* tests
 *   8) add binutils tests in third_party/binutils/test_decenc

```

**Step 1: update `op_instr` array**

Added entries to `op_instr`. These point directly to `evex_Wb_extensions` since these instructions only have `evex` encoding.

**Step 2: add decode_table entries**

   - updated `third_byte_38` table to point to `prefix_extensions` since these instructions have common opcodes and differ in prefix.
   - added entries in `prefix_extensions` to point to appropriate vex/evex entries
   - added entries in `evex_Wb_extensions`

The instructions `VCVTNEPS2BF16` and `VCVTNE2PS2BF16` have three byte opcodes starting with `0f 38` so the decoder looks at `third_byte_38[third_byte_38_index[opcode]]`. Since these instructions have the same opcode (`72`) and differ only in the prefix (`f2/f3`), we need to point the `third_byte_38` to `prefix_extensions` which in turn points to the appropriate `EVEX_Wb` entries.

The instruction `VDPBF16PS` has the same opcode (52) as the VNNI instruction `vpdpwsd` and they differ only in the prefix (`F3/66`). We need to update that entry to point to `prefix_extensions` instead of `e_vex_extensions`. This causes the `e_vex_extensions` entry ( `e_vex ext 151`) to be orphaned - do we remove this entry?

Updated opcodes for invalid entries in e_vex ext 151 and 152 for consistency.

**Step 3: add OP_ enums**

Done

**Step 4: update OP_LAST**

Not needed since OP_LAST already points to the last enum.

**Step 5: decode_fast tables if necessary**

Not done

**Step 6: instr_create macros**

Added `1dst_3src` macros for `VCVTNE2PS2BF16` and `VDPBF16PS` since they write to operand 1 and read from mask register, operand 2, and operand 3.

Added `1dst_2src` macro for `VCVTNEPS2BF16` since it writes to operand 1 and reads from mask register and operand 2.

**Step 7: suite/tests/api/ir tests**

Added tests in ir_x86_3args_avx512_evex_mask.h and ir_x86_4args_avx512_evex_mask_C.h.

Added manual tests in ir_x86.c but these have two issues:
   - operand size of `vcvtneps2bf16` is incorrect. I tried `OPSZ_half_16_vex32_evex64` and `OPSZ_half_16_vex32` but did not get expected values
   - not sure how to encode broadcast or zeroctl so tests like the following are failing

```
vcvtne2ps2bf16  (%r9){1to16}, %zmm29, %zmm30   #AVX512_BF16 BROADCAST_EN
vcvtne2ps2bf16  -8192(%rdx){1to16}, %zmm29, %zmm30{%k7}{z}   #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
```

**Step 8: binutils tests**

Pending

Issue: #5483
@prasun3
Copy link
Contributor Author

prasun3 commented May 6, 2022

Thanks for the info. I tried it but still seeing the same. I also tried OPSZ_half_16_vex32 but did not see expected results.

I am also not sure how to encode broadcast or zeroctl so tests like the following are failing. Is there some API to set these attributes?

vcvtne2ps2bf16  (%r9){1to16}, %zmm29, %zmm30   #AVX512_BF16 BROADCAST_EN
vcvtne2ps2bf16  -8192(%rdx){1to16}, %zmm29, %zmm30{%k7}{z}   #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL

prasun3 added a commit that referenced this issue May 6, 2022
Use instr_set_prefix_flag to set prefixes (need these to be defined in
public headers)

Issue: #5483
@prasun3
Copy link
Contributor Author

prasun3 commented May 6, 2022

I was able to set those prefixes using instr_set_prefix_flag. The prefixes (PREFIX_EVEX_b, PREFIX_EVEX_z) are defined in decode_private.h so they are not available in ir_x86.c. Should we move these to a public header? I have defined them locally for now.

prasun3 added a commit that referenced this issue May 6, 2022
Updated destination to use OPSZ_half_16_vex32. Fixed test in ir_x86.c to
use ZMM but set opnd size to OPSZ_32. Disabled the test in
ir_x86_3args_avx512_evex_mask.h.

Set broadcast bit for VNNI/BF16 EVEX_Wb_EXT.b entries

Issue: #5483
@derekbruening
Copy link
Contributor

I was able to set those prefixes using instr_set_prefix_flag. The prefixes (PREFIX_EVEX_b, PREFIX_EVEX_z) are defined in decode_private.h so they are not available in ir_x86.c. Should we move these to a public header? I have defined them locally for now.

Some prefix flags are only used during decoding and the information they convey is, after decoding, present in another aspect of the IR. Such flags are not made public as they are private to the decoder.

@prasun3
Copy link
Contributor Author

prasun3 commented May 6, 2022

So I realized that with OPSZ_half_16_vex32 it shows the right size if I use dr disas syntax even though the reg name doesn't match.

 62 f2 7e 0b 72 d3    vcvtneps2bf16 {%k3} %xmm3 -> %xmm2[8byte]
 62 d2 7e 2b 72 d3    vcvtneps2bf16 {%k3} %ymm11 -> %ymm2[16byte]
 62 92 7e 4b 72 d7    vcvtneps2bf16 {%k3} %zmm31 -> %zmm2[32byte]

The ideal output (from objdump) looks like this

 62 f2 7e 0b 72 d3       vcvtneps2bf16 xmm2{k3},xmm3
 62 d2 7e 2b 72 d3       vcvtneps2bf16 xmm2{k3},ymm11
 62 92 7e 4b 72 d7       vcvtneps2bf16 ymm2{k3},zmm31

With this change, the tests that do encode/decode followed by inst_same check fail if I use YMM_x as the dest reg (because the re-decoded reg is ZMM_x). They also fail if I use ZMM_x as the dest reg (because the re-decoded dst size is OPSZ_32).

I was able to get the test to pass if I use ZMM_x followed by setting the dst size to OPSZ_32 but then I had to disable the tests in ir_x86_3args_avx512_evex_mask.h because I cannot set the size there.

@prasun3
Copy link
Contributor Author

prasun3 commented May 6, 2022

I was able to set those prefixes using instr_set_prefix_flag. The prefixes (PREFIX_EVEX_b, PREFIX_EVEX_z) are defined in decode_private.h so they are not available in ir_x86.c. Should we move these to a public header? I have defined them locally for now.

Some prefix flags are only used during decoding and the information they convey is, after decoding, present in another aspect of the IR. Such flags are not made public as they are private to the decoder.

So like I mentioned above I am trying to encode some binutils tests that have broadcast/zero prefixes. Is there a better way to set these prefixes?

vcvtne2ps2bf16  (%r9){1to16}, %zmm29, %zmm30   #AVX512_BF16 BROADCAST_EN
vcvtne2ps2bf16  -8192(%rdx){1to16}, %zmm29, %zmm30{%k7}{z}   #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL

@khuey
Copy link
Contributor

khuey commented May 6, 2022

Look at e.g. the EVEX encoded vfmadd213ps around evex_W_ext 63. You'll need two separate lines in the table, one for the vector version and another for the broadcast version, and OPCODE_TWOBYTES is used to indicate the value of EVEX.b

prasun3 added a commit that referenced this issue May 7, 2022
@prasun3
Copy link
Contributor Author

prasun3 commented May 7, 2022

I did have those lines but I was trying to get the encode test going and I wasn't sure how to specify those flags while encoding.

I did realize that the operand size indicates whether it is broadcast or not, so the broadcast prefix need not be specified explicitly. I updated the code to change the operand size and not set the broadcast prefix. I am still now sure how to indicate {z}.

For example consider these instructions

   0:   62 62 15 40 a8 31       vfmadd213ps zmm30,zmm29,ZMMWORD PTR [rcx]
   6:   62 62 15 50 a8 31       vfmadd213ps zmm30,zmm29,DWORD PTR [rcx]{1to16}
   c:   62 62 15 51 a8 31       vfmadd213ps zmm30{k1},zmm29,DWORD PTR [rcx]{1to16}
  12:   62 62 15 d1 a8 31       vfmadd213ps zmm30{k1}{z},zmm29,DWORD PTR [rcx]{1to16}

I can specify the first thee with the INSTR_CREATE_vfmadd213ps_mask api alone. But to specify the fourth instruction I am calling instr_set_prefix_flag(instr_z, PREFIX_EVEX_z); and it seems to generate the right opcodes.

367:   0x00005645a7587220  62 62 15 40 a8 31    vfmadd213ps {%k0} %zmm29 (%rcx)[64byte] %zmm30 -> %zmm30
367:   0x00005645a7587220  62 62 15 50 a8 31    vfmadd213ps {%k0} %zmm29 (%rcx)[4byte] %zmm30 -> %zmm30
367:   0x00005645a7587220  62 62 15 51 a8 31    vfmadd213ps {%k1} %zmm29 (%rcx)[4byte] %zmm30 -> %zmm30
367:   0x00005645a7587220  62 62 15 d1 a8 31    vfmadd213ps {%k1} %zmm29 (%rcx)[4byte] %zmm30 -> %zmm30

This is my code and I am not sure how to specify vfmadd213ps zmm30{k1}{z},zmm29,DWORD PTR [rcx]{1to16} with the API alone without calling instr_set_prefix_flag

    memarg_disp = 0;
    instr_t *instr_x = INSTR_CREATE_vfmadd213ps_mask(dc, REGARG(ZMM30), REGARG(K0), REGARG(ZMM29), MEMARG(OPSZ_64));
    instr_t *instr_b = INSTR_CREATE_vfmadd213ps_mask(dc, REGARG(ZMM30), REGARG(K0), REGARG(ZMM29), MEMARG(OPSZ_4));
    instr_t *instr_k = INSTR_CREATE_vfmadd213ps_mask(dc, REGARG(ZMM30), REGARG(K1), REGARG(ZMM29), MEMARG(OPSZ_4));
    instr_t *instr_z = INSTR_CREATE_vfmadd213ps_mask(dc, REGARG(ZMM30), REGARG(K1), REGARG(ZMM29), MEMARG(OPSZ_4));
    uint PREFIX_EVEX_z = 0x000800000;
    instr_set_prefix_flag(instr_z, PREFIX_EVEX_z);

@khuey
Copy link
Contributor

khuey commented May 7, 2022

There is not currently any way to create a {z} instruction from scratch other than using instr_set_prefix_flag, afaik.

@prasun3
Copy link
Contributor Author

prasun3 commented May 9, 2022

So should we make this flag public?

@khuey
Copy link
Contributor

khuey commented May 9, 2022

API design is a @derekbruening question, I just deal with the giant decode table :)

@derekbruening
Copy link
Contributor

I filed #5488 on this. It looks like something that was overlooked in the original AVX-512 work.

prasun3 added a commit that referenced this issue May 10, 2022
  - Added binutils tests that do encode and match against expected opcodes
  (this is opposite of the current binutils tests)
  - Removed 66h prefix to pass binutils test
  - Broadcast entries were not reachable. Fixed operand size for
  broadcast. (Updated VNNI tests also)

Issue: #5483
@prasun3
Copy link
Contributor Author

prasun3 commented May 10, 2022

I'll summarize the major issues I came across while working on this issue. This time I have added binutils tests that encode the assembly instructions using instr_create_.. APIs and match against the opcode bytes rather than the opposite because we don't produce disassembly that can match exactly against binutils disassembly.

zeroing support in encoder
As mentioned above there is no straightforward way to encode zeroing instructions. As a workaround I have added the z flag to the tests until we have a solution for #5488

need to set opnd size to OPSZ_half_16_vex32_evex64
Described here: #5483 (comment). Basically we don't have a good way to support instructions like vcvtneps2bf16 ymm2{k3},zmm31 that write to "half" of the destination. I am using the flag OPSZ_half_16_vex32_evex64 in the decode table. I have to set the destination to ZMM2 and set destination size to OPSZ_32 to get this to work.

Similarly I am setting dest to YMMx and set size to OPSZ_16 for the following instructions

  vcvtneps2bf16 (%r9){1to8}, %xmm22  #AVX512{BF16,VL} BROADCAST_EN
  vcvtneps2bf16y  (%rcx){1to8}, %xmm2  #AVX512{BF16,VL} BROADCAST_EN
  vcvtneps2bf16y  4064(%rcx), %xmm23   #AVX512{BF16,VL} Disp8
  vcvtneps2bf16 -4096(%rdx){1to8}, %xmm27{%k7}{z}  #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL

disassembler shows data16 prefix

I had to move vpdpwssd from E_VEX_EXT to PREFIX_EXT since the opcode is shared with vdpbf16ps but then I started getting the test failure below. The data16 prefix is shown in the disassembly if the entry is picked up from the EVEX_Wb_EXT table but not shown if it is picked up from the PREFIX_EXT table. This is because read_prefix_ext clears di->data_prefix if 0x66 is part of the opcode. I changed the entry in PREFIX_EXT from 0x66385218 to 0x385218 to get the test to pass but I think we should fix the test (and fix the EVEX_Wb_EXT handling).

383: Test timeout computed to be: 1500
383: -- diff: 102733,102736c102733,102736
383: <  c4 c2 59 52 d4       vpdpwssd %xmm4, %xmm12, %xmm2
383: <  c4 c2 59 52 d4       vpdpwssd %xmm4, %xmm12, %xmm2
383: <  c4 e2 59 52 11       vpdpwssd %xmm4, (%rcx), %xmm2
383: <  c4 e2 59 52 11       vpdpwssd %xmm4, (%rcx), %xmm2
383: ---
383: >  c4 c2 59 52 d4       data16 vpdpwssd %xmm4, %xmm12, %xmm2
383: >  c4 c2 59 52 d4       data16 vpdpwssd %xmm4, %xmm12, %xmm2
383: >  c4 e2 59 52 11       data16 vpdpwssd %xmm4, (%rcx), %xmm2
383: >  c4 e2 59 52 11       data16 vpdpwssd %xmm4, (%rcx), %xmm2
383:
383: CMake Error at /home/prasun/dynamorio/dynamorio/suite/tests/runcmp.cmake:86 (message):
383:   output in
383:   /home/prasun/dynamorio/build/suite/tests/drdecode_decenc_x86_64.expect-out
383:   failed to match expected output in
383:   /home/prasun/dynamorio/build/suite/tests/drdecode_decenc_x86_64.expect-expect
383:
383:
3/3 Test #383: code_api|decenc.drdecode_decenc_x86_64 ...***Failed    1.39 sec

@prasun3 prasun3 linked a pull request May 10, 2022 that will close this issue
prasun3 added a commit that referenced this issue May 10, 2022
prasun3 added a commit that referenced this issue May 10, 2022
prasun3 added a commit that referenced this issue May 11, 2022
Replaced asserts with printfs since test compares test output

Issue: #5483
prasun3 added a commit that referenced this issue May 12, 2022
@prasun3
Copy link
Contributor Author

prasun3 commented May 13, 2022

@derekbruening @khuey any comments on the three issues mentioned above?

@khuey
Copy link
Contributor

khuey commented May 13, 2022

I'll summarize the major issues I came across while working on this issue. This time I have added binutils tests that encode the assembly instructions using instr_create_.. APIs and match against the opcode bytes rather than the opposite because we don't produce disassembly that can match exactly against binutils disassembly.

zeroing support in encoder As mentioned above there is no straightforward way to encode zeroing instructions. As a workaround I have added the z flag to the tests until we have a solution for #5488

Right.

need to set opnd size to OPSZ_half_16_vex32_evex64 Described here: #5483 (comment). Basically we don't have a good way to support instructions like vcvtneps2bf16 ymm2{k3},zmm31 that write to "half" of the destination. I am using the flag OPSZ_half_16_vex32_evex64 in the decode table. I have to set the destination to ZMM2 and set destination size to OPSZ_32 to get this to work.

Similarly I am setting dest to YMMx and set size to OPSZ_16 for the following instructions

  vcvtneps2bf16 (%r9){1to8}, %xmm22  #AVX512{BF16,VL} BROADCAST_EN
  vcvtneps2bf16y  (%rcx){1to8}, %xmm2  #AVX512{BF16,VL} BROADCAST_EN
  vcvtneps2bf16y  4064(%rcx), %xmm23   #AVX512{BF16,VL} Disp8
  vcvtneps2bf16 -4096(%rdx){1to8}, %xmm27{%k7}{z}  #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL

You should be able to do tests the same way that vcvtdq2pd_xlok0xlo and friends work (e.g. using REGARG_PARTIAL(XMM1, OPSZ_8), REGARG_PARTIAL(YMM1, OPSZ_16), REGARG_PARTIAL(ZMM1, OPSZ_32) for the half-width operands). Grepping for vcvtdq2pd is instructive to see how we handle this elsewhere. For the binutils test we actually have the "wrong" disassembly checked in. For c5 fe e6 e4 we have vcvtdq2pd %ymm4, %ymm4 checked in even though binutils will decode this as vcvtdq2pd %ymm4, %xmm4

disassembler shows data16 prefix

I had to move vpdpwssd from E_VEX_EXT to PREFIX_EXT since the opcode is shared with vdpbf16ps but then I started getting the test failure below. The data16 prefix is shown in the disassembly if the entry is picked up from the EVEX_Wb_EXT table but not shown if it is picked up from the PREFIX_EXT table. This is because read_prefix_ext clears di->data_prefix if 0x66 is part of the opcode. I changed the entry in PREFIX_EXT from 0x66385218 to 0x385218 to get the test to pass but I think we should fix the test (and fix the EVEX_Wb_EXT handling).

I think this is what the REQUIRES_PREFIX/reqp flag is for. If you look at the code that handles it in read_instruction at the very end of that branch it erases whatever prefix_var was from further consideration. If you add reqp to vpdpwssd's decode table entries does it fix this?

383: Test timeout computed to be: 1500
383: -- diff: 102733,102736c102733,102736
383: <  c4 c2 59 52 d4       vpdpwssd %xmm4, %xmm12, %xmm2
383: <  c4 c2 59 52 d4       vpdpwssd %xmm4, %xmm12, %xmm2
383: <  c4 e2 59 52 11       vpdpwssd %xmm4, (%rcx), %xmm2
383: <  c4 e2 59 52 11       vpdpwssd %xmm4, (%rcx), %xmm2
383: ---
383: >  c4 c2 59 52 d4       data16 vpdpwssd %xmm4, %xmm12, %xmm2
383: >  c4 c2 59 52 d4       data16 vpdpwssd %xmm4, %xmm12, %xmm2
383: >  c4 e2 59 52 11       data16 vpdpwssd %xmm4, (%rcx), %xmm2
383: >  c4 e2 59 52 11       data16 vpdpwssd %xmm4, (%rcx), %xmm2

Is this stuff even right at all? C4 is a VEX prefix but doesn't vpdpwssd require EVEX?

prasun3 added a commit that referenced this issue May 15, 2022
removed unnecessary data16 prefix from test and fixed decode table entries
enabled vcvtneps2bf16 tests after using REGARG_PARTIAL
updated binutils encode tests to use opnd_create_reg_partial
made encode tests more compact by renaming some macros

Issue: #5483
@prasun3
Copy link
Contributor Author

prasun3 commented May 16, 2022

Is this stuff even right at all? C4 is a VEX prefix but doesn't vpdpwssd require EVEX?

Both are supported. AVX-VNNI was introduced on Alder Lake.

@khuey
Copy link
Contributor

khuey commented May 16, 2022

Is this stuff even right at all? C4 is a VEX prefix but doesn't vpdpwssd require EVEX?

Both are supported. AVX-VNNI was introduced on Alder Lake.

Ah, fun. I guess they had to do that since they nerfed AVX-512 there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants