-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for AVX512 BF16 instructions #5483
Comments
There are three BF16 instructions. One of them writes to a "smaller" destination register. I am not sure how to encode this size information in the decode tables. I used VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data
Example This is the objdump output for the different operand sizes
This is what the DR disassembler produces if I set the destination operand to
Any idea what I should set the operand size to? This is what I have currently
|
I think you want OPSZ_half_16_vex32_evex64? |
I don't see it in the enum (https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/opnd_api.h#L69). Do you mean I should add it? |
It's in the set of weird extra OPSZ values at Line 165 in 17822fe
|
**AVX512 bfloat16 instructions** These are the three bfloat16 instructions. VCVTNE2PS2BF16—Convert Two Packed Single Data to One Packed BF16 Data ``` EVEX.128.F2.0F38.W0 72 /r VCVTNE2PS2BF16 xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst EVEX.256.F2.0F38.W0 72 /r VCVTNE2PS2BF16 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst EVEX.512.F2.0F38.W0 72 /r VCVTNE2PS2BF16 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst Op/En Tuple Operand 1 Operand 2 Operand 3 A Full ModRM:reg (w) EVEX.vvvv (r) ModRM:r/m (r) ``` VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data ``` EVEX.128.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, xmm2/m128/m32bcst EVEX.256.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, ymm2/m256/m32bcst EVEX.512.F3.0F38.W0 72 /r VCVTNEPS2BF16 ymm1{k1}{z}, zmm2/m512/m32bcst Op/En Tuple Operand 1 Operand 2 A Full ModRM:reg (w) ModRM:r/m (r) ``` VDPBF16PS—Dot Product of BF16 Pairs Accumulated into Packed Single Precision ``` EVEX.128.F3.0F38.W0 52 /r VDPBF16PS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst EVEX.256.F3.0F38.W0 52 /r VDPBF16PS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst EVEX.512.F3.0F38.W0 52 /r VDPBF16PS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst Op/En Tuple Operand 1 Operand 2 Operand 3 A Full ModRM:reg (w) EVEX.vvvv (r) ModRM:r/m (r) ``` **List of places to update** From https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/x86/opcode_api.h#L53 ``` * When adding new instructions, be sure to update all of these places: * 1) decode_table op_instr array * 2) decode_table decoding table entries * 3) OP_ enum (here) via x86opnums.pl * 4) update OP_LAST at end of enum here * 5) decode_fast tables if necessary (they are conservative) * 6) instr_create macros * 7) suite/tests/api/ir* tests * 8) add binutils tests in third_party/binutils/test_decenc ``` **Step 1: update `op_instr` array** Added entries to `op_instr`. These point directly to `evex_Wb_extensions` since these instructions only have `evex` encoding. **Step 2: add decode_table entries** - updated `third_byte_38` table to point to `prefix_extensions` since these instructions have common opcodes and differ in prefix. - added entries in `prefix_extensions` to point to appropriate vex/evex entries - added entries in `evex_Wb_extensions` The instructions `VCVTNEPS2BF16` and `VCVTNE2PS2BF16` have three byte opcodes starting with `0f 38` so the decoder looks at `third_byte_38[third_byte_38_index[opcode]]`. Since these instructions have the same opcode (`72`) and differ only in the prefix (`f2/f3`), we need to point the `third_byte_38` to `prefix_extensions` which in turn points to the appropriate `EVEX_Wb` entries. The instruction `VDPBF16PS` has the same opcode (52) as the VNNI instruction `vpdpwsd` and they differ only in the prefix (`F3/66`). We need to update that entry to point to `prefix_extensions` instead of `e_vex_extensions`. This causes the `e_vex_extensions` entry ( `e_vex ext 151`) to be orphaned - do we remove this entry? Updated opcodes for invalid entries in e_vex ext 151 and 152 for consistency. **Step 3: add OP_ enums** Done **Step 4: update OP_LAST** Not needed since OP_LAST already points to the last enum. **Step 5: decode_fast tables if necessary** Not done **Step 6: instr_create macros** Added `1dst_3src` macros for `VCVTNE2PS2BF16` and `VDPBF16PS` since they write to operand 1 and read from mask register, operand 2, and operand 3. Added `1dst_2src` macro for `VCVTNEPS2BF16` since it writes to operand 1 and reads from mask register and operand 2. **Step 7: suite/tests/api/ir tests** Added tests in ir_x86_3args_avx512_evex_mask.h and ir_x86_4args_avx512_evex_mask_C.h. Added manual tests in ir_x86.c but these have two issues: - operand size of `vcvtneps2bf16` is incorrect. I tried `OPSZ_half_16_vex32_evex64` and `OPSZ_half_16_vex32` but did not get expected values - not sure how to encode broadcast or zeroctl so tests like the following are failing ``` vcvtne2ps2bf16 (%r9){1to16}, %zmm29, %zmm30 #AVX512_BF16 BROADCAST_EN vcvtne2ps2bf16 -8192(%rdx){1to16}, %zmm29, %zmm30{%k7}{z} #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL ``` **Step 8: binutils tests** Pending Issue: #5483
Thanks for the info. I tried it but still seeing the same. I also tried I am also not sure how to encode broadcast or zeroctl so tests like the following are failing. Is there some API to set these attributes?
|
Use instr_set_prefix_flag to set prefixes (need these to be defined in public headers) Issue: #5483
I was able to set those prefixes using |
Updated destination to use OPSZ_half_16_vex32. Fixed test in ir_x86.c to use ZMM but set opnd size to OPSZ_32. Disabled the test in ir_x86_3args_avx512_evex_mask.h. Set broadcast bit for VNNI/BF16 EVEX_Wb_EXT.b entries Issue: #5483
Some prefix flags are only used during decoding and the information they convey is, after decoding, present in another aspect of the IR. Such flags are not made public as they are private to the decoder. |
So I realized that with
The ideal output (from objdump) looks like this
With this change, the tests that do encode/decode followed by I was able to get the test to pass if I use |
So like I mentioned above I am trying to encode some binutils tests that have broadcast/zero prefixes. Is there a better way to set these prefixes?
|
Look at e.g. the EVEX encoded vfmadd213ps around evex_W_ext 63. You'll need two separate lines in the table, one for the vector version and another for the broadcast version, and OPCODE_TWOBYTES is used to indicate the value of EVEX.b |
I did have those lines but I was trying to get the encode test going and I wasn't sure how to specify those flags while encoding. I did realize that the operand size indicates whether it is broadcast or not, so the broadcast prefix need not be specified explicitly. I updated the code to change the operand size and not set the broadcast prefix. I am still now sure how to indicate For example consider these instructions
I can specify the first thee with the
This is my code and I am not sure how to specify
|
There is not currently any way to create a {z} instruction from scratch other than using instr_set_prefix_flag, afaik. |
So should we make this flag public? |
API design is a @derekbruening question, I just deal with the giant decode table :) |
I filed #5488 on this. It looks like something that was overlooked in the original AVX-512 work. |
- Added binutils tests that do encode and match against expected opcodes (this is opposite of the current binutils tests) - Removed 66h prefix to pass binutils test - Broadcast entries were not reachable. Fixed operand size for broadcast. (Updated VNNI tests also) Issue: #5483
I'll summarize the major issues I came across while working on this issue. This time I have added binutils tests that encode the assembly instructions using zeroing support in encoder need to set opnd size to Similarly I am setting dest to
disassembler shows data16 prefix I had to move
|
Replaced asserts with printfs since test compares test output Issue: #5483
@derekbruening @khuey any comments on the three issues mentioned above? |
Right.
You should be able to do tests the same way that vcvtdq2pd_xlok0xlo and friends work (e.g. using REGARG_PARTIAL(XMM1, OPSZ_8), REGARG_PARTIAL(YMM1, OPSZ_16), REGARG_PARTIAL(ZMM1, OPSZ_32) for the half-width operands). Grepping for vcvtdq2pd is instructive to see how we handle this elsewhere. For the binutils test we actually have the "wrong" disassembly checked in. For
I think this is what the REQUIRES_PREFIX/reqp flag is for. If you look at the code that handles it in
Is this stuff even right at all? C4 is a VEX prefix but doesn't vpdpwssd require EVEX? |
removed unnecessary data16 prefix from test and fixed decode table entries enabled vcvtneps2bf16 tests after using REGARG_PARTIAL updated binutils encode tests to use opnd_create_reg_partial made encode tests more compact by renaming some macros Issue: #5483
Both are supported. AVX-VNNI was introduced on Alder Lake. |
Ah, fun. I guess they had to do that since they nerfed AVX-512 there. |
Add a pointer to any prior users list discussion.
NA
Is your feature request related to a problem? Please describe.
Unable to run programs that use BF16 instructions.
Describe the solution you'd like
Enable programs that use BF16 instructions
Do you have any implementation in mind for this feature?
Add encoder/decoder support and tests.
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered: