Update vISA spec

Closes #312
intel · Oct 21, 2024 · 1544985 · 1544985
1 parent fa65306
commit 1544985
Show file tree

Hide file tree

Showing 34 changed files with 570 additions and 209 deletions.
diff --git a/documentation/visa/1_introduction.md b/documentation/visa/1_introduction.md
@@ -116,12 +116,12 @@ References and Related Information
 ==================================
 
 \[1\] Intel Graphics Media Accelerator Developer's Guide,
-<http://software.intel.com/en-us/articles/intel-graphics-media-accelerator-developers-guide/>.
+<https://www.intel.com/content/dam/develop/external/us/en/documents/intel-integrated-graphics-performance-developer-s-guide-v2-6-7-166010.pdf>
 
 \[2\] United States Patent 7257695, 2007.
 
 \[3\] Intel C++ Compiler User and Reference Guides,
-<http://www.intel.com/cd/software/products/asmo-na/eng/347618.htm>.
+<https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-10/overview.html>
 
 \[4\] Intel Media SDK Reference Manual, Supplement A: Frame VME
 Emulation Library for Intel SNB Graphics, April 8, 2009 (Version 1.21s).

diff --git a/documentation/visa/2_datatypes.md b/documentation/visa/2_datatypes.md
@@ -191,8 +191,9 @@ Float to Float with Lower Precision
 -----------------------------------
 
 Converting a floating-point number to another type with lower precision
-(DF -&gt; F, DF -&gt; HF, F -&gt; HF) uses the round to zero rounding
-mode.
+(DF -&gt; F, DF -&gt; HF, F -&gt; HF) uses the rounding mode set in the
+control register, which can be RTNE (Round to Nearest or Even), RU
+(Round Up), RD (Round Down), or RTZ (Round Toward Zero).
 
 | Source Float (F, DF) | Destination Float (HF, F) |
 | --- | --- |

diff --git a/documentation/visa/4_visa_header.md b/documentation/visa/4_visa_header.md
@@ -412,8 +412,8 @@ which have special meanings in the program.
 | | | | | | **Pause counter (tm.4): {ICLLP+}** bit0-9 stores the pause duration. Bit0-4 must be zero. |
 | | | | | | Writing to the pause counter causes the thread to pause (no new instructions issued) for approximately the cycles specified. |
 | V7(%r0) | 8 | UD | R | Yes | The r0 register. The variable consists of eight dwords that represent the R0 thread payload header.\ ** ** |
-| V8(%arg) | 256 | UD | R/W | Yes | The argument variable. It consists of up to 256 dwords and is used for argument passing between functions. The actual number of elements used by each function is specified in the vISA function object. |
-| V9(%retval) | 96 | UD | R/W | Yes | The return value variable. It consists of up to 96 dwords and is used to store the return value for function calls. The actual number of elements used by each function is specified in the vISA function object. |
+| V8(%arg) | 256/512 | UD | R/W | Yes | The argument variable. It consists of up to 32 GRFs (256 elements for PrePVC/512 elements for PVC+) and is used for argument passing between functions. The actual number of elements used by each function is specified in the vISA function object. |
+| V9(%retval) | 96/192 | UD | R/W | Yes | The return value variable. It consists of up to 12 GRFs (96 elements for PrePVC/192 elements for PVC+) and is used to store the return value for function calls. The actual number of elements used by each function is specified in the vISA function object. |
 | V10(%sp) | 1 | UD | R/W | No | The stack pointer variable. |
 | V11(%fp) | 1 | UD | R/W | No | The frame pointer varible. |
 | V12(%hw_id) | 1 | UD | R | No | The HW thread id. It is a unique identifier for all concurrent threads, with range from [0, max_num_HW_threads-1]. The maximum number of hardware threads is platform and configuration dependent. |

diff --git a/documentation/visa/5_operands.md b/documentation/visa/5_operands.md
@@ -111,8 +111,11 @@ elements accessed by a multi-address indirect source operand:
  }
 ```
 
-The behavior is again undefined if an indirect operand attempts to
-access an out-of-bound address or data element. The behavior is also
+The addresses in an indirect operand may only be generated by the special
+ADDR_ADD instruction, and they must fall within range of the general variable
+whose address is taken.
+The behavior is undefined if an indirect operand attempts to
+access an out-of-bound address. The behavior is also
 undefined if any of the address offset is not aligned to the type of the
 indirect operand. The figure below illustrates the use of regions for
 indirect operands.
@@ -148,6 +151,16 @@ of the JIT compiler's variable alignment rules, which are as follows:**
 1. Multi-address indirect operand may not be used as the destination of
  an instruction.
 
+2. If a multi-address indirect operand is used inside divergent control
+ flow, the inactive channels must still have a valid address even if
+ they do not execute. A valid address is defined as an address that
+ is within range of a general variable and is also aligned to the
+ indirect operand's type.
+
+**Implementation Note: One possible way to satisfy rule 2 is to
+ explicitly initialize an address register to zero for all lanes
+ inside divergent control flow.**
+
 The behavior is undefined if any of the region rules are violated.
 
 Predication

diff --git a/documentation/visa/6_instructions.md b/documentation/visa/6_instructions.md
@@ -84,6 +84,8 @@ Arithmetic Instructions
 
 - [INV - reciprocal](instructions/INV.md)
 
+- [INVM - Div macro {SKL,XEHP+}](instructions/INVM.md)
+
 - [LOG - logarithm](instructions/LOG.md)
 
 - [LRP - linear interpolation {-ICLLP}](instructions/LRP.md)
@@ -100,7 +102,7 @@ Arithmetic Instructions
 
 - [MULH - multiply high](instructions/MULH.md)
 
-- [PLN - plane {-TGLLP}](instructions/PLN.md)
+- [PLN - plane {-ICLLP}](instructions/PLN.md)
 
 - [POW - power](instructions/POW.md)
 
@@ -114,6 +116,8 @@ Arithmetic Instructions
 
 - [RSQRT - inverse square root](instructions/RSQRT.md)
 
+- [RSQTM - Inverse square root macro {SKL,XEHP+}](instructions/RSQTM.md)
+
 - [SAD2 - two-wide sum of absolute {-ICLLP}](instructions/SAD2.md)
 
 - [SAD2ADD - two-wide sum of absolute differences and addition {-ICLLP}](instructions/SAD2ADD.md)
@@ -158,7 +162,7 @@ Control Flow Instructions
 
 - [SUBROUTINE - subroutine](instructions/SUBROUTINE.md)
 
-- [SWITCHJMP - switch jump table {PVC+,-TGLLP}](instructions/SWITCHJMP.md)
+- [SWITCHJMP - switch jump table {-TGLLP,PVC+}](instructions/SWITCHJMP.md)
 
 
 Data Movement Instructions
@@ -292,19 +296,19 @@ Surface-based Memory Access Instructions
 SVM - Shared Virtual Memory Access
 ----------------------------------
 
-- [SVM_BLOCK_ST - SMV Block Store](instructions/SVM_BLOCK_ST.md)
-
-- [SVM_ATOMIC - SVM atomic operations](instructions/SVM_ATOMIC.md)
+- [SVM_SCATTER - SMV scatter](instructions/SVM_SCATTER.md)
 
 - [SVM_BLOCK_LD - SMV Block Load](instructions/SVM_BLOCK_LD.md)
 
-- [SVM_GATHER4_SCALED - SVM gather4 with scaling pitch](instructions/SVM_GATHER4_SCALED.md)
+- [SVM_ATOMIC - SVM atomic operations](instructions/SVM_ATOMIC.md)
 
-- [SVM_GATHER - SMV gather](instructions/SVM_GATHER.md)
+- [SVM_GATHER4_SCALED - SVM gather4 with scaling pitch](instructions/SVM_GATHER4_SCALED.md)
 
 - [SVM_SCATTER4_SCALED - SVM scatter4 with scaling pitch](instructions/SVM_SCATTER4_SCALED.md)
 
-- [SVM_SCATTER - SMV scatter](instructions/SVM_SCATTER.md)
+- [SVM_GATHER - SMV gather](instructions/SVM_GATHER.md)
+
+- [SVM_BLOCK_ST - SMV Block Store](instructions/SVM_BLOCK_ST.md)
 
 
 Synchronization Instructions

diff --git a/documentation/visa/7_appendix_debug_information.md b/documentation/visa/7_appendix_debug_information.md
@@ -231,7 +231,7 @@ Debug information header format is as follows:
  VarLiveIntervalsGenISA retAddr;
  uw numCalleeSaveEntries;
  PhyRegSaveInfoPerIP calleeSaveEntry[numCalleeSaveEntries];
- uw numCallerSaveEntries;
+ ud numCallerSaveEntries;
  PhyRegSaveInfoPerIP callerSaveEntry[numCallerSaveEntries];
  }
 ```

diff --git a/documentation/visa/appendix_instruction_by_platform.md b/documentation/visa/appendix_instruction_by_platform.md
@@ -57,6 +57,7 @@ SPDX-License-Identifier: MIT
 |IFCALL | Y | Y | Y | Y | Y |
 |INFO | Y | Y | Y | Y | Y |
 |INV | Y | Y | Y | Y | Y |
+|INVM | | Y | | | |
 |JMP | Y | Y | Y | Y | Y |
 |LABEL | Y | Y | Y | Y | Y |
 |LIFETIME | Y | Y | Y | Y | Y |
@@ -83,7 +84,7 @@ SPDX-License-Identifier: MIT
 |OWORD_LD | Y | Y | Y | Y | Y |
 |OWORD_LD_UNALIGNED | Y | Y | Y | Y | Y |
 |OWORD_ST | Y | Y | Y | Y | Y |
-|PLN | Y | Y | Y | Y | |
+|PLN | Y | Y | Y |   | |
 |POW | Y | Y | Y | Y | Y |
 |QW_GATHER | | | | | |
 |QW_SCATTER | | | | | |
@@ -97,6 +98,7 @@ SPDX-License-Identifier: MIT
 |ROL | | | | Y | Y |
 |ROR | | | | Y | Y |
 |RSQRT | Y | Y | Y | Y | Y |
+|RSQTM | | Y | | | |
 |RT_WRITE | Y | Y | Y | Y | Y |
 |SAD2 | Y | Y | Y | | |
 |SAD2ADD | Y | Y | Y | | |

diff --git a/documentation/visa/instructions/ADDR_ADD.md b/documentation/visa/instructions/ADDR_ADD.md
@@ -96,8 +96,9 @@ ADDR_ADD (<exec_size>) <dst> <src0> <src1>
 
 
 ```
- If src0 is a general operand, the byte address of the general variable is taken and the row and column offset are added to it to produce the address value. In this scenario Src0's region must be <0;1,0>, implying that all channels receive the same value for Src0. If src0 is a state operand, the byte address of the state variable (one of surface/sampler) is taken and the offset is then added to it to produce the address value. The result value in Dst must point to the same variable as Src0 (i.e., if src0 points to an element in v1, src0 + src1 must point to another element in v1). Predication is not supported for this instruction. Src0 must neither be a pre-defined variable nor a pre-defined surface, except for V13(%arg) and V14(%retval).
+ If src0 is a general operand, the byte address of the general variable is taken and the row and column offset are added to it to produce the address value. In this scenario Src0's region must be <0;1,0>, implying that all channels receive the same value for Src0. If src0 is a state operand, the byte address of the state variable (one of surface/sampler) is taken and the offset is then added to it to produce the address value. The resulting address in Dst may be arbitrary value, but the behavior of accessing an out-of-range or unaligned address is undefined (i.e., if src0 points to an element in v1, src0 + src1 may go out of bound for v1, but using such address in an indirect operand is undefined). As far as the finalizer is concerned, the ADDR_ADD instruction is just adding two integers representing GRF byte offsets.
+
+ Predication is not supported for this instruction. Src0 must neither be a pre-defined variable nor a pre-defined surface, except for V13(%arg) and V14(%retval).
 
- It is up to the front-end compiler to ensure that resulting address has the right alignment before it is used in an indirect operand. As far as the finalizer is concerned it is just adding two integers representing GRF byte offsets.
 ```
 
diff --git a/documentation/visa/instructions/COS.md b/documentation/visa/instructions/COS.md
@@ -84,6 +84,10 @@ SPDX-License-Identifier: MIT
 - **Source Modifier:** arithmetic
 
 
+#### Operand type maps
+- **Type map**
+ - **Dst types:** F, HF
+ - **Src types:** F, HF
 
 
 ## Text

diff --git a/documentation/visa/instructions/DIV.md b/documentation/visa/instructions/DIV.md
@@ -82,11 +82,21 @@ SPDX-License-Identifier: MIT
 
 
 #### Properties
-- **Supported Types:** B,D,F,HF,UB,UD,UW,W
+- **Supported Types:** B,D,DF,F,HF,UB,UD,UW,W
 - **Saturation:** Only when type is float
 - **Source Modifier:** arithmetic
 
 
+#### Operand type maps
+- **Type map**
+ - **Dst types:** DF
+ - **Src types:** DF
+- **Type map**
+ - **Dst types:** F, HF
+ - **Src types:** F, HF
+- **Type map**
+ - **Dst types:** UD, D, UW, W, UB, B
+ - **Src types:** UD, D, UW, W, UB, B
 
 
 ## Text
@@ -100,7 +110,7 @@ SPDX-License-Identifier: MIT
 
 
 
-
+```
 
 
 Integer divide with signed inputs follow the rules below for the signs of the quotient and remainder.
@@ -116,4 +126,5 @@ Integer divide with signed inputs follow the rules below for the signs of the qu
  +------------+------------------+-----+-----+-----+-----+
 
 Floating point divide (x/y) is implemented as x * INV(y). DIVM provides the IEEE-conforming correctly rounded results.
+```