Skip to content

Commit

Permalink
Update vISA spec
Browse files Browse the repository at this point in the history
Closes #312
  • Loading branch information
pszymich authored and igcbot committed Oct 21, 2024
1 parent fa65306 commit 1544985
Show file tree
Hide file tree
Showing 34 changed files with 570 additions and 209 deletions.
4 changes: 2 additions & 2 deletions documentation/visa/1_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,12 +116,12 @@ References and Related Information
==================================

\[1\] Intel Graphics Media Accelerator Developer's Guide,
<http://software.intel.com/en-us/articles/intel-graphics-media-accelerator-developers-guide/>.
<https://www.intel.com/content/dam/develop/external/us/en/documents/intel-integrated-graphics-performance-developer-s-guide-v2-6-7-166010.pdf>

\[2\] United States Patent 7257695, 2007.

\[3\] Intel C++ Compiler User and Reference Guides,
<http://www.intel.com/cd/software/products/asmo-na/eng/347618.htm>.
<https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-10/overview.html>

\[4\] Intel Media SDK Reference Manual, Supplement A: Frame VME
Emulation Library for Intel SNB Graphics, April 8, 2009 (Version 1.21s).
Expand Down
5 changes: 3 additions & 2 deletions documentation/visa/2_datatypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,9 @@ Float to Float with Lower Precision
-----------------------------------

Converting a floating-point number to another type with lower precision
(DF -&gt; F, DF -&gt; HF, F -&gt; HF) uses the round to zero rounding
mode.
(DF -&gt; F, DF -&gt; HF, F -&gt; HF) uses the rounding mode set in the
control register, which can be RTNE (Round to Nearest or Even), RU
(Round Up), RD (Round Down), or RTZ (Round Toward Zero).

| Source Float (F, DF) | Destination Float (HF, F) |
| --- | --- |
Expand Down
4 changes: 2 additions & 2 deletions documentation/visa/4_visa_header.md
Original file line number Diff line number Diff line change
Expand Up @@ -412,8 +412,8 @@ which have special meanings in the program.
| | | | | | **Pause counter (tm.4): {ICLLP+}** bit0-9 stores the pause duration. Bit0-4 must be zero. |
| | | | | | Writing to the pause counter causes the thread to pause (no new instructions issued) for approximately the cycles specified. |
| V7(%r0) | 8 | UD | R | Yes | The r0 register. The variable consists of eight dwords that represent the R0 thread payload header.\ ** ** |
| V8(%arg) | 256 | UD | R/W | Yes | The argument variable. It consists of up to 256 dwords and is used for argument passing between functions. The actual number of elements used by each function is specified in the vISA function object. |
| V9(%retval) | 96 | UD | R/W | Yes | The return value variable. It consists of up to 96 dwords and is used to store the return value for function calls. The actual number of elements used by each function is specified in the vISA function object. |
| V8(%arg) | 256/512 | UD | R/W | Yes | The argument variable. It consists of up to 32 GRFs (256 elements for PrePVC/512 elements for PVC+) and is used for argument passing between functions. The actual number of elements used by each function is specified in the vISA function object. |
| V9(%retval) | 96/192 | UD | R/W | Yes | The return value variable. It consists of up to 12 GRFs (96 elements for PrePVC/192 elements for PVC+) and is used to store the return value for function calls. The actual number of elements used by each function is specified in the vISA function object. |
| V10(%sp) | 1 | UD | R/W | No | The stack pointer variable. |
| V11(%fp) | 1 | UD | R/W | No | The frame pointer varible. |
| V12(%hw_id) | 1 | UD | R | No | The HW thread id. It is a unique identifier for all concurrent threads, with range from [0, max_num_HW_threads-1]. The maximum number of hardware threads is platform and configuration dependent. |
Expand Down
17 changes: 15 additions & 2 deletions documentation/visa/5_operands.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,11 @@ elements accessed by a multi-address indirect source operand:
}
```

The behavior is again undefined if an indirect operand attempts to
access an out-of-bound address or data element. The behavior is also
The addresses in an indirect operand may only be generated by the special
ADDR_ADD instruction, and they must fall within range of the general variable
whose address is taken.
The behavior is undefined if an indirect operand attempts to
access an out-of-bound address. The behavior is also
undefined if any of the address offset is not aligned to the type of the
indirect operand. The figure below illustrates the use of regions for
indirect operands.
Expand Down Expand Up @@ -148,6 +151,16 @@ of the JIT compiler's variable alignment rules, which are as follows:**
1. Multi-address indirect operand may not be used as the destination of
an instruction.

2. If a multi-address indirect operand is used inside divergent control
flow, the inactive channels must still have a valid address even if
they do not execute. A valid address is defined as an address that
is within range of a general variable and is also aligned to the
indirect operand's type.

**Implementation Note: One possible way to satisfy rule 2 is to
explicitly initialize an address register to zero for all lanes
inside divergent control flow.**

The behavior is undefined if any of the region rules are violated.

Predication
Expand Down
20 changes: 12 additions & 8 deletions documentation/visa/6_instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ Arithmetic Instructions

- [INV - reciprocal](instructions/INV.md)

- [INVM - Div macro {SKL,XEHP+}](instructions/INVM.md)

- [LOG - logarithm](instructions/LOG.md)

- [LRP - linear interpolation {-ICLLP}](instructions/LRP.md)
Expand All @@ -100,7 +102,7 @@ Arithmetic Instructions

- [MULH - multiply high](instructions/MULH.md)

- [PLN - plane {-TGLLP}](instructions/PLN.md)
- [PLN - plane {-ICLLP}](instructions/PLN.md)

- [POW - power](instructions/POW.md)

Expand All @@ -114,6 +116,8 @@ Arithmetic Instructions

- [RSQRT - inverse square root](instructions/RSQRT.md)

- [RSQTM - Inverse square root macro {SKL,XEHP+}](instructions/RSQTM.md)

- [SAD2 - two-wide sum of absolute {-ICLLP}](instructions/SAD2.md)

- [SAD2ADD - two-wide sum of absolute differences and addition {-ICLLP}](instructions/SAD2ADD.md)
Expand Down Expand Up @@ -158,7 +162,7 @@ Control Flow Instructions

- [SUBROUTINE - subroutine](instructions/SUBROUTINE.md)

- [SWITCHJMP - switch jump table {PVC+,-TGLLP}](instructions/SWITCHJMP.md)
- [SWITCHJMP - switch jump table {-TGLLP,PVC+}](instructions/SWITCHJMP.md)


Data Movement Instructions
Expand Down Expand Up @@ -292,19 +296,19 @@ Surface-based Memory Access Instructions
SVM - Shared Virtual Memory Access
----------------------------------

- [SVM_BLOCK_ST - SMV Block Store](instructions/SVM_BLOCK_ST.md)

- [SVM_ATOMIC - SVM atomic operations](instructions/SVM_ATOMIC.md)
- [SVM_SCATTER - SMV scatter](instructions/SVM_SCATTER.md)

- [SVM_BLOCK_LD - SMV Block Load](instructions/SVM_BLOCK_LD.md)

- [SVM_GATHER4_SCALED - SVM gather4 with scaling pitch](instructions/SVM_GATHER4_SCALED.md)
- [SVM_ATOMIC - SVM atomic operations](instructions/SVM_ATOMIC.md)

- [SVM_GATHER - SMV gather](instructions/SVM_GATHER.md)
- [SVM_GATHER4_SCALED - SVM gather4 with scaling pitch](instructions/SVM_GATHER4_SCALED.md)

- [SVM_SCATTER4_SCALED - SVM scatter4 with scaling pitch](instructions/SVM_SCATTER4_SCALED.md)

- [SVM_SCATTER - SMV scatter](instructions/SVM_SCATTER.md)
- [SVM_GATHER - SMV gather](instructions/SVM_GATHER.md)

- [SVM_BLOCK_ST - SMV Block Store](instructions/SVM_BLOCK_ST.md)


Synchronization Instructions
Expand Down
2 changes: 1 addition & 1 deletion documentation/visa/7_appendix_debug_information.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ Debug information header format is as follows:
VarLiveIntervalsGenISA retAddr;
uw numCalleeSaveEntries;
PhyRegSaveInfoPerIP calleeSaveEntry[numCalleeSaveEntries];
uw numCallerSaveEntries;
ud numCallerSaveEntries;
PhyRegSaveInfoPerIP callerSaveEntry[numCallerSaveEntries];
}
```
Expand Down
4 changes: 3 additions & 1 deletion documentation/visa/appendix_instruction_by_platform.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ SPDX-License-Identifier: MIT
|IFCALL | Y | Y | Y | Y | Y |
|INFO | Y | Y | Y | Y | Y |
|INV | Y | Y | Y | Y | Y |
|INVM | | Y | | | |
|JMP | Y | Y | Y | Y | Y |
|LABEL | Y | Y | Y | Y | Y |
|LIFETIME | Y | Y | Y | Y | Y |
Expand All @@ -83,7 +84,7 @@ SPDX-License-Identifier: MIT
|OWORD_LD | Y | Y | Y | Y | Y |
|OWORD_LD_UNALIGNED | Y | Y | Y | Y | Y |
|OWORD_ST | Y | Y | Y | Y | Y |
|PLN | Y | Y | Y | Y | |
|PLN | Y | Y | Y | | |
|POW | Y | Y | Y | Y | Y |
|QW_GATHER | | | | | |
|QW_SCATTER | | | | | |
Expand All @@ -97,6 +98,7 @@ SPDX-License-Identifier: MIT
|ROL | | | | Y | Y |
|ROR | | | | Y | Y |
|RSQRT | Y | Y | Y | Y | Y |
|RSQTM | | Y | | | |
|RT_WRITE | Y | Y | Y | Y | Y |
|SAD2 | Y | Y | Y | | |
|SAD2ADD | Y | Y | Y | | |
Expand Down
5 changes: 3 additions & 2 deletions documentation/visa/instructions/ADDR_ADD.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,9 @@ ADDR_ADD (<exec_size>) <dst> <src0> <src1>


```
If src0 is a general operand, the byte address of the general variable is taken and the row and column offset are added to it to produce the address value. In this scenario Src0's region must be <0;1,0>, implying that all channels receive the same value for Src0. If src0 is a state operand, the byte address of the state variable (one of surface/sampler) is taken and the offset is then added to it to produce the address value. The result value in Dst must point to the same variable as Src0 (i.e., if src0 points to an element in v1, src0 + src1 must point to another element in v1). Predication is not supported for this instruction. Src0 must neither be a pre-defined variable nor a pre-defined surface, except for V13(%arg) and V14(%retval).
If src0 is a general operand, the byte address of the general variable is taken and the row and column offset are added to it to produce the address value. In this scenario Src0's region must be <0;1,0>, implying that all channels receive the same value for Src0. If src0 is a state operand, the byte address of the state variable (one of surface/sampler) is taken and the offset is then added to it to produce the address value. The resulting address in Dst may be arbitrary value, but the behavior of accessing an out-of-range or unaligned address is undefined (i.e., if src0 points to an element in v1, src0 + src1 may go out of bound for v1, but using such address in an indirect operand is undefined). As far as the finalizer is concerned, the ADDR_ADD instruction is just adding two integers representing GRF byte offsets.
Predication is not supported for this instruction. Src0 must neither be a pre-defined variable nor a pre-defined surface, except for V13(%arg) and V14(%retval).
It is up to the front-end compiler to ensure that resulting address has the right alignment before it is used in an indirect operand. As far as the finalizer is concerned it is just adding two integers representing GRF byte offsets.
```

4 changes: 4 additions & 0 deletions documentation/visa/instructions/COS.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ SPDX-License-Identifier: MIT
- **Source Modifier:** arithmetic


#### Operand type maps
- **Type map**
- **Dst types:** F, HF
- **Src types:** F, HF


## Text
Expand Down
15 changes: 13 additions & 2 deletions documentation/visa/instructions/DIV.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,21 @@ SPDX-License-Identifier: MIT


#### Properties
- **Supported Types:** B,D,F,HF,UB,UD,UW,W
- **Supported Types:** B,D,DF,F,HF,UB,UD,UW,W
- **Saturation:** Only when type is float
- **Source Modifier:** arithmetic


#### Operand type maps
- **Type map**
- **Dst types:** DF
- **Src types:** DF
- **Type map**
- **Dst types:** F, HF
- **Src types:** F, HF
- **Type map**
- **Dst types:** UD, D, UW, W, UB, B
- **Src types:** UD, D, UW, W, UB, B


## Text
Expand All @@ -100,7 +110,7 @@ SPDX-License-Identifier: MIT




```
Integer divide with signed inputs follow the rules below for the signs of the quotient and remainder.
Expand All @@ -116,4 +126,5 @@ Integer divide with signed inputs follow the rules below for the signs of the qu
+------------+------------------+-----+-----+-----+-----+
Floating point divide (x/y) is implemented as x * INV(y). DIVM provides the IEEE-conforming correctly rounded results.
```

Loading

0 comments on commit 1544985

Please sign in to comment.