Skip to content

Commit

Permalink
[NFC] Another attempt to fix naming of AIE dialect documentation (#1754)
Browse files Browse the repository at this point in the history
  • Loading branch information
keryell authored Sep 7, 2024
1 parent 8d9548d commit 31b8618
Show file tree
Hide file tree
Showing 6 changed files with 61 additions and 57 deletions.
2 changes: 1 addition & 1 deletion docs/_layouts/default.html
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<li class="listedpages"><a style="color:white;" href="index.html">Overview</a></li>
<li class="listedpages"><a style="color:white;" href="Building.html">Getting Started</a></li>
<li><label for="btn-4 " class="show ">References </label></li>
<li class="listedpages"><a style="color:white;" href="AIE.html">AIE Dialect</a></li>
<li class="listedpages"><a style="color:white;" href="AIEDialect.html">AIE Dialect</a></li>
<li class="listedpages"><a style="color:white;" href="AIEPasses.html">AIE Passes</a></li>
<li class="listedpages"><a style="color:white;" href="AIEXDialect.html">AIEX Dialect</a></li>
<li class="listedpages"><a style="color:white;" href="AIEXPasses.html">AIEX Passes</a></li>
Expand Down
2 changes: 2 additions & 0 deletions include/aie/Dialect/AIE/IR/AIEDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
//
//===----------------------------------------------------------------------===//

// A simplified version of AIEDialect.h used to generate the on-line
// documentation
include "aie/Dialect/AIE/IR/AIE.td"
include "aie/Dialect/AIE/IR/AIEAttrs.td"
include "aie/Dialect/AIE/IR/AIEInterfaces.td"
Expand Down
4 changes: 3 additions & 1 deletion include/aie/Dialect/AIE/IR/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
# (c) Copyright 2021 Xilinx Inc.

add_mlir_dialect(AIE aie)
add_mlir_doc(AIEDocs AIE ./ -gen-dialect-doc)
# Use a simplified version AIEDocs.td to generate the AIEDialect.md ending up as
# https://xilinx.github.io/mlir-aie/AIEDialect.html
add_mlir_doc(AIEDocs AIEDialect ./ -gen-dialect-doc)

# Add AIE interfaces
set(LLVM_TARGET_DEFINITIONS AIEInterfaces.td)
Expand Down
38 changes: 19 additions & 19 deletions include/aie/Dialect/AIEX/IR/AIEX.td
Original file line number Diff line number Diff line change
Expand Up @@ -505,11 +505,11 @@ def AIE_NpuDmaMemcpyNdOp: AIEX_Op<"npu.dma_memcpy_nd", [

#### `metadata` -- Specifying Tile, Channel, Direction and Linking a `dma_memcpy_nd` to its Other Half

The `metadata` attribute must point to a symbol referencing a
[`aie.shim_dma_allocation` operation](AIE.html#aiedma_bd-xilinxaiedmabdop).
The `metadata` attribute must point to a symbol referencing a
[`aie.shim_dma_allocation` operation](AIEDialect.html#aiedma_bd-xilinxaiedmabdop).
The tile coordinates of the DMA to configure, the channel number and the direction (`MM2S` or `S2MM`) are taken from this operation.

To connect the DMA to its other half (i.e. a `MM2S` DMA to its receiving end and a `S2MM` to the sending end),
To connect the DMA to its other half (i.e. a `MM2S` DMA to its receiving end and a `S2MM` to the sending end),
the user must configure a flow (`aie.flow`) between the tile and channel referenced in the `aie.shim_dma_allocation` and the corresponding other end.

When using ObjectFIFOs, the `aie.shim_dma_allocation` operations and the `aie.flows` are generated automatically.
Expand All @@ -520,13 +520,13 @@ def AIE_NpuDmaMemcpyNdOp: AIEX_Op<"npu.dma_memcpy_nd", [
When the `dma_memcpy_nd` operation executes, it immediately reprograms the buffer descriptor with ID `bd_id` on tile (`x`, `y`), even if that buffer descriptor is currently executing.
Without proper synchronization, this inevitably leads to nondeterministic results.

Programming a buffer descriptor that is not currently executing is harmless.
Programming a buffer descriptor that is not currently executing is harmless.
Thus, the first `dma_memcpy_nd` call for each `bd_id` requires no synchronization.

However, if you wish to later re-use a `bd_id` on the same tile, you must wait for the previous buffer descriptor to complete.
However, if you wish to later re-use a `bd_id` on the same tile, you must wait for the previous buffer descriptor to complete.
The `sync` or `dma_wait` operations can be used for this.

`sync` blocks until it receives a _task completion token_ (TCT).
`sync` blocks until it receives a _task completion token_ (TCT).
To properly synchronize, you must thus configure your BD to issue a TCT using the `issue_token` attribute, then wait on that token before reusing the BD.

`dma_wait` is a convenience operation that lowers to the corresponding `sync` operation for the refrenced symbol.
Expand All @@ -537,15 +537,15 @@ def AIE_NpuDmaMemcpyNdOp: AIEX_Op<"npu.dma_memcpy_nd", [
#### Data Layout Transformations

The `sizes` and `strides` attributes describe a data layout transformation to be performed by the DMA.
These transformations are described in more depth in the documentation for the
[`aie.dma_bd` operation](AIE.html#aiedma_bd-xilinxaiedmabdop).
Note that the syntax here differs from that of the `dma_bd` operation:
These transformations are described in more depth in the documentation for the
[`aie.dma_bd` operation](AIEDialect.html#aiedma_bd-xilinxaiedmabdop).
Note that the syntax here differs from that of the `dma_bd` operation:
offsets and strides are given as separate arrays instead of tuples.

The `offsets` array is used to calculate a static offset into the memref.
Each offset in the array is understood in relation to the shape of the memref;
the lowest-dimension `offset` is a direct offset in units of memref element type, and the higher dimensions are multiplied by the size of the memref in those dimensions.
Note that this is for convenience of the user only.
Each offset in the array is understood in relation to the shape of the memref;
the lowest-dimension `offset` is a direct offset in units of memref element type, and the higher dimensions are multiplied by the size of the memref in those dimensions.
Note that this is for convenience of the user only.
The hardware only supports a single static offset, and this offset is calculated at compile time.
Thus, all offsets can be equivalently expressed with the lowest dimension only.

Expand Down Expand Up @@ -589,7 +589,7 @@ def AIE_NpuDmaMemcpyNdOp: AIEX_Op<"npu.dma_memcpy_nd", [
/* Returns the data transfer offset in bytes, i.e. the first N bytes of the
target buffer will be skipped. In the IR, offsets are expressed in units
of memref element data type size. */
int64_t getOffsetInBytes();
int64_t getOffsetInBytes();

bool isLinearTransferWithoutTransformation();
}];
Expand All @@ -608,7 +608,7 @@ def AIE_NpuDmaWaitOp: AIEX_Op<"npu.dma_wait", []> {
The NpuDmaWaitOp blocks until the DMA referenced through `symbol` completes execution
and issues a task-complete-token (TCT).

`symbol` is a reference to a `aie.shim_dma_allocation`, which contains information about the column, channel and channel direction on which to wait for a TCT.
`symbol` is a reference to a `aie.shim_dma_allocation`, which contains information about the column, channel and channel direction on which to wait for a TCT.
The `aie.shim_dma_allocation` may be generated from an ObjectFIFO, in which case you can directly pass the ObjectFIFO symbol refrence.
`npu.dma_wait` will be lowered to the corresponding `npu.sync` operation using the information from `symbol`.

Expand Down Expand Up @@ -644,7 +644,7 @@ def AIE_NpuWriteRTPOp: AIEX_Op<"npu.rtp_write", []> {
I32Attr:$value
);
let results = (outs );
let assemblyFormat = [{ `(` $buffer `,` $index `,` $value `)` attr-dict
let assemblyFormat = [{ `(` $buffer `,` $index `,` $value `)` attr-dict
}];
let description = [{
rtp write operator
Expand Down Expand Up @@ -1000,8 +1000,8 @@ def AIE_DMAStartBdChainOp: AIEX_Op<"dma_start_bd_chain", [HasParent<"RuntimeSequ
DefaultValuedOptionalAttr<I32Attr, "0">:$repeat_count
);

let assemblyFormat = [{
$symbol `(` $args `)` `:` `(` type($args) `)` ` ` `on` ` ` `(` $tile `,` $direction `,` $channel `)` attr-dict
let assemblyFormat = [{
$symbol `(` $args `)` `:` `(` type($args) `)` ` ` `on` ` ` `(` $tile `,` $direction `,` $channel `)` attr-dict
}];

let hasVerifier = 1;
Expand Down Expand Up @@ -1031,8 +1031,8 @@ def AIE_DMAStartBdChainForOp: AIEX_Op<"dma_start_bd_chain_for", [HasParent<"Runt
DefaultValuedOptionalAttr<I32Attr, "0">:$repeat_count
);

let assemblyFormat = [{
$symbol `(` $args `)` `:` `(` type($args) `)` ` ` `for` ` ` $alloc attr-dict
let assemblyFormat = [{
$symbol `(` $args `)` `:` `(` type($args) `)` ` ` `for` ` ` $alloc attr-dict
}];
}

Expand Down
40 changes: 20 additions & 20 deletions mlir_tutorials/tutorial-1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// Copyright (C) 2022, Advanced Micro Devices, Inc.
//
//
//===----------------------------------------------------------------------===//-->

# <ins>Tutorial 1 - Modules, tile, buffer, core, lock</ins>
In the MLIR-based AI Engine representation, every physical component of the AI Engine array including connections are declared within the top level block called a `module`. All parameters and customizations of these components are then elaborated within the `module`. We generally write the MLIR code in a file with the .mlir file extension as it integrates well with the lit based auto-test of LLVM, such as those found in `test` sub-folder. A module declaration is shown below:

```
module @module_name {
...
AI Engine array components and connections
...
AI Engine array components and connections
...
}
```
Expand All @@ -26,31 +26,31 @@ AI Engine tiles are the basic building blocks of AIE designs and can be declared
%tile24 = AIE.tile(2,4)
%tile34 = AIE.tile(3,4)
```
The two major components of an AI Engine tile are
The two major components of an AI Engine tile are

* VLIW processor core declared as `AIE.core(tileName) { body }`
* Local memory buffer declared as `AIE.buffer(tileName) : memref<depth x data_type> { body }`.
* Local memory buffer declared as `AIE.buffer(tileName) : memref<depth x data_type> { body }`.

Example declarations include:
```
AIE.core(%tile14) {
...
core body
...
core body
...
}
%buff0 = AIE.buffer(%tile14) : memref<256xi32>
%buff1 = AIE.buffer(%tile14) : memref<256xi32>
```
The association between these declarations and the physical AI Engine tile components can be seen here. For more details on mlir-aie dialect syntax, you can refer to the online reference document [here](https://xilinx.github.io/mlir-aie/AIE.html).
The association between these declarations and the physical AI Engine tile components can be seen here. For more details on mlir-aie dialect syntax, you can refer to the online reference document [here](https://xilinx.github.io/mlir-aie/AIEDialect.html).
<img src="../images/diagram1.png" width="1000">

A third key component of a tile is the `lock` which is critical for synchronizing data between tiles and one another, and between tiles and the host controller. While not a physically large component, it plays a critical role in facilitating efficient and correct data communication.

### <ins>Tile</ins>

For the tile, we simply declare its coordinates by column and row
>**NOTE:** index values start at 0, with row 0 belonging to the shim which is not a regular row. The first regular row for first generation AI engines is row index 1.
For the tile, we simply declare its coordinates by column and row
>**NOTE:** index values start at 0, with row 0 belonging to the shim which is not a regular row. The first regular row for first generation AI engines is row index 1.
Tile declaration is mainly needed so other sub components can be associated to the tile by name. Some higher level logical components may also automatically declare tiles so they are enabled (e.g. logical flows require the intermediate tiles along the path to be enabled to support stream switch routing).

>**ADF Graph NOTE:** ADF graph descriptions treat the first full row of AI Engines as row 0 and does not count the shim tiles. As such, be sure to add one to the row value when moving from ADF designs to MLIR-AIE descriptions.
Expand All @@ -63,7 +63,7 @@ The type of tiles and orientation of its associated local memory is architecture

### <ins>Buffer</ins>

When declaring a buffer, we pass in the associated AIE tile and declare the buffer parameters. Those parameters are the depth and data type width (though the local memory itself is not physically organized in this way).
When declaring a buffer, we pass in the associated AIE tile and declare the buffer parameters. Those parameters are the depth and data type width (though the local memory itself is not physically organized in this way).
> One important note about buffers is that one buffer is not strictly mapped to the entire local memory. You can declare multiple buffers that are associated with the local memory of a tile and they would, by default, be allocated sequentially in that tile's local memory.
Operators such as buffers (and some other components such as locks) also can have a symbolic name defined in the body to make it easier to refer to the component in generated host access functions. The syntax for this looks like:
Expand Down Expand Up @@ -91,11 +91,11 @@ Examples:
%lock13_11 = AIE.lock(%tile13, 11) { sym_name = "lock13_11" }
```
Each tile has 16 locks and each lock is in one of two states (acquired, released) and one of two values (0, 1).
> By default, we tend to assume (value=0 is a "write", value=1 is "read"). But there is no real definition of these values. The only key thing to remember is that the lock value starts at `val=0`, and is reset into the release `val=0` state. This means an `acquire=0` will always succeed first, while an `acquire=1` needs the lock state to be `release=1` to succeed. Once acquired, a lock can be released to the 0 or 1 state.
> By default, we tend to assume (value=0 is a "write", value=1 is "read"). But there is no real definition of these values. The only key thing to remember is that the lock value starts at `val=0`, and is reset into the release `val=0` state. This means an `acquire=0` will always succeed first, while an `acquire=1` needs the lock state to be `release=1` to succeed. Once acquired, a lock can be released to the 0 or 1 state.
The 16 locks in a tile are accessible by its same 3 cardinal neighbors that can access the tile's local memory. This is to ensure all neighbors that can access the local memory can also access the locks.
The 16 locks in a tile are accessible by its same 3 cardinal neighbors that can access the tile's local memory. This is to ensure all neighbors that can access the local memory can also access the locks.

To use the lock, we call the `use_lock` operation either inside a `core` operation or `mem`/`shim_dma` operation.
To use the lock, we call the `use_lock` operation either inside a `core` operation or `mem`/`shim_dma` operation.
```
AIE.use_lock(%lockName, "Acquire|Release", 0|1)
```
Expand All @@ -112,9 +112,9 @@ Notice the familiar design pattern of:
* a set of operations
* release lock in some value (usually the other value)

The acquire value must match the current lock state in order for the acquire to succeed. The release value can be either 0 or 1.
The acquire value must match the current lock state in order for the acquire to succeed. The release value can be either 0 or 1.

We will be introducing more components and the ways these components are customized in subsequent tutorials. Additional syntax for these MLIR-based AI Engine components can be found in the github<area>.io docs [here](https://xilinx.github.io/mlir-aie/AIE.html).
We will be introducing more components and the ways these components are customized in subsequent tutorials. Additional syntax for these MLIR-based AI Engine components can be found in the github<area>.io docs [here](https://xilinx.github.io/mlir-aie/AIEDialect.html).

## <ins>Tutorial 1 Lab</ins>

Expand All @@ -130,9 +130,9 @@ We will be introducing more components and the ways these components are customi
### <ins>MLIR-AIE Transformations</ins>
Under the hood, `make` calls `aiecc.py` which itself calls a number of utilities that are built as part of the `mlir-aie` project (`aie-translate`, `aie-opt`). More details on these utilities can be found in [tutorial-10](../tutorial-10). These utilities are built as part of the `mlir-aie` to perform IR transformations and lowerings. In this example, since we are already describing our design at a low physical level, we will perform the final transformation and produce an AI Engine program (core_1_4.elf).

The MLIR operations inside the core are then converted to an LLVM representation which the AMD internal compiler (currently `xchesscc`) takes to build the executable that will run on each individual AIE tile.
3. In [aie.mlir](aie.mlir), what is the variable name for tile(1,4)? <img src="../images/answer1.jpg" title="%tile14" height=25>
The MLIR operations inside the core are then converted to an LLVM representation which the AMD internal compiler (currently `xchesscc`) takes to build the executable that will run on each individual AIE tile.

3. In [aie.mlir](aie.mlir), what is the variable name for tile(1,4)? <img src="../images/answer1.jpg" title="%tile14" height=25>

What about the variable name and size of the buffer that is associated with the local memory of tile(1,4)? <img src="../images/answer1.jpg" title="%buf, 256 x int32" height=25>

Expand All @@ -145,7 +145,7 @@ In first generation AI Engines, each tile has 32 kB of local data memory assigne
5. Change the size of the buffer to the size of our local memory (8192 x i32) and run `make` again. What do you expect to happen and what happens instead? <img src="../images/answer1.jpg" title="You may expect to be able to define a buffer that uses the entirety of local memory. Instead, an error occurs: Allocated buffers exceed local memory. (The next paragraph explains why this happens.)" height=25>

### <ins>AI Engine Program Memory</ins>
While we have a separate 16 kB of program memory which stores the AIE program code, the 32 kB of data memory is also used for the program stack. By default, the tool reserves 1024 bytes for the stack so all buffers are then allocated immediately after that.
While we have a separate 16 kB of program memory which stores the AIE program code, the 32 kB of data memory is also used for the program stack. By default, the tool reserves 1024 bytes for the stack so all buffers are then allocated immediately after that.

6. Declare a horizontally adjacent tile (pay attention to which row we're in) so that tile (1,4) can access the neighbor tile's local memory. Declare a buffer in this tile that uses the entire local memory (8192 x i32) and replace the reference %buf in line 34 with the new buffer, then run `make ` again. What happens? <img src="../images/answer1.jpg" title="We are able to compile successfully, since we can use our neighbor's full local memory. Only tiles that are running program code have a stack space of 1024 bytes reserved." height=25>

Expand Down
Loading

0 comments on commit 31b8618

Please sign in to comment.