[TKW] Rework vector mask generation #172

Hardcode84 · 2024-09-27T14:24:05Z

Instead of generating individual element comparisons and doing vector.insertelement generate the whole mask using vector ops.

Add support for vector codegen when generating MLIR IR from sympy expressions. Add method IndexingContext.iota to generate special symbols which map to (1,2 ... n-1) vec expressions. gen_sympy_index will start to generate vector ops when encountering such symbols, inserting proper splat's between scalar vals when necessary.

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

harsh-nod

lgtm, modulo some minor comments

harsh-nod · 2024-09-28T00:27:56Z

lit_tests/kernel/wave/codegen.py

-        # CHECK-SAME:         memref<1x3xf16, strided<[3, 1], offset: ?>>
-        # CHECK:            vector.maskedstore %[[D27]][%[[D5]], %[[D8]]], %[[D25]], %[[D26]] :
-        # CHECK-SAME:         memref<1x3xf16, strided<[3, 1], offset: ?>>, vector<4xi1>, vector<4xf16>
+        # CHECK-DAG:        %[[CST:.*]] = arith.constant dense<0.000000e+00> : vector<4xf16>


harsh-nod · 2024-09-28T00:34:21Z

shark_turbine/kernel/wave/utils.py

+) -> Optional[list[IndexExpr]]:
+    bounds = []
+    for constraint in constraints:
+        if not isinstance(constraint, (WorkgroupConstraint, TilingConstraint)):


Why do we ignore the WaveConstraints here? Does that mean that we assuming that the workgroup tile size is a multiple of the wave tile size?

Yeah, I implicitly assumed WG size is divisible by wave size, do we have any potential examples when it's false?

Good question. I was just thinking in terms of generality (so workgroup tile size = 27 and wave tile size = 17) but maybe its not such a common use case. We can ignore for now.

As I am reading through some of the history, allow me to share a thought.
It is generally good practice to be as exhaustive and general as possible and failing loudly.
This way, in the future we can come back and immediately understand the problem and that it is NYI.
In the the current form, it seems to me the case proposed by Harsh would silently pass but generate wrong code?

harsh-nod · 2024-09-28T00:35:36Z

shark_turbine/kernel/wave/codegen.py


-        pos = arith_d.ConstantOp(IndexType.get(), i)
-        mask = vector_d.insertelement(cond, mask, position=pos)
+    mask_expr = functools.reduce(


harsh-nod · 2024-09-28T00:36:54Z

shark_turbine/kernel/wave/codegen.py

@@ -171,6 +172,32 @@ def get_type_or_element_type(operand_type: IrType):
 def gen_sympy_index(emitter: WaveEmitter, expr: sympy.Expr) -> OpResult:
    stack: list[OpResult] = []

+    def _broadcast(a, b):


Seems like you can refactor this to avoid the duplication for a and b.

harsh-nod · 2024-09-28T00:38:40Z

shark_turbine/kernel/wave/codegen.py

+    mask_expr = functools.reduce(
+        lambda a, b: sympy.And(a, b), (new_index[dim] < dim for dim in bounds)
+    )
+    mask = gen_sympy_index(emitter, mask_expr)


No action needed, but just putting this down for the record that at some point we should evaluate generating these masks prior to codegen and evaluate the performance impact.

Instead of generating individual element comparisons and doing `vector.insertelement` generate the whole mask using vector ops. Add support for vector codegen when generating MLIR IR from sympy expressions. Add method `IndexingContext.iota` to generate special symbols which map to `(1,2 ... n-1)` vec expressions. `gen_sympy_index` will start to generate vector ops when encountering such symbols, inserting proper `splat`'s between scalar vals when necessary. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Instead of generating individual element comparisons and doing `vector.insertelement` generate the whole mask using vector ops. Add support for vector codegen when generating MLIR IR from sympy expressions. Add method `IndexingContext.iota` to generate special symbols which map to `(1,2 ... n-1)` vec expressions. `gen_sympy_index` will start to generate vector ops when encountering such symbols, inserting proper `splat`'s between scalar vals when necessary. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Signed-off-by: Ian <ian.nordeng@amd.com>

Instead of generating individual element comparisons and doing `vector.insertelement` generate the whole mask using vector ops. Add support for vector codegen when generating MLIR IR from sympy expressions. Add method `IndexingContext.iota` to generate special symbols which map to `(1,2 ... n-1)` vec expressions. `gen_sympy_index` will start to generate vector ops when encountering such symbols, inserting proper `splat`'s between scalar vals when necessary. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Hardcode84 added 4 commits September 27, 2024 15:39

vector mask

ab23fa5

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

fix and

6ffb063

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

add splat

89b6869

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

fix lit test

ae846e3

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Hardcode84 requested review from harsh-nod and raikonenfnu September 27, 2024 14:24

Hardcode84 added 2 commits September 27, 2024 16:34

refac

183ce52

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

fix

2e26581

Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

Hardcode84 marked this pull request as ready for review September 27, 2024 14:55

harsh-nod approved these changes Sep 28, 2024

View reviewed changes

Hardcode84 merged commit 92ad900 into iree-org:main Sep 30, 2024
8 checks passed

Hardcode84 deleted the masking-vectorize branch September 30, 2024 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TKW] Rework vector mask generation #172

[TKW] Rework vector mask generation #172

Hardcode84 commented Sep 27, 2024

harsh-nod left a comment

harsh-nod Sep 28, 2024

harsh-nod Sep 28, 2024

Hardcode84 Sep 28, 2024

harsh-nod Sep 28, 2024

nicolasvasilache Jan 28, 2025

harsh-nod Sep 28, 2024

harsh-nod Sep 28, 2024

harsh-nod Sep 28, 2024

[TKW] Rework vector mask generation #172

[TKW] Rework vector mask generation #172

Conversation

Hardcode84 commented Sep 27, 2024

harsh-nod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment