[mlir][nvvm] Introduce `elect.sync` Op #68323

grypp · 2023-10-05T15:28:42Z

The Op selects a leader thread from a set of threads.

See for more information:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync

The Op selects a leader thread from a set of threads. See for more information: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync

llvmbot · 2023-10-05T15:29:51Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Changes

The Op selects a leader thread from a set of threads.

See for more information:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync

Full diff: https://github.com/llvm/llvm-project/pull/68323.diff

3 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+20)
(modified) mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp (+2)
(modified) mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir (+14)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 0d4d734edd2b69b..ecfc5dd2e8e1d08 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -574,6 +574,26 @@ def NVVM_SyncWarpOp :
 }
 
 
+def NVVM_ElectSyncOp : NVVM_Op<"elect.sync", 
+                  [DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]>
+{  
+  let results = (outs I1:$pred);
+  let assemblyFormat = "attr-dict `->` type(results)";  
+  let extraClassDefinition = [{        
+    std::string $cppClass::getPtx() { 
+      return std::string(
+        "{                                  \n"
+        ".reg .u32 rx;                      \n"
+        ".reg .pred px;                     \n"
+        " mov.u32 %0, 0;                    \n"
+        "    elect.sync rx | px, 0xFFFFFFFF;\n"
+        "@px mov.u32 %0, 1;                 \n"
+        "}\n"
+      ); 
+    }
+  }];
+}
+
 def LoadCacheModifierCA : I32EnumAttrCase<"CA", 0, "ca">;
 def LoadCacheModifierCG : I32EnumAttrCase<"CG", 1, "cg">;
 def LoadCacheModifierCS : I32EnumAttrCase<"CS", 2, "cs">;
diff --git a/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp b/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
index 2d7a441e950045c..15703fb99339e25 100644
--- a/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
+++ b/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
@@ -63,6 +63,8 @@ class PtxBuilder {
 
   // https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#constraints
   char getRegisterType(Type type) {
+    if (type.isInteger(1))
+      return 'b';
     if (type.isInteger(16))
       return 'h';
     if (type.isInteger(32))
diff --git a/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir b/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
index 7ffe1ad2bb2b111..bf10ddbb4016af4 100644
--- a/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
+++ b/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
@@ -363,3 +363,17 @@ func.func @wgmma_f32_e5m2_e4m3(%descA : i64, %descB : i64) -> !mat32f32 {
       : !mat32f32 -> !mat32f32
   return %result2 : !mat32f32
 }
+
+// -----
+
+func.func @elect_one_leader_sync() {  
+  // CHECK: llvm.inline_asm has_side_effects asm_dialect = att "{
+  // CHECK-SAME: .reg .u32 rx;
+  // CHECK-SAME: .reg .pred px;
+  // CHECK-SAME: mov.u32 $0, 0;
+  // CHECK-SAME: elect.sync rx | px, 0xFFFFFFFF;
+  // CHECK-SAME: @px mov.u32 $0, 1;
+  // CHECK-SAME: "=b"  : () -> i1
+  %cnd = nvvm.elect.sync -> i1 
+  return 
+}

Artem-B · 2023-10-10T18:35:19Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

+        ".reg .u32 rx;                      \n"
+        ".reg .pred px;                     \n"
+        " mov.u32 %0, 0;                    \n"
+        "    elect.sync rx | px, 0xFFFFFFFF;\n"


The instruction is only available on sm_90/PTX8.0. Does MLIR have any constraints based on which GPU model we're compiling for?

It is a good point and thanks for the review.

NVVM dialect doesn't know SM arch. You can think that NVVM is similar to PTX, you can compile it but you should make sure that you have the SM arch to run.
A high-level dialect knows the SM arch and it's responsible of generating the right NVVM Ops. For example, a transformation from vector dialect -> nvvm should know the SM target.

Thank you for the explanation. So it would be similar to using LLVM intrinsics that may or may not be available for a particular target.

Speaking of intrinsics, it may make sense to make elect.sync an LLVM intrinsic. We'll eventually need it there in any case.

So it would be similar to using LLVM intrinsics that may or may not be available for a particular target.

Exactly, it is similar to using LLVM intrinsic. NVVM was designed to map 1to1 to LLVM intrinsic.

FWIW, generating PTX like I do here is relatively new in MLIR. We noticed that LLVM doesn't support all of the PTX (also predicates). Therefore, I've recently implemented the BasicPtxBuilderOpInterface that generates inline assembly with the given PTX. This is what I use here.

Speaking of intrinsics, it may make sense to make elect.sync an LLVM intrinsic. We'll eventually need it there in any case.

Yes it does makes sense. We would like to generate LLVM intrinsic whenever it is possible.

[mlir][nvvm] Introduce elect.sync Op

f25de29

The Op selects a leader thread from a set of threads. See for more information: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync

grypp requested review from manishucsd, nicolasvasilache and qcolombet October 5, 2023 15:28

add nl

f160a93

llvmbot added mlir:llvm mlir labels Oct 5, 2023

grypp merged commit 6e8ffab into llvm:main Oct 9, 2023

grypp deleted the nvvm-elect branch October 9, 2023 17:21

stepthomas mentioned this pull request Oct 10, 2023

AMDGPU stepthomas atomic csub no rtn forms ver2 stepthomas/llvm-project#1

Closed

Artem-B reviewed Oct 10, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir][nvvm] Introduce `elect.sync` Op #68323

[mlir][nvvm] Introduce `elect.sync` Op #68323

grypp commented Oct 5, 2023

llvmbot commented Oct 5, 2023 •

edited

Loading

Artem-B Oct 10, 2023

grypp Oct 10, 2023

Artem-B Oct 10, 2023

grypp Oct 10, 2023

[mlir][nvvm] Introduce elect.sync Op #68323

[mlir][nvvm] Introduce elect.sync Op #68323

Conversation

grypp commented Oct 5, 2023

llvmbot commented Oct 5, 2023 • edited Loading

Artem-B Oct 10, 2023

Choose a reason for hiding this comment

grypp Oct 10, 2023

Choose a reason for hiding this comment

Artem-B Oct 10, 2023

Choose a reason for hiding this comment

grypp Oct 10, 2023

Choose a reason for hiding this comment

[mlir][nvvm] Introduce `elect.sync` Op #68323

[mlir][nvvm] Introduce `elect.sync` Op #68323

llvmbot commented Oct 5, 2023 •

edited

Loading