Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][nvvm] Introduce elect.sync Op #68323

Merged
merged 2 commits into from
Oct 9, 2023
Merged

[mlir][nvvm] Introduce elect.sync Op #68323

merged 2 commits into from
Oct 9, 2023

Conversation

grypp
Copy link
Member

@grypp grypp commented Oct 5, 2023

@llvmbot
Copy link
Member

llvmbot commented Oct 5, 2023

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Changes

The Op selects a leader thread from a set of threads.

See for more information:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync


Full diff: https://github.com/llvm/llvm-project/pull/68323.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+20)
  • (modified) mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp (+2)
  • (modified) mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir (+14)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 0d4d734edd2b69b..ecfc5dd2e8e1d08 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -574,6 +574,26 @@ def NVVM_SyncWarpOp :
 }
 
 
+def NVVM_ElectSyncOp : NVVM_Op<"elect.sync", 
+                  [DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]>
+{  
+  let results = (outs I1:$pred);
+  let assemblyFormat = "attr-dict `->` type(results)";  
+  let extraClassDefinition = [{        
+    std::string $cppClass::getPtx() { 
+      return std::string(
+        "{                                  \n"
+        ".reg .u32 rx;                      \n"
+        ".reg .pred px;                     \n"
+        " mov.u32 %0, 0;                    \n"
+        "    elect.sync rx | px, 0xFFFFFFFF;\n"
+        "@px mov.u32 %0, 1;                 \n"
+        "}\n"
+      ); 
+    }
+  }];
+}
+
 def LoadCacheModifierCA : I32EnumAttrCase<"CA", 0, "ca">;
 def LoadCacheModifierCG : I32EnumAttrCase<"CG", 1, "cg">;
 def LoadCacheModifierCS : I32EnumAttrCase<"CS", 2, "cs">;
diff --git a/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp b/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
index 2d7a441e950045c..15703fb99339e25 100644
--- a/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
+++ b/mlir/lib/Conversion/NVVMToLLVM/NVVMToLLVM.cpp
@@ -63,6 +63,8 @@ class PtxBuilder {
 
   // https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#constraints
   char getRegisterType(Type type) {
+    if (type.isInteger(1))
+      return 'b';
     if (type.isInteger(16))
       return 'h';
     if (type.isInteger(32))
diff --git a/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir b/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
index 7ffe1ad2bb2b111..bf10ddbb4016af4 100644
--- a/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
+++ b/mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
@@ -363,3 +363,17 @@ func.func @wgmma_f32_e5m2_e4m3(%descA : i64, %descB : i64) -> !mat32f32 {
       : !mat32f32 -> !mat32f32
   return %result2 : !mat32f32
 }
+
+// -----
+
+func.func @elect_one_leader_sync() {  
+  // CHECK: llvm.inline_asm has_side_effects asm_dialect = att "{
+  // CHECK-SAME: .reg .u32 rx;
+  // CHECK-SAME: .reg .pred px;
+  // CHECK-SAME: mov.u32 $0, 0;
+  // CHECK-SAME: elect.sync rx | px, 0xFFFFFFFF;
+  // CHECK-SAME: @px mov.u32 $0, 1;
+  // CHECK-SAME: "=b"  : () -> i1
+  %cnd = nvvm.elect.sync -> i1 
+  return 
+}

@grypp grypp merged commit 6e8ffab into llvm:main Oct 9, 2023
@grypp grypp deleted the nvvm-elect branch October 9, 2023 17:21
".reg .u32 rx; \n"
".reg .pred px; \n"
" mov.u32 %0, 0; \n"
" elect.sync rx | px, 0xFFFFFFFF;\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction is only available on sm_90/PTX8.0. Does MLIR have any constraints based on which GPU model we're compiling for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point and thanks for the review.

NVVM dialect doesn't know SM arch. You can think that NVVM is similar to PTX, you can compile it but you should make sure that you have the SM arch to run.
A high-level dialect knows the SM arch and it's responsible of generating the right NVVM Ops. For example, a transformation from vector dialect -> nvvm should know the SM target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. So it would be similar to using LLVM intrinsics that may or may not be available for a particular target.

Speaking of intrinsics, it may make sense to make elect.sync an LLVM intrinsic. We'll eventually need it there in any case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would be similar to using LLVM intrinsics that may or may not be available for a particular target.

Exactly, it is similar to using LLVM intrinsic. NVVM was designed to map 1to1 to LLVM intrinsic.

FWIW, generating PTX like I do here is relatively new in MLIR. We noticed that LLVM doesn't support all of the PTX (also predicates). Therefore, I've recently implemented the BasicPtxBuilderOpInterface that generates inline assembly with the given PTX. This is what I use here.

Speaking of intrinsics, it may make sense to make elect.sync an LLVM intrinsic. We'll eventually need it there in any case.

Yes it does makes sense. We would like to generate LLVM intrinsic whenever it is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants