Skip to content

Conversation

@clementval
Copy link
Contributor

When the stream is a scalar constant, it is lowered as i32. Stream needs to be i64 to pass the verifier. Detect and update the stream reference when it is i32.

@clementval clementval requested a review from wangzpgi September 10, 2025 21:05
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Sep 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

When the stream is a scalar constant, it is lowered as i32. Stream needs to be i64 to pass the verifier. Detect and update the stream reference when it is i32.


Full diff: https://github.com/llvm/llvm-project/pull/157957.diff

2 Files Affected:

  • (modified) flang/lib/Lower/ConvertCall.cpp (+10-1)
  • (added) flang/test/Lower/CUDA/cuda-stream.cuf (+15)
diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index 3951401ebed37..fbbc8b1fa83cb 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -639,9 +639,18 @@ Fortran::lower::genCallOpAndResult(
               caller.getCallDescription().chevrons()[2], stmtCtx)));
 
     mlir::Value stream; // stream is optional.
-    if (caller.getCallDescription().chevrons().size() > 3)
+    if (caller.getCallDescription().chevrons().size() > 3) {
       stream = fir::getBase(converter.genExprAddr(
           caller.getCallDescription().chevrons()[3], stmtCtx));
+      if (!fir::unwrapRefType(stream.getType()).isInteger(64)) {
+        auto i64Ty = mlir::IntegerType::get(builder.getContext(), 64);
+        mlir::Value newStream = fir::AllocaOp::create(builder, loc, i64Ty);
+        mlir::Value load = fir::LoadOp::create(builder, loc, stream);
+        mlir::Value conv = fir::ConvertOp::create(builder, loc, i64Ty, load);
+        fir::StoreOp::create(builder, loc, conv, newStream);
+        stream = newStream;
+      }
+    }
 
     cuf::KernelLaunchOp::create(builder, loc, funcType.getResults(),
                                 funcSymbolAttr, grid_x, grid_y, grid_z, block_x,
diff --git a/flang/test/Lower/CUDA/cuda-stream.cuf b/flang/test/Lower/CUDA/cuda-stream.cuf
new file mode 100644
index 0000000000000..a58ab4ed4235a
--- /dev/null
+++ b/flang/test/Lower/CUDA/cuda-stream.cuf
@@ -0,0 +1,15 @@
+! RUN: bbc -emit-hlfir -fcuda %s -o - | FileCheck %s
+
+attributes(global) subroutine sharedmem()
+  real, shared :: s(*)
+  integer :: t
+  t = threadIdx%x
+  s(t) = t
+end subroutine
+
+program test
+  call sharedmem<<<1, 1, 1024, 0>>>()
+end
+
+! CHECK-LABEL: func.func @_QQmain()
+! CHECK: cuf.kernel_launch @_QPsharedmem<<<%c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1024{{.*}}, %{{.*}} : !fir.ref<i64>>>>()

Copy link
Contributor

@vzakhari vzakhari Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the stream operand is made a reference while all the other operands are just values. Maybe it will be more consitent to make it an optional AnyInteger operand, then you do not have to do anything special in lowering, and instead hide all the details of the kernel launch inside the cuf.kernel_launch conversion.

Just a thought. I am not suggesting changing it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream has to be a i64 reference because it can be written to as well with different cuda API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it can be written by cuf.kernel_launch operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CUDA Fortran stream is represented as a i64 integer. There is a special kind for it cuda_stream_kind. It can be written to by other API such as cudaStreamCreate not directly by cuf.kernel_launch. So for consistency I think it is better to keep the restriction here. It is quite rare to pass the stream as a constant like in this test. Usually the user has a local variable and use the proper API to create the stream and pass it to the launch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. To me the consistency of the dialect operation is more important than following to the letter the source APIs, so I would prefer just loading the value of stream before cuf.kernel_launch. But this is just my preference :)

Thanks for the change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and I think I like it also to have the op taking an integer value instead of a reference. I would need to change the kernel launch API as well since it is currently taking a reference.
I'm gonna make this change in a follow up patch.

@clementval clementval merged commit 7e38793 into llvm:main Sep 11, 2025
9 checks passed
@clementval clementval deleted the cuf_stream_i64 branch September 11, 2025 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants