-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[flang][cuda] Make sure stream is a i64 reference #157957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-flang-fir-hlfir Author: Valentin Clement (バレンタイン クレメン) (clementval) ChangesWhen the stream is a scalar constant, it is lowered as i32. Stream needs to be i64 to pass the verifier. Detect and update the stream reference when it is i32. Full diff: https://github.com/llvm/llvm-project/pull/157957.diff 2 Files Affected:
diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index 3951401ebed37..fbbc8b1fa83cb 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -639,9 +639,18 @@ Fortran::lower::genCallOpAndResult(
caller.getCallDescription().chevrons()[2], stmtCtx)));
mlir::Value stream; // stream is optional.
- if (caller.getCallDescription().chevrons().size() > 3)
+ if (caller.getCallDescription().chevrons().size() > 3) {
stream = fir::getBase(converter.genExprAddr(
caller.getCallDescription().chevrons()[3], stmtCtx));
+ if (!fir::unwrapRefType(stream.getType()).isInteger(64)) {
+ auto i64Ty = mlir::IntegerType::get(builder.getContext(), 64);
+ mlir::Value newStream = fir::AllocaOp::create(builder, loc, i64Ty);
+ mlir::Value load = fir::LoadOp::create(builder, loc, stream);
+ mlir::Value conv = fir::ConvertOp::create(builder, loc, i64Ty, load);
+ fir::StoreOp::create(builder, loc, conv, newStream);
+ stream = newStream;
+ }
+ }
cuf::KernelLaunchOp::create(builder, loc, funcType.getResults(),
funcSymbolAttr, grid_x, grid_y, grid_z, block_x,
diff --git a/flang/test/Lower/CUDA/cuda-stream.cuf b/flang/test/Lower/CUDA/cuda-stream.cuf
new file mode 100644
index 0000000000000..a58ab4ed4235a
--- /dev/null
+++ b/flang/test/Lower/CUDA/cuda-stream.cuf
@@ -0,0 +1,15 @@
+! RUN: bbc -emit-hlfir -fcuda %s -o - | FileCheck %s
+
+attributes(global) subroutine sharedmem()
+ real, shared :: s(*)
+ integer :: t
+ t = threadIdx%x
+ s(t) = t
+end subroutine
+
+program test
+ call sharedmem<<<1, 1, 1024, 0>>>()
+end
+
+! CHECK-LABEL: func.func @_QQmain()
+! CHECK: cuf.kernel_launch @_QPsharedmem<<<%c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1024{{.*}}, %{{.*}} : !fir.ref<i64>>>>()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why the stream operand is made a reference while all the other operands are just values. Maybe it will be more consitent to make it an optional AnyInteger operand, then you do not have to do anything special in lowering, and instead hide all the details of the kernel launch inside the cuf.kernel_launch conversion.
Just a thought. I am not suggesting changing it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stream has to be a i64 reference because it can be written to as well with different cuda API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it can be written by cuf.kernel_launch operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CUDA Fortran stream is represented as a i64 integer. There is a special kind for it cuda_stream_kind. It can be written to by other API such as cudaStreamCreate not directly by cuf.kernel_launch. So for consistency I think it is better to keep the restriction here. It is quite rare to pass the stream as a constant like in this test. Usually the user has a local variable and use the proper API to create the stream and pass it to the launch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. To me the consistency of the dialect operation is more important than following to the letter the source APIs, so I would prefer just loading the value of stream before cuf.kernel_launch. But this is just my preference :)
Thanks for the change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point and I think I like it also to have the op taking an integer value instead of a reference. I would need to change the kernel launch API as well since it is currently taking a reference.
I'm gonna make this change in a follow up patch.
eb22736 to
88c2832
Compare
When the stream is a scalar constant, it is lowered as i32. Stream needs to be i64 to pass the verifier. Detect and update the stream reference when it is i32.