-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for @cuda fastmath #2030
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #2030 +/- ##
==========================================
- Coverage 61.04% 60.92% -0.12%
==========================================
Files 152 152
Lines 13291 13297 +6
==========================================
- Hits 8113 8101 -12
- Misses 5178 5196 +18
☔ View full report in Codecov by Sentry. |
I've added the test here, Zentrik@e87fa74. I don't think I can just commit to this pr. |
Thanks. Guess that test doesn't work on CUDA 11.0, where we generate:
|
I haven't been able to figure out why we don't generate a |
The reason is define float @__nv_sqrtf(float %x) #0 {
%1 = call float @llvm.nvvm.sqrt.f(float %x)
ret float %1
} ... vs CUDA 11.1: define float @__nv_sqrtf(float %x) #0 {
%1 = call i32 @__nvvm_reflect(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i32 0, i32 0)) #6
%2 = icmp ne i32 %1, 0
br i1 %2, label %3, label %10
3: ; preds = %0
%4 = call i32 @__nvvm_reflect(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @.str.2, i32 0, i32 0)) #6
%5 = icmp ne i32 %4, 0
br i1 %5, label %6, label %8
6: ; preds = %3
%7 = call float @llvm.nvvm.sqrt.rn.ftz.f(float %x) #6
br label %__nvvm_sqrt_f.exit
8: ; preds = %3
%9 = call float @llvm.nvvm.sqrt.approx.ftz.f(float %x) #6
br label %__nvvm_sqrt_f.exit
10: ; preds = %0
%11 = call i32 @__nvvm_reflect(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @.str.2, i32 0, i32 0)) #6
%12 = icmp ne i32 %11, 0
br i1 %12, label %13, label %15
13: ; preds = %10
%14 = call float @llvm.nvvm.sqrt.rn.f(float %x) #6
br label %__nvvm_sqrt_f.exit
15: ; preds = %10
%16 = call float @llvm.nvvm.sqrt.approx.f(float %x) #6
br label %__nvvm_sqrt_f.exit
__nvvm_sqrt_f.exit: ; preds = %6, %8, %13, %15
%.0 = phi float [ %7, %6 ], [ %9, %8 ], [ %14, %13 ], [ %16, %15 ]
ret float %.0
} So yeah, this is expected. |
Fixes JuliaGPU/GPUCompiler.jl#491, ref JuliaGPU/GPUCompiler.jl#492
@Zentrik Can you add a test?