-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in tuples getting copied to functions that only read from them. #15277
Comments
It is possible that this has nothing to do with tuples but instead has something to do with inlining or not inlining... |
Some updates: Using @noinline function get_idx_tuple(n::NTuple, i::Int)
return 1
end shows the same behavior so it is not the actual indexing into the tuple that is the problem. Looking at the LLVM shows a big difference. After codegen rewrite: julia> @code_llvm bench(tuple)
define double @julia_bench_22447([10 x double]*) #0 {
top:
%1 = alloca [10 x double], align 8
%2 = call i64 inttoptr (i64 140084164784448 to i64 ()*)()
<A BUNCH OF ALLOCS>
br label %if
L.loopexit.loopexit: ; preds = %if5
br label %L.loopexit
L.loopexit: ; preds = %L.loopexit.loopexit, %if
%15 = add i64 %"#s8.012", 1
%16 = icmp eq i64 %"#s8.012", %3
br i1 %16, label %L4.loopexit, label %if
L4.loopexit: ; preds = %L.loopexit
br label %L4
L4: ; preds = %L4.loopexit, %top
%17 = call i64 inttoptr (i64 140084164784448 to i64 ()*)()
%18 = sub i64 %17, %2
%19 = uitofp i64 %18 to double
%20 = fdiv double %19, 1.000000e+09
ret double %20
if: ; preds = %if.lr.ph, %L.loopexit
%"#s8.012" = phi i64 [ 1, %if.lr.ph ], [ %15, %L.loopexit ]
%21 = load i64, i64* inttoptr (i64 140075470594608 to i64*), align 16
%22 = icmp eq i64 %21, 0
br i1 %22, label %L.loopexit, label %if5.preheader
if5.preheader: ; preds = %if
br label %if5
if5: ; preds = %if5.preheader, %if5
%"#s7.010" = phi i64 [ %23, %if5 ], [ 1, %if5.preheader ]
%23 = add i64 %"#s7.010", 1
%24 = load [10 x double], [10 x double]* %0, align 8
%25 = extractvalue [10 x double] %24, 0
store double %25, double* %5, align 8
%26 = extractvalue [10 x double] %24, 1
store double %26, double* %6, align 8
<A BUNCH OF STORES>
%35 = call i64 @julia_get_idx_tuple_22448([10 x double]* nonnull %1, i64 %"#s7.010") #0
%36 = icmp eq i64 %"#s7.010", %21
br i1 %36, label %L.loopexit.loopexit, label %if5
} Before: julia> @code_llvm bench(tuple)
define double @julia_bench_21225([75 x double]*) {
top:
%tuple = alloca [75 x double], align 8
%1 = load [75 x double]* %0, align 8
store [75 x double] %1, [75 x double]* %tuple, align 8
%2 = call i64 inttoptr (i64 139864792730528 to i64 ()*)()
%3 = call i64 @julia_power_by_squaring3285(i64 10, i64 5)
%4 = icmp sgt i64 %3, 0
%5 = select i1 %4, i64 %3, i64 0
%6 = icmp eq i64 %5, 0
br i1 %6, label %L7, label %L
L: ; preds = %L5, %top
%"#s4.0" = phi i64 [ %10, %L5 ], [ 1, %top ]
br label %L2
L2: ; preds = %L2, %L
%"#s3.0" = phi i64 [ 1, %L ], [ %7, %L2 ]
%7 = add i64 %"#s3.0", 1
%8 = call i64 @julia_get_idx_tuple_21214([75 x double]* %tuple, i64 %"#s3.0")
%9 = icmp eq i64 %"#s3.0", 75
br i1 %9, label %L5, label %L2
L5: ; preds = %L2
%10 = add i64 %"#s4.0", 1
%11 = icmp eq i64 %"#s4.0", %5
br i1 %11, label %L7, label %L
L7: ; preds = %L5, %top
%12 = call i64 inttoptr (i64 139864792730528 to i64 ()*)()
%13 = sub i64 %12, %2
%14 = uitofp i64 %13 to double
%15 = fdiv double %14, 1.000000e+09
ret double %15
} I see now that this might just something similar to #13305 since it seems the tuple get copied when it is passed to the |
There is a performance regression in what I
believe is tuple indexing(now I believe it is function call overhead)now I believe it is tuples sometimes getting copied into functions that previously got passed by reference, from the giant codegen rewrite PR.Test script (note the
@noinline
):This typically looks like this for different sizes of the tuple:
Full bisect log: https://gist.github.com/KristofferC/1ba13fe374897e4f7f98
Relevant:
cc @vtjnash
The text was updated successfully, but these errors were encountered: