Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in tuples getting copied to functions that only read from them. #15277

Closed
KristofferC opened this issue Feb 28, 2016 · 3 comments · Fixed by #15312
Closed

Comments

@KristofferC
Copy link
Member

There is a performance regression in what I believe is tuple indexing (now I believe it is function call overhead) now I believe it is tuples sometimes getting copied into functions that previously got passed by reference, from the giant codegen rewrite PR.

Test script (note the @noinline):

@noinline function get_idx_tuple(n::NTuple, i::Int)
    @inbounds v = n[i]
    return v
end


function bench{N, T}(tuple::NTuple{N, T})
    s = 0.0
    t_tup = @elapsed for n in 1:10^5
        for i in 1:N
            s += get_idx_tuple(tuple, i)
        end
    end
    return t_tup
end


tuple = (rand(75)...)

bench(tuple)

This typically looks like this for different sizes of the tuple:

tuple

Full bisect log: https://gist.github.com/KristofferC/1ba13fe374897e4f7f98

Relevant:

# possible first bad commit: [5c5d48369a1a8f9a5959699ec44cab393255d0c5] [wip codegen rewrite] fix another type information mismatch error in assignment
# possible first bad commit: [548075efe73083f6a58b97814ea558bdbc0db41d] [wip codegen rewrite] fix mixed-type intrinsic usage (such as `bswap(Float64)`)
# possible first bad commit: [322878d93879f81cdf3d6a783376ad149b10d65b] [wip codegen rewrite] fix select_value validation and missing value copy in variable assignment
# possible first bad commit: [e4dcff39221571b8fc59d25873de092edaf46768] [wip codegen rewrite] fix isVolatile marking and closure memloc allocation marking
# possible first bad commit: [8b98aa0fe4056915b69176c0eca23391a77f2f58] [wip codegen rewrite] fix marking of varinfo.value when type_is_ghost and recursive constant struct creation
# possible first bad commit: [ffc200620862e04d6e4d5b9363ea03130276deca] [wip codegen rewrite] cleanup closure env handling to occur early
# possible first bad commit: [ae1a623634aed1912b8fb12a624529f3108fe418] [wip codegen rewrite] use correct macro test for jl_is_immutable
# possible first bad commit: [fec6aa9048d2e54f3c178b379db96074f1d867be] [wip] codegen rewrite + rewrite intrinsics to accomodate

cc @vtjnash

@KristofferC
Copy link
Member Author

It is possible that this has nothing to do with tuples but instead has something to do with inlining or not inlining...

@KristofferC
Copy link
Member Author

Yes tuples are actually faster on 0.5 than 0.4 when the getindex is inlined so this is probably not a tuple problem then.

Inlined:
inlinging

@KristofferC KristofferC changed the title Performance regression in tuple indexing from codegen rewrite Performance regression in function call overhead from codegen rewrite Feb 28, 2016
@KristofferC
Copy link
Member Author

Some updates:

Using

@noinline function get_idx_tuple(n::NTuple, i::Int)
    return 1
end

shows the same behavior so it is not the actual indexing into the tuple that is the problem.

Looking at the LLVM shows a big difference.

After codegen rewrite:

julia> @code_llvm bench(tuple)

define double @julia_bench_22447([10 x double]*) #0 {
top:
  %1 = alloca [10 x double], align 8
  %2 = call i64 inttoptr (i64 140084164784448 to i64 ()*)()
<A BUNCH OF ALLOCS>
  br label %if

L.loopexit.loopexit:                              ; preds = %if5
  br label %L.loopexit

L.loopexit:                                       ; preds = %L.loopexit.loopexit, %if
  %15 = add i64 %"#s8.012", 1
  %16 = icmp eq i64 %"#s8.012", %3
  br i1 %16, label %L4.loopexit, label %if

L4.loopexit:                                      ; preds = %L.loopexit
  br label %L4

L4:                                               ; preds = %L4.loopexit, %top
  %17 = call i64 inttoptr (i64 140084164784448 to i64 ()*)()
  %18 = sub i64 %17, %2
  %19 = uitofp i64 %18 to double
  %20 = fdiv double %19, 1.000000e+09
  ret double %20

if:                                               ; preds = %if.lr.ph, %L.loopexit
  %"#s8.012" = phi i64 [ 1, %if.lr.ph ], [ %15, %L.loopexit ]
  %21 = load i64, i64* inttoptr (i64 140075470594608 to i64*), align 16
  %22 = icmp eq i64 %21, 0
  br i1 %22, label %L.loopexit, label %if5.preheader

if5.preheader:                                    ; preds = %if
  br label %if5

if5:                                              ; preds = %if5.preheader, %if5
  %"#s7.010" = phi i64 [ %23, %if5 ], [ 1, %if5.preheader ]
  %23 = add i64 %"#s7.010", 1
  %24 = load [10 x double], [10 x double]* %0, align 8
  %25 = extractvalue [10 x double] %24, 0
  store double %25, double* %5, align 8
  %26 = extractvalue [10 x double] %24, 1
  store double %26, double* %6, align 8
<A BUNCH OF STORES>
  %35 = call i64 @julia_get_idx_tuple_22448([10 x double]* nonnull %1, i64 %"#s7.010") #0
  %36 = icmp eq i64 %"#s7.010", %21
  br i1 %36, label %L.loopexit.loopexit, label %if5
}

Before:

julia> @code_llvm bench(tuple)

define double @julia_bench_21225([75 x double]*) {
top:
  %tuple = alloca [75 x double], align 8
  %1 = load [75 x double]* %0, align 8
  store [75 x double] %1, [75 x double]* %tuple, align 8
  %2 = call i64 inttoptr (i64 139864792730528 to i64 ()*)()
  %3 = call i64 @julia_power_by_squaring3285(i64 10, i64 5)
  %4 = icmp sgt i64 %3, 0
  %5 = select i1 %4, i64 %3, i64 0
  %6 = icmp eq i64 %5, 0
  br i1 %6, label %L7, label %L

L:                                                ; preds = %L5, %top
  %"#s4.0" = phi i64 [ %10, %L5 ], [ 1, %top ]
  br label %L2

L2:                                               ; preds = %L2, %L
  %"#s3.0" = phi i64 [ 1, %L ], [ %7, %L2 ]
  %7 = add i64 %"#s3.0", 1
  %8 = call i64 @julia_get_idx_tuple_21214([75 x double]* %tuple, i64 %"#s3.0")
  %9 = icmp eq i64 %"#s3.0", 75
  br i1 %9, label %L5, label %L2

L5:                                               ; preds = %L2
  %10 = add i64 %"#s4.0", 1
  %11 = icmp eq i64 %"#s4.0", %5
  br i1 %11, label %L7, label %L

L7:                                               ; preds = %L5, %top
  %12 = call i64 inttoptr (i64 139864792730528 to i64 ()*)()
  %13 = sub i64 %12, %2
  %14 = uitofp i64 %13 to double
  %15 = fdiv double %14, 1.000000e+09
  ret double %15
}

I see now that this might just something similar to #13305 since it seems the tuple get copied when it is passed to the get_idx_tuple function. However, #13305 works the same on 0.4.3 and 0.5 while this doesn't. Maybe something was hardcodes for tuples on 0.4.3.

@KristofferC KristofferC changed the title Performance regression in function call overhead from codegen rewrite Performance regression in tuples getting copied to functions that only read from them. Feb 28, 2016
carnaval added a commit that referenced this issue Mar 1, 2016
- Do not copy a getfield argument if it is already a pointer.
- Mark struct-as-pointer arguments immutable

Should: fix #15274, fix #13305, fix #15277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant