Performance regression in tuples getting copied to functions that only read from them. #15277

KristofferC · 2016-02-28T17:09:42Z

There is a performance regression in what I ~~believe is tuple indexing~~ ~~(now I believe it is function call overhead)~~ now I believe it is tuples sometimes getting copied into functions that previously got passed by reference, from the giant codegen rewrite PR.

Test script (note the @noinline):

@noinline function get_idx_tuple(n::NTuple, i::Int)
    @inbounds v = n[i]
    return v
end


function bench{N, T}(tuple::NTuple{N, T})
    s = 0.0
    t_tup = @elapsed for n in 1:10^5
        for i in 1:N
            s += get_idx_tuple(tuple, i)
        end
    end
    return t_tup
end


tuple = (rand(75)...)

bench(tuple)

This typically looks like this for different sizes of the tuple:

Full bisect log: https://gist.github.com/KristofferC/1ba13fe374897e4f7f98

Relevant:

# possible first bad commit: [5c5d48369a1a8f9a5959699ec44cab393255d0c5] [wip codegen rewrite] fix another type information mismatch error in assignment
# possible first bad commit: [548075efe73083f6a58b97814ea558bdbc0db41d] [wip codegen rewrite] fix mixed-type intrinsic usage (such as `bswap(Float64)`)
# possible first bad commit: [322878d93879f81cdf3d6a783376ad149b10d65b] [wip codegen rewrite] fix select_value validation and missing value copy in variable assignment
# possible first bad commit: [e4dcff39221571b8fc59d25873de092edaf46768] [wip codegen rewrite] fix isVolatile marking and closure memloc allocation marking
# possible first bad commit: [8b98aa0fe4056915b69176c0eca23391a77f2f58] [wip codegen rewrite] fix marking of varinfo.value when type_is_ghost and recursive constant struct creation
# possible first bad commit: [ffc200620862e04d6e4d5b9363ea03130276deca] [wip codegen rewrite] cleanup closure env handling to occur early
# possible first bad commit: [ae1a623634aed1912b8fb12a624529f3108fe418] [wip codegen rewrite] use correct macro test for jl_is_immutable
# possible first bad commit: [fec6aa9048d2e54f3c178b379db96074f1d867be] [wip] codegen rewrite + rewrite intrinsics to accomodate

cc @vtjnash

The text was updated successfully, but these errors were encountered:

KristofferC · 2016-02-28T17:35:37Z

It is possible that this has nothing to do with tuples but instead has something to do with inlining or not inlining...

KristofferC · 2016-02-28T17:56:27Z

Yes tuples are actually faster on 0.5 than 0.4 when the getindex is inlined so this is probably not a tuple problem then.

Inlined:

KristofferC · 2016-02-28T21:00:45Z

Some updates:

Using

@noinline function get_idx_tuple(n::NTuple, i::Int)
    return 1
end

shows the same behavior so it is not the actual indexing into the tuple that is the problem.

Looking at the LLVM shows a big difference.

After codegen rewrite:

julia> @code_llvm bench(tuple)

define double @julia_bench_22447([10 x double]*) #0 {
top:
  %1 = alloca [10 x double], align 8
  %2 = call i64 inttoptr (i64 140084164784448 to i64 ()*)()
<A BUNCH OF ALLOCS>
  br label %if

L.loopexit.loopexit:                              ; preds = %if5
  br label %L.loopexit

L.loopexit:                                       ; preds = %L.loopexit.loopexit, %if
  %15 = add i64 %"#s8.012", 1
  %16 = icmp eq i64 %"#s8.012", %3
  br i1 %16, label %L4.loopexit, label %if

L4.loopexit:                                      ; preds = %L.loopexit
  br label %L4

L4:                                               ; preds = %L4.loopexit, %top
  %17 = call i64 inttoptr (i64 140084164784448 to i64 ()*)()
  %18 = sub i64 %17, %2
  %19 = uitofp i64 %18 to double
  %20 = fdiv double %19, 1.000000e+09
  ret double %20

if:                                               ; preds = %if.lr.ph, %L.loopexit
  %"#s8.012" = phi i64 [ 1, %if.lr.ph ], [ %15, %L.loopexit ]
  %21 = load i64, i64* inttoptr (i64 140075470594608 to i64*), align 16
  %22 = icmp eq i64 %21, 0
  br i1 %22, label %L.loopexit, label %if5.preheader

if5.preheader:                                    ; preds = %if
  br label %if5

if5:                                              ; preds = %if5.preheader, %if5
  %"#s7.010" = phi i64 [ %23, %if5 ], [ 1, %if5.preheader ]
  %23 = add i64 %"#s7.010", 1
  %24 = load [10 x double], [10 x double]* %0, align 8
  %25 = extractvalue [10 x double] %24, 0
  store double %25, double* %5, align 8
  %26 = extractvalue [10 x double] %24, 1
  store double %26, double* %6, align 8
<A BUNCH OF STORES>
  %35 = call i64 @julia_get_idx_tuple_22448([10 x double]* nonnull %1, i64 %"#s7.010") #0
  %36 = icmp eq i64 %"#s7.010", %21
  br i1 %36, label %L.loopexit.loopexit, label %if5
}

Before:

julia> @code_llvm bench(tuple)

define double @julia_bench_21225([75 x double]*) {
top:
  %tuple = alloca [75 x double], align 8
  %1 = load [75 x double]* %0, align 8
  store [75 x double] %1, [75 x double]* %tuple, align 8
  %2 = call i64 inttoptr (i64 139864792730528 to i64 ()*)()
  %3 = call i64 @julia_power_by_squaring3285(i64 10, i64 5)
  %4 = icmp sgt i64 %3, 0
  %5 = select i1 %4, i64 %3, i64 0
  %6 = icmp eq i64 %5, 0
  br i1 %6, label %L7, label %L

L:                                                ; preds = %L5, %top
  %"#s4.0" = phi i64 [ %10, %L5 ], [ 1, %top ]
  br label %L2

L2:                                               ; preds = %L2, %L
  %"#s3.0" = phi i64 [ 1, %L ], [ %7, %L2 ]
  %7 = add i64 %"#s3.0", 1
  %8 = call i64 @julia_get_idx_tuple_21214([75 x double]* %tuple, i64 %"#s3.0")
  %9 = icmp eq i64 %"#s3.0", 75
  br i1 %9, label %L5, label %L2

L5:                                               ; preds = %L2
  %10 = add i64 %"#s4.0", 1
  %11 = icmp eq i64 %"#s4.0", %5
  br i1 %11, label %L7, label %L

L7:                                               ; preds = %L5, %top
  %12 = call i64 inttoptr (i64 139864792730528 to i64 ()*)()
  %13 = sub i64 %12, %2
  %14 = uitofp i64 %13 to double
  %15 = fdiv double %14, 1.000000e+09
  ret double %15
}

I see now that this might just something similar to #13305 since it seems the tuple get copied when it is passed to the get_idx_tuple function. However, #13305 works the same on 0.4.3 and 0.5 while this doesn't. Maybe something was hardcodes for tuples on 0.4.3.

- Do not copy a getfield argument if it is already a pointer. - Mark struct-as-pointer arguments immutable Should: fix #15274, fix #13305, fix #15277

This was referenced Feb 28, 2016

Accessing an index in a tuple wrapped by an immutable generates worse code than accessing the naked tuple. #15274

Closed

add tuple sum benchmarks JuliaCI/BaseBenchmarks.jl#7

Merged

Performance regression in function calls #15257

Closed

KristofferC changed the title ~~Performance regression in tuple indexing from codegen rewrite~~ Performance regression in function call overhead from codegen rewrite Feb 28, 2016

KristofferC changed the title ~~Performance regression in function call overhead from codegen rewrite~~ Performance regression in tuples getting copied to functions that only read from them. Feb 28, 2016

carnaval added a commit that referenced this issue Mar 1, 2016

Couple ispointer fixes.

c679bfb

- Do not copy a getfield argument if it is already a pointer. - Mark struct-as-pointer arguments immutable Should: fix #15274, fix #13305, fix #15277

carnaval mentioned this issue Mar 1, 2016

Couple ispointer fixes #15312

Merged

carnaval closed this as completed in #15312 Mar 2, 2016

KristofferC mentioned this issue Mar 6, 2016

Use ContMechTensors whith JuAFEM Ferrite-FEM/Ferrite.jl#51

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in tuples getting copied to functions that only read from them. #15277

Performance regression in tuples getting copied to functions that only read from them. #15277

KristofferC commented Feb 28, 2016

KristofferC commented Feb 28, 2016

KristofferC commented Feb 28, 2016

KristofferC commented Feb 28, 2016

Performance regression in tuples getting copied to functions that only read from them. #15277

Performance regression in tuples getting copied to functions that only read from them. #15277

Comments

KristofferC commented Feb 28, 2016

KristofferC commented Feb 28, 2016

KristofferC commented Feb 28, 2016

KristofferC commented Feb 28, 2016