Function passed as an argument of a function is slow to broadcast() #20282

wsshin · 2017-01-27T22:19:17Z

For example, execution of apply_f(f, A) = broadcast(f, A) is much slower than direct direct execution of broadcast(f, A):

julia> apply_f(f, A) = broadcast(f, A);

julia> f(x) = 2x;

julia> A = rand(10,10);

julia> using BenchmarkTools

julia> @benchmark apply_f($f, $A)
BenchmarkTools.Trial:
  memory estimate:  1.97 kb
  allocs estimate:  25
  median time:      18.385 μs (0.00% GC)

julia> @benchmark broadcast($f, $A)
BenchmarkTools.Trial:
  memory estimate:  896.00 bytes
  allocs estimate:  1
  median time:      118.314 ns (0.00% GC)

Here is the version info:

julia> versioninfo()
Julia Version 0.6.0-dev.2338
Commit 6f9fe0bd1 (2017-01-25 01:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin16.4.0)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

The text was updated successfully, but these errors were encountered:

KristofferC · 2017-01-27T23:19:32Z

Sometimes you need to force specialization to take place:

julia> apply_f2{F}(f::F, A) = broadcast(f, A);

julia> @benchmark apply_f2($f, $A)
BenchmarkTools.Trial: 
  memory estimate:  896.00 bytes
  allocs estimate:  1
  --------------
  minimum time:     95.070 ns (0.00% GC)
  median time:      112.688 ns (0.00% GC)
  mean time:        132.656 ns (12.22% GC)
  maximum time:     845.149 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     957
  time tolerance:   5.00%
  memory tolerance: 1.00%

Sacha0 · 2017-01-28T00:07:00Z

Ah, good catch @KristofferC! This is #19137. Best!

wsshin · 2017-01-29T04:00:29Z

Thanks! The explanation in #19137 is clear.

However, there is still a problem. When the dot syntax is used instead of broadcast(), execution remains slow even if specialization is forced:

julia> apply_fdot(f, A) = f.(A)
apply_fdot (generic function with 1 method)

julia> apply_fdot2{F}(f::F, A) = f.(A)
apply_fdot2 (generic function with 1 method)

julia> @benchmark apply_fdot(f, A)
BenchmarkTools.Trial:
  memory estimate:  1.19 kb
  allocs estimate:  25
  median time:      10.514 μs (0.00% GC)

julia> @benchmark apply_fdot2(f, A)
BenchmarkTools.Trial:
  memory estimate:  1.19 kb
  allocs estimate:  25
  median time:      11.395 μs (0.00% GC)

Maybe this has been discussed already as well?

Sacha0 · 2017-01-29T04:56:39Z

Interpolating f and A into the benchmarks eliminates the discrepancy:

julia> using BenchmarkTools

julia> f(x) = 2x;

julia> A = rand(10,10);

julia> apply_fdot(f, A) = f.(A)
apply_fdot (generic function with 1 method)

julia> apply_fdot2{F}(f::F, A) = f.(A)
apply_fdot2 (generic function with 1 method)

julia> apply_fdot3{F}(f::F, A) = broadcast(f, A)
apply_fdot3 (generic function with 1 method)

julia> @benchmark apply_fdot($f, $A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  1.97 kb
  allocs estimate:  25
  minimum time:     21.51 μs (0.00% GC)
  median time:      22.23 μs (0.00% GC)
  mean time:        23.46 μs (0.00% GC)
  maximum time:     129.07 μs (0.00% GC)

julia> @benchmark apply_fdot2(f, A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  1.97 kb
  allocs estimate:  25
  minimum time:     22.18 μs (0.00% GC)
  median time:      23.02 μs (0.00% GC)
  mean time:        24.76 μs (0.00% GC)
  maximum time:     118.79 μs (0.00% GC)

julia> @benchmark apply_fdot2($f, $A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     886
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  896.00 bytes
  allocs estimate:  1
  minimum time:     130.00 ns (0.00% GC)
  median time:      135.00 ns (0.00% GC)
  mean time:        154.72 ns (8.82% GC)
  maximum time:     1.43 μs (84.16% GC)

julia> @benchmark apply_fdot3($f, $A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     888
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  896.00 bytes
  allocs estimate:  1
  minimum time:     128.00 ns (0.00% GC)
  median time:      133.00 ns (0.00% GC)
  mean time:        153.16 ns (8.80% GC)
  maximum time:     1.40 μs (85.15% GC)

julia> versioninfo()
Julia Version 0.6.0-dev.2374
Commit 876549f (2017-01-25 22:41 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

Best!

wsshin · 2017-01-29T05:03:19Z

Of course. I hate myself for making this same mistake again!

Sacha0 closed this as completed Jan 28, 2017

wsshin mentioned this issue Apr 20, 2017

Add inlined methods for ntuple(f, Val{N}) for 0 ≤ N ≤ 15 #21446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function passed as an argument of a function is slow to broadcast() #20282

Function passed as an argument of a function is slow to broadcast() #20282

wsshin commented Jan 27, 2017

KristofferC commented Jan 27, 2017

Sacha0 commented Jan 28, 2017

wsshin commented Jan 29, 2017

Sacha0 commented Jan 29, 2017

wsshin commented Jan 29, 2017

Function passed as an argument of a function is slow to broadcast() #20282

Function passed as an argument of a function is slow to broadcast() #20282

Comments

wsshin commented Jan 27, 2017

KristofferC commented Jan 27, 2017

Sacha0 commented Jan 28, 2017

wsshin commented Jan 29, 2017

Sacha0 commented Jan 29, 2017

wsshin commented Jan 29, 2017