Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function passed as an argument of a function is slow to broadcast() #20282

Closed
wsshin opened this issue Jan 27, 2017 · 5 comments
Closed

Function passed as an argument of a function is slow to broadcast() #20282

wsshin opened this issue Jan 27, 2017 · 5 comments

Comments

@wsshin
Copy link
Contributor

wsshin commented Jan 27, 2017

For example, execution of apply_f(f, A) = broadcast(f, A) is much slower than direct direct execution of broadcast(f, A):

julia> apply_f(f, A) = broadcast(f, A);

julia> f(x) = 2x;

julia> A = rand(10,10);

julia> using BenchmarkTools

julia> @benchmark apply_f($f, $A)
BenchmarkTools.Trial:
  memory estimate:  1.97 kb
  allocs estimate:  25
  median time:      18.385 μs (0.00% GC)

julia> @benchmark broadcast($f, $A)
BenchmarkTools.Trial:
  memory estimate:  896.00 bytes
  allocs estimate:  1
  median time:      118.314 ns (0.00% GC)

Here is the version info:

julia> versioninfo()
Julia Version 0.6.0-dev.2338
Commit 6f9fe0bd1 (2017-01-25 01:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin16.4.0)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
@KristofferC
Copy link
Member

Sometimes you need to force specialization to take place:

julia> apply_f2{F}(f::F, A) = broadcast(f, A);

julia> @benchmark apply_f2($f, $A)
BenchmarkTools.Trial: 
  memory estimate:  896.00 bytes
  allocs estimate:  1
  --------------
  minimum time:     95.070 ns (0.00% GC)
  median time:      112.688 ns (0.00% GC)
  mean time:        132.656 ns (12.22% GC)
  maximum time:     845.149 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     957
  time tolerance:   5.00%
  memory tolerance: 1.00%

@Sacha0
Copy link
Member

Sacha0 commented Jan 28, 2017

Ah, good catch @KristofferC! This is #19137. Best!

@Sacha0 Sacha0 closed this as completed Jan 28, 2017
@wsshin
Copy link
Contributor Author

wsshin commented Jan 29, 2017

Thanks! The explanation in #19137 is clear.

However, there is still a problem. When the dot syntax is used instead of broadcast(), execution remains slow even if specialization is forced:

julia> apply_fdot(f, A) = f.(A)
apply_fdot (generic function with 1 method)

julia> apply_fdot2{F}(f::F, A) = f.(A)
apply_fdot2 (generic function with 1 method)

julia> @benchmark apply_fdot(f, A)
BenchmarkTools.Trial:
  memory estimate:  1.19 kb
  allocs estimate:  25
  median time:      10.514 μs (0.00% GC)

julia> @benchmark apply_fdot2(f, A)
BenchmarkTools.Trial:
  memory estimate:  1.19 kb
  allocs estimate:  25
  median time:      11.395 μs (0.00% GC)

Maybe this has been discussed already as well?

@Sacha0
Copy link
Member

Sacha0 commented Jan 29, 2017

Interpolating f and A into the benchmarks eliminates the discrepancy:

julia> using BenchmarkTools

julia> f(x) = 2x;

julia> A = rand(10,10);

julia> apply_fdot(f, A) = f.(A)
apply_fdot (generic function with 1 method)

julia> apply_fdot2{F}(f::F, A) = f.(A)
apply_fdot2 (generic function with 1 method)

julia> apply_fdot3{F}(f::F, A) = broadcast(f, A)
apply_fdot3 (generic function with 1 method)

julia> @benchmark apply_fdot($f, $A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  1.97 kb
  allocs estimate:  25
  minimum time:     21.51 μs (0.00% GC)
  median time:      22.23 μs (0.00% GC)
  mean time:        23.46 μs (0.00% GC)
  maximum time:     129.07 μs (0.00% GC)

julia> @benchmark apply_fdot2(f, A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  1.97 kb
  allocs estimate:  25
  minimum time:     22.18 μs (0.00% GC)
  median time:      23.02 μs (0.00% GC)
  mean time:        24.76 μs (0.00% GC)
  maximum time:     118.79 μs (0.00% GC)

julia> @benchmark apply_fdot2($f, $A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     886
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  896.00 bytes
  allocs estimate:  1
  minimum time:     130.00 ns (0.00% GC)
  median time:      135.00 ns (0.00% GC)
  mean time:        154.72 ns (8.82% GC)
  maximum time:     1.43 μs (84.16% GC)

julia> @benchmark apply_fdot3($f, $A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     888
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  896.00 bytes
  allocs estimate:  1
  minimum time:     128.00 ns (0.00% GC)
  median time:      133.00 ns (0.00% GC)
  mean time:        153.16 ns (8.80% GC)
  maximum time:     1.40 μs (85.15% GC)

julia> versioninfo()
Julia Version 0.6.0-dev.2374
Commit 876549f (2017-01-25 22:41 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

Best!

@wsshin
Copy link
Contributor Author

wsshin commented Jan 29, 2017

Of course. I hate myself for making this same mistake again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants