Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in performance of sum on Broadcasted with small Union eltype #39425

Closed
nalimilan opened this issue Jan 27, 2021 · 5 comments
Closed
Labels
broadcast Applying a function over a collection missing data Base.missing and related functionality performance Must go faster regression Regression in behavior compared to a previous version

Comments

@nalimilan
Copy link
Member

The performance of sum(Base.Broadcast.Broadcasted(*, (x, z))) when z has eltype Union{Float64, Missing} regressed dramatically between Julia 1.3 and 1.4. It improved again in 1.5 but it's still much slower on master than on 1.3.

On Julia 1.3.1:

julia> using BenchmarkTools

julia> using LinearAlgebra

julia> x = rand(10_000);

julia> z = Vector{Union{Float64, Missing}}(x);

julia> f(A, w) = sum(Base.Broadcast.Broadcasted(*, (A, w)))
f (generic function with 1 method)

julia> @btime f(x, z);
  31.150 μs (11 allocations: 240 bytes)

On 1.4.2:

julia> @btime f(x, z);
  2.070 ms (40004 allocations: 1.07 MiB)

On master:

julia> @btime f(x, z);
  941.751 μs (40002 allocations: 1.07 MiB)

Cc: @tkf. Found while investigating JuliaStats/StatsBase.jl#518 (comment).

@nalimilan nalimilan added performance Must go faster regression Regression in behavior compared to a previous version broadcast Applying a function over a collection missing data Base.missing and related functionality labels Jan 27, 2021
@nalimilan
Copy link
Member Author

Though after calling instantiate everything is very fast on master:

julia> g(A, w) = sum(Base.Broadcast.instantiate(Base.Broadcast.Broadcasted(*, (A, w))));

julia> @btime g(x, z);
  23.826 μs (1 allocation: 16 bytes)

@vtjnash
Copy link
Member

vtjnash commented Mar 31, 2021

From that observation, I suspect this is also due to an excess usage of @inline in broadcast.jl. We've observed that those may hurt performance more often than they help, so as a rough cut, we may just want to remove all of them to solve all these performance regressions, then slowly bring them back more carefully, if necessary.

@inkydragon
Copy link
Member

test code

x = rand(10_000);
z = Vector{Union{Float64, Missing}}(x);
f(A, w) = sum(Base.Broadcast.Broadcasted(*, (A, w)))
@time f(x, z);
@time f(x, z);
@time f(x, z);
  • ✔️ 1.3.1 (2019-12-30): (15 allocations: 400 bytes)
  • ✔️ 1.4.0-DEV.559 (2019-12-04) 7642e2b: (11 allocations: 240 bytes)
  • ❌ 1.4.0-DEV.560 (2019-12-04) 3c182bc: (40.00 k allocations: 1.068 MiB)
    Transducer as an optimization: map, filter and flatten #33526 cc: @tkf
  • ❌ 1.4.0-rc1 (2020-01-23): (40.00 k allocations: 1.068 MiB)
  • ❌ 1.4.2 (2020-05-23): (40.00 k allocations: 1.068 MiB)
  • ❌ 1.6.4 (2021-11-19)

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2023

This still reproduces on master

@Tortar
Copy link
Contributor

Tortar commented Jul 10, 2024

This doesn't reproduce anymore on 1.11 and nightly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection missing data Base.missing and related functionality performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

5 participants