-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance of broadcast and collect with Union{T, Missing} #30480
Conversation
Use the same pattern as in collect_to_with_first! (which is used when size is known).
@@ -926,13 +926,12 @@ function copyto_nonleaf!(dest, bc::Broadcasted, iter, state, count) | |||
y === nothing && break | |||
I, state = y | |||
@inbounds val = bc[I] | |||
S = typeof(val) | |||
if S <: T | |||
if val isa T || typeof(val) === T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether typeof(val) === T
is needed is not clear (#30125), but I've added it for consistency with collect_to!
. If that's redundant, we should remove all uses at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should remove that from everywhere
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Nanosoldier confirms this improves performance dramatically with |
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan |
Looks like you're all clear. |
Am I understanding the magic here correctly? Is the key that we should not assign a type that may be a small discriminated union to a variable? |
That's my conclusion as well, but I have no idea why that's the case. |
|
Good to merge? |
Bump. Who's making the decision here? |
Most of these conditions were introduced in #25828 and #30480 for some performance reasons atm, but now they seem just unnecessary or even harmful in terms of inferrability. There doesn't seem to be any performance difference in the benchmark used at #25828: ```julia using BenchmarkTools x = rand(Int, 100_000); y = convert(Vector{Union{Int,Missing}}, x); z = copy(y); z[2] = missing; ``` > master: ```julia julia> @Btime map(identity, x); 57.814 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.040 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 127.554 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 59.248 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 74.693 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 126.262 μs (4 allocations: 1.62 MiB) ``` > this commit: ``` julia> @Btime map(identity, x); 58.668 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.013 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 126.600 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 57.531 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 69.561 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 125.578 μs (4 allocations: 1.62 MiB) ```
Most of these conditions were introduced in #25828 and #30480 for some performance reasons atm, but now they seem just unnecessary or even harmful in terms of inferrability. There doesn't seem to be any performance difference in the benchmark used at #25828: ```julia using BenchmarkTools x = rand(Int, 100_000); y = convert(Vector{Union{Int,Missing}}, x); z = copy(y); z[2] = missing; ``` > master: ```julia julia> @Btime map(identity, x); 57.814 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.040 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 127.554 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 59.248 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 74.693 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 126.262 μs (4 allocations: 1.62 MiB) ``` > this commit: ```julia julia> @Btime map(identity, x); 58.668 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.013 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 126.600 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 57.531 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 69.561 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 125.578 μs (4 allocations: 1.62 MiB) ```
Use the same pattern as in
collect_to!
(which is used when size is known). Follows what was done by #25828.Fixes #30455.
Before:
After: