-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broadcast[!] over combinations of scalars and sparse vectors/matrices #19724
Conversation
Do inference failures affect common cases or only very specific ones? As long as the behavior is correct, we could merge this before the feature freeze, and improve performance later. |
broadcast{Tf,T}(f::Tf, ::Type{T}, A::SparseMatrixCSC) = broadcast(y -> f(T, y), A) | ||
broadcast{Tf,T}(f::Tf, A::SparseMatrixCSC, ::Type{T}) = broadcast(x -> f(x, T), A) | ||
|
||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_defargforcol_all(j, tail(isemptys), tail(expandsverts), tail(ks), tail(As))...) | ||
# fusing the following defs. avoids a few branches and construction of a tuple, yielding 1-20% runtime reduction | ||
# @inline _isactiverow(activerow, row) = row == activerow | ||
# @inline _isactiverow_all(activerow, ::Tuple{}) = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are all of these commented-out definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out definitions are what I fused into _fusedupdatebc
(for performance) per the comment on line 797. I left the unfused definitions in place (along with commented out calls to those methods before calls to _fusedupdatebc
) as an easier-to-follow form of the fused version. Best! (These commits were part of #19518.)
# @inline _updaterow_all(rowsentinel, activerows, rows, ks, stopks, As) = ( | ||
# _updaterow(rowsentinel, first(activerows), first(rows), first(ks), first(stopks), first(As)), | ||
# _updaterow_all(rowsentinel, tail(activerows), tail(rows), tail(ks), tail(stopks), tail(As))...) | ||
@inline function _fusedupdatebc(rowsentinel, activerow, row, defarg, k, stopk, A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comment on what _fusedupdate
is supposed to do would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my response above. Best!
# argument tuple (passedargstup) containing only the sparse vectors/matrices in mixedargs | ||
# in their orginal order, and such that the result of broadcast(g, passedargstup...) is | ||
# broadcast(f, mixedargs...) | ||
@inline capturescalars(f, mixedargs) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this useful to do (as an optimization) for non-sparse broadcast
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were the inference and allocation issues impacting performance (see the original post and this comment in test/sparse/higherorderfns.jl:244) fixed, possibly. Though as in the original post's last paragraph, I plan to head the other direction for now. Best!
@nanosoldier |
Most cases involving one or two sparse vector/matrix arguments and a small number of scalars seem to avoid the inference issue, though they do hit the (much less significant) issue (1) mentioned in this comment in test/sparse/higherorderfns.jl:244). Most cases involving three or more sparse vector/matrix arguments hit the inference issue.
:) |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
The nanosoldier regression seems to be the usual |
the pr's that make up the first several commits should be finished first, otherwise we're splitting review of them in multiple places |
# that incorporates the scalar arguments to broadcast (and, with #19667, | ||
# is inferable, so the overall return type from broadcast is inferred), | ||
# in some cases inference seems unable to determine the return type of | ||
# direct calls to that closure. This issue causes variables in in both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in in duplicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed on push. Thanks!
# in some cases inference seems unable to determine the return type of | ||
# direct calls to that closure. This issue causes variables in in both the | ||
# broadcast[!] entry points (fofzeros = f(_zeros_eltypes(args...)...)) and | ||
# the driver routines (Cx in _map_zeropres! and _broadcast_zeroprs!) to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zeroprs missing a second e?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed on push. Thanks!
Rebased and replaced the first commit with the latest version of #19723. Best! |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
# capturescalars takes a function (f) and a tuple of mixed sparse vectors/matrices and | ||
# broadcast scalar arguments (mixedargs), and returns a function (parevalf) and a reduced | ||
# argument tuple (passedargstup) containing only the sparse vectors/matrices in mixedargs | ||
# in their orginal order, and such that the result of broadcast(g, passedargstup...) is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is g
the same as parevalf
? is that par-eval or pare-val?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and par-eval as in partial evaluation. The former fixed and the latter clarified on push. Thanks!
…and sparse vectors/matrices.
…and sparse vectors/matrices.
Absent objections or requests for time, I plan to merge this Monday morning PST. Best! |
@inline function broadcast!{N}(f, C::AbstractArray, A, Bs::Vararg{Any,N}) | ||
@inline broadcast!{N}(f, C::AbstractArray, A, Bs::Vararg{Any,N}) = | ||
broadcast!_c(f, containertype(C, A, Bs...), C, A, Bs...) | ||
@inline function broadcast!_c{N}(f, ::Type, C::AbstractArray, A, Bs::Vararg{Any,N}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of broadcast_c!
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preference; I can see arguments for each form. Thoughts? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like broadcast_c!
better, but however you prefer is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed broadcast!_c
to broadcast_c!
throughout. Thanks!
I'm somewhat hesitating between following that approach or implementing some kind of optimization like the one in this PR for broadcasting methods in general. For one, this PR's mechanism is a way to avoid problems with For now I'd go with this solution anyway and would eventually revisit other options. |
…rse vectors/matrices. Makes broadcast! dispatch on container type (as broadcast), and inject generic sparse broadcast! for the appropriate container type.
…rse vectors/matrices.
Thanks all! |
Master is failing tests as of this pull request (particularly those introduced by this pull request). I conjecture that #19922 (merged after this pull request's last CI run) broke this pull request. Should have a fix shortly if the former is true. Best! |
(Requires #19667, #19723, and #19690, which form the first four commits / most of the diff.)
This pull request extends sparse
broadcast[!]
to handle combinations of scalars and sparse vectors/matrices. The tl;dr good: It works, and with #19667/#19273 the overall return el/type is inferred consistently. The tl;dr not so good: Inference seems unhappy in another way in some cases, impacting performance. The details:This PR's general approach is to: (1) capture the scalar arguments to
broadcast[!]
in a closure and simultaneously collect the sparse vector/matrix arguments; and then (2) pass that closure and the sparse vector/matrix arguments to existingbroadcast[!]
methods that accept solely sparse vector/matrix arguments.With #19667/#19723,
_return_type
/_broadcast_eltype
is inferable and yields the correct result eltype when called on the closure and collected sparse vector/matrix arguments described above. Hence the overall return el/type is correct. Unfortunately, in some cases inference seems unable to determine the return type of direct calls to that closure, in those cases causing allocation in both thebroadcast[!]
entry points and the underlyin routines (_map_zeropres!
,_broadcast_zeropres!
) and lackluster performance. (Will try to construct an MRE at some point.)This solution's general finickiness drives me to the conclusion that we should instead handle scalar arguments directly in the underlying routines. I do not know whether I will have time for another rewrite of the underlying routines before feature freeze. So I suggest we incorporate this (correct but performance-suboptimal) implementation before feature freeze, allowing another underlying routine rewrite for performance after feature freeze. Best!
(Edit: Also the closure construction allocates a bit, not sure why. Pointers much appreciated.)