Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcast[!] over combinations of scalars and sparse vectors/matrices #19724

Merged
merged 4 commits into from
Jan 9, 2017

Conversation

Sacha0
Copy link
Member

@Sacha0 Sacha0 commented Dec 27, 2016

(Requires #19667, #19723, and #19690, which form the first four commits / most of the diff.)

This pull request extends sparse broadcast[!] to handle combinations of scalars and sparse vectors/matrices. The tl;dr good: It works, and with #19667/#19273 the overall return el/type is inferred consistently. The tl;dr not so good: Inference seems unhappy in another way in some cases, impacting performance. The details:

This PR's general approach is to: (1) capture the scalar arguments to broadcast[!] in a closure and simultaneously collect the sparse vector/matrix arguments; and then (2) pass that closure and the sparse vector/matrix arguments to existing broadcast[!] methods that accept solely sparse vector/matrix arguments.

With #19667/#19723, _return_type/_broadcast_eltype is inferable and yields the correct result eltype when called on the closure and collected sparse vector/matrix arguments described above. Hence the overall return el/type is correct. Unfortunately, in some cases inference seems unable to determine the return type of direct calls to that closure, in those cases causing allocation in both the broadcast[!] entry points and the underlyin routines (_map_zeropres!, _broadcast_zeropres!) and lackluster performance. (Will try to construct an MRE at some point.)

This solution's general finickiness drives me to the conclusion that we should instead handle scalar arguments directly in the underlying routines. I do not know whether I will have time for another rewrite of the underlying routines before feature freeze. So I suggest we incorporate this (correct but performance-suboptimal) implementation before feature freeze, allowing another underlying routine rewrite for performance after feature freeze. Best!

(Edit: Also the closure construction allocates a bit, not sure why. Pointers much appreciated.)

@nalimilan
Copy link
Member

Do inference failures affect common cases or only very specific ones? As long as the behavior is correct, we could merge this before the feature freeze, and improve performance later.

@kshyatt kshyatt added sparse Sparse arrays broadcast Applying a function over a collection labels Dec 27, 2016
broadcast{Tf,T}(f::Tf, ::Type{T}, A::SparseMatrixCSC) = broadcast(y -> f(T, y), A)
broadcast{Tf,T}(f::Tf, A::SparseMatrixCSC, ::Type{T}) = broadcast(x -> f(x, T), A)

end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #19690. (Commits three and four are #19690.) Thanks!

_defargforcol_all(j, tail(isemptys), tail(expandsverts), tail(ks), tail(As))...)
# fusing the following defs. avoids a few branches and construction of a tuple, yielding 1-20% runtime reduction
# @inline _isactiverow(activerow, row) = row == activerow
# @inline _isactiverow_all(activerow, ::Tuple{}) = ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all of these commented-out definitions?

Copy link
Member Author

@Sacha0 Sacha0 Dec 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out definitions are what I fused into _fusedupdatebc (for performance) per the comment on line 797. I left the unfused definitions in place (along with commented out calls to those methods before calls to _fusedupdatebc) as an easier-to-follow form of the fused version. Best! (These commits were part of #19518.)

# @inline _updaterow_all(rowsentinel, activerows, rows, ks, stopks, As) = (
# _updaterow(rowsentinel, first(activerows), first(rows), first(ks), first(stopks), first(As)),
# _updaterow_all(rowsentinel, tail(activerows), tail(rows), tail(ks), tail(stopks), tail(As))...)
@inline function _fusedupdatebc(rowsentinel, activerow, row, defarg, k, stopk, A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment on what _fusedupdate is supposed to do would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my response above. Best!

# argument tuple (passedargstup) containing only the sparse vectors/matrices in mixedargs
# in their orginal order, and such that the result of broadcast(g, passedargstup...) is
# broadcast(f, mixedargs...)
@inline capturescalars(f, mixedargs) =
Copy link
Member

@stevengj stevengj Dec 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful to do (as an optimization) for non-sparse broadcast as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were the inference and allocation issues impacting performance (see the original post and this comment in test/sparse/higherorderfns.jl:244) fixed, possibly. Though as in the original post's last paragraph, I plan to head the other direction for now. Best!

@stevengj
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@Sacha0
Copy link
Member Author

Sacha0 commented Dec 27, 2016

Do inference failures affect common cases or only very specific ones?

Most cases involving one or two sparse vector/matrix arguments and a small number of scalars seem to avoid the inference issue, though they do hit the (much less significant) issue (1) mentioned in this comment in test/sparse/higherorderfns.jl:244). Most cases involving three or more sparse vector/matrix arguments hit the inference issue.

As long as the behavior is correct, we could merge this before the feature freeze, and improve performance later.

:)

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@stevengj
Copy link
Member

The nanosoldier regression seems to be the usual floatexp test noise, so it should be fine to merge.

@tkelman
Copy link
Contributor

tkelman commented Dec 28, 2016

the pr's that make up the first several commits should be finished first, otherwise we're splitting review of them in multiple places

@Sacha0
Copy link
Member Author

Sacha0 commented Dec 31, 2016

Rebased out commits from merged pull requests. Now depends only on #19723 or #19787+#19723. Best!

# that incorporates the scalar arguments to broadcast (and, with #19667,
# is inferable, so the overall return type from broadcast is inferred),
# in some cases inference seems unable to determine the return type of
# direct calls to that closure. This issue causes variables in in both the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in in duplicated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on push. Thanks!

# in some cases inference seems unable to determine the return type of
# direct calls to that closure. This issue causes variables in in both the
# broadcast[!] entry points (fofzeros = f(_zeros_eltypes(args...)...)) and
# the driver routines (Cx in _map_zeropres! and _broadcast_zeroprs!) to have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zeroprs missing a second e?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on push. Thanks!

@Sacha0
Copy link
Member Author

Sacha0 commented Jan 1, 2017

Rebased and replaced the first commit with the latest version of #19723. Best!

@tkelman
Copy link
Contributor

tkelman commented Jan 1, 2017

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

# capturescalars takes a function (f) and a tuple of mixed sparse vectors/matrices and
# broadcast scalar arguments (mixedargs), and returns a function (parevalf) and a reduced
# argument tuple (passedargstup) containing only the sparse vectors/matrices in mixedargs
# in their orginal order, and such that the result of broadcast(g, passedargstup...) is
Copy link
Contributor

@tkelman tkelman Jan 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is g the same as parevalf ? is that par-eval or pare-val?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and par-eval as in partial evaluation. The former fixed and the latter clarified on push. Thanks!

@Sacha0
Copy link
Member Author

Sacha0 commented Jan 8, 2017

Absent objections or requests for time, I plan to merge this Monday morning PST. Best!

@inline function broadcast!{N}(f, C::AbstractArray, A, Bs::Vararg{Any,N})
@inline broadcast!{N}(f, C::AbstractArray, A, Bs::Vararg{Any,N}) =
broadcast!_c(f, containertype(C, A, Bs...), C, A, Bs...)
@inline function broadcast!_c{N}(f, ::Type, C::AbstractArray, A, Bs::Vararg{Any,N})
Copy link
Contributor

@pabloferz pabloferz Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of broadcast_c! instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No preference; I can see arguments for each form. Thoughts? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like broadcast_c! better, but however you prefer is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed broadcast!_c to broadcast_c! throughout. Thanks!

@pabloferz
Copy link
Contributor

This solution's general finickiness drives me to the conclusion that we should instead handle scalar arguments directly in the underlying routines.

I'm somewhat hesitating between following that approach or implementing some kind of optimization like the one in this PR for broadcasting methods in general. For one, this PR's mechanism is a way to avoid problems with ::Type arguments which are somewhat problematic (see #19849 and #19879). On the other hand, the Broadcast approach (handling scalars directly in the routines) would probably avoid some of the problems you saw with inference. I'm not sure (yet) if a mixed approach between the two solutions is plausible.

For now I'd go with this solution anyway and would eventually revisit other options.

…rse vectors/matrices.

Makes broadcast! dispatch on container type (as broadcast), and inject generic sparse broadcast! for the appropriate container type.
@Sacha0 Sacha0 merged commit 1494f43 into JuliaLang:master Jan 9, 2017
@Sacha0
Copy link
Member Author

Sacha0 commented Jan 9, 2017

Thanks all!

@Sacha0 Sacha0 deleted the mixedbc branch January 9, 2017 19:26
@Sacha0
Copy link
Member Author

Sacha0 commented Jan 9, 2017

Master is failing tests as of this pull request (particularly those introduced by this pull request). I conjecture that #19922 (merged after this pull request's last CI run) broke this pull request. Should have a fix shortly if the former is true. Best!

Sacha0 added a commit to Sacha0/julia that referenced this pull request Jan 9, 2017
Sacha0 added a commit to Sacha0/julia that referenced this pull request Jan 9, 2017
vtjnash added a commit that referenced this pull request Jan 10, 2017
Fix simultaneous merge issue between #19724 and #19922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants