better internal interface for extending `broadcast` #22060

stevengj · 2017-05-25T17:15:52Z

@MikeInnes suggested in FluxML/Flux.jl#31 that we change the dot-call lowering in such a way as to allow it to be disabled for custom types (which may reside on a GPU or something and support only a small set of functions).

For example, currently x .+ y .* x.^z is lowered to broadcast((x,y,z) -> x+y*x^z, x,y,z). This could instead be lowered to:

if Base.isfusing(x,y,z)
    broadcast((x,y) -> x+y*x^z, x,y,z)
else
    broadcast(+, x, broadcast(*, y, broadcast(^, x, z)))
end

with Base.isfusing(...) = true being the default. This would also address #22053 (cc @malmaud).

Pro:

Makes life easier in the short run for containers based on external libraries that only support a few efficient operations. They could overload isfusing and broadcast(::typeof(foo), ...) for a small number of supported foo.
Makes life easier in specialized applications where it may not be possible to define broadcast for general functions, e.g. in Convex.jl where it needs to guarantee convexity.

Con:

You can no longer look at an expression like x .+ y .* x and know that it fuses into a single loop with no temporaries.
There is no middle ground. A single non-fusable operand will "spoil" fusion for an entire set of nested dot calls. (To fuse a subset of an expression, you'd have to explicitly assign it to a temporary array.) (We could lower to a nested set of isfusing calls, of course, but then you'd get an exponential explosion in lowered code size.)

In the long run, I really think that packages like TensorFlow.jl should exploit the underlying library's ability to define custom operations as callback functions to implement broadcast(f::Function, ...) for arbitrary f, at which point they can defining isfusing(...) = true and get all the benefits of fusion.

The text was updated successfully, but these errors were encountered:

malmaud · 2017-05-25T18:35:51Z

This seems like a good solution.

It reminds me of the solution we had to package precompilation - one non-precompile-enabled package disables all its dependents from precompiling, just like one non-fusing operand spoils a whole call. That put pressure on package authors to support precompiling, but gave them some time and flexibility to do so and preserved backwards compatibility. This solution gives us a similar path forward for supporting fusing across the Julia ecosystem.

nalimilan · 2017-05-25T19:16:56Z

That would also fix #19313 (though I'm less interested in it since Nullable is going to be replaced with Union{T, Null} to represent missingness).

oxinabox · 2017-05-26T02:57:36Z

I am right in saying that the branch in the lowered-code would be optimized away during specialization on input types? If used within a function where x,y,z are type-stable.

stevengj · 2017-05-26T03:24:29Z

@oxinabox, yes.

ylxdzsw · 2017-05-26T14:19:25Z

Couldn't we do the same thing without this feature? For example, if we want to overload Tensor for their element-wise operations

function broadcast(f, x::Tensor, y::Tensor)
  f(Wrapper(x), Wrapper(y)).data
end

+(x::Wrapper{Tensor}, y::Wrapper{Tensor}) = element_wise_add(x.data, y.data) |> Wrapper
*(x::Wrapper{Tensor}, y::Wrapper{Tensor}) = element_wise_mult(x.data, y.data) |> Wrapper

so for x .+ y .* z, f become (x,y,z)->x+y*z, and it is called with the Wrappers. Then + and * also get called with Wrappers, so the actual Tensor are called by element_wise_mult and element_wise_add finally.

stevengj · 2017-05-26T14:23:26Z

@ylxdzsw, good trick! That might well work.

KristofferC · 2017-05-26T14:28:36Z

Could that be seen as "lifting" the Arrays so they are considered scalars from the p.o.v of broadcast, defining operations in these scalars and then extracting out the Array's again?

MikeInnes · 2017-05-26T14:36:12Z

It's a neat trick but I'm not sure how it will go in practice; at the least it's going to have an impact on error messages and traces.

How would you get multiple types (which could each be fusing or not) to play nicely together? It seems like you'd need some standard Wrapper type to do the dispatch on, and some way of marking what should be wrapped before broadcast – which devolves pretty quickly into a more complicated version of this approach :)

stevengj · 2017-05-26T15:50:31Z

@MikeInnes, it is more complicated, but think it could be done entirely with dispatch by defining a broadcast(f, ::Union{Tensor,X,Y}...) method for all X and Y that your library supports broadcasting tensors with (presumably only a small number), rather than by changes to lowering.

vchuravy · 2017-05-27T01:27:40Z

The problem with that is that we are talking about a diverse set of array-like containers that come from different libraries. I am definitely in favour of the trait-like implementation.

oxinabox · 2017-05-27T02:36:29Z

The problem with that is that we are talking about a diverse set of array-like containers that come from different libraries. I am definitely in favour of the trait-like implementation.

Indeed, even calling some of them containers is misleading.
Eg TensorFlow's Tensors do not hold values as such.
They describe a computational graph in terms of tensors.
When that graph is executed, then some containers are filled in the C++ backend.
I guess objects-with-array-like-semantics is a bit of a mouthful though.

I'm interested in @ylxdzsw "trick", I think we would need to try it out, to get clarity

MikeInnes · 2017-06-23T17:52:38Z

cc @madeleineudell

MikeInnes · 2017-06-23T23:26:45Z

also @denizyuret

This came up quite a few times at JuliaCon so I put something together to hack around it. Please try it out and let me know how it goes.

I have to say, I'm pretty disappointed that Base has left a bunch of packages out in the cold like this. I hope it's worth it to save some characters on broadcast! in a few places.

malmaud · 2017-08-30T21:02:07Z

Can we add this to the 1.0 milestone?

StefanKarpinski · 2017-08-31T20:19:56Z

While I've added this to the 1.0 milestone, we may want to consider internals of how dot-fusion syntax works to be unstable.

fix #21094 fix #22060 fix #22053 replaces #22063

JeffBezanson · 2017-11-20T21:29:56Z

We've decided that broadcast internals can change over 1.x.

@lower

This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, #23692, #25377 by superceding them). This introduces a pure Julia data structure that represents a fused broadcast expression. For example, the expression `2 .* (x .+ 1)` lowers to: ```julia julia> Meta.@lower 2 .* (x .+ 1) :($(Expr(:thunk, CodeInfo(:(begin Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize) Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1) Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3)) Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4)) return Core.SSAValue(5) end))))) ``` Or, slightly more readably as: ```julia using .Broadcast: materialize, make materialize(make(*, 2, make(+, x, 1))) ``` The `Broadcast.make` function serves two purposes. Its primary purpose is to construct the `Broadcast.Broadcasted` objects that hold onto the function, the tuple of arguments (potentially including nested `Broadcasted` arguments), and sometimes a set of `axes` to include knowledge of the outer shape. The secondary purpose, however, is to allow an "out" for objects that _don't_ want to participate in fusion. For example, if `x` is a range in the above `2 .* (x .+ 1)` expression, it needn't allocate an array and operate elementwise — it can just compute and return a new range. Thus custom structures are able to specialize `Broadcast.make(f, args...)` just as they'd specialize on `f` normally to return an immediate result. `Broadcast.materialize` is identity for everything _except_ `Broadcasted` objects for which it allocates an appropriate result and computes the broadcast. It does two things: it `initialize`s the outermost `Broadcasted` object to compute its axes and then `copy`s it. Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact same expression tree to compute the right-hand side of the expression as above, and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the `Broadcasted` expression tree and then `copyto!` it into the given destination. All-together, this forms a complete API for custom types to extend and customize the behavior of broadcast (fixes #22060). It uses the existing `BroadcastStyle`s throughout to simplify dispatch on many arguments: * Custom types can opt-out of broadcast fusion by specializing `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`. * The `Broadcasted` object computes and stores the type of the combined `BroadcastStyle` of its arguments as its first type parameter, allowing for easy dispatch and specialization. * Custom Broadcast storage is still allocated via `broadcast_similar`, however instead of passing just a function as a first argument, the entire `Broadcasted` object is passed as a final argument. This potentially allows for much more runtime specialization dependent upon the exact expression given. * Custom broadcast implmentations for a `CustomStyle` are defined by specializing `copy(bc::Broadcasted{CustomStyle})` or `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`. * Fallback broadcast specializations for a given output object of type `Dest` (for the `DefaultArrayStyle` or another such style that hasn't implemented assignments into such an object) are defined by specializing `copyto(dest::Dest, bc::Broadcasted{Nothing})`. As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to `.+`, just as had been done for all `AbstractArray`s in general. As a first-mover proof of concept, LinearAlgebra uses this new system to improve broadcasting over structured arrays. Before, broadcasting over a structured matrix would result in a sparse array. Now, broadcasting over a structured matrix will _either_ return an appropriately structured matrix _or_ a dense array. This does incur a type instability (in the form of a discriminated union) in some situations, but thanks to type-based introspection of the `Broadcasted` wrapper commonly used functions can be special cased to be type stable. For example: ```julia julia> f(d) = round.(Int, d) f (generic function with 1 method) julia> @inferred f(Diagonal(rand(3))) 3×3 Diagonal{Int64,Array{Int64,1}}: 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 1 julia> @inferred Diagonal(rand(3)) .* 3 ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}} Stacktrace: [1] error(::String) at ./error.jl:33 [2] top-level scope julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3) 4×4 Tridiagonal{Float64,Array{Float64,1}}: 1.30771 0.838589 ⋅ ⋅ 0.0 3.89109 0.0459757 ⋅ ⋅ 0.0 4.48033 2.51508 ⋅ ⋅ 0.0 6.23739 ``` In addition to the issues referenced above, it fixes: * Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated specially in a fused broadcast; they're just arguments in a `Broadcasted` object like everything else. * Fixes #21094: Since broadcasting is now represented by a pure Julia datastructure it can be created within `@generated` functions and serialized. * Fixes #26097: The fallback destination-array specialization method of `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not be confused by `nothing` arguments. * Fixes the broadcast-specific element of #25499: The default base broadcast implementation no longer depends upon `Base._return_type` to allocate its array (except in the empty or concretely-type cases). Note that the sparse implementation (#19595) is still dependent upon inference and is _not_ fixed. * Fixes #25340: Functions are treated like normal values just like arguments and only evaluated once. * Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was fixed on master already, but this fixes it now, too. * Fixes #25521. * The performance of this patch has been thoroughly tested through its iterative development process in #25377. There remain [two classes of performance regressions](#25377) that Nanosoldier flagged. * #25691: Propagation of constant literals sill lose their constant-ness upon going through the broadcast machinery. I believe quite a large number of functions would need to be marked as `@pure` to support this -- including functions that are intended to be specialized. (For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377) branch as of a1d4e7e. Squashed and separated out to make it easier to review and commit) Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Jameson Nash <vtjnash@gmail.com> Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>

@lower

This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, #23692, #25377 by superceding them). This introduces a pure Julia data structure that represents a fused broadcast expression. For example, the expression `2 .* (x .+ 1)` lowers to: ```julia julia> Meta.@lower 2 .* (x .+ 1) :($(Expr(:thunk, CodeInfo(:(begin Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize) Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1) Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3)) Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4)) return Core.SSAValue(5) end))))) ``` Or, slightly more readably as: ```julia using .Broadcast: materialize, make materialize(make(*, 2, make(+, x, 1))) ``` The `Broadcast.make` function serves two purposes. Its primary purpose is to construct the `Broadcast.Broadcasted` objects that hold onto the function, the tuple of arguments (potentially including nested `Broadcasted` arguments), and sometimes a set of `axes` to include knowledge of the outer shape. The secondary purpose, however, is to allow an "out" for objects that _don't_ want to participate in fusion. For example, if `x` is a range in the above `2 .* (x .+ 1)` expression, it needn't allocate an array and operate elementwise — it can just compute and return a new range. Thus custom structures are able to specialize `Broadcast.make(f, args...)` just as they'd specialize on `f` normally to return an immediate result. `Broadcast.materialize` is identity for everything _except_ `Broadcasted` objects for which it allocates an appropriate result and computes the broadcast. It does two things: it `initialize`s the outermost `Broadcasted` object to compute its axes and then `copy`s it. Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact same expression tree to compute the right-hand side of the expression as above, and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the `Broadcasted` expression tree and then `copyto!` it into the given destination. All-together, this forms a complete API for custom types to extend and customize the behavior of broadcast (fixes #22060). It uses the existing `BroadcastStyle`s throughout to simplify dispatch on many arguments: * Custom types can opt-out of broadcast fusion by specializing `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`. * The `Broadcasted` object computes and stores the type of the combined `BroadcastStyle` of its arguments as its first type parameter, allowing for easy dispatch and specialization. * Custom Broadcast storage is still allocated via `broadcast_similar`, however instead of passing just a function as a first argument, the entire `Broadcasted` object is passed as a final argument. This potentially allows for much more runtime specialization dependent upon the exact expression given. * Custom broadcast implmentations for a `CustomStyle` are defined by specializing `copy(bc::Broadcasted{CustomStyle})` or `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`. * Fallback broadcast specializations for a given output object of type `Dest` (for the `DefaultArrayStyle` or another such style that hasn't implemented assignments into such an object) are defined by specializing `copyto(dest::Dest, bc::Broadcasted{Nothing})`. As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to `.+`, just as had been done for all `AbstractArray`s in general. As a first-mover proof of concept, LinearAlgebra uses this new system to improve broadcasting over structured arrays. Before, broadcasting over a structured matrix would result in a sparse array. Now, broadcasting over a structured matrix will _either_ return an appropriately structured matrix _or_ a dense array. This does incur a type instability (in the form of a discriminated union) in some situations, but thanks to type-based introspection of the `Broadcasted` wrapper commonly used functions can be special cased to be type stable. For example: ```julia julia> f(d) = round.(Int, d) f (generic function with 1 method) julia> @inferred f(Diagonal(rand(3))) 3×3 Diagonal{Int64,Array{Int64,1}}: 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 1 julia> @inferred Diagonal(rand(3)) .* 3 ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}} Stacktrace: [1] error(::String) at ./error.jl:33 [2] top-level scope julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3) 4×4 Tridiagonal{Float64,Array{Float64,1}}: 1.30771 0.838589 ⋅ ⋅ 0.0 3.89109 0.0459757 ⋅ ⋅ 0.0 4.48033 2.51508 ⋅ ⋅ 0.0 6.23739 ``` In addition to the issues referenced above, it fixes: * Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated specially in a fused broadcast; they're just arguments in a `Broadcasted` object like everything else. * Fixes #21094: Since broadcasting is now represented by a pure Julia datastructure it can be created within `@generated` functions and serialized. * Fixes #26097: The fallback destination-array specialization method of `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not be confused by `nothing` arguments. * Fixes the broadcast-specific element of #25499: The default base broadcast implementation no longer depends upon `Base._return_type` to allocate its array (except in the empty or concretely-type cases). Note that the sparse implementation (#19595) is still dependent upon inference and is _not_ fixed. * Fixes #25340: Functions are treated like normal values just like arguments and only evaluated once. * Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was fixed on master already, but this fixes it now, too. * Fixes #25521. * The performance of this patch has been thoroughly tested through its iterative development process in #25377. There remain [two classes of performance regressions](#25377) that Nanosoldier flagged. * #25691: Propagation of constant literals sill lose their constant-ness upon going through the broadcast machinery. I believe quite a large number of functions would need to be marked as `@pure` to support this -- including functions that are intended to be specialized. (For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377) branch as of a1d4e7e. Squashed and separated out to make it easier to review and commit) Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Jameson Nash <vtjnash@gmail.com> Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>

stevengj added speculative Whether the change will be implemented is speculative broadcast Applying a function over a collection labels May 25, 2017

stevengj mentioned this issue May 25, 2017

WIP: add Base.isfusing function to allow containers to disable fusion #22063

Closed

4 tasks

Sacha0 mentioned this issue Jun 29, 2017

Wishlist: API and documentation for extending broadcast #20740

Closed

ylxdzsw mentioned this issue Jul 26, 2017

support julia v0.6 denizyuret/AutoGrad.jl#24

Merged

denizyuret mentioned this issue Jul 31, 2017

wishlist for next version denizyuret/AutoGrad.jl#25

Closed

9 tasks

denizyuret mentioned this issue Aug 15, 2017

can this handle 3 arguments? JuliaAttic/TakingBroadcastSeriously.jl#5

Open

StefanKarpinski added this to the 1.0 milestone Aug 31, 2017

MikeInnes added the gpu Affects running Julia on a GPU label Sep 7, 2017

vtjnash self-assigned this Sep 9, 2017

vtjnash added a commit that referenced this issue Sep 12, 2017

introduce runtime representation of broadcast fusion

a13aacd

fix #21094 fix #22060 fix #22053 replaces #22063

vtjnash added a commit that referenced this issue Sep 12, 2017

introduce runtime representation of broadcast fusion

bfd0a3e

fix #21094 fix #22060 fix #22053 replaces #22063

vtjnash added a commit that referenced this issue Sep 13, 2017

introduce runtime representation of broadcast fusion

535fefa

fix #21094 fix #22060 fix #22053 replaces #22063

vtjnash mentioned this issue Sep 13, 2017

RFC: introduce runtime representation of broadcast fusion #23692

Closed

vchuravy mentioned this issue Sep 17, 2017

Define broadcast for NDArray dmlc/MXNet.jl#170

Closed

StefanKarpinski added the broadcast label Nov 20, 2017

JeffBezanson changed the title ~~trait-like rule to disable broadcast fusion~~ better internal interface for extending broadcast Nov 20, 2017

JeffBezanson modified the milestones: 1.0, 1.x Nov 20, 2017

timholy mentioned this issue Jan 3, 2018

RFC: Customizable lazy broadcasting with options for pure-Julia fusion and eager evaluation #25377

Closed

mbauman mentioned this issue Apr 24, 2018

Customizable lazy fused broadcasting in pure Julia #26891

Merged

mbauman removed the broadcast label Apr 24, 2018

mbauman closed this as completed in #26891 Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better internal interface for extending `broadcast` #22060

better internal interface for extending `broadcast` #22060

stevengj commented May 25, 2017 •

edited

Loading

malmaud commented May 25, 2017

nalimilan commented May 25, 2017

oxinabox commented May 26, 2017

stevengj commented May 26, 2017

ylxdzsw commented May 26, 2017

stevengj commented May 26, 2017

KristofferC commented May 26, 2017

MikeInnes commented May 26, 2017

stevengj commented May 26, 2017 •

edited

Loading

vchuravy commented May 27, 2017

oxinabox commented May 27, 2017

MikeInnes commented Jun 23, 2017

MikeInnes commented Jun 23, 2017 •

edited

Loading

malmaud commented Aug 30, 2017

StefanKarpinski commented Aug 31, 2017

JeffBezanson commented Nov 20, 2017

better internal interface for extending broadcast #22060

better internal interface for extending broadcast #22060

Comments

stevengj commented May 25, 2017 • edited Loading

malmaud commented May 25, 2017

nalimilan commented May 25, 2017

oxinabox commented May 26, 2017

stevengj commented May 26, 2017

ylxdzsw commented May 26, 2017

stevengj commented May 26, 2017

KristofferC commented May 26, 2017

MikeInnes commented May 26, 2017

stevengj commented May 26, 2017 • edited Loading

vchuravy commented May 27, 2017

oxinabox commented May 27, 2017

MikeInnes commented Jun 23, 2017

MikeInnes commented Jun 23, 2017 • edited Loading

malmaud commented Aug 30, 2017

StefanKarpinski commented Aug 31, 2017

JeffBezanson commented Nov 20, 2017

better internal interface for extending `broadcast` #22060

better internal interface for extending `broadcast` #22060

stevengj commented May 25, 2017 •

edited

Loading

stevengj commented May 26, 2017 •

edited

Loading

MikeInnes commented Jun 23, 2017 •

edited

Loading