RFC: introduce runtime representation of broadcast fusion #23692

vtjnash · 2017-09-13T15:38:17Z

This preserves all dot-fusion information in the type-heirarchy, permitting complex runtime-analysis and re-dispatch overrides. Since no information is obscured in a new type, experiments with variations on #22060 and #22053 can now be implemented in pure Julia just by adding new dispatch rules. Also, since no new types are created, these can now be easily serialized, and can be returned from generated functions.

fix #21094
fix #22060
fix #22053
replaces #22063

fix #21094 fix #22060 fix #22053 replaces #22063

vchuravy · 2017-09-13T15:52:06Z

Awesome! Would be great if it had some tests and examples for actually modifying broadcast behaviour for custom container types.

vtjnash · 2017-09-13T16:11:26Z

Haha. From my description above ("experiments with variations ... can now be implemented"), I was implying that someone else should do that 😛

andyferris · 2017-09-17T12:03:25Z

Cool. This is awesome that this is looked at. :) Thanks @vtjnash

Since I'm not sure of any preceding discussions - was this approach vs others considered? For example, one might consider lazy but non-parser-fusing broadcast? This problem seems at first blush possible to deal with using Julia types, dispatch, and dead-simple dot parsing where each dot-call or dot-op makes exactly one broadcast call, and broadcasting with arrays could make a BroadcastedArray lazily evaluated view. The reason I bring this up is because I'm a fan of minimizing parser magic (but then again I'm better at Julia meta programming than at lisp so maybe that's partly me?).

stevengj · 2017-09-20T13:11:23Z

@andyferris, see e.g. #19198 for discussion of making broadcast lazy vs materializing.

stevengj · 2017-09-20T13:15:42Z

test/numbers.jl

@@ -2914,7 +2914,7 @@ Base.literal_pow(::typeof(^), ::PR20530, ::Val{p}) where {p} = 2
    p = 2
    @test x^p == 1
    @test x^2 == 2
-    @test [x,x,x].^2 == [2,2,2]
+    @test_broken [x, x, x].^2 == [2, 2, 2] # literal_pow violates referential transparency


What result does this give now?

On master, I get [x,x,x].^2 == [2,2,2] but I guess this PR breaks this? Temporarily or permanently?

I don't think literal_pow really played well with broadcast and dot syntax to begin with. It was a little unpredictable when literal_pow or regular ^ would be called...

Yes, on this PR you get [1, 1, 1].

timholy · 2017-09-20T14:34:43Z

Probably also the best path for fixing #22932 (comment)

stevengj · 2017-09-21T16:25:58Z

What happens in this PR to @. x^2 + 2x + sin(x)? Currently, that gets lowered to broadcast(x -> x^2 + 2x + sin(x), x), i.e. the multiple appearances of x are collapsed into a single argument. Is that still true?

mbauman · 2017-09-22T18:49:35Z

No, they are currently passed as three arguments:

julia> expand(Main, :(@. x^2 + 2x + sin(x)))
:((Base.broadcast)(((Core.apply_type)((Core.getfield)(Base.Broadcast, :Fusion), 3, false))(((Core.getfield)(Base.Broadcast, :FusionCall))(+, (Core.tuple)(((Core.getfield)(Base.Broadcast, :FusionCall))(^, (Core.tuple)(((Core.apply_type)((Core.getfield)(Base.Broadcast, :FusionArg), 1))(), ((Core.getfield)(Base.Broadcast, :FusionConstant))(2))), ((Core.getfield)(Base.Broadcast, :FusionCall))(*, (Core.tuple)(((Core.getfield)(Base.Broadcast, :FusionConstant))(2), ((Core.apply_type)((Core.getfield)(Base.Broadcast, :FusionArg), 2))())), ((Core.getfield)(Base.Broadcast, :FusionCall))(sin, (Core.tuple)(((Core.apply_type)((Core.getfield)(Base.Broadcast, :FusionArg), 3))()))))), x, x, x))

Are there cases where that makes a significant performance difference?

stevengj · 2017-09-22T19:33:14Z

@mbauman, I thought that it did, but now I'm trying a simple example and I'm not seeing much difference:

julia> f!(dest, x) = broadcast!(x -> x^2 + 3x - 5x^3, dest, x);
julia> g!(dest, x) = broadcast!((x,y,z) -> x^2 + 3y - 5z^3, dest, x,x,x);
julia> x = rand(10^4); y = similar(x);

julia> @btime f!($y, $x);
  7.167 μs (0 allocations: 0 bytes)

julia> @btime g!($y, $x);
  7.183 μs (0 allocations: 0 bytes)

(I wonder if the compiler got better since I last tried this?)

stevengj · 2017-09-22T19:33:28Z

It would also be nice to see a proof-of-concept implementation of a custom array type for which fusion is disabled, just to make sure that this is feasible with a reasonable amount of code given the new design.

mbauman · 2017-09-22T19:37:32Z

That's precisely what I'm working on right now. I understand this much better after playing with it for a little bit. Here's a simple example:

@. 2x + 1

Expands out into:

julia> d = Base.Broadcast.Fusion{1,false}(
               Base.Broadcast.FusionCall(+,
                   (Base.Broadcast.FusionCall(*,
                       (Base.Broadcast.FusionConstant(2),
                        Base.Broadcast.FusionArg{1}())),
                    Base.Broadcast.FusionConstant(1))))
(a_1) -> +(*(2, a_1), 1)

julia> x = 1:2;

julia> broadcast(d, x)
2-element Array{Int64,1}:
 3
 5

So, now I think you'll need to overload broadcast(::Fusion, ::UnitRange) and dig through the call graph. It doesn't look like it'll be easy…

It'd be really nice if we could pass the objects themselves to the FusionArg and then eagerly evaluate them in cases like this. But that doesn't work at all, because this call graph is defining the scalar function. Still puzzling this through…

Edit: note that it's easy to introspect these things by evaluating a partial expansion:

julia> d = eval(expand(Main, :(@. 4x^2 + 2x + 1)).args[2]) # Expr(call, :broadcast, *THIS FUNCTION*, args…)
(a_1, a_2) -> +(*(4, ^(a_1, 2)), *(2, a_2), 1)

mbauman · 2017-09-22T20:54:48Z

Ok, I don't have the complete thread here yet, but it looks like we'll want to add a recursive pre-walk to make it easy to modify the call tree and its arguments. For example, we want to recursively identify that, e.g., FusionCall(+, ::FusionConstant{Int}, ::FusionArgument{1}) is a representation of unfused_broadcast(+, ::Int, ::UnitRange).

In this case, we'd then swap the FusionCall out in exchange for simply the FusionArgument{1}(), and then eagerly evaluate the first fusion argument to r.start+i:r.stop+i.

Does this plan of attack sound reasonable and achievable? I definitely will need more time to process how this becomes an extensible interface. Others are very welcome to pitch in. :)

stevengj · 2017-09-23T12:29:24Z

@mbauman, if this analysis of the fusion call graph occurs at runtime, is there an issue with type inference? Functions that disable fusion still need to be inferrable.

vtjnash · 2017-09-25T15:02:14Z

FusionCall(+, ::FusionConstant{Int}, ::FusionArgument{1}) is a representation of unfused_broadcast(+, ::Int, ::UnitRange)

It's not – that later information is never produced in this representation.

mbauman · 2017-09-25T17:11:39Z

Right, but that's exactly what we'd need to address something like #22932 (comment). I was simply trying to identify if/how it'd be possible to get there from the current implementation in a way that's easy to extend.

This is very clever and definitely an improvement in many cases… but it blows up fast due to all the nested types. For example, this definition is no longer inferable:

julia> f(x) = @. x^5 - 3x^4 + 2x^3 + 10x^2 - 4x + 64

julia> @code_warntype f(1:10)
Variables:
  #self# <optimized out>
  x::UnitRange{Int64}
  T::Type{#s50} where #s50
…
      return #temp#@_53::Any
  end::Any

# Conflicts: # src/julia-syntax.scm # test/broadcast.jl # test/ranges.jl

timholy · 2017-12-17T12:07:17Z

I merged this into master. Here's a fun demo.

code

function Base.broadcast(f, r::UnitRange)
    if all_ops_ursafe(f)
        UnitRange(f(first(r)), f(last(r)))
    elseif all_ops_rsafe(f)
        range(f(first(r)), f(step(r)), length(r))
    else
        broadcast(f, Broadcast.DefaultArrayStyle{1}(), nothing, nothing, r)
    end
end

all_ops_ursafe(f::Broadcast.Fusion) = all_ops_ursafe(f.f)
all_ops_ursafe(f::Broadcast.FusionCall) = all_ops_ursafe(f.f) && all(all_ops_ursafe, f.args)
all_ops_ursafe(::typeof(+)) = true
all_ops_ursafe(::typeof(-)) = true
all_ops_ursafe(f::Broadcast.FusionConstant{<:Real}) = true
all_ops_ursafe(f::Broadcast.FusionArg) = true
all_ops_ursafe(arg) = false

all_ops_rsafe(f::Broadcast.Fusion) = all_ops_rsafe(f.f)
all_ops_rsafe(f::Broadcast.FusionCall) = all_ops_rsafe(f.f) && all(all_ops_rsafe, f.args)
all_ops_rsafe(::typeof(+)) = true
all_ops_rsafe(::typeof(-)) = true
all_ops_rsafe(::typeof(*)) = true
all_ops_rsafe(::typeof(/)) = true
all_ops_rsafe(f::Broadcast.FusionConstant) = true
all_ops_rsafe(f::Broadcast.FusionArg) = true
all_ops_rsafe(arg) = false

demo

julia> using Test

julia> x = 1:3
1:3

julia> @inferred(2 .+ x .+ 1)
4:6

julia> @inferred(2 .* x .+ 1)
3:2:7

julia> @inferred(sin.(x) .+ 1)
3-element Array{Float64,1}:
 1.8414709848078965
 1.9092974268256817
 1.1411200080598671

Overall I'm pretty optimistic about this and think we probably want this for 1.0. My two concerns:

this would need updating for the change in keyword arguments, and I'm a bit skeptical that I have time to tackle that within the next 48 hours (I'm in the middle of a move...)
the quality of the generated code doesn't seem great, though I didn't have time to check in detail

vtjnash · 2017-12-18T16:58:05Z

I suppose this is OK, but I don't really like this PR anymore, since it seemed like it only addressed part of the problem. I started a new project at master...vtjnash:jn/lazydotfuse to experiment with being fully lazy and it seemed better. It needs updating for the changes to the broadcast API.

Sacha0 · 2017-12-18T17:28:15Z

Here's a fun demo.

Neat demo @timholy! :)

timholy · 2017-12-18T19:04:44Z

I didn't spend a lot of time looking it over, but IIUC jn/lazydotfuse seems to generalize the approach of MappedArrays.jl. As someone whose computing life routinely features 1TB 4d arrays, I love lazy operations. But, the problem is, when do you evaluate? Will users get their hands on a Broadcast object, or is this something that gets evaled at the "end" of broadcasting? (My Scheme is nonexistent, so I can't easily tell from just looking at the code.) If so, won't the fact that a Broadcast is neither <:AbstractArray nor <:Tuple cause downstream problems?

Conversely, this PR doesn't seem to suffer from that problem at all. What specifically seemed missing to you?

vtjnash · 2017-12-18T19:14:25Z

Will users get their hands on a Broadcast object, or is this something that gets evaled at the "end" of broadcasting?

The operations are realized at the "end" of the broadcasting, so the effect of it is transparent to user code.

This PR still does an awkward amount of work in scheme. The other branch eliminates that and moves everything into Julia. The net result is that the default is for Julia to build effectively the same representation as this PR. But allowing libraries to implement more accurate control over the transform.

StefanKarpinski · 2017-12-18T19:23:26Z

So it seems like what @vtjnash has done on that branch is exactly what I had envisioned when I proposed doing this with lazy operations instead of as a purely syntactic transformation. The key insight is that we were already effectively deciding where the materialization boundary is in the syntactic approach: the materialization boundary is exactly where dot fusion stops.

bramtayl · 2017-12-27T01:29:44Z

It might be worth exporting the linked lists as a standard library package?

KristofferC · 2017-12-27T01:37:20Z

It seems the questions here are diverging further and further from the core of this PR. Please open new, focused, issues instead, and cross reference back to this one. Issues are cheap.

timholy · 2017-12-28T15:20:06Z

What do folks think I should do about this? I'm currently blocked on #25267. I could push a branch that has the requisite changes in it, but it is going to fail nanosoldier. Or if it doesn't, it's only because we don't have sufficiently comprehensive broadcasting tests.

I guess the real question is whether we want to merge something that has known performance regressions for the sake of the API change, and assume that the performance fix will come.

vchuravy · 2017-12-28T15:48:20Z

I would say do the API changes now and performance improvements during the alpha.

stevengj · 2017-12-29T15:53:00Z

Can't this be done for 1.1? Right now, there is no documented API for makingbroadcast lazy etcetera; the only documented modification is overloading broadcast itself ala #24992. So, adding an API for this later would not be breaking. I'd rather not merge something at the last minute that massively slows things down for the sake of a new API that we've barely tested.

StefanKarpinski · 2017-12-29T15:54:56Z

I agree, @stevengj – hooking into broadcast should be explicitly not a stable API in 1.0.

timholy · 2017-12-30T13:00:52Z

Agreed this is a tough call. You outline the dangers well. The flip side is that it's a bit unsatisfying to say "Julia 1.0 is here! It's got a stable API except for broadcasting!" I suspect the coming PR is the last remaining component needed to avoid having that kind of caveat, and to me that's sufficiently attractive that I'm going to keep plugging away at this, get the PR submitted, and y'all can decide what to do with it.

If the goal were only to allow broadcasting to be lazy, then I'd say that's not important enough to worry about. But as a refresher, the real problems to be fixed are:

it's pretty awful that now we return an Array for (1:5) .+ 1. We could put a bandaid on that by specializing broadcast(::typeof(+), ::UnitRange, ::Real), but due to the lisp-fusion things like 2.*(1:5) .+ 1 are beyond repair, and even the simple fix would behave differently for variables than for literals.
the lisp-fusion makes life essentially impossible for, say, the GPU folks or anyone else who calls a C library, who can't make use of a fused operation and can't easily unfuse it.

If we do this later, then my crystal ball suggests it might look something like this:

we'll have to add a broadcast_similar method that throws an error if the API documented here is used---without the lisp-fusion it's not clear what to supply for f, so I don't think we can issue a traditional deprecation warning (in the sense that code will continue to function).
There seems also to be concern about the impact on literal_pow above, though I'm not far enough along to have anything useful to say about that.
The return type of many broadcasting operations that involve ranges will change. To me that's a breaking change that goes beyond "the API for hooking into broadcasting has changed."

If none of these sound serious to others, don't let this stand in the way of the 0.7-alpha release.

stevengj · 2017-12-30T14:10:42Z

@timholy, the minimal change to address some of those issues would be something like #22063: a way to disable fusion for certain argument types, without any more complicated stuff like laziness or this PR.

If we ever change broadcast to be fully lazy or whatever, we can still implement isfusing on top of that so that container types still have an easy way to disable fusion.

(Honestly, I don't see the big deal about ranges. Just don't use .* if you want a range result; use * instead.)

StefanKarpinski · 2017-12-30T17:42:50Z

The flip side is that it's a bit unsatisfying to say "Julia 1.0 is here! It's got a stable API except for broadcasting!"

It would certainly be better to be able to say "everything is perfect and we won't change it", but we're really out of time to keep working on this and it doesn't seem ready yet. Since broadcasting and dot syntax are such new and novel features, I don't think it's the end of the world to say that:

Usage of broadcasting will continue to work from user's perspective the way it does now;
Implementation of broadcasting may change and that may break libraries that hook into it deeply.

For the vast majority of Julia users, the "broadcasting API" is how broadcasting behaves and how they use it, not how it's implemented and extended. If/when we do change the internals of broadcasting, we can work closely with packages that hook into it to ensure a smooth transition so that non-library-author Julia users are none the wiser that anything has changed.

vtjnash · 2017-12-31T20:36:37Z

There seems also to be concern about the impact on literal_pow above, though I'm not far enough along to have anything useful to say about that.

It's possible to handle, it just requires some code duplication.

andreasnoack · 2018-01-02T08:17:12Z

Do we still need all the literal_pow stuff now that we have constant propagation?

StefanKarpinski · 2018-01-02T14:42:40Z

@andreasnoack, please don't derail this conversation to rehash that. I've explained why that's needed elsewhere and will be happy to do so again but not here.

timholy · 2018-01-04T13:00:42Z

New version of this PR in #25377.

@stevengj

(Honestly, I don't see the big deal about ranges. Just don't use .* if you want a range result; use * instead.)

We're deprecating a + 1 for other AbstractVectors, so telling people to use + for ranges and .+ for all other AbstractVectors is not a satisfying solution. We seem to have missed deprecating those methods for ranges, but I do that as a separate commit in #25377.

That fact then makes the alternative less attractive:

the minimal change to address some of those issues would be something like #22063: a way to disable fusion for certain argument types, without any more complicated stuff like laziness or this PR.

So we'd have to disable fusion for all ranges, and that presumably means ending fusion for quite a few operations. Totally doable, of course, just unattractive.

stevengj · 2018-01-04T17:25:59Z

@timholy, as you mentioned above, we could define broadcast(::typeof(+), ::Range, ::Real) so that people could get a range from (1:5) .+ 1 etcetera. If they combine this with multiplication, they just need to use *, e.g. 2 * (1:5) .+ 1 and not 2 .* (1:5) .+ 1 if they want a range result. Anything more complicated with a range would still produce an an array. This doesn't seem so bad to me — how common is it to want to do complex nested broadcast arithmetic with ranges to get another range?

timholy · 2018-01-04T19:02:03Z

One problem with broadcast(::typeof(+), ::Range, ::Real) is it doesn't work for r::Range .+ n::Number when n is literal, because literals get fused into f. On Julia 0.6:

julia> Base.broadcast(::typeof(+), ::UnitRange, ::Int) = "crazy"

julia> (1:5) .+ 1
5-element Array{Int64,1}:
 2
 3
 4
 5
 6

julia> n = 1
1

julia> (1:5) .+ n
"crazy"

But I agree that deep nesting is rare, and I could live without preserving the range in such cases.

stevengj · 2018-01-04T19:54:36Z

Okay, then let's just define + number for ranges. Ranges are a special type unto themselves, not a linear-algebra object, so I don't see a problem with defining special operations on them.

timholy · 2018-01-04T20:14:20Z

We have that currently, so that wouldn't be a change. But that does take us back to

We're deprecating a + 1 for other AbstractVectors, so telling people to use + for ranges and .+ for all other AbstractVectors is not a satisfying solution.

That said, not everything in 1.0 is going to be satisfying. Still, this problem and others are fixed in #25377. It's just a question of whether people are scared by it, and of course to finish up the loose ends.

As an alternative, what about removing the literal-slurping from the lisp fusion code? Are there big negatives? That would at least make broadcasting work in simple cases.

stevengj · 2018-01-04T20:20:02Z

Still, this problem and others are fixed in #25377.

At a huge performance cost, I thought you said?

stevengj · 2018-01-04T20:37:27Z

As an alternative, what about removing the literal-slurping from the lisp fusion code? Are there big negatives? That would at least make broadcasting work in simple cases.

Last I checked, there was a significant performance penalty to simple expressions if you remove the literal-slurping. But this doesn't seem to be the case any more?

julia> f!(x) = broadcast!(x -> x^2, x, x);
julia> g!(x) = broadcast!((x,p) -> x^p, x, x,2);
julia> x = zeros(10^3);
julia> @btime f!($x); @btime g!($x);
  82.644 ns (0 allocations: 0 bytes)
  85.150 ns (0 allocations: 0 bytes)

julia> f!(x) = broadcast!(x -> 3x^2 + 4x + 5, x, x);
julia> g!(x) = broadcast!((x,c1,c2,c3,c4) -> c1*x^c2 + c3*x + c4,x, x,3,2,4,5);
julia> @btime f!($x); @btime g!($x);
  183.832 ns (0 allocations: 0 bytes)
  183.849 ns (0 allocations: 0 bytes)

I'm not sure what changed?

Another problem with removing literal-slurping is that it eliminates the type-stability of Unitful expressions involving literal powers. One possibility would be to only do literal-slurping for .^ with integer literal powers.

timholy · 2018-01-04T20:41:42Z

At a huge performance cost, I thought you said?

Not as bad now. For a = rand(2,2), master gives

julia> @btime broadcast!(+, $b, $a, 1);
  12.715 ns (0 allocations: 0 bytes)

and #25377 gives

julia> @btime broadcast!(+, $b, $a, 1);
  17.991 ns (0 allocations: 0 bytes)

timholy · 2018-01-04T20:58:24Z

One possibility would be to only do literal-slurping for literal powers.

If we don't do #25377, that would be an improvement. Then, #22063 might be enough to prevent the GPU people from hating us. You could conceivably rework it around BroadcastStyle since that already has promote_type-like behavior (see the docs about binary rules and combine_styles).

Still less flexible than #25377, but it's a much smaller change, and it automatically fixes the literal_pow problem.

vtjnash · 2018-04-19T21:11:24Z

base/broadcast.jl

+@inline (f::FusionCall)(args...) = f.f(map(a -> a(args...), f.args)...)
+# TODO: calling _apply on map _apply is not handled by inference
+# for now, we unroll some cases and generate others, to help it out
+#@inline (f::FusionApply)(args...) = Core._apply(f.f, map(a -> a(args...), f.args)...)


equivalently, I think this is also f.f(flatten(map(a -> a(args...), f.args)...)...)
where flatten(args...) = Core._apply(tuple, args...)

@lower

This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, #23692, #25377 by superceding them). This introduces a pure Julia data structure that represents a fused broadcast expression. For example, the expression `2 .* (x .+ 1)` lowers to: ```julia julia> Meta.@lower 2 .* (x .+ 1) :($(Expr(:thunk, CodeInfo(:(begin Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize) Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1) Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3)) Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4)) return Core.SSAValue(5) end))))) ``` Or, slightly more readably as: ```julia using .Broadcast: materialize, make materialize(make(*, 2, make(+, x, 1))) ``` The `Broadcast.make` function serves two purposes. Its primary purpose is to construct the `Broadcast.Broadcasted` objects that hold onto the function, the tuple of arguments (potentially including nested `Broadcasted` arguments), and sometimes a set of `axes` to include knowledge of the outer shape. The secondary purpose, however, is to allow an "out" for objects that _don't_ want to participate in fusion. For example, if `x` is a range in the above `2 .* (x .+ 1)` expression, it needn't allocate an array and operate elementwise — it can just compute and return a new range. Thus custom structures are able to specialize `Broadcast.make(f, args...)` just as they'd specialize on `f` normally to return an immediate result. `Broadcast.materialize` is identity for everything _except_ `Broadcasted` objects for which it allocates an appropriate result and computes the broadcast. It does two things: it `initialize`s the outermost `Broadcasted` object to compute its axes and then `copy`s it. Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact same expression tree to compute the right-hand side of the expression as above, and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the `Broadcasted` expression tree and then `copyto!` it into the given destination. All-together, this forms a complete API for custom types to extend and customize the behavior of broadcast (fixes #22060). It uses the existing `BroadcastStyle`s throughout to simplify dispatch on many arguments: * Custom types can opt-out of broadcast fusion by specializing `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`. * The `Broadcasted` object computes and stores the type of the combined `BroadcastStyle` of its arguments as its first type parameter, allowing for easy dispatch and specialization. * Custom Broadcast storage is still allocated via `broadcast_similar`, however instead of passing just a function as a first argument, the entire `Broadcasted` object is passed as a final argument. This potentially allows for much more runtime specialization dependent upon the exact expression given. * Custom broadcast implmentations for a `CustomStyle` are defined by specializing `copy(bc::Broadcasted{CustomStyle})` or `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`. * Fallback broadcast specializations for a given output object of type `Dest` (for the `DefaultArrayStyle` or another such style that hasn't implemented assignments into such an object) are defined by specializing `copyto(dest::Dest, bc::Broadcasted{Nothing})`. As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to `.+`, just as had been done for all `AbstractArray`s in general. As a first-mover proof of concept, LinearAlgebra uses this new system to improve broadcasting over structured arrays. Before, broadcasting over a structured matrix would result in a sparse array. Now, broadcasting over a structured matrix will _either_ return an appropriately structured matrix _or_ a dense array. This does incur a type instability (in the form of a discriminated union) in some situations, but thanks to type-based introspection of the `Broadcasted` wrapper commonly used functions can be special cased to be type stable. For example: ```julia julia> f(d) = round.(Int, d) f (generic function with 1 method) julia> @inferred f(Diagonal(rand(3))) 3×3 Diagonal{Int64,Array{Int64,1}}: 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 1 julia> @inferred Diagonal(rand(3)) .* 3 ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}} Stacktrace: [1] error(::String) at ./error.jl:33 [2] top-level scope julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3) 4×4 Tridiagonal{Float64,Array{Float64,1}}: 1.30771 0.838589 ⋅ ⋅ 0.0 3.89109 0.0459757 ⋅ ⋅ 0.0 4.48033 2.51508 ⋅ ⋅ 0.0 6.23739 ``` In addition to the issues referenced above, it fixes: * Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated specially in a fused broadcast; they're just arguments in a `Broadcasted` object like everything else. * Fixes #21094: Since broadcasting is now represented by a pure Julia datastructure it can be created within `@generated` functions and serialized. * Fixes #26097: The fallback destination-array specialization method of `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not be confused by `nothing` arguments. * Fixes the broadcast-specific element of #25499: The default base broadcast implementation no longer depends upon `Base._return_type` to allocate its array (except in the empty or concretely-type cases). Note that the sparse implementation (#19595) is still dependent upon inference and is _not_ fixed. * Fixes #25340: Functions are treated like normal values just like arguments and only evaluated once. * Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was fixed on master already, but this fixes it now, too. * Fixes #25521. * The performance of this patch has been thoroughly tested through its iterative development process in #25377. There remain [two classes of performance regressions](#25377) that Nanosoldier flagged. * #25691: Propagation of constant literals sill lose their constant-ness upon going through the broadcast machinery. I believe quite a large number of functions would need to be marked as `@pure` to support this -- including functions that are intended to be specialized. (For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377) branch as of a1d4e7e. Squashed and separated out to make it easier to review and commit) Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Jameson Nash <vtjnash@gmail.com> Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>

@lower

This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, #23692, #25377 by superceding them). This introduces a pure Julia data structure that represents a fused broadcast expression. For example, the expression `2 .* (x .+ 1)` lowers to: ```julia julia> Meta.@lower 2 .* (x .+ 1) :($(Expr(:thunk, CodeInfo(:(begin Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize) Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1) Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3)) Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4)) return Core.SSAValue(5) end))))) ``` Or, slightly more readably as: ```julia using .Broadcast: materialize, make materialize(make(*, 2, make(+, x, 1))) ``` The `Broadcast.make` function serves two purposes. Its primary purpose is to construct the `Broadcast.Broadcasted` objects that hold onto the function, the tuple of arguments (potentially including nested `Broadcasted` arguments), and sometimes a set of `axes` to include knowledge of the outer shape. The secondary purpose, however, is to allow an "out" for objects that _don't_ want to participate in fusion. For example, if `x` is a range in the above `2 .* (x .+ 1)` expression, it needn't allocate an array and operate elementwise — it can just compute and return a new range. Thus custom structures are able to specialize `Broadcast.make(f, args...)` just as they'd specialize on `f` normally to return an immediate result. `Broadcast.materialize` is identity for everything _except_ `Broadcasted` objects for which it allocates an appropriate result and computes the broadcast. It does two things: it `initialize`s the outermost `Broadcasted` object to compute its axes and then `copy`s it. Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact same expression tree to compute the right-hand side of the expression as above, and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the `Broadcasted` expression tree and then `copyto!` it into the given destination. All-together, this forms a complete API for custom types to extend and customize the behavior of broadcast (fixes #22060). It uses the existing `BroadcastStyle`s throughout to simplify dispatch on many arguments: * Custom types can opt-out of broadcast fusion by specializing `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`. * The `Broadcasted` object computes and stores the type of the combined `BroadcastStyle` of its arguments as its first type parameter, allowing for easy dispatch and specialization. * Custom Broadcast storage is still allocated via `broadcast_similar`, however instead of passing just a function as a first argument, the entire `Broadcasted` object is passed as a final argument. This potentially allows for much more runtime specialization dependent upon the exact expression given. * Custom broadcast implmentations for a `CustomStyle` are defined by specializing `copy(bc::Broadcasted{CustomStyle})` or `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`. * Fallback broadcast specializations for a given output object of type `Dest` (for the `DefaultArrayStyle` or another such style that hasn't implemented assignments into such an object) are defined by specializing `copyto(dest::Dest, bc::Broadcasted{Nothing})`. As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to `.+`, just as had been done for all `AbstractArray`s in general. As a first-mover proof of concept, LinearAlgebra uses this new system to improve broadcasting over structured arrays. Before, broadcasting over a structured matrix would result in a sparse array. Now, broadcasting over a structured matrix will _either_ return an appropriately structured matrix _or_ a dense array. This does incur a type instability (in the form of a discriminated union) in some situations, but thanks to type-based introspection of the `Broadcasted` wrapper commonly used functions can be special cased to be type stable. For example: ```julia julia> f(d) = round.(Int, d) f (generic function with 1 method) julia> @inferred f(Diagonal(rand(3))) 3×3 Diagonal{Int64,Array{Int64,1}}: 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 1 julia> @inferred Diagonal(rand(3)) .* 3 ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}} Stacktrace: [1] error(::String) at ./error.jl:33 [2] top-level scope julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3) 4×4 Tridiagonal{Float64,Array{Float64,1}}: 1.30771 0.838589 ⋅ ⋅ 0.0 3.89109 0.0459757 ⋅ ⋅ 0.0 4.48033 2.51508 ⋅ ⋅ 0.0 6.23739 ``` In addition to the issues referenced above, it fixes: * Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated specially in a fused broadcast; they're just arguments in a `Broadcasted` object like everything else. * Fixes #21094: Since broadcasting is now represented by a pure Julia datastructure it can be created within `@generated` functions and serialized. * Fixes #26097: The fallback destination-array specialization method of `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not be confused by `nothing` arguments. * Fixes the broadcast-specific element of #25499: The default base broadcast implementation no longer depends upon `Base._return_type` to allocate its array (except in the empty or concretely-type cases). Note that the sparse implementation (#19595) is still dependent upon inference and is _not_ fixed. * Fixes #25340: Functions are treated like normal values just like arguments and only evaluated once. * Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was fixed on master already, but this fixes it now, too. * Fixes #25521. * The performance of this patch has been thoroughly tested through its iterative development process in #25377. There remain [two classes of performance regressions](#25377) that Nanosoldier flagged. * #25691: Propagation of constant literals sill lose their constant-ness upon going through the broadcast machinery. I believe quite a large number of functions would need to be marked as `@pure` to support this -- including functions that are intended to be specialized. (For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377) branch as of a1d4e7e. Squashed and separated out to make it easier to review and commit) Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Jameson Nash <vtjnash@gmail.com> Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>

introduce runtime representation of broadcast fusion

535fefa

fix #21094 fix #22060 fix #22053 replaces #22063

ararslan requested review from JeffBezanson and stevengj September 13, 2017 18:42

ararslan added the broadcast Applying a function over a collection label Sep 13, 2017

stevengj reviewed Sep 20, 2017

View reviewed changes

vtjnash mentioned this pull request Sep 25, 2017

Fix inferrability of split #23867

Merged

vchuravy mentioned this pull request Oct 3, 2017

Rework the broadcast API and document it #23939

Merged

2 tasks

c42f mentioned this pull request Nov 13, 2017

UnitRange broacasting with literals gives a Vector #24586

Closed

gasagna mentioned this pull request Dec 10, 2017

generated function body is not pure when using dot notation #21094

Closed

Merge branch 'master' into jn/Fusion

6889aea

# Conflicts: # src/julia-syntax.scm # test/broadcast.jl # test/ranges.jl

timholy mentioned this pull request Dec 29, 2017

what should we use { } for? #8470

Open

timholy mentioned this pull request Jan 3, 2018

RFC: Customizable lazy broadcasting with options for pure-Julia fusion and eager evaluation #25377

Closed

vtjnash mentioned this pull request Jan 4, 2018

nested broadcast should evaluate non-broadcasted part once #25340

Closed

vtjnash commented Apr 19, 2018

View reviewed changes

mbauman mentioned this pull request Apr 24, 2018

Customizable lazy fused broadcasting in pure Julia #26891

Merged

mbauman closed this in #26891 Apr 26, 2018

DilumAluthge deleted the jn/Fusion branch March 25, 2021 22:08

RFC: introduce runtime representation of broadcast fusion #23692

RFC: introduce runtime representation of broadcast fusion #23692

Conversation

vtjnash commented Sep 13, 2017

vchuravy commented Sep 13, 2017

vtjnash commented Sep 13, 2017

andyferris commented Sep 17, 2017 • edited Loading

stevengj commented Sep 20, 2017

stevengj Sep 20, 2017

Choose a reason for hiding this comment

StefanKarpinski Sep 20, 2017

Choose a reason for hiding this comment

TotalVerb Sep 21, 2017

Choose a reason for hiding this comment

mbauman Sep 22, 2017

Choose a reason for hiding this comment

timholy commented Sep 20, 2017

stevengj commented Sep 21, 2017

mbauman commented Sep 22, 2017

stevengj commented Sep 22, 2017 • edited Loading

stevengj commented Sep 22, 2017

mbauman commented Sep 22, 2017 • edited Loading

mbauman commented Sep 22, 2017

stevengj commented Sep 23, 2017 • edited Loading

vtjnash commented Sep 25, 2017

mbauman commented Sep 25, 2017 • edited Loading

timholy commented Dec 17, 2017

code

demo

vtjnash commented Dec 18, 2017

Sacha0 commented Dec 18, 2017

timholy commented Dec 18, 2017

vtjnash commented Dec 18, 2017

StefanKarpinski commented Dec 18, 2017

bramtayl commented Dec 27, 2017

KristofferC commented Dec 27, 2017 • edited Loading

timholy commented Dec 28, 2017

vchuravy commented Dec 28, 2017

stevengj commented Dec 29, 2017 • edited Loading

StefanKarpinski commented Dec 29, 2017

timholy commented Dec 30, 2017

stevengj commented Dec 30, 2017

StefanKarpinski commented Dec 30, 2017

vtjnash commented Dec 31, 2017

andreasnoack commented Jan 2, 2018

StefanKarpinski commented Jan 2, 2018

timholy commented Jan 4, 2018 • edited Loading

stevengj commented Jan 4, 2018 • edited Loading

timholy commented Jan 4, 2018

stevengj commented Jan 4, 2018

timholy commented Jan 4, 2018

stevengj commented Jan 4, 2018 • edited Loading

stevengj commented Jan 4, 2018 • edited Loading

timholy commented Jan 4, 2018

timholy commented Jan 4, 2018

vtjnash Apr 19, 2018

Choose a reason for hiding this comment

andyferris commented Sep 17, 2017 •

edited

Loading

stevengj commented Sep 22, 2017 •

edited

Loading

mbauman commented Sep 22, 2017 •

edited

Loading

stevengj commented Sep 23, 2017 •

edited

Loading

mbauman commented Sep 25, 2017 •

edited

Loading

KristofferC commented Dec 27, 2017 •

edited

Loading

stevengj commented Dec 29, 2017 •

edited

Loading

timholy commented Jan 4, 2018 •

edited

Loading

stevengj commented Jan 4, 2018 •

edited

Loading

stevengj commented Jan 4, 2018 •

edited

Loading

stevengj commented Jan 4, 2018 •

edited

Loading