Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit broadcast harmful? #135

Closed
baggepinnen opened this issue Feb 14, 2017 · 12 comments
Closed

Implicit broadcast harmful? #135

baggepinnen opened this issue Feb 14, 2017 · 12 comments

Comments

@baggepinnen
Copy link
Contributor

baggepinnen commented Feb 14, 2017

I'm posting this as an issue but intend it as a suggestion you might consider.

In Julia, operators such as +,- that work elementwise on matrices and vectors do not automatically broadcast when dimensions of the two operands are not the same, for this, the dot-syntax (e.g., .+) must be used. I find this explicit broadcast a nice safety feature.

Tensorflow operations, such as the ones above, use implicit broadcast, i.e., if dimensions do not match, broadcast is considered.

I'm proposing the following: In Tensorflow.jl, the dot syntax automatically calls the corresponding tensorflow operation, but the dot-free syntax throws an error if the sizes of the operands do not match.

Any thoughts on this? :)

@oxinabox
Copy link
Collaborator

You are correct but
.+ and .* etc are all going to break in 0.6 in a way I don't (yet) see how can be fixed.
Because broadcasting will be automatic, due to getting dot-fusing everywhere.
Which will mean that .+ and .* will have their meanings automatically given by the language from the definitions of + and * on the elements of the Arrays.

But as the Tensors are not julia arrays, but representations of points in the computation graph,
I'm not sure that that will work out at all.
Unless some very clever stuff can be done.

It may be that mean that .+ and .* etc will need to be renamed, eg to ⊕ and ⊗ etc.

@malmaud
Copy link
Owner

malmaud commented Feb 14, 2017

I think the best we can really do with the changes in .6 is to define the scalar-form operators (+, -, etc).

Longer-term, I'm thinking of having a macro like @tensorflow where all the code in its block is interpreted in a special way appropriate for making a computation graph, which can include special non-standard semantics for the scalar and broadcasting operators on tensors.

@oxinabox
Copy link
Collaborator

cf FluxML/Flux.jl#31

@stevengj
Copy link

Why couldn't you solve this by defining broadcast(f::Function, ...) operations on TensorFlow containers, or mixtures of those containers and scalars?

@oxinabox
Copy link
Collaborator

So rather the defining

  • Base.broadcast(::typeof(*), n1::AbstractTensor, n2::AbstractTensor) = multiply(n1, n2)
    Base.broadcast(::typeof(*), n1::AbstractTensor, n2) = multiply(n1, tf_promote(n1, n2))
    Base.broadcast(::typeof(*), n1, n2::AbstractTensor) = multiply(tf_promote(n2, n1), n2)
  • @eval Base.broadcast(::typeof($sym), t1::AbstractTensor, t2::AbstractTensor) = $func(t1, t2)
    @eval Base.broadcast(::typeof($sym), t1::AbstractTensor, t2) = $func(t1, Tensor(t2))
    @eval Base.broadcast(::typeof($sym), t1, t2::AbstractTensor) = $func(Tensor(t1), t2)

just define broadcast once for all functions.

as

Base.broadcast(func::Function, t1::AbstractTensor, ts...; kwargs...) = func(t1, Tensor.(ts)...; kwargs...)
Base.broadcast(func::Function, t1::AbstractTensor, t2::AbstractTensor, ts...; kwargs...) = func(t1, t2, Tensor.(ts)...; kwargs...)
Base.broadcast(func::Function, t1, t2::AbstractTensor, ts...; kwargs...) = func(Tensor(t1), t2, Tensor.(ts)...; kwargs...)

@malmaud
Copy link
Owner

malmaud commented May 25, 2017

I'm not sure that's quite what we want...we need x*y to call Ops.matmul(x,y) and x.*y to call Ops.mul(x,y).

@stevengj
Copy link

stevengj commented May 25, 2017

Can't you define a generic broadcast by defining a custom op?

Basically, .* means fusing broadcast now in Julia — when the user writes @. x += x^2 - 3*y + f(y) they get a syntactic guarantee that it is a single broadcast! call — so if you want something different you should use a different operator (e.g. ) or function name.

@malmaud
Copy link
Owner

malmaud commented May 25, 2017

Not quite sure I see how a custom TensorFlow op helps with the situation - there is already one TensorFlow op for matrix multiplication and a TensorFlow op for element-wise multiplication, it's just a matter of the syntax for exposing them to users.

Users probably expect x.*y to translate to element-wise multiplication whether x happens to a Julia Array or a TensorFlow Tensor, but if it's not possible, we'll just have to educate users about the situation with a note in the docs. It was a design goal of TensorFlow.jl to, as much as possible, let users be agnostic of the distinction between arrays and tensors, so it just feels a little unfortunate.

@oxinabox
Copy link
Collaborator

It is going to break this code

Which is sad, because that is some beautiful duck-typing,
where the function can be passed, t and u as Arrays and it will compute the function directly.
Or the function can be passed Tensors, and it will return a node in the computational graph, that will be executed when you call run(sess, ...)

Being able to duck-type Tensors as Arrays is one of the most wonderful features.
Because functions can be written without thinking of Tensorflow at all, and then just work with it.

@malmaud
Copy link
Owner

malmaud commented May 25, 2017

I guess a workaround for now is to introduce an operator like ⊛ that is element-wise multiplication for both tensors and arrays and use that in TensorFlowDiffEq.

@stevengj
Copy link

stevengj commented May 25, 2017

Not quite sure I see how a custom TensorFlow op helps with the situation

Because then you can define broadcast(f::Function, ...) on Tensors and it will work for any f (since f will translate into a custom op that calls back to Julia). Then you will get all of your "dot ops" for free, and @. (x + 3x^2) * y will become a single fused elementwise operation. (And it will work, and fuse, for both Tensors and Arrays.)

@oxinabox
Copy link
Collaborator

Closing as non-actionable/already fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants