Can't use broadcasting on non-primitives #3

dfdx · 2018-07-26T20:49:31Z

Say, we have a function logistic(::Real) and no wrapper that would write it to the tape (like in scalar.jl). If we broadcast it on TArray, it will be written to the tape as:

record!(tape, Bcast, logistic, (x,))

During differentiation of Bcast we run the function in question on the first element of TArray, but since logistic(x[1]) doesn't record to minitape function logistic, but instead the list of underlying operations (e.g. 5 underlying operations), ops on minitape won't be correctly mapped back to ops on tape itself and differentiation will fail.

Possible solutions are:

Forbid broadcasting on non-primitives. This actually may be fine for closed system that Yota targets to be right now, but most likely will cause annoyance for broader audience.
Push broadcast through the function. For example, we can convert TArray{<:Real} to Array{TReal}, run broadcasting and then assemble TArray back. This, however, sounds quite fragile.
Write operations to a minitape and then rewrite calls to corresponding broadcasting on the main tape. The disadvantage is that we will execute these ops twice - for first element and the whole tensor - which is undesirable for dynamic graphs.

I'm going to start with (1), but leave this issue open for a while.

The text was updated successfully, but these errors were encountered:

jrevels · 2018-08-13T09:47:55Z

I came up with a technique to solve this for ReverseDiff where we replace the broadcasted op with a forward-mode AD'd version, then cache those intermediary derivatives for use in the backwards pass. Some of us are writing a paper on the technique now, but until that's ready, here's a prototype implementation that gets good performance on the GPU: https://github.com/jrevels/MixedModeBroadcastAD.jl

(cool package, btw 🙂)

dfdx · 2018-08-13T14:24:14Z

I would love to read the paper! Fortunately, for me it's not much an issue at the moment, so I have some time before it gets critical :)

In theory, I could detect broadcasting on non-primitives in advance and do during forward pass the same trick as in reverse pass - call a function on first elements of arrays, writing ops to a "minitape", and then rewriting minitape to the main tape for arrays. But it doesn't sound very robust, so mixed-mode broadcasting can be a way to go.

gradient for mea() and sum() with keywords

dfdx · 2021-07-03T20:14:05Z

Many features including broadcasting are now handled by ChainRules, so closing this issue as outdated.

dfdx added a commit that referenced this issue Jan 20, 2019

Merge pull request #3 from dfdx/sum-mean-grad

425b1ab

gradient for mea() and sum() with keywords

mcabbott mentioned this issue Apr 25, 2020

Errors when broadcasting #58

Closed

dfdx closed this as completed Jul 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use broadcasting on non-primitives #3

Can't use broadcasting on non-primitives #3

dfdx commented Jul 26, 2018 •

edited

Loading

jrevels commented Aug 13, 2018

dfdx commented Aug 13, 2018

dfdx commented Jul 3, 2021

Can't use broadcasting on non-primitives #3

Can't use broadcasting on non-primitives #3

Comments

dfdx commented Jul 26, 2018 • edited Loading

jrevels commented Aug 13, 2018

dfdx commented Aug 13, 2018

dfdx commented Jul 3, 2021

dfdx commented Jul 26, 2018 •

edited

Loading