Eagerly evaluate ::Zero * ::Any #90

YingboMa · 2020-01-11T22:25:14Z

Master behavior

julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, Zero())

julia> frule(one, 1, Zero(), One())
(1, Zero())

Desirable behavior

julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, [0, 0])

julia> frule(one, 1, Zero(), One())
(1, Thunk(var"#8#10"())
)

Master behavior ```julia julia> @scalar_rule(one(x), Zero()) julia> frule(one, 1, Zero(), [1, 2]) (1, Zero()) julia> frule(one, 1, Zero(), One()) (1, Zero()) ``` Desirable behavior ```julia julia> @scalar_rule(one(x), Zero()) julia> frule(one, 1, Zero(), [1, 2]) (1, [0, 0]) julia> frule(one, 1, Zero(), One()) (1, Thunk(var"#8#10"()) ) ```

Ref: JuliaDiff/ChainRulesCore.jl#90

oxinabox · 2020-01-12T16:56:45Z

We don't want to replace unthunk with extern.
See #64

extern is not a well defined operation.
Its been pending for a while to remove it, or at least drop it down to being a a "provided for testing / debugging purposes only".

You can't convert all differentials to their primal types, especially without knowing what the primal is.

Also what are your use-case?
On subtypes of AbstractThunk extern is just recursive unthunk.
And in the context of addition (quote #64)

even in double thunking cases:
1+@thunk(@thunk(2)) becomes 1 + @thunk(2) becomes 1+2
so addition is still fine.

oxinabox · 2020-01-12T16:58:41Z

Your PR undoes #84,
which is something you argued for in the first place.

And does the opposite of making frules eagerly evaluated...

oxinabox · 2020-01-12T17:13:57Z

Ok.
Theory on what you wanted:

You wanted to get back 0 from the frule, rather than Zero()
By bringing thunking back the @scalar_rule then externing during multiplication, the recursive externing is means extern(@thunk(Zero()) becomes extern(Zero()) becomes false
so 1 * @thunk(Zero()) becomes 1*false becomes 0

This isn't the way to achieve that.
The way to achieve that would be more like:

Base.*(x::Number, y::Zero) = zero(x)
Base.*(y::Zero, x::Number) where T<: Number = zero(x)

But I don't see that this is desirable in the first place.
Zero() is a nice differential.

I think in a few places it is even mathematically nescisary to get the right result.
Because it is a strong zero that destroys NaN that could show up from branch's not taken (in ifelse like senarios).
(Cross-ref: "The double where trick" in TensorFlow tensorflow/tensorflow#30199 (comment))
Though that probably doesn't occur in forward mode.

Its also useful because it destroys computations hidden behind thunks.
Though I don't know that that can matter when it is returnned from a propagator,
Since one doesn't normally multiply derivitives by each other.

If ForwardDiff2 has issues with it, it should be resolved after calling frule.
Though I would be interested in knowing what those problems are.
My guess would be #53 related.

oxinabox

Given this reverts several past PRs, #84 and #64
(cf also #56)
I do not think this is the right way to achieve true goals.

I am marking this with "Request Changes."
So we don't merge by mistake.
But I suspect this PR should be replaced with a totally different PR,
once we get to the bottom of the true problem.

YingboMa · 2020-01-12T17:40:31Z

When I have x, dx = frule(f, value(dual), Zero(), partials(dual)), I'd like dx to be something like partials(dual) but not Zero().

YingboMa · 2020-01-12T17:44:03Z

If ForwardDiff2 has issues with it, it should be resolved after calling frule.

Yes, ForwardDiff2 has issues with it, and I don't think to give Zero() is good, since, for chunk mode AD, I need to know the length of partials(dual), giving Zero() is not helpful.

YingboMa · 2020-01-12T17:44:26Z

JuliaDiff/ChainRules.jl@34a1bbe
is somewhat related.

YingboMa · 2020-01-12T17:57:00Z

It breaks ForwardDiff2 because

(f, args) = (setindex!, (ForwardDiff2.Dual{ForwardDiff2.Tag{Nothing},Int64,Array{Int64,1}}[#undef #undef; #undef #undef], (1 + Zero()ϵ₁), CartesianIndex(1, 1)))

cannot be done. I cannot know the length of partials from the type, so it fails with

ERROR: MethodError: Cannot `convert` an object of type Zero to an object of type Array{Int64,1}

YingboMa · 2020-01-12T17:59:55Z

which is something you argued for in the first place.

The whole reason that I said no thunks is that I want a reasonable result, not something like Zero or Thunk.

YingboMa · 2020-01-12T18:19:34Z

By "reasonable result", I mean

x, dx = frule(one, v, Zero(), ps)
convert(typeof(ps), dx)

can be done.

ChrisRackauckas · 2020-01-12T18:45:35Z

I've been having issues with Zero too. zero of what? Zero matrix? Zero on the manifold defined by ...? With no type information you can't place what space it is in, so it can't really be used. It lives in the assumption that everything is a simple number, and breaks whenever your space is not easily interpretable from a Float32 or whatnot. One solution would be Zero(x), i.e. give it a space to live in, but why add so much type information when Julia already has zero(x)? This should also participate in constant prop so it's not like Zero is giving some optimizations?

oxinabox · 2020-01-12T18:55:20Z

no thunks is that I want a reasonable result, not something like Zero or Thunk.

So called "reasonable results" are not, in general, garentee'd to be defined.

x, dx = frule(one, v, Zero(), ps)
convert(typeof(ps), dx)

This doesn't have to always work,

since the type (and size) of ps and ds don't have to match.

Example:


julia> dup(x) = (x, x)
dup (generic function with 1 method)

julia> function frule(::typeof(dup), x, dself, dx)
       y = dup(x)
       dy = Composite{typeof(y)}((dx, dx))
       return y, dy
       end

frule (generic function with 1 method)

julia> frule(dup, 5.0, NO_FIELDS, 1.0)
((5.0, 5.0), Composite{Tuple{Float64,Float64}}((1.0, 1.0),))

julia> y, dy = frule(dup, 5.0, NO_FIELDS, 1.0)
((5.0, 5.0), Composite{Tuple{Float64,Float64}}((1.0, 1.0),))

I assumed you wanted thunks gone because you didn't want results that had computation inside them.
And removed them from these cases as they are actually redundant in the first place since creating and removing the Thunk is potentially more work than evaluating the thing, for scalars.
(esp ones with single output, since we know that output will be used, so no point thunking it, only to unthunk it)

I am still following the chunked forward mode discussion on Slack.
But wanted to post this before it is lost.

YingboMa · 2020-01-12T18:58:28Z

I assumed you wanted thunks gone because you didn't want results that had computation inside them.

I want thunks gone because I don't want to see them after calling frule, just as much as I don't want to see Zero coming out of frule.

oxinabox · 2020-01-12T19:06:43Z

I've been having issues with Zero too. zero of what? Zero matrix? Zero on the manifold defined by ...? With no type information you can't place what space it is in, so it can't really be used.

Yes, this could well be the case. This is why we can't be doing extern.
(and why it was removed earlier #64, which this PR reverts.)

It the one case we thought we didn't need to know the primal type.
Because we know the result of the core operation +,
which is that it stays the same.

It lives in the assumption that everything is a simple number, and breaks whenever your space is not easily interpretable from a Float32 or whatnot.

No? How so?

One solution would be Zero(x), i.e. give it a space to live in, but why add so much type information when Julia already has zero(x)? This should also participate in constant prop so it's not like Zero is giving some optimizations?

When used as a scalar it does, in that zero(x) * Thunk(...) does not constant prop out.

When uses as a differential, it avoids allocation.
Much like FillArrays but more general.

It not carrying shape information (about the primal space) around with it, is on the assumption that it will be added to something that does have that shape information.

YingboMa · 2020-01-12T19:12:00Z

I'd say let's merge this PR, the computational save of Zero() is minimal, but it breaks ForwardDiff2.

oxinabox · 2020-01-12T19:26:27Z

I'd say let's merge this PR, the computational save of Zero() is minimal, but it breaks ForwardDiff2.

To be clear this PR is not the right way to make this change.
You want Zero gone. but instead of doing anything with Zero you've done things with Thunks

As I posted above
The better way to do this change is to redefine multiplication with Zero

Base.*(x, y::Zero) = zero(x)
Base.*(y::Zero, x) = zero(x)

I suspect the more restrictive:

Base.:*(x::AbstractArray{T}, ::Zero) where T = FillArrays.Zeros{T}(size(x)...)
Base.:*(::Zero, x::AbstractArray{T}) where T = FillArrays.Zeros{T}(size(x)...)

Might be enough.

Which is seperate from if this is a good idea.
But either would be much preferred to implementation proposed in this PR.

I would be Ok with merging either of those, as a short-term work around for your issues.

It would also be good if you could open an issue describing the chunked way of forward propagating multiple partials at the same time,
so we can make sure to support it, since it was not something I'ld considered.

This reverts commit dbe7765.

oxinabox

I don't love this, but if it fixes your problem in the short-term and lets us get ForwardDiff2 out the door,
then lets do it.

Main concern is we are losing the strength of our Zero
You get that strength once, then after that it degrades to a regular zero.

Longer term we should work out a better way.

A Zero coming back should be really valuable information for ForwardDiff2.
Since it means you can stop doing AD, and just run the function regularly.
Since pushforwards are always linear, thus the final sensistivity will also be Zero.

src/differential_arithmetic.jl

YingboMa added 2 commits January 11, 2020 17:23

New release

9abe813

YingboMa requested a review from oxinabox January 11, 2020 22:25

Add tests

cf2bc6e

YingboMa mentioned this pull request Jan 11, 2020

Fuse frule YingboMa/ForwardDiff2.jl#16

Merged

YingboMa added a commit to JuliaDiff/ChainRules.jl that referenced this pull request Jan 11, 2020

Remove the .+ 0 hack

34a1bbe

Ref: JuliaDiff/ChainRulesCore.jl#90

YingboMa mentioned this pull request Jan 11, 2020

Update to ChainRulesCore 0.5.1 JuliaDiff/ChainRules.jl#146

Merged

oxinabox requested changes Jan 12, 2020

View reviewed changes

Revert "Eagerly evaluate scalers rules"

d329ed8

This reverts commit dbe7765.

YingboMa changed the title ~~Eagerly evaluate scalar rules~~ Eagerly evaluate ::Zero * ::Any Jan 12, 2020

Redefine * between ::Zero and ::Any

0a66a95

YingboMa force-pushed the myb/extern branch from 848fd7e to 0a66a95 Compare January 12, 2020 20:28

oxinabox approved these changes Jan 12, 2020

View reviewed changes

oxinabox reviewed Jan 12, 2020

View reviewed changes

src/differential_arithmetic.jl Outdated Show resolved Hide resolved

Make it nicer

6a14259

oxinabox reviewed Jan 12, 2020

View reviewed changes

src/differential_arithmetic.jl Outdated Show resolved Hide resolved

Add tests and move zero(::AbstractDifferential) to the right folder

df2eee0

oxinabox approved these changes Jan 12, 2020

View reviewed changes

YingboMa merged commit de61cd3 into master Jan 12, 2020

YingboMa deleted the myb/extern branch January 12, 2020 20:57

This was referenced Jan 12, 2020

Early stopping YingboMa/ForwardDiff2.jl#18

Closed

See if we can make Zero stronger again #91

Closed

Support chunked frule #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eagerly evaluate ::Zero * ::Any #90

Eagerly evaluate ::Zero * ::Any #90

YingboMa commented Jan 11, 2020

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox left a comment •

edited

Loading

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

ChrisRackauckas commented Jan 12, 2020

oxinabox commented Jan 12, 2020 •

edited

Loading

YingboMa commented Jan 12, 2020

oxinabox commented Jan 12, 2020

YingboMa commented Jan 12, 2020

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox left a comment

Eagerly evaluate ::Zero * ::Any #90

Eagerly evaluate ::Zero * ::Any #90

Conversation

YingboMa commented Jan 11, 2020

oxinabox commented Jan 12, 2020 • edited Loading

oxinabox commented Jan 12, 2020 • edited Loading

oxinabox commented Jan 12, 2020 • edited Loading

oxinabox left a comment • edited Loading

Choose a reason for hiding this comment

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

YingboMa commented Jan 12, 2020

ChrisRackauckas commented Jan 12, 2020

oxinabox commented Jan 12, 2020 • edited Loading

YingboMa commented Jan 12, 2020

oxinabox commented Jan 12, 2020

YingboMa commented Jan 12, 2020

oxinabox commented Jan 12, 2020 • edited Loading

oxinabox left a comment

Choose a reason for hiding this comment

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox left a comment •

edited

Loading

oxinabox commented Jan 12, 2020 •

edited

Loading

oxinabox commented Jan 12, 2020 •

edited

Loading