Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly evaluate ::Zero * ::Any #90

Merged
merged 7 commits into from
Jan 12, 2020
Merged

Eagerly evaluate ::Zero * ::Any #90

merged 7 commits into from
Jan 12, 2020

Conversation

YingboMa
Copy link
Member

Master behavior

julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, Zero())

julia> frule(one, 1, Zero(), One())
(1, Zero())

Desirable behavior

julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, [0, 0])

julia> frule(one, 1, Zero(), One())
(1, Thunk(var"#8#10"())
)

Master behavior
```julia
julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, Zero())

julia> frule(one, 1, Zero(), One())
(1, Zero())
```

Desirable behavior
```julia
julia> @scalar_rule(one(x), Zero())

julia> frule(one, 1, Zero(), [1, 2])
(1, [0, 0])

julia> frule(one, 1, Zero(), One())
(1, Thunk(var"#8#10"())
)
```
YingboMa added a commit to JuliaDiff/ChainRules.jl that referenced this pull request Jan 11, 2020
@oxinabox
Copy link
Member

oxinabox commented Jan 12, 2020

We don't want to replace unthunk with extern.
See #64

extern is not a well defined operation.
Its been pending for a while to remove it, or at least drop it down to being a a "provided for testing / debugging purposes only".

You can't convert all differentials to their primal types, especially without knowing what the primal is.

Also what are your use-case?
On subtypes of AbstractThunk extern is just recursive unthunk.
And in the context of addition (quote #64)

even in double thunking cases:
1+@thunk(@thunk(2)) becomes 1 + @thunk(2) becomes 1+2
so addition is still fine.

@oxinabox
Copy link
Member

oxinabox commented Jan 12, 2020

Your PR undoes #84,
which is something you argued for in the first place.

And does the opposite of making frules eagerly evaluated...

@oxinabox
Copy link
Member

oxinabox commented Jan 12, 2020

Ok.
Theory on what you wanted:

  • You wanted to get back 0 from the frule, rather than Zero()
  • By bringing thunking back the @scalar_rule then externing during multiplication, the recursive externing is means extern(@thunk(Zero()) becomes extern(Zero()) becomes false
  • so 1 * @thunk(Zero()) becomes 1*false becomes 0

This isn't the way to achieve that.
The way to achieve that would be more like:

Base.*(x::Number, y::Zero) = zero(x)
Base.*(y::Zero, x::Number) where T<: Number = zero(x)

But I don't see that this is desirable in the first place.
Zero() is a nice differential.

I think in a few places it is even mathematically nescisary to get the right result.
Because it is a strong zero that destroys NaN that could show up from branch's not taken (in ifelse like senarios).
(Cross-ref: "The double where trick" in TensorFlow tensorflow/tensorflow#30199 (comment))
Though that probably doesn't occur in forward mode.

Its also useful because it destroys computations hidden behind thunks.
Though I don't know that that can matter when it is returnned from a propagator,
Since one doesn't normally multiply derivitives by each other.

If ForwardDiff2 has issues with it, it should be resolved after calling frule.
Though I would be interested in knowing what those problems are.
My guess would be #53 related.

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this reverts several past PRs, #84 and #64
(cf also #56)
I do not think this is the right way to achieve true goals.

I am marking this with "Request Changes."
So we don't merge by mistake.
But I suspect this PR should be replaced with a totally different PR,
once we get to the bottom of the true problem.

@YingboMa
Copy link
Member Author

When I have x, dx = frule(f, value(dual), Zero(), partials(dual)), I'd like dx to be something like partials(dual) but not Zero().

@YingboMa
Copy link
Member Author

If ForwardDiff2 has issues with it, it should be resolved after calling frule.

Yes, ForwardDiff2 has issues with it, and I don't think to give Zero() is good, since, for chunk mode AD, I need to know the length of partials(dual), giving Zero() is not helpful.

@YingboMa
Copy link
Member Author

JuliaDiff/ChainRules.jl@34a1bbe
is somewhat related.

@YingboMa
Copy link
Member Author

It breaks ForwardDiff2 because

(f, args) = (setindex!, (ForwardDiff2.Dual{ForwardDiff2.Tag{Nothing},Int64,Array{Int64,1}}[#undef #undef; #undef #undef], (1 + Zero()ϵ₁), CartesianIndex(1, 1)))

cannot be done. I cannot know the length of partials from the type, so it fails with

ERROR: MethodError: Cannot `convert` an object of type Zero to an object of type Array{Int64,1}

@YingboMa
Copy link
Member Author

which is something you argued for in the first place.

The whole reason that I said no thunks is that I want a reasonable result, not something like Zero or Thunk.

@YingboMa
Copy link
Member Author

By "reasonable result", I mean

x, dx = frule(one, v, Zero(), ps)
convert(typeof(ps), dx)

can be done.

@ChrisRackauckas
Copy link
Member

I've been having issues with Zero too. zero of what? Zero matrix? Zero on the manifold defined by ...? With no type information you can't place what space it is in, so it can't really be used. It lives in the assumption that everything is a simple number, and breaks whenever your space is not easily interpretable from a Float32 or whatnot. One solution would be Zero(x), i.e. give it a space to live in, but why add so much type information when Julia already has zero(x)? This should also participate in constant prop so it's not like Zero is giving some optimizations?

@oxinabox
Copy link
Member

oxinabox commented Jan 12, 2020

no thunks is that I want a reasonable result, not something like Zero or Thunk.

So called "reasonable results" are not, in general, garentee'd to be defined.

x, dx = frule(one, v, Zero(), ps)
convert(typeof(ps), dx)

This doesn't have to always work,

since the type (and size) of ps and ds don't have to match.

Example:


julia> dup(x) = (x, x)
dup (generic function with 1 method)

julia> function frule(::typeof(dup), x, dself, dx)
       y = dup(x)
       dy = Composite{typeof(y)}((dx, dx))
       return y, dy
       end

frule (generic function with 1 method)

julia> frule(dup, 5.0, NO_FIELDS, 1.0)
((5.0, 5.0), Composite{Tuple{Float64,Float64}}((1.0, 1.0),))

julia> y, dy = frule(dup, 5.0, NO_FIELDS, 1.0)
((5.0, 5.0), Composite{Tuple{Float64,Float64}}((1.0, 1.0),))

I assumed you wanted thunks gone because you didn't want results that had computation inside them.
And removed them from these cases as they are actually redundant in the first place since creating and removing the Thunk is potentially more work than evaluating the thing, for scalars.
(esp ones with single output, since we know that output will be used, so no point thunking it, only to unthunk it)


I am still following the chunked forward mode discussion on Slack.
But wanted to post this before it is lost.

@YingboMa
Copy link
Member Author

I assumed you wanted thunks gone because you didn't want results that had computation inside them.

I want thunks gone because I don't want to see them after calling frule, just as much as I don't want to see Zero coming out of frule.

@oxinabox
Copy link
Member

I've been having issues with Zero too. zero of what? Zero matrix? Zero on the manifold defined by ...? With no type information you can't place what space it is in, so it can't really be used.

Yes, this could well be the case. This is why we can't be doing extern.
(and why it was removed earlier #64, which this PR reverts.)

It the one case we thought we didn't need to know the primal type.
Because we know the result of the core operation +,
which is that it stays the same.

It lives in the assumption that everything is a simple number, and breaks whenever your space is not easily interpretable from a Float32 or whatnot.

No? How so?

One solution would be Zero(x), i.e. give it a space to live in, but why add so much type information when Julia already has zero(x)? This should also participate in constant prop so it's not like Zero is giving some optimizations?

When used as a scalar it does, in that zero(x) * Thunk(...) does not constant prop out.

When uses as a differential, it avoids allocation.
Much like FillArrays but more general.

It not carrying shape information (about the primal space) around with it, is on the assumption that it will be added to something that does have that shape information.

@YingboMa
Copy link
Member Author

I'd say let's merge this PR, the computational save of Zero() is minimal, but it breaks ForwardDiff2.

@oxinabox
Copy link
Member

oxinabox commented Jan 12, 2020

I'd say let's merge this PR, the computational save of Zero() is minimal, but it breaks ForwardDiff2.

To be clear this PR is not the right way to make this change.
You want Zero gone. but instead of doing anything with Zero you've done things with Thunks

As I posted above
The better way to do this change is to redefine multiplication with Zero

Base.*(x, y::Zero) = zero(x)
Base.*(y::Zero, x) = zero(x)

I suspect the more restrictive:

Base.:*(x::AbstractArray{T}, ::Zero) where T = FillArrays.Zeros{T}(size(x)...)
Base.:*(::Zero, x::AbstractArray{T}) where T = FillArrays.Zeros{T}(size(x)...)

Might be enough.

Which is seperate from if this is a good idea.
But either would be much preferred to implementation proposed in this PR.

I would be Ok with merging either of those, as a short-term work around for your issues.

It would also be good if you could open an issue describing the chunked way of forward propagating multiple partials at the same time,
so we can make sure to support it, since it was not something I'ld considered.

@YingboMa YingboMa changed the title Eagerly evaluate scalar rules Eagerly evaluate ::Zero * ::Any Jan 12, 2020
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this, but if it fixes your problem in the short-term and lets us get ForwardDiff2 out the door,
then lets do it.

Main concern is we are losing the strength of our Zero
You get that strength once, then after that it degrades to a regular zero.

Longer term we should work out a better way.

A Zero coming back should be really valuable information for ForwardDiff2.
Since it means you can stop doing AD, and just run the function regularly.
Since pushforwards are always linear, thus the final sensistivity will also be Zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants