-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eagerly evaluate ::Zero * ::Any #90
Conversation
Master behavior ```julia julia> @scalar_rule(one(x), Zero()) julia> frule(one, 1, Zero(), [1, 2]) (1, Zero()) julia> frule(one, 1, Zero(), One()) (1, Zero()) ``` Desirable behavior ```julia julia> @scalar_rule(one(x), Zero()) julia> frule(one, 1, Zero(), [1, 2]) (1, [0, 0]) julia> frule(one, 1, Zero(), One()) (1, Thunk(var"#8#10"()) ) ```
We don't want to replace
You can't convert all differentials to their primal types, especially without knowing what the primal is. Also what are your use-case?
|
Your PR undoes #84, And does the opposite of making |
Ok.
This isn't the way to achieve that.
But I don't see that this is desirable in the first place. I think in a few places it is even mathematically nescisary to get the right result. Its also useful because it destroys computations hidden behind thunks. If ForwardDiff2 has issues with it, it should be resolved after calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this reverts several past PRs, #84 and #64
(cf also #56)
I do not think this is the right way to achieve true goals.
I am marking this with "Request Changes."
So we don't merge by mistake.
But I suspect this PR should be replaced with a totally different PR,
once we get to the bottom of the true problem.
When I have |
Yes, ForwardDiff2 has issues with it, and I don't think to give |
JuliaDiff/ChainRules.jl@34a1bbe |
It breaks ForwardDiff2 because (f, args) = (setindex!, (ForwardDiff2.Dual{ForwardDiff2.Tag{Nothing},Int64,Array{Int64,1}}[#undef #undef; #undef #undef], (1 + Zero()ϵ₁), CartesianIndex(1, 1))) cannot be done. I cannot know the length of ERROR: MethodError: Cannot `convert` an object of type Zero to an object of type Array{Int64,1} |
The whole reason that I said no thunks is that I want a reasonable result, not something like |
By "reasonable result", I mean x, dx = frule(one, v, Zero(), ps)
convert(typeof(ps), dx) can be done. |
I've been having issues with Zero too. zero of what? Zero matrix? Zero on the manifold defined by ...? With no type information you can't place what space it is in, so it can't really be used. It lives in the assumption that everything is a simple number, and breaks whenever your space is not easily interpretable from a Float32 or whatnot. One solution would be Zero(x), i.e. give it a space to live in, but why add so much type information when Julia already has zero(x)? This should also participate in constant prop so it's not like Zero is giving some optimizations? |
So called "reasonable results" are not, in general, garentee'd to be defined.
This doesn't have to always work, since the type (and size) of Example:
I assumed you wanted thunks gone because you didn't want results that had computation inside them. I am still following the chunked forward mode discussion on Slack. |
I want thunks gone because I don't want to see them after calling |
Yes, this could well be the case. This is why we can't be doing It the one case we thought we didn't need to know the primal type.
No? How so?
When used as a scalar it does, in that When uses as a differential, it avoids allocation. It not carrying shape information (about the primal space) around with it, is on the assumption that it will be added to something that does have that shape information. |
I'd say let's merge this PR, the computational save of |
To be clear this PR is not the right way to make this change. As I posted above
I suspect the more restrictive:
Might be enough. Which is seperate from if this is a good idea. I would be Ok with merging either of those, as a short-term work around for your issues. It would also be good if you could open an issue describing the chunked way of forward propagating multiple partials at the same time, |
This reverts commit dbe7765.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this, but if it fixes your problem in the short-term and lets us get ForwardDiff2 out the door,
then lets do it.
Main concern is we are losing the strength of our Zero
You get that strength once, then after that it degrades to a regular zero.
Longer term we should work out a better way.
A Zero
coming back should be really valuable information for ForwardDiff2.
Since it means you can stop doing AD, and just run the function regularly.
Since pushforwards are always linear, thus the final sensistivity will also be Zero
.
Master behavior
Desirable behavior