-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perturbation Confusion (Nested Differentiation Bug) #83
Comments
I wonder what a tagging system would look like, an extra integer field for every forward diff number? |
Possibly. I'm curious about the performance burden a tagging system might impose on non-nested differentiation. Correctness is the first priority, of course, but it would be worth it to figure out if we could resolve this issue without incurring a performance penalty on already correct cases. Maybe there would be a way to lift some of the work into the compilation phase? I'm not really sure yet. |
So, after some discussion with @mlubin, I think we might've figured out a semi-reasonable strategy for solving this issue. TL;DR: The v0.5 closure rewrite (JuliaLang/julia#13412) might allow us to reach in and strip out any problematic closed-over perturbations before the closure gets called. Essentially, the problem is that perturbations attached to arguments of the target function could leak into inner functions that close over these arguments. When a differentiation operation on the closure introduces a separate set of perturbations, these new perturbations are vulnerable to erroneous interaction with the closed-over perturbations. Most forward-mode implementations solve this problem by assigning matching tags to differentiation operations and their associated perturbations, so that the system can prevent interactions between perturbations that belong to separate differentiation operations. Then, when gathering results, each differentiation operation extracts only the perturbations that share its tags. I've been playing around with implementing a tagging system, but every idea I've come up with would either cause big performance regressions (mostly due to type certainty/stability or excessive code generation), or force us to sacrifice some of ForwardDiff's unique features (like the ability to use tunable, stack-allocated perturbation containers). However, we may not need to utilize a tagging system if we can simply remove the closed-over perturbations before the closure ever gets called by the differentiation operation. JuliaLang/julia#13412 could make intercepting and stripping external perturbations possible, since the closed-over data is explicitly stored as a field in the function type. This would allow us to recursively remove perturbations from the child elements of the closed-over data. With this strategy, for example, the following should work: function f(x::Number)
foo = [Foo(x), Foo(x + 1)]
g(y) = dot(foo, y)
# ForwardDiff.gradient would create and then operate on a
# new instance of g where foo's elements are stripped of
# problematic perturbations
v = ForwardDiff.gradient(g, [2.0, 3.0])
return sum(v)
end
ForwardDiff.derivative(f, 1) We'd have to be careful not to perform unnecessary copies, but there's no reason this couldn't be accomplished in theory. The main restriction we'd have to expose to the user is that closures could not rely on mutating closed-over data. For example, this code wouldn't work properly: function f(x::Number)
foo = [x]
g(y) = (foo[1] += y; return foo[1]) # the behavior of f relies on g mutating foo
r = zero(x)
for i in 1:10
r += ForwardDiff.derivative(g, i)
end
return r
end
ForwardDiff.derivative(f, 1) If anybody has any thoughts on this kind of strategy (and potential limitations), I'd be interested to hear them. |
That would be quite bad for me, at least for how I am doing things now. In my case, I solve a set of nonlinear equations where the gradient at a point depend on results in computing the residual. In other words the gradient and residual equation need to "talk" to each other. The way I do this is by making a closure over a parameter Maybe there is some other way for me to solve it but it feels like a quite significant limitation.. |
Technically, the restriction is more specific than my earlier statement - the proposed mutation restriction would only be on closed-over objects that sneak external perturbations into a closure that is being differentiated. In other words, that the proposed restriction is actually less restrictive than the restriction already imposed by this bug. If the code you're talking about currently relies on these objects entering the closure scope at all, then it's quite likely that the code already suffers from perturbation confusion. Can you link to the code you're talking about? |
Your last post clarified things. There are no perturbations in the data being shared, they can be seen pretty much as constants when the gradient is being evsluated. So it should be fine then . |
I was wondering, would you be able to use type parameterization of the dual numbers to implement a light weight tagging system? Specifically a third type parameter could be added to dual numbers that takes any tag. The forward differentiation operator then maps to zero any dual numbers where the tag type parameter does not match. By default the tag parameter parameter could be set to zero, so that differentiation treats all variables inside a function as the same variable (which it currently does). The tag parameter could then be used to specify that the two dual numbers with different types represent two different types of duals to calculate the derivative by. |
@aaronsheldon If you're talking about removing the erroneous perturbations before executing the target function, you don't need a tagging system. See my comment here. If you want to remove the erroneous perturbations during execution of the target function, there are some tough decisions you'd have to make. I've played around with a couple different tagging implementations but none of them were fully satisfactory. I would love it if somebody came up with a good one. For example, you have to have some mechanism for choosing a tag within elementary function definitions. Take this simple function: Base.:+{tag1,tag2}(a::Dual{tag1}, b::Dual{tag2}) = # ? These tags could be any valid type parameter - function types, integers, symbols, etc. But which tag do you give preference to in this operation? In other words, which of these My solution for this was a tagging system where each tag is an integer corresponding to the "closure level" of the dual number. Dual numbers in non-closure scope have tag The big problem with this approach is that we'd have to make basically every non-unary function on dual numbers a |
Okay, I've settled on what I think is a feasible "solution" to this issue, and is way less hacky than closure interception (which seems like it would be very tricky/fragile if possible at all). My proposal is for ForwardDiff to error-out explicitly rather than fail silently. To accomplish this, we can add an extra The key question is how to generate the tags. We want a unique compile-time tag for every differentiation operation, but we don't want to recompile for every call. We can't just use the target function's type as the tag, since we'd like recursively differentiated functions to work. Thus, I believe the best option is to generate a unique tag at parse-time. In other words, we'll have to change the current API functions to macros (which is something I've played around with in the past anyway). |
While it may be necessary to fix this issue I typically find that macros feel a lot more obscure to use than normal functions. Could there also be an option to perhaps keep the functional form and have the tag be an optional argument or something (and leave it up to the user to ensure uniqueness). |
Definitely. That was basically my plan for the functions underlying the "safe API". For example, I was thinking something like immutable gradient{tag} end
macro gradient(args...)
return :(gradient{$(Expr(:quote, gensym()))}($(args...)))
end
function (::Type{gradient{tag}}){tag}(args...)
# basically has the same definition that `gradient` has now, but takes the tag into account
end The tricky part will be ensuring the EDIT: Argh, this doesn't seem to work ( julia> immutable Bar{tag} end
julia> immutable Foo{tag}
bars::Vector{Bar{tag}}
end
julia> f = Foo([Bar{1}(), Bar{1}()])
Foo{1}(Bar{1}[Bar{1}(),Bar{1}()])
julia> reinterpret(Foo{2}, f)
ERROR: reinterpret: target type not a leaf bitstype
in reinterpret(::Type{Foo{2}}, ::Foo{1}) at ./essentials.jl:86 Guess this will be more work than I thought... EDIT2: Note that this will all still work out, I'll just have to define explicit tag conversion/reinterpretation methods for |
Hmm...just realized one could pretty trivially break the parse-time-tag approach by defining This solution is definitely incomplete, though - users generally won't have enough contextual information to properly generate unique tags. However, at this point, I'm not sure it's possible to correctly generate disjoint compile-time tags without either |
Should the readme be updated to reflect the fact that this bug is closed? |
Nested differentiation should not be trusted until this issue is resolved. Not all nested calculations will be affected, but I think it's better to play it safe. Non-nested differentiation should be unaffected.
EDIT:
This bug is restricted to cases where the inner function closes over the argument at which an outer function is being differentiated.
Here are some examples:
I discovered today that ForwardDiff suffers from perturbation confusion.
I noticed this when I was looking at the test suite earlier today and realized we had a test missing. The missing test should've been verifying that the following bug didn't occur:
I thought I had committed this test a while ago (at the same time that I wrote the other nested differentiation tests), but apparently I screwed up somewhere, because I can't find it in the commit history.
I won't have time to really dig into this for a week or two, and a fix might require significant changes to the package (likely implementing a tagging system). I won't be sure of the best path forward until I do some thorough testing/code review. I'll go ahead and update the docs/README to make it clear that this issue exists.
@mlubin I bet this is also an issue for a lot of nested differentiation that people do using DualNumbers.jl, unless they were wise enough to code around it. I don't think DualNumbers.jl has a responsibility to provide a tagging system or anything, but it might be worth mentioning in that package's README.
The text was updated successfully, but these errors were encountered: