-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional AD #86
Comments
Does this mean that a user will need to make
where
|
Er, no? For the most part I'm not expecting any thing else to look different; so the MNIST example would stay exactly the same. It's really no different to the current Your API is something we discussed as it's closer to what Knet currently has. At a minimum it only scales up well if you allow the structure to define the forward pass (e.g. via call overloading). Even then it imposes a bigger burden on user-defined types and small models, and it's harder to figure out how it plays when you get to really complex models (as one example, higher-order models that take another model as input). |
Ah, I re-read the MNIST example. Do I understand correctly that |
Essentially yes, it's not actually a closure in this case because it's global, but it could be. In the docs there are some examples of closing over parameters, and I expect those to work with Cassette as well. |
How would the user go about taking the gradient of a model output with respect to a non-parmameter like the input? This is common in creating adverserial examples, linearizing dynamical models etc. |
I guess that would have to be |
A year on we can do some much cooler things here. Closing in favour of #628. |
The eventual plan is to build a new compiler-level AD that better exploits Julia's compilation, provides a more function interface, and supports nested differentiation. A question here is how to support the
grad(f, x)
style interface while also still allowing abstraction and modularity in layers and their weights.I see this looking something like:
W
andb
are treated as implicit arguments to the function; this is nice in that it's essentially the ideal functional interface but without the mess of hundreds of explicit arguments.Models will implement
params
, as they do now, and whatever arrays they return will be treated as trainable parameters (dparams = grad(model, params(model), args...)
). We'll also have aFreeze
layer to treat things as constant, e.g.m = Freeze(Dense(10, 5)); params(m) == []
. Freezing parameters is a little more coarse-grained compared to now, but that's small loss compared to the gains.The text was updated successfully, but these errors were encountered: