-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple arguments / activities? #311
Comments
That would be an improvement but we need to make sure that the decomposition of the arguments doesn't result in a materialized intermediate variable |
Can you give an example and explain why that's a bad thing? |
On movile so link chasing is hard but the logprovlemsAD has a thread where
I described why closures were bad. The intermediate variable — whether
tuple struct or closure — will have the same impact if not expanded
…On Fri, Jun 7, 2024 at 12:40 AM Guillaume Dalle ***@***.***> wrote:
Can you give an example and explain why that's a bad thing?
—
Reply to this email directly, view it on GitHub
<#311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXEDEIHL5QQ54Z6Z2SDZGDQMNAVCNFSM6AAAAABI5QNRGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJTGUYTGNZYHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yeah I added that link above, I guess I'm just wondering what you mean by intermediate variable in this setting. I don't see why grouping multiple arguments into one or unpacking them would require a closure |
Not sure, this would work for my use case. Typically, I have a function To be clear: I don't care, if DifferentiationInterface covers my use case or not. I'm fine with using Enzyme directly. I was just exploring this direction and thought I report back my experience 🙂 . |
Thanks for your feedback! I imagine the following people may also have opinions:
|
As for your specific use case, we might have to make a distinction between "active / inactive" and "shadowed (mutated) / not shadowed"? |
No strongs views on my end -- activity analysis isn't presently really a thing in Tapir.jl, so happy to take the lead from others. |
Also note that the reason I'm mentioning ComponentArrays is because the original plan was never for DI to support an arbitrary number of arguments: that's what we agreed AbstractDifferentation would try to achieve based on DI (ping @mohamed82008). Hence I feel like disguising multiple arguments into one (x) or two (x and c for constant) is the best we can hope for in the short term. |
@Vaibhavdixit02 @ChrisRackauckas what kind of activity granularity would we need for Optimization.jl? |
I think Optimization.jl is the one that should be reasonable to hit with the current interfaces. We can start there, and SciMLSensitivity.jl would need the full generality 😅 |
Forcing multiple arguments into a single struct — even two structs (one active one constant) will still result in the performance issues I outlined in the log density link. |
Which is why I'm suggesting to force all the active arguments together into x, and all the inactive arguments together into c |
Yes but I’m saying that will still have performance oenalties — if say you were condensing two active args into one |
Copying my write up from before here — most of the arguments apply equally to non differentiated code. Thus the “wrap multiple args together forcing indirection” be it through a closure struct or whatever else will cause perf difficulties of the undifferentiated and differentiated code and possibly prevent differentiation. Essentially replace the use of the word closures below with struct or whatever else (though box will only come up in a closure iirc). Closures can present several issues both for Enzyme's ability to AD something, the performance of differentiated programs, and even the performance of programs not being differniated. I'll write up a few concerns below, in no particular order:
There are many reasons for this. First of all there's one less load instruction per loop in the top (the first loop needs to load the data from the array, the second loads a ref from the array, then the float from the ref pointer). Secondly, this data is layed out consecutively. This allows vector instructions to apply on the first, whereas it is impossible in the second (vector loads/stores require consecutive data). Secondly you get much better cache performance. In the second case you can get a cache miss per iteration. In the first case you get a cache hit only once per cache size. (since a single cache hit loads a consecutive cache size number of elements all at once). Beyond the extra information, closures can obscure aliasing information. Aliasing is the ability to say that two pointers point to different memory. This is critical for optimization. Consider the following.
Here the two uses of x get optimized into one, fantastic. However if now we have it go through a pointer (via a Ref)
You'll note that now the two loads of x[] are there explicitly and not combined into one. This is because the call to inner could possibly write to the same memory of x, making it possible that it changed the value of x inside the function (even though here we know it doesn't). Getting rid of the inner function call indeed does let a single load replace both uses.
Introducing a closure means that there is significant worse aliasing information since each of the captured variables is behind a pointer, and any aliasing information (e.g. variables x and y don't overlap) is now lost. This makes the regular code slow, and can also make the genereated AD code much slower. As the various enzyme papers have shown running optimization before AD makes a compound impact on the generated code. In this case worse aliasing information may mean that Enzyme has to cache a lot more data, dramatically slowing things down and possibly preventing previously differntiable code from being able to be run at all (if available memory is insufficient).
|
Thanks for the clarification, I didn't realize that some of the issues with closures also apply to indirection. Do you have suggestions for a middle ground that would improve performance on enzyme without being as permissive as enzyme itself? I really am open to hearing your ideas cause mine are running out. |
At the moment Enzyme has lots of momentum, so it makes sense for me to go a little out of my way to improve support for you. But three years ago it was Zygote everywhere, and maybe three years from now it will be Tapir everywhere, who knows. Either project may run out of funding or volunteer manpower. In case changing backends becomes necessary, people will be happy to have an interface that is designed to be a good compromise, instead of fully customized to Enzyme's very specific needs. That's why I'm kinda hesitant to go in all the way on activities and shadows and stuff. |
It's worth reiterating that this goes both ways. For me, part of the motivation for DI and the DifferentiationInterfaceTest.jl tests and benchmarks has always been to make the transition from Zygote to Enzyme on my own projects as simple as possible. And in my eyes, swapping To add more background to this comment: |
It's obviously completely up to you how you want your package to progress. Fundamentally it's going to be a question of trade offs. Either you choose to have less/simpler code in DifferentiationInterface and consequently worse support and/or performance for various AD package backends. Obviously this makes it easier to get things up and running, and perhaps an easier entry point for users. However, the trade off you make here is that it'll be harder for bigger / more workload-critical users to adopt the package. For example, if a codebase already successfully makes use of a given AD package, its going to be much more difficult to have them swap to DI if the use of DI itself (in contrast to direct use of the AD package) prevents code from being differentiated, or makes the differentiated code slower. I'm struggling to find the right link, but I recall that closures + Zygote was a significantly reason it didn't take off. If any interface were causing such a "cannot differentiate and/or slower perf issue" for any backend that doesn't exist on the original tool, I would likely recommend the user directly use a tool and file a bug on the interface in the meantime. On the other hand, you could have more/specialized code in DI. Obviously this means more code/complexity here, in exchange for more potential users. Either way, happy to try to support you. Side note, all the code examples I gave above weren't even differentiated -- so at least the performance penalties I mention would be true of most julia code regardless of additional AD. |
Not at the moment unfortunately. My current priority is to make the single-input part good enough that it can be used by most of the SciML ecosystem (in particular Optimization and DifferentialEquations). Once I get that right, I'll turn to multiple arguments, but it is an insane amount of code to write and I'm just one dude ^^ If you want to help speed things up, the most useful contribution would be suggestions on how I could achieve this without 1) completely transforming the existing API and 2) turning into an Enzyme front end. What's hard is that a solid 2/3 of backends don't support this at all, so I need to make it work for them with a closure while allowing top performance for more sophisticated ones like Tapir and Enzyme. |
I don't think aiming for supporting multiple arguments in DI.jl for backends that don't natively support multiple arguments will be too useful. It might be better to document backend limitations and ask users to define their own closures. If DI.jl is to implement multiple argument support, we might as well deprecate AD.jl and go with DI.jl for everything. I personally have no issue with that if DI eventually supported everything AD currently supports. I would have preferred AD.jl to be the one package but development here is clearly much more active than in AD so maybe that's better for the ecosystem as a whole. Alternatively, we could keep the arrangement we agreed on but you contribute the multiple argument support to AD.jl. I think everyone here has write access to AD.jl and can contribute there instead. It's really up to you where you want to contribute this code. If we decide to make DI.jl the new AD.jl, then perhaps we should also figure out a way to give some credit to the AD.jl contributors and original designers. |
@mohamed82008 I don't think I have write access. I left comments on the AD.jl wip Enzyme PR a while ago, I'd be happy to do some minor fixups but don't have permission to xD |
Fixed :) |
Thanks for your answer Mohamed 🙏 A lot has happened in DI over the last couple of months, and it's already being used by big players like SciML and JuliaSmoothOptimizers. As a result, I honestly think it would be best for the ecosystem to put all of this stuff in one place, and aim for multiple argument support in DI as well (either for a few backends or for all with a closure fallback, not sure yet). |
That might be a reasonable way to give credit to the paper's authors but not all of the contributors, e.g David. I suppose there is no perfect solution. But the CITATION.bib suggestion + a README comment would be the best we can do. I am ok with that. |
Yeah I agree that is a problem, but how do you even cite contributors that are not authors, except by citing the repo itself? Which we can also recommend btw |
What do you think of the wording here? |
Looks good. Thanks. I think the reference to AD.jl as "the inspiration" sufficiently gives credit to the AD.jl contributors and the citation sufficiently gives credit to the AD.jl designers and paper authors. I don't think we can do much better than that. Asking users to cite the AD.jl repo as well may be a bit much but if David prefers that, it would be nice to do as well. I am happy with the current status. |
This is me trying to sum up the discussion and propose a new API. ProblemSome backends support multiple arguments or activity specifications. Backend abilities
The most generic backend is Enzyme's with Possible APIAt the moment, DI supports:
For a given input argument, we can ask:
First proposition: split between:
The API could look like: jacobian(f, backend, input, params, cache)
jacobian(f!, output, backend, input, params, cache)
pushforward(f, backend, input, params, cache, input_seed)
pushforward(f!, output, backend, input, params, cache, input_seed)
pullback(f, backend, input, params, cache, output_seed)
pullback(f!, output, backend, input, params, cache, output_seed) Not sure about the name ImplementationIn practice, For most backends other than Enzyme, operators would fall back on the following closure: f(input) = f(input, params, cache)
f!(output, input) = f!(output, input, params, cache) |
The gist is that DI is a math-inspired interface: functions |
I don't fully understand what you mean by "care about initial values"? |
When you provide a cache, you don't give a damn what's there at the start of the function, you can initialize it with |
I like this API and I believe that it is near optimal given the constraint that DI should be the "intersection of all AD packages" (a job which it does really well right now). Beyond enabling the use of Enzyme, your API would also facilitate some additional performance optimization; i.e. to work around the slow closure bug. I don't think that there is a much more flexible API hat all packages can support. This also makes it easier to support future packages quickly in DI without them having to develop a large set of features before joining the DI family. The only alternative I see is to "wait for Enzyme to be able to handle closures reliably" and stick to single-arg API; the simplicity of which is certainly appealing. If "one day" ™️ the slow closure bug is gone and Enzyme is able to handle closures more reliably, I would certainly prefer the single-arg + closure route. As for naming bike shedding: I feel like users with ML background would expect the function to be differentiable in |
Something we might want to consider as well when discussing future API changes are "auxiliary" outputs à la JAX:
|
As of the newly-released v0.6, users can specify an arbitrary number of "context" arguments which are not differentiated. At the moment, there is only one context type named gradient(f, backend, x, Constant(c1), Constant(c2)) There still cannot be more than a single active (differentiated) argument |
At the moment the only viable way I see is to support one active and one inactive argument, and leave it to ComponentArrays to combine several arguments if necessary. It could be done by passing a tuple (x, c) instead of the input x. @wsmoses do you think that might get us 90% of the way to where we need to be?
See the following discussions:
The text was updated successfully, but these errors were encountered: