Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply ChainRulesCore.jl's projection operators #153

Closed
wants to merge 2 commits into from

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Jul 27, 2021

ChainRules now has a projection mechanism to preserve, among other things, the structure of structured arrays. This should probably apply to FillArrays. So this PR writes a few methods.

However, I'm not sure where it should live. This package depends on nothing at all besides the standard library. My vote would be for it to live in ChainRules, which is already a bigger package. (And Zygote which loads ChainRules already loads FillArrays anyway). But I open the PR here just because I wrote the code on this fork, and to discuss.

CC @oxinabox, @mzgubic

Loading times, on 1.8-, before:

julia> @time using FillArrays
[ Info: Precompiling FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
  1.058200 seconds (1.43 M allocations: 82.388 MiB, 0.42% gc time, 1.01% compilation time)

julia> @time using FillArrays  # fresh start
  0.217252 seconds (763.15 k allocations: 46.191 MiB, 64.89% compilation time)

julia> @time using ChainRulesCore
  0.242396 seconds (702.60 k allocations: 37.375 MiB, 4.25% gc time, 81.22% compilation time)

And after:

julia> @time using FillArrays
[ Info: Precompiling FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
  1.703313 seconds (2.20 M allocations: 124.462 MiB, 0.26% gc time, 0.60% compilation time)

julia> @time using FillArrays  # fresh start
  0.341163 seconds (974.43 k allocations: 58.087 MiB, 9.82% gc time, 65.06% compilation time)

@oxinabox
Copy link
Member

In general rule, like rrule, extra ProjectTo overloads should live in the package defining the types.
And only rules for Base + Stdlibs should live in ChainRules.jl.
The other options are too type-piratical.
And often internal changes to the package can change what rules are required.
(Less clear that that applies to ProjectTos I imagine it does for things that we need to think about the fields of, since fields are not public API)

We kicked out all the rrule code from ChainRules.jl that was for any thing other than stdlibs with the 0.8 release.

An argument is that we might well want to depend on FillArrays in ChainRules.jl
JuliaDiff/ChainRules.jl#46
However, a counter argument to that is probably other things in the ecosystem with rules,
like DiffEq sensitivity would also like to use FillArrays, for the same reason ChainRules.jl does.
And they should not have to depend on ChainRules.jl.
They should to play around with their rules in the REPL without having to thinker around with their enviroment.
and won't ChainRules.jl type-pirating this lead to very different behavour depending on if ChainRules.jl is loaded or not?


Did you start julia withj --startup=no ?
This is needed to get consistent timing for load-time, as stuff like Revise can signficantly change load time.

@dlfivefifty
Copy link
Member

Since ChainRulesCore.jl depends on StaticArrays.jl it makes sense for it to also depend on FillArrays.jl

@oxinabox
Copy link
Member

oxinabox commented Jul 27, 2021

Since ChainRulesCore.jl depends on StaticArrays.jl

It does not.
That is a test only dependency

@dlfivefifty
Copy link
Member

OK. But StaticArrays.jl doesn't depend on ChainRulesCore.jl....

@oxinabox
Copy link
Member

OK. But StaticArrays.jl doesn't depend on ChainRulesCore.jl....

yes, because Static Arrays implements a unstructured dense array type.
Which is the default assumption (kind of the only sensible default we can have, unless we want to treat all AbstractArrays as structs, which we tried for a while but it turned out to lead to too much suffering).
From the math, a a StaticArray and a Array represent the same space.
So no projection is needed to make sure it's derivatives don't end up escaping the space.
A FillArray has structure, that probably users of it would like to see preserved in it's derivatives

@mcabbott
Copy link
Contributor Author

mcabbott commented Jul 27, 2021

Did you start julia withj --startup=no ?

No, but the times are pretty consistent. If Revise takes a long time to think about things, that's relevant time for actual use. Trying now with --startup=no, things are quicker, but still 50% extra:

julia> @time using FillArrays
[ Info: Precompiling FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
  0.839034 seconds (1.51 M allocations: 87.205 MiB, 0.65% gc time, 1.30% compilation time)

julia> @time using FillArrays
  0.074092 seconds (279.86 k allocations: 19.970 MiB, 3.11% compilation time)

After:

julia> @time using FillArrays
[ Info: Precompiling FillArrays [1a297f60-69ca-5386-bcde-b61e274b549b]
  0.968634 seconds (1.57 M allocations: 90.634 MiB, 0.80% gc time, 1.09% compilation time)

julia> @time using FillArrays
  0.114498 seconds (342.79 k allocations: 23.845 MiB, 2.02% compilation time)

julia> 0.114 / 0.074
1.5405405405405406

julia> 0.341 /  0.242  # above
1.4090909090909092

Edit: that's 1.8- on an M1.

Trying 1.6 on some older xeon, again with no startup file, I get:

julia> @time using FillArrays  # before
  0.274581 seconds (273.42 k allocations: 20.355 MiB, 2.43% compilation time)
  0.279008 seconds (274.26 k allocations: 20.424 MiB, 2.39% compilation time)

julia> @time using FillArrays  # after
  0.405631 seconds (334.10 k allocations: 24.302 MiB, 1.73% compilation time)
  0.387100 seconds (334.04 k allocations: 24.299 MiB, 1.92% compilation time)

and if I load Revise first, then these become:

julia> using Revise

julia> @time using FillArrays  # before
  0.941846 seconds (707.88 k allocations: 46.001 MiB)

julia> @time using FillArrays  # after
  1.047073 seconds (755.14 k allocations: 48.984 MiB, 2.07% gc time)

@oxinabox
Copy link
Member

oxinabox commented Jul 27, 2021

50% longer but still <0.12 seconds.
Which seems acceptable.

@dlfivefifty
Copy link
Member

The other angle is that FillArrays.jl May become part of Base

JuliaLang/julia#39184

@oxinabox
Copy link
Member

The other angle is that FillArrays.jl May become part of Base

I would be keen on that.
In that case we would define it's projectors in ChainRulesCore.
(and it's rrules in ChainRules.jl, if it needed any).
But that day is not today.
When (/if) that happens we can git-filter-branch the commits over

@dlfivefifty
Copy link
Member

I think in that case it's better to put this in a new package, FillArrayChainRules.jl, for now.

@mzgubic
Copy link

mzgubic commented Jul 28, 2021

Would that not be quite confusing to users of FillArrays? If they are using an AD system that uses ChainRules but does not depend on FillArrays (ForwardDiff2, Nabla, Yota) they would have to know that they need to add a new dependency on FillArraysChainRules? Might be quite hard to figure that out from an error message?

@dlfivefifty
Copy link
Member

If it's confusing for users that is their problem. It's not a good reason to make a package bloated and hard to maintain.

@dlfivefifty
Copy link
Member

Or... have ChainRulesCore.jl depend on FillArrays.jl and put it there! It seems you don't want your package to be bloated but you are happy to bloat other peoples packages...

@mzgubic
Copy link

mzgubic commented Jul 28, 2021

Apologies if this discussion has upset you in some way, that was not the intention. We are all just trying to find the best solution to this. It might be what you suggest, it might be what we suggest, it might be neither. But the best way forward is to hear out all arguments and decide collectively based on that. Of course ultimately as a FillArrays code owner you have the right to refuse this PR outright if you think that's what best for the package.

@dlfivefifty
Copy link
Member

that's precisely what I'm saying, the best for this package and its maintenance is to not accept this PR.

for d in 1:max(ndims(dx), length(project.axes))
size(dx, d) == length(get(project.axes, d, 1)) || throw(_projection_mismatch(axes_x, size(dx)))
end
Fill(mean(dx), project.axes) # Note that mean(dx::Fill) is optimised

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need another rule for the constructor to multiply the mean dx by the length of the vector? Think of x -> sum(Fill(x, 3)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the equivalent of https://github.com/FluxML/Zygote.jl/blob/e6a86745d66b5974eaafa8a8f28bcd4b100374df/src/lib/array.jl#L17

If the constructor is close to where the Fill is used, then perhaps it's a little wasteful to first project like this, and then un-create. But not so serious.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@oxinabox
Copy link
Member

oxinabox commented Aug 17, 2021

I think it is indeed completely legit that you don't want to have the code for ChainRulesCore in your package.
After-all, if you don't care about this use-case, why should you have to maintain the code for it.
(A case could be made that you should care about this use-case, but I am not going to make it today.)
And FillArrays is super light weight; and imposing it on FillArrays 800 total dependents is too much.

Simultaneously, it is also legit that ChainRulesCore can't take this dependency, since we we also are super light-weight, and also are not imposing it on our 1500 indirect dependents is also too much.

So the remaining solution is to have this in some package, that does type-piracy.
With all the spooky action at a distance problems that incurs.

The option of ChainRules.jl isn't great because:

  • It means loading ChainRules.jl before you can do testing, which is kinda big
  • It sets up an expectation for others (who do not have the same good reasons that FillArrays does) that rules and ProjectTo's can go into ChainRules.jl.

So better, I think is indeed to make a separate ChainRules_FillArrays.jl (we can put it in JuliaDiff).
I am hopeful that we will solve Conditional Dependencies in the not so distant future.
And then we can autoload that glue package when both ChainRulesCore and FillArrays are loaded
JuliaLang/Pkg.jl#1285 (comment)

@mcabbott
Copy link
Contributor Author

option of ChainRules.jl isn't great because:

Good to lay these things out. To me these reasons don't seem to obviously counteract the extra hassle of having one more package (and one more repository, possibly) to look after. It's already pretty fragmented having CR + CRC + CRTU, I mean I see why these exist but there is friction & complexity in having them separate.

  • Testing, if these rules were part of ChainRules, then so would the testing be. The advantage is that each new upgrade to the testing story can be done to these and to Base rules in one PR, in one place.
  • Expectation, sure. It puts you in a position of "blessing" certain packages as being sufficiently small/stable/central to deserve inclusion. Not ideal. I suspect that willingness to make & maintain ChainRules_XyPackage.jl also gets you that headache to a degree.

@oxinabox
Copy link
Member

Testing, if these rules were part of ChainRules, then so would the testing be.

The testing problem i mean is not for this package, but rather for some other package.
Like e.g. NNLib.jl might want to incorporate special behavour for FillArrays.
And things will go weird in the NNLib tests if they don't have the glue loaded.
Probably they should add the glue as a main dependencies.
But they probably don't want to add ChainRules.jl as a main dependency since that takes a fair while to load
(actually on retiming it, it is not so bad on julia 1.6 (pretty bad in Julia 1.0 though), for v2.0 we might think about having CR and CRC be one package, have to see how good Julia's TTFP is at that point. Probably a few years from now.)

The advantage is that each new upgrade to the testing story can be done to these and to Base rules in one PR, in one place.

All of the big, required, changes driven by the to the testing system are now over, since was have released 1.0.
Further changes to the testing tools will be adding more convenience things.
That can really just be ignored if the package's tests are working and not getting many changes.
Right now there are no nice tools at all for testing ProjectTos.
but that is fine you don't need nice tools for that; unlike testing rrules testing it directly is not going to lead to any logical traps.


Expectation is the big one; and as you say the blessing; that goes with it.
I really don't want to end-up at that level of having to make a call.
FillArrays is in, Distances is in, StatsFun is in, but RMath is out.
I don't want to do that.
Much nicer to have a single clear policy of "nothing is in".
(plus the other reasons).

We don't want to end-up with fly-by-night packages as dependencies of ChainRules.jl.
If it is a dependency of ChainRules.jl we need to maintain it forever.
Since it becomes part of our semver.
If a package Foo is abandon'ed, we can likewise abandon ChainRules_Foo.jl.
If we had to maintain it's support in ChainRules.jl forever this can become super-problematic, if the unmaintained package has dependencies that it is no longer compatible with other things we use, and the package is no longer being maintained.

And on the topic of dependencies:
It would make the load time for ChainRules.jl go right up.
Including by loading things the end user doesn't care about.

Further, would make ChainRules end-up blocking things from updating til many things are updated.
Including things that the final user doesn't actually use.
This happens with Zygote.jl every now and again.
Since Zygote did take on a ton of dependencies to write rules for them.
Before ZygoteRules.jl was a thing I think.

Finally, these ProjectTo definitions and rrules (especially ones involving structured tangents, or heavily augmented primal computations) tend to access things that are not part of the public API of the package.
E.g. fields of structs.
Which makes them fragile and hard to maintain outside the package that defines the thing they are for.
And that make it much harder in particular to deal with the above issues with ChainRules.jl, since it needs to drop support for old versions without the non-public API changes.

@st--
Copy link

st-- commented Apr 6, 2022

Hi, was there any decision as to what's actually going to happen? As a user who would like to use FillArrays's structured arrays as well as autodiff through those arrays, is there some way for me to actually get this together?

@oxinabox
Copy link
Member

I think we should:

make a separate ChainRules_FillArrays.jl (we can put it in JuliaDiff).
I am hopeful that we will solve Conditional Dependencies in the not so distant future.
And then we can autoload that glue package when both ChainRulesCore and FillArrays are loaded
JuliaLang/Pkg.jl#1285 (comment)

@dlfivefifty
Copy link
Member

I think the convention is no underscores

@oxinabox
Copy link
Member

oxinabox commented Apr 12, 2022

I think the convention is no underscores

Right, but this is a glue package that should be automastically loaded if julia had conditional dependencies.
I feel that calls for a new convention.
It's not a proper package -- 100% of what it does is type-piracy.

@devmotion
Copy link
Contributor

Maybe this PR could be revisited and ChainRulesCore made a weak dependency? That would make the definitions available at least on Julia >= 1.9, and should not cause any additional loading or compilation time for users that do not load ChainRulesCore.

@dlfivefifty
Copy link
Member

What's a weak dependency?

@jishnub
Copy link
Member

jishnub commented Jan 18, 2023

It is sometimes desirable to be able to extend some functionality of a package without having to unconditionally take on the cost (in terms of e.g. load time) of adding a full dependency on that package. A package extension is a module in a file (similar to a package) that is automatically loaded when some other set of packages are loaded into the Julia session.
https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants