Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayInterfaceCore #211

Closed
ParadaCarleton opened this issue Oct 7, 2021 · 102 comments
Closed

ArrayInterfaceCore #211

ParadaCarleton opened this issue Oct 7, 2021 · 102 comments

Comments

@ParadaCarleton
Copy link

ParadaCarleton commented Oct 7, 2021

As mentioned here, ArrayInterface has become a huge dependency to take on. It might make sense to separate out some ArrayInterfaceCore that can be loaded faster and contains only basic functionality.

@chriselrod
Copy link
Collaborator

We could take one or both of the following approaches:

  1. Don't even include functionality. Just stubs of the methods other libraries would need/want to overload. ArrayInterface would import these, so it won't be a breaking change (i.e., extending ArrayInterface.parent_type would be extending ArrayInterfaceCore.parent_type).
  2. Add basic functionality so that someone wanting to use ArrayInterface can actually use ArrayInterfaceCore without taking on as much weight.

Taking both would mean creating two libraries: one of the stubs, and then one of the basic functionality.

@grahamas
Copy link

grahamas commented Oct 7, 2021

@chriselrod your approaches sounds like (for shorthand):

  1. AbstractArrayInterface
  2. and ArrayInterfaceCore

I think there's a more functional divide. It seems to me like there are two kinds of users:

  1. Someone purely trying to develop their own Array type (usually the ideal use-case for AbstractX packages). However, even for this person the AbstractArrayInterface wouldn't be sufficient in many cases. The existing package provides a lot of useful low-level stuff (device etc) that a more advanced developer might want to use without the cost of importing the various high-level dependencies of this package. This is the person who probably wants something like ArrayInterfaceCore, for some definition of "basic functionality," and so is the subject of this issue.
  2. Someone trying to use one (or more) of the Array types defined in this package. This person wants more than would be included in my understanding of "basic functionality" (maybe e.g. the FieldArray) but still a lot less than the full package (e.g. does not want CUDA, CuArray etc.). This person suggest splitting up ArrayInterface into sub-packages, all depending on ArrayInterfaceCore, each contributing one set of Array implementations. This person isn't really relevant to this issue, but sometimes this person is me, so I think it'd be cool. But it should be clear/decided that "basic functionality" won't include this person. Edit: this isn't a concern because of @requires

Basically, that's all to say I don't think AbstractArrayInterface would be all that useful, and I think "basic functionality" can mean a lot of different things to different people. From my perspective, I'd like it to mean that when I subtype AbstractArray, the vast majority of necessary and optional methods have working default implementations. Someone should probably write down what you really want "basic functionality" to mean, though.

Finally, I'm curious how this ties into the long-mentioned merge into base julia.

@chriselrod
Copy link
Collaborator

chriselrod commented Oct 7, 2021

For me, basic functionality are the things required for LoopVectorization to work on a custom array type.
That set of things are what the JuliaSIMD ecosystem uses.

@ChrisRackauckas
Copy link
Member

What's the measure here? Load time?

@ParadaCarleton
Copy link
Author

ParadaCarleton commented Oct 7, 2021

What's the measure here? Load time?

I think it should be; anything like an ArrayInterfaceCore should take on the order of a tenth of a second to load, I think. Is there a breakdown of load time by file? What precisely makes ArrayInterface take so long to load?

@ChrisRackauckas
Copy link
Member

Looks like it hits that mark?

julia> @time using ArrayInterface
  0.098646 seconds (172.08 k allocations: 9.701 MiB, 40.21% compilation time)

Though it's close 😅 . Likely influenced by the @requires usage which should get upstreamed as much as possible.

@ParadaCarleton
Copy link
Author

Looks like it hits that mark?

julia> @time using ArrayInterface
  0.098646 seconds (172.08 k allocations: 9.701 MiB, 40.21% compilation time)

Though it's close sweat_smile . Likely influenced by the @requires usage which should get upstreamed as much as possible.

Are you sure you're in a clean environment, and also that this environment isn't located on a supercomputer?

julia> @time using ArrayInterface
    1.524555 seconds (1.29 M allocations: 71.782 MiB, 1.27% gc time, 94.02% compilation time)

@ChrisRackauckas
Copy link
Member

Julia v1.7?

@ParadaCarleton
Copy link
Author

ParadaCarleton commented Oct 7, 2021

Julia v1.7?

... How?!

 0.156010 seconds (147.17 k allocations: 8.357 MiB, 58.06% compilation time)

@Tokazama
Copy link
Member

Tokazama commented Oct 8, 2021

The problem with splitting things up is that the point is that everyone uses the same interface. I know that there's a lot of code related to the indexing pipeline and ArrayIndex, but not everyone is using it and that seems to cause some friction. I'm completely open to improving that code but if people just don't want to use it we should just get rid of it and hope to have related problems solved in Julio 2.0.

@ParadaCarleton
Copy link
Author

The problem with splitting things up is that the point is that everyone uses the same interface. I know that there's a lot of code related to the indexing pipeline and ArrayIndex, but not everyone is using it and that seems to cause some friction. I'm completely open to improving that code but if people just don't want to use it we should just get rid of it and hope to have related problems solved in Julio 2.0.

It might be worth splitting that code into a different package, both to reduce compile times (which are a bit on the iffy side, although definitely not a huge deal after 1.7) and to improve the code's visibility -- if the indexing features mostly aren't being used, that might be because people aren't aware of them.

@Tokazama
Copy link
Member

if the indexing features mostly aren't being used, that might be because people aren't aware of them.

I think the appropriate solution for this is documentation (which the lack of is admittedly my fault).
My personal opinion is that there is a need for all of this code in one package (perhaps a cleaner and more concise version than what currently exists), because the core array interface includes indexing and there are necessary improvements on what is currently in Base.
We could certainly take out certain components that are not array specific (like we did with Static.jl), but anything else would needlessly divide a code base that should be completely unified.

Nonetheless, I've drafted a PR that gets rid of the indexing code so people can begin to look at how this affects things.
I'm ultimately open to whatever people think is best moving forward because no matter how good this package may seem to any one person, its utility is entirely dependent on community buy in.

@ParadaCarleton
Copy link
Author

if the indexing features mostly aren't being used, that might be because people aren't aware of them.

I think the appropriate solution for this is documentation (which the lack of is admittedly my fault). My personal opinion is that there is a need for all of this code in one package (perhaps a cleaner and more concise version than what currently exists), because the core array interface includes indexing and there are necessary improvements on what is currently in Base. We could certainly take out certain components that are not array specific (like we did with Static.jl), but anything else would needlessly divide a code base that should be completely unified.

Nonetheless, I've drafted a PR that gets rid of the indexing code so people can begin to look at how this affects things. I'm ultimately open to whatever people think is best moving forward because no matter how good this package may seem to any one person, its utility is entirely dependent on community buy in.

I’d definitely agree that’s not sufficient reason to separate this code out — it’s more of an incidental benefit you might get. The main hope would be splitting the package would reduce the size of the dependency without causing many problems.

@Tokazama
Copy link
Member

We could probably move OptionallyStaticUnitRange and OptionallyStaticStepRange to Static.jl, leaving the 'known_*' methods in this package.

@ChrisRackauckas
Copy link
Member

I'm confused. Splitting to different packages and making use of Requires would only increase compile and startup times, not decrease it. I'm not sure why that's being proposed as the solution then to decreasing startup time.

@chriselrod
Copy link
Collaborator

chriselrod commented Oct 14, 2021

Why would we need Requires.jl?

Most packages don't want to use ArrayInterface.jl.
Some of them may, however, want to support it for the sake of packages that depend on it (like LoopVectorization).

So put the methods that need to be extended to support ArrayInterface into ArrayInterfaceCore.

This could reduce the use of Requires, if libraries like StaticArrays.jl are then willing to take ArrayInterfaceCore as a dependency, letting us drop the associated @requires from ArrayInterface.jl proper.

@Tokazama
Copy link
Member

I don't think anyone has actually made a serious proposal to have StaticArrays.jl use ArrayInterface.jl. Perhaps there would be push back related to this issue but a lot of code in that package could be replaced by what we have here anyway.

@ChrisRackauckas
Copy link
Member

What parts are being proposed to move out and what is the v1.7 using time drop from it?

@rafaqz
Copy link
Contributor

rafaqz commented Oct 14, 2021

Im not sure its worth doing anything about this given the 1.7 performance improvements.

Speaking to my small part in triggering this discussion, 0.1 seconds is not an issue at all for DimensionalData.jl. It will be swallowed by the improvements to load times of DD itself in 1.7.

@chriselrod
Copy link
Collaborator

chriselrod commented Oct 14, 2021

What parts are being proposed to move out and what is the v1.7 using time drop from it?

A library of stubs should load nearly instantly.

I don't think anyone has actually made a serious proposal to have StaticArrays.jl use ArrayInterface.jl. Perhaps there would be push back related to this issue but a lot of code in that package could be replaced by what we have here anyway.

Making a bunch of internal changes costs developer time and risks breaking things, while leaving them as they are does not. If someone has the time, they could make a PR, although I'd recommend asking the StaticArrays maintainers first to see if they're likely to accept such a PR.
A simpler PR that doesn't touch anything and simply adds a few extensions while having a negligible impact on load times would take very little time/effort and is also less likely to be controversial.

@ChrisRackauckas
Copy link
Member

I agree. Which things are making the loading slow, and why? I don't see anything clear here other than "yay smaller". Which functions need to move, and will it send it down 10%? Or will removing just one function reduce the using time by 99.9%? Could we just delete that function? Etc.

@Tokazama
Copy link
Member

I opened #212 to start showing what we could remove. There's a bit more that could be cut but most other changes would be clean up we should do anyway.

@ChrisRackauckas
Copy link
Member

How much using time would it cut? Can we make it a sub-package?

@Tokazama
Copy link
Member

Here's the current one for me

julia> @time using ArrayInterface
  1.419789 seconds (4.16 M allocations: 203.910 MiB, 11.25% gc time)

and that PR...

julia> @time using ArrayInterface
[ Info: Precompiling ArrayInterface [4fba245c-0d91-5ea0-9b3e-6abc04ee57a9]
  1.339895 seconds (1.56 M allocations: 91.977 MiB, 0.70% gc time, 32.98% compilation time)

Other people should probably benchmark it with more robust setups for benchmarking.

Can we make it a sub-package?

We could but I just don't really see the point. The majority of code that would stick around after cleanup is related to_indices working for traits independent of how deeply they are nested. That's only useful if we all begin gravitating towards it, so if no one wants it we should just get rid of it so we don't waist developer time on it.

Making a bunch of internal changes costs developer time and risks breaking things, while leaving them as they are does not. If someone has the time, they could make a PR, although I'd recommend asking the StaticArrays maintainers first to see if they're likely to accept such a PR.

Initial changes shouldn't be that bad. For example, SOneTo and SUnitRange should be fully covered by what we do here with aliases for OptionallyStaticUnitRange.

@ChrisRackauckas
Copy link
Member

Time to post precompile both times.

@chriselrod
Copy link
Collaborator

My suggestion with ArrayInterfaceCore was that it would just be stubs, e.g.

function parent_type end

that ArrayInterface.jl then extends, as would other libraries.
ArrayInterface itself wouldn't get faster to load, but only the libraries that want to use it (e.g. LoopVectorization) would have to pay the cost.

@Tokazama
Copy link
Member

Tokazama commented Oct 14, 2021

Wouldn't parent_type give incorrect results for all of the array types in base unless ArrayInterface.jl was loaded? At what point do we tell users they can reliably use the stubs?

Sorry, I'm not trying to be difficult. Maybe an initial PR the proposed package as a submodule or perhaps a gist would help the conversation.

@chriselrod
Copy link
Collaborator

chriselrod commented Oct 15, 2021

At what point do we tell users they can reliably use the stubs?

When ArrayInterface is loaded.
If you want to use ArrayInterface, you must using ArrayInterface.
If all you want to do is add methods, then you can using ArrayInterfaceSubs instead.
(Figured Stubs is a more appropriate suffix than Core, but ChainRulesCore was the source of the idea.)

@ChrisRackauckas
Copy link
Member

yes, that would also give explicit versioning on the requires'd pieces.

@Tokazama
Copy link
Member

Tokazama commented May 3, 2022

Gotcha. I'll try looking into it in a couple days.

Tokazama added a commit that referenced this issue May 5, 2022
This moves all of the non-`@require` code into an internal package. I
managed to get initial tests to run for `ArrayInterfaceCore` but there
will be several more combs through the tests to make sure we can
separate out tests that are independent of downstream packages. This
required addition of a simple proxy for `MArray` so we could test all of
the static/known traits.

Zero work has been done to actually test the ArrayInterace test at the
top level here so far, but hopefully it will be fairly straightforward
if there is an exhaustive list following `impor ArrayInterfaceCore:
...` (which should also prevent any noise downstream).
@Tokazama
Copy link
Member

Tokazama commented May 5, 2022

Early work on a branch for this here.

@oschulz
Copy link

oschulz commented May 7, 2022

A more lightweight version of ArrayInterface would be awsome, I think - it is nowadays a direct or indirect dependency of a large chunk of the ecosystem. I've been tracking down some package load times, and ArrayInterface often turns out to be a driving factor. With a package like FiniteDiff, for example:

julia> @time_imports using FiniteDiff
      3.2 ms  ┌ Compat
      0.5 ms  ┌ Requires
    538.4 ms  ┌ StaticArrays
      0.2 ms  ┌ IfElse
     33.1 ms    ┌ Static
    722.1 ms  ┌ ArrayInterface
   1512.6 ms  FiniteDiff

ArrayInterface makes up the biggest chunk of the load time, StaticArrays almost all of the rest, FiniteDiff itself is tiny:

julia> @time_imports using ArrayInterface, StaticArrays
      4.2 ms  ┌ Compat
      0.7 ms  ┌ Requires
      0.2 ms  ┌ IfElse
     35.8 ms  ┌ Static
    717.2 ms  ArrayInterface
    560.6 ms  StaticArrays

julia> @time_imports using FiniteDiff
     19.6 ms  FiniteDiff

FIniteDiff (just an example here, though it is itself a direct or indirect dependency of many packages) seems to mostly use ArrayInterface.allowed_getindex and ArrayInterface.allowed_setindex!, which intuitively shouldn't have to come from a heavy dependency.

I was surprised how lightweight even a package like Optim itself is, here is Optim's "load time story":

julia> @time using ArrayInterface
  0.690469 seconds (656.96 k allocations: 36.272 MiB, 2.19% gc time, 91.62% compilation time)

julia> @time using StaticArrays
  0.588908 seconds (2.28 M allocations: 179.212 MiB, 7.03% compilation time)

julia> @time using FillArrays
  0.152951 seconds (412.35 k allocations: 31.434 MiB)

julia> @time using ForwardDiff
  0.212571 seconds (548.34 k allocations: 36.217 MiB, 2.03% compilation time)

julia> @time using DataStructures
  0.068310 seconds (127.42 k allocations: 9.279 MiB)

julia> @time using Optim
  0.077643 seconds (207.88 k allocations: 13.644 MiB, 7.29% compilation time)

@ChrisRackauckas
Copy link
Member

What is @time_imports? That seems awesome.

@oschulz
Copy link

oschulz commented May 7, 2022

What is @time_imports? That seems awesome.

It's new in Julia 1.8 (I've been using v1.8.0-beta3 above). It really opened my eyes about the package load times in general in the last days.

@oschulz
Copy link

oschulz commented May 7, 2022

It really opened my eyes about the package load times in general in the last days.

I want to reduce the horrid load time of BAT.jl by reducing it's dependencies and splitting it up along dependency lines, so I started to measure load times of a bunch of diff- and opt- and stats-related packages and was quite surprised how lightweight some packages are (on top of their deps) and how heavy some others (bat/BAT.jl#351).

@ChrisRackauckas
Copy link
Member

The split is done. The core is nice. The rest needs work, but hey, that was the point of this issue so closed.

julia> @time_imports using ArrayInterfaceCore
     28.4 ms  ArrayInterfaceCore

julia> @time_imports using ArrayInterface
      2.8 ms  ┌ Compat
      1.0 ms  ┌ IfElse
     45.1 ms  ┌ Static
      2.3 ms  ┌ ArrayInterfaceCore
    398.1 ms  ArrayInterface

julia> @time_imports using ArrayInterfaceBandedMatrices
    120.3 ms    ┌ FillArrays
   1005.0 ms  ┌ ArrayLayouts
    214.9 ms  ┌ BandedMatrices
      2.3 ms  ┌ ArrayInterfaceCore
   1251.9 ms  ArrayInterfaceBandedMatrices

julia> @time_imports using ArrayInterfaceBlockBandedMatrices
    122.4 ms    ┌ FillArrays
   1017.1 ms  ┌ ArrayLayouts
    207.1 ms  ┌ BandedMatrices
    138.1 ms  ┌ MatrixFactorizations
      2.5 ms  ┌ ArrayInterfaceCore
      1.3 ms  ┌ ArrayInterfaceBandedMatrices
    154.6 ms  ┌ BlockArrays
     41.7 ms  ┌ BlockBandedMatrices
   1595.4 ms  ArrayInterfaceBlockBandedMatrices

julia> @time_imports using ArrayInterfaceCUDA
      2.6 ms  ┌ Compat
      1.5 ms  ┌ Requires
     52.3 ms    ┌ ChainRulesCore
     60.7 ms  ┌ AbstractFFTs
     15.7 ms      ┌ Preferences
     17.6 ms    ┌ JLLWrappers
    173.7 ms  ┌ LLVMExtra_jll
      1.2 ms  ┌ Reexport
     23.5 ms    ┌ RandomNumbers
     38.3 ms  ┌ Random123
      4.0 ms      ┌ CEnum
    100.4 ms    ┌ LLVM
      2.0 ms    ┌ Adapt
    799.5 ms  ┌ GPUArrays
      1.3 ms  ┌ ExprTools
      1.1 ms  ┌ IfElse
     34.2 ms    ┌ Static
      2.3 ms    ┌ ArrayInterfaceCore
    381.2 ms  ┌ ArrayInterface
     24.7 ms  ┌ TimerOutputs
      1.7 ms  ┌ CompilerSupportLibraries_jll
      6.2 ms  ┌ BFloat16s
    193.5 ms  ┌ GPUCompiler
   1059.9 ms  ┌ CUDA
   2855.1 ms  ArrayInterfaceCUDA

julia> @time_imports using ArrayInterfaceOffsetArrays
      2.8 ms  ┌ Compat
      1.3 ms  ┌ IfElse
     30.5 ms    ┌ Static
      2.2 ms    ┌ ArrayInterfaceCore
    327.2 ms  ┌ ArrayInterface
      1.6 ms  ┌ Adapt
     35.4 ms  ┌ OffsetArrays
    405.6 ms  ArrayInterfaceOffsetArrays

julia> @time_imports using ArrayInterfaceStaticArrays
      2.5 ms  ┌ Compat
    460.9 ms  ┌ StaticArrays
      1.1 ms  ┌ IfElse
     33.4 ms    ┌ Static
      2.5 ms    ┌ ArrayInterfaceCore
    333.8 ms  ┌ ArrayInterface
      1.8 ms  ┌ Adapt
    837.3 ms  ArrayInterfaceStaticArrays

julia> @time_imports using ArrayInterfaceTracker
    467.2 ms    ┌ StaticArrays
    478.1 ms  ┌ DiffResults
      2.9 ms  ┌ Compat
      1.3 ms  ┌ NaNMath
      1.3 ms  ┌ Requires
     53.7 ms    ┌ ChainRulesCore
     55.9 ms  ┌ ChangesOfVariables
      1.1 ms  ┌ OpenLibm_jll
      1.5 ms  ┌ InverseFunctions
      4.5 ms  ┌ DocStringExtensions
      5.3 ms    ┌ IrrationalConstants
      1.5 ms    ┌ CompilerSupportLibraries_jll
      2.0 ms      ┌ LogExpFunctions
     36.3 ms          ┌ Preferences
     38.6 ms        ┌ JLLWrappers
    187.3 ms      ┌ OpenSpecFun_jll
    203.6 ms    ┌ SpecialFunctions
     11.1 ms      ┌ MacroTools
     12.9 ms    ┌ CommonSubexpressions
      1.4 ms    ┌ DiffRules
    298.4 ms  ┌ ForwardDiff
      1.8 ms  ┌ Adapt
      2.4 ms  ┌ ArrayInterfaceCore
     26.4 ms  ┌ NNlib
    561.0 ms  ┌ Tracker
   1815.3 ms  ArrayInterfaceTracker

@oschulz
Copy link

oschulz commented May 21, 2022

Amazing!

@oschulz
Copy link

oschulz commented May 21, 2022

So now we just need to ask a bunch of packages to switch deps? :-)

@ChrisRackauckas
Copy link
Member

CompatHelper will tell them ArrayInterface had a breaking update, and yes most packages should then choose to update to Core.

@ChrisRackauckas
Copy link
Member

One of the biggest remaining issues is that GPUArrays.jl is still heavy:

JuliaGPU/GPUArrays.jl#409

So any package that removes Requires.jl is not going to cut down on using time until there's a better target than GPUArrays.jl. If that goes Core as well, then anything that's not static is almost instant.

@oschulz
Copy link

oschulz commented May 21, 2022

and yes most packages should then choose to update to Core.

Could reduce the load time of FiniteDiff by over 50% (the remainder will be StaticArrays, which is harder to address).

@ChrisRackauckas
Copy link
Member

Yes, I'm doing those PRs now. RecursiveArrayTools first, and a bit of shaming on StaticArrays

@oschulz
Copy link

oschulz commented May 21, 2022

and a bit of shaming on StaticArrays

Amen to that! I did created an issue regarding it's load time (JuliaArrays/StaticArrays.jl#1023) - but to be honest I have to admit that I don't have a concrete proposal on how a lightweight StaticArraysBase.jl or so could look like, without behavior changes when the full one is loaded.

@ChrisRackauckas
Copy link
Member

See my comment. We only need 7 functions out of the whole thing, and then we delete the second biggest contributor to load times after GPUArrays.jl. Just look at this:

julia> @time_imports using RecursiveArrayTools
    10.1 ms    ┌ MacroTools
    18.7 ms  ┌ ZygoteRules
    2.6 ms  ┌ Compat
123.9 ms  ┌ FillArrays
521.7 ms  ┌ StaticArrays
    18.4 ms      ┌ Preferences
    20.2 ms    ┌ JLLWrappers
173.5 ms  ┌ LLVMExtra_jll
    4.8 ms      ┌ CEnum
103.7 ms    ┌ LLVM
    2.0 ms    ┌ Adapt
815.3 ms  ┌ GPUArrays
    5.2 ms  ┌ DocStringExtensions
    40.3 ms  ┌ RecipesBase
    2.6 ms  ┌ ArrayInterfaceCore
    56.6 ms  ┌ ChainRulesCore
1822.5 ms  RecursiveArrayTools

Fix those two and we go from 1822.5 ms to 208.3 ms (since GPUArrays is what also pulls in LLVM). "StaticArrays and GPUArrays not having a core is what causes all of SciML to have an order of magnitude extra startup time" is not an exagguration, it's a proven fact with our tools and data. If those maintainers don't help out out here, than that would be depressing. I hope that we get a quick fix to this now that the cause is so obvious now!

@oschulz
Copy link

oschulz commented May 21, 2022

Fix those two and we go from 1822.5 ms to 208.3 ms

Wow! We should send a big box of whatever is his favourite to @IanButterworth for JuliaLang/julia#41612 !

@IanButterworth
Copy link

Cool! Nice to see it helping.

If you haven't already, try it in nightly. It reports compilation time and recompilation if any (due to upstream invalidations)

@oschulz
Copy link

oschulz commented May 21, 2022

It reports compilation time and recompilation if any (due to upstream invalidations)

Oh, that could really ... encourage ... package maintainers to take a closer look at a few things. :-)

@oschulz
Copy link

oschulz commented May 21, 2022

try it in nightly [...] It reports compilation time and recompilation

Will that still go into 1.8 as well?

@IanButterworth
Copy link

It was added recently and is not marked for backport, but perhaps that's reasonable.. might be worth suggesting it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants