Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improve ArrayOfSimilarArrays and VectorOfArrays (breaking) #20

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oschulz
Copy link
Collaborator

@oschulz oschulz commented Dec 8, 2021

This PR is intended to accumulate breaking API changes for ArraysOfArrays v0.7 to address certain issues and add features required to support additional use cases.

Update: #26 has removed all requires

  • Supporting StatsBase.mean and friends with weights: If ArraysOfArrays can be made more lightweight, maybe it could become acceptable as a dependency for StatsBase. Semantically this would be nice, also for Distributions, since ArrayOfSimilarArrays resolved the confusion regarding which dimensions of a matrix are used for the "parameter" axis and the "n_th sample"-axis, with different defaults for rand and cov and so on. And things like Distributions.likelihood (already supports abstract vectors of vectors) could use fast matrix-backed implementations for ArrayOfSimilarArrays.

Update: StatsBase should handle this, the upcoming Base.AbstractSlices will provide a basis for specialized statistics operations on sliced arrays.

  • Supporting StaticArrays (using nestedview and flatview for zero-copy switching between matrix and vector-of-StaticArray representation). StaticArrays isn't very lightweight, but while ArraysOfArrays is very lean, it's not a super-slim API-only package either (but still lighter than StaticArrays). Could StaticArrays depend on ArraysOfArrays in principle? - @andyferris, @timholy may I pull you in here?

The new StaticArraysCore.jl allows us to support static arrays directly in ArraysOfArrays without heavy dependencies.

@codecov
Copy link

codecov bot commented Dec 8, 2021

Codecov Report

Merging #20 (20b9faf) into master (393a25d) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #20   +/-   ##
=======================================
  Coverage   95.78%   95.78%           
=======================================
  Files           7        7           
  Lines         451      451           
=======================================
  Hits          432      432           
  Misses         19       19           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 393a25d...20b9faf. Read the comment docs.

@devmotion
Copy link

devmotion commented Dec 8, 2021

if I may pull you in here? Could you see something like that happen (StatsBase depending on a light ArraysOfArrays)?

I would prefer if this would be solved language-wide with JuliaLang/julia#32310 and eachcol and eachrow would be used throughout the JuliaStats ecosystem. And I'm still confident that this PR will be merged at some point 🙂

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 8, 2021

I would prefer if this would be solved language-wide with JuliaLang/julia#32310

I'm looking forward to that PR getting merged a lot myself. Will that one also bring a generic equivalent to flatview, do you think?

Still, JuliaLang/julia#32310 will not bring efficient vectors of similar matrices (e.g. to represent multiple samples of matrix-variate distributions) and so on, from what I understand.

@devmotion
Copy link

Still, JuliaLang/julia#32310 will not bring efficient vectors of similar matrices (e.g. to represent multiple samples of matrix-variate distributions) and so on, from what I understand.

It will support them and more generally vectors of any higher-dimensional array - e.g. you can take a four-dimensional array and with eachslice construct an AbstractVector of the slices along the fourth (or some other) dimension.

@cscherrer
Copy link

It seems to me ArraysOfArrays and JuliaLang/julia#32310 offer very different kinds of abstractions. ArraysOfArrays gives a more efficient representation for an AbstractArray{<:AbstractArray}, similar to what StructArrays does. But the underlying representation is an optimization, and things are set up so the end might never need to think about the fact that it's back by a single array.

The PR seems to me more related to views. The user starts with an array, and can slice it in different ways.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 9, 2021

It seems to me ArraysOfArrays and JuliaLang/julia#32310 offer very different kinds of abstractions.

Hm, yes and no maybe - looking a bit deeper into , it looks like Slices and ArrayOfSimilarArrays kinda do the same thing. On the other hand, Slices will not provide capabilities like append! which I need in packages like BAT to combine samples. On the other hand, it can slide in an arbitrary combination of dimensions. I think I'll need to keep ArrayOfSimilarArrays around at least in part after JuliaLang/julia#32310 .

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 9, 2021

I've created #21 as a reminder regarding JuliaLang/julia#32310 .

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 10, 2021

Here's a load time analysis ofArraysOfArrays, with and without support for StatsBase and StaticArrays and with and without using them (I ran everthing 22 time, discarding the first two runs):

ArraysOfArrays without Requires, no support for StatsBase and StaticArrays, no inter-dependencies:

  • mean time for using ArraysOfArrays: 0.085 s
  • mean time for using StatsBase: 0.170
  • mean time for using StaticArrays: 0.510

Requires-based support for StatsBase and StaticArrays:

  • mean time for using ArraysOfArrays: 0.095 s
  • mean time for using StatsBase, ArraysOfArrays: 0.280 s
  • mean time for using StaticArrays, ArraysOfArrays: 0.625 s
  • mean time for using StaticArrays, StatsBase, ArraysOfArray: 0.840 s

ArraysOfArrays depending on StatsBase, no StaticArrays support:

  • mean time for using StatsBase, ArraysOfArrays: 0.225 s

ArraysOfArrays depending on StaticArrays, no StatsBase support:

  • mean time for using StaticArrays, ArraysOfArrays: 0.730 s

ArraysOfArrays depending on StatsBase and StaticArrays:

  • mean time for using StaticArrays, StatsBase, ArraysOfArray: 0.775 s

So ArraysOfArrays by itself is very lightweight, but heavy enough that it would increase the load times of StatsBase noticeably. An AbstractArraysOfArrays package would solve this, but maybe JuliaLang/julia#32310 could provide an AbstractSlices type, this would do away with the need of a separate interface-only package.

StaticArrays is much heavier, and the relative increase on the load time due to ArraysOfArrays would be lower, but still too high to be acceptable, I expect - it would only add a bit of extra functionality after all. Here, an AbstractArraysOfArrays package that defines nestedview and flatview would be a good solution. Alternatively, one could have a separate ArraysOfStaticArrays package, but that would be almost empty and doesn't seem like an elegant approach.

Interestingly, the cost of using Requires is quite low in this case. I'm all for avoiding it in general - in this specific case though, it may be good to keep things as they are for a while longer until we know the final shape of JuliaLang/julia#32310 (i.e. whether it will include an abstract type for sliced arrays). @torfjelde and @cscherrer would that be acceptable for you? I doubt you'd see a significant difference in your package load times with and without Requires in ArraysOfArrays.

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 10, 2021

The load time of ArraysOfArrays itself can probably not be improved much, it's already invalidation-free (in contrast to StatsBase and StaticArrays, but I those are probably already as optimized as they can be?):

julia> using SnoopCompileCore

julia> invalidations = @snoopr using StatsBase
1106-element Vector{Any}: [...]

julia> invalidations = @snoopr using StaticArrays
273-element Vector{Any}: [...]

julia> invalidations = @snoopr using ArraysOfArrays
Any[]

@devmotion
Copy link

Interestingly, the cost of using Requires is quite low in this case.

As expected, the timings show that Requires increases loading times in particular if the conditional dependencies are loaded. AFAIK this is a general observation with Requires. Some other problems are listed here: JuliaLang/Pkg.jl#1285

@oschulz
Copy link
Collaborator Author

oschulz commented Dec 10, 2021

As expected, the timings show that Requires increases loading times in particular if the conditional dependencies are loaded.

Indeed. And since the extra code activated here in that case is very compact, the overhead of using Requires compared to hard dependencies is relatively small in this case as well.

@oschulz
Copy link
Collaborator Author

oschulz commented Jul 6, 2022

Update: #26 has removed all requires. We have StaticArraysCore.jl now, and Julia v1.9 (and hopefully Compat.jl as well) will bring Base.AbstractSlices as a common basis for sliced-array-like data structures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants