Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add ArrayLike #34196

Closed
wants to merge 8 commits into from
Closed

RFC: Add ArrayLike #34196

wants to merge 8 commits into from

Conversation

bramtayl
Copy link
Contributor

@bramtayl bramtayl commented Dec 25, 2019

This is going to need a bit of explanation.

There is a meaningful idea of an array without a predefined eltype. The most notable examples are generators and broadcasts over arrays. If they could be <: ArrayLike, then they could automatically inherit much of the machinery already constructed for AbstractArrays.

The next step would be to audit all uses of AbstractArray in Base and the stdlibs and decide whether they could be widened to ArrayLike instead.

One way to do this would be simply to replace every occurrence unless the method dispatches on the eltype, i.e. AbstractArray{T} where T. There are two reasons why I didn't do this:

  1. It introduced a lot of hard to fix ambiguity errors
  2. I don't understand a lot of the code, so I'm not certain whether many of these methods could be meaningfully extended to arrays without a predefined eltype.

The good news is that this is barely breaking, and I only had to change one test (show_supertypes).

Closes #32940

tv = jl_svec2(tvar("T"), tvar("N"));
jl_abstractarray_type = (jl_unionall_t*)
jl_new_abstracttype((jl_value_t*)jl_symbol("AbstractArray"), core,
jl_any_type, tv)->name->wrapper;
(jl_datatype_t*)jl_apply_type((jl_value_t*)jl_arraylike_type, jl_svec_data(jl_svec1(tv_N)), 1),
tv)->name->wrapper;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this code by guess and check because I don't know C, but it seemed to work

@bramtayl bramtayl changed the title Added ArrayLike RFC: Added ArrayLike Dec 25, 2019
@bramtayl bramtayl changed the title RFC: Added ArrayLike RFC: Add ArrayLike Dec 25, 2019
@@ -3396,7 +3396,7 @@ void jl_init_serializer(void)
deser_tag[LAST_TAG+1+i] = (jl_value_t*)vals[i];
i += 1;
}
assert(LAST_TAG+1+i < 256);
assert(LAST_TAG+1+i < 257);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it this is valid???

@timholy
Copy link
Member

timholy commented Dec 28, 2019

Can you explain why this is better than

const AnyArray{N} = AbstractArray{<:Any,N}

@bramtayl
Copy link
Contributor Author

That could also be a workable option. I think that what makes this special is that this is a meaningful subtype relationship: arrays with an eltype are a subtype of arrays

@timholy
Copy link
Member

timholy commented Dec 28, 2019

Understood, but are there methods that would be defined for ArrayLike{N} that couldn't reasonably be defined for AnyArray{N}? If not, then I'm still not sure I understand the need for the subtype.

@bramtayl
Copy link
Contributor Author

I think the main difference is:

AnyArray <: AbstractArray vs.
AbstractArray <: ArrayLike

There are methods defined for AbstractArray that make use of an eltype that we wouldn't want to be inherited by ArrayLike objects.

@timholy
Copy link
Member

timholy commented Dec 28, 2019

I understand the difference in the type hierarchy. I am just asking about methods. I am not opposed to inserting something into the type hierarchy, but I'm trying to understand what advantages accrue if we decide to do so. #32940 asserts that this would be useful without really making it clear what problem gets solved that can't be solved by alternative means. I see Jeff wondered the same thing, #32940 (comment).

For example, there's always the option to define

Base.eltype(::Type{<:MyArrayLikeType}) = error("eltype is undefined for MyArrayLikeType")

That would automatically invalidate all methods that call eltype, without having to manually go through and decide which can be defined for ArrayLike and which require the "real" AbstractArray. Since one is easy and complete, and the other is harder and likely incomplete, I am trying to understand why you want to pursue the more difficult option.

@bramtayl
Copy link
Contributor Author

I guess that could work. Bottom line, this just seems like a more elegant solution to me.

@bramtayl
Copy link
Contributor Author

Or I guess, to be more specific, that solution would work, but it would require encoding an eltype of Any into the signature of an AbstractArray, but then error when an eltype is requested, which seems paradoxical.

@tkf
Copy link
Member

tkf commented Dec 28, 2019

That would automatically invalidate all methods that call eltype, without having to manually go through and decide which can be defined for ArrayLike and which require the "real" AbstractArray. Since one is easy and complete, and the other is harder and likely incomplete, I am trying to understand why you want to pursue the more difficult option.

@timholy I think it depends on what do you mean by "complete"; i.e., if you prefer false-positive dispatch (error when a method should not be called) or false-negative dispatch (the method really needs eltype). The latter scenario may arise in code like

f(array::AbstractArray) = # code using `eltype`
f(itr) = # code using mutate-or-widen approach

(I guess I'm repeating what @bramtayl just said.)

As Julia language usually encourages duck-typing, I can see that false-positive dispatch might be preferable. You can also "fix" the code above by using f(array::AbstractArray{T}) where {T} = .... But now that Julia is in the post-v1 era I think it also makes sense to play on the safe side. For example, the "fix" similar to the above example for IteratorEltype

IteratorEltype(::Type{<:AbstractArray}) = EltypeUnknown()
IteratorEltype(::Type{<:AbstractArray{T}}) where {T} = HasEltype()

is technically breaking (ATM, IteratorEltype(AbstractArray) returns HasEltype()).


Having said that, using a trait may be a better option. For example, skipmissing(::AbstractArray) is array-like but skipmissing(::Generator) is not. See: #31020 (comment)

@bramtayl
Copy link
Contributor Author

Well anyway, if people are interested in this I can take the next step of auditing the use of AbstractArray in Base

@timholy
Copy link
Member

timholy commented Dec 29, 2019

Certainly if there are cases that need to be intercepted prior to failing on eltype, that's an advantage with this approach. I'd say that getting to the stage of seeing a practical use case would be helpful.

@tkf
Copy link
Member

tkf commented Dec 29, 2019

@bramtayl It would be nice if you clarify your view on some challenges with ArrayLike{N} approach:

  1. An useful case may be to declare Broadcasted to be a subtype of ArrayLike{N} Faster mapreduce for Broadcasted #31020. However, it is difficult to do ATM as the dimension is not known until the Broadcasted object is instantiated. (One possible solution is to define (say) InstantiatedBroadcasted{N, ...} <: ArrayLike{N}. This is a breaking change.)

  2. Defining view(::ArrayLike, ...) and reshape(::ArrayLike, ...)is somewhat tricky. Returning a SubArray from view(::ArrayLike, ...) is not an option as the element type is unknown. (One option may be to add SubArrayLike wrapper struct and then re-define all methods in terms of a union (say) SubArrayLikeImpl = Union{SubArray, SubArrayLike}. But AbstractArray{<:Any,N} approach would be much nicer here.)

@martinholters
Copy link
Member

Can someone explain the conceptual difference between leaving the eltype undefined and defining it as Any? Doesn't it both mean that the type(s) of the elements can not be known without actually looking at them individually? What I'm getting at is: What would be an ArrayLike that couldn't be an AbstractArray{Any} and how do consumers act differently on those two?

@tkf
Copy link
Member

tkf commented Dec 29, 2019

@martinholters My understanding is that the common expectation is collect(x) isa Array{eltype(x)} if IteratorEltype(x) isa HasEltype, rather than collect(x) isa Array{<:eltype(x)}. That is to say, collect-like functions are "allowed" to minimize the element types of the output container if the input IteratorEltype is EltypeUnknown; but it's expected to preserve the element type if it is HasEltype.

@bramtayl
Copy link
Contributor Author

@tkf I agree. On the two challenges you mention:

  1. Yes, it would be nice. I haven't looked at the broadcast internals, but a simple (maybe non-performant answer) would to be call instantiate as part of the size method.

  2. I like the idea of adding SubArray <: SubArrayLike and ReshappedArray <: ReshapedArrayLike and again auditing base to replace, for example, SubArray with SubArrayLike where possible.

I haven't really gotten far enough along in my thinking to plan out an interface for ArrayLike objects, but it seems to me the biggest problem will be similar, because it's a fundamental part of the AbstractArray machinery that implicitly requires an eltype.

@tkf
Copy link
Member

tkf commented Dec 30, 2019

size actually already works for Broadcasted (in the way you mentioned). The problem is rather the type parameter N which would be required if we want Broadcasted <: ArrayLike. Adding this type parameter is very tricky because it is not determined at construction time (by design, I suppose).

I don't think we can do SubArray <: SubArrayLike and ReshapedArray <: ReshapedArrayLike as we don't have multiple inheritance. That's why I proposed the Union-based approach. To be clearer, something like

SubArrayLike <: ArrayLike
ReshapedArrayLike <: ArrayLike
AbstractArray <: ArrayLike
SubArray <: AbstractArray
ReshapedArray <: AbstractArray
SubArrayLikeImpl = Union{SubArray, SubArrayLike}
ReshapedArrayLikeImpl = Union{ReshapedArray, ReshapedArrayLike}

where SubArrayLikeImpl and ReshapedArrayLikeImpl are implementation detail used only for implementation sharing.

@bramtayl
Copy link
Contributor Author

Darn you're right. I wonder how much would be lost if all views and reshapes were ArrayLike?

@tkf
Copy link
Member

tkf commented Dec 30, 2019

I don't think that's possible as linear algebra functions often need to dispatch on the element type of these arrays.

@bramtayl
Copy link
Contributor Author

I don't think that's possible as linear algebra functions often need to dispatch on the element type of these arrays.

:( oh that's too bad

The problem is rather the type parameter N which would be required if we want Broadcasted <: ArrayLike

Would it be possible to compute this at compile time?

@tkf
Copy link
Member

tkf commented Jan 1, 2020

The problem is rather the type parameter N which would be required if we want Broadcasted <: ArrayLike

Would it be possible to compute this at compile time?

If we forget backward compatibility, I think it's possible but it'd still be hard to do it without increase compilation time. I think you need to ask @mbauman to be sure. But, as it'd be a breaking change, I think it's impossible to change this in Julia 1.x.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 1, 2020

I thought that widening type parameters from AbstractArray to ArrayLike would be non-breaking, but it turns out that it can create ambiguity errors. Is there a go-to solution for these kind of problems?

@tkf
Copy link
Member

tkf commented Jan 3, 2020

I think the go-to solution is trait. One idea I had was to use the IndexStyle by defining it for all types:

struct NonIndexable <: IndexStyle end
IndexStyle(::Type) = NonIndexable()

As you can safely call IndexStyle for any type after this, indexability of a type can be queried just by calling IndexStyle(T). Users would then define something like

f(x::MyArrayLike) = _f(IndexStyle(x), x)
_f(::IndexCartesian, x) = ...
_f(::NonIndexable, x) = ...

This way, defining MyArrayLike <: ArrayLike is equivalent to defining IndexStyle(::Type{<:MyArrayLike}) = IndexCartesian() or IndexLinear().

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 6, 2020

Ok, I've audited Base, test, and the standard library for AbstractArray, AbstractVector, and AbstractMatrix. If the {T} parameter was extracted and used, I left them alone, and otherwise, I replaced them with ArrayLike, ArrayLike{1}, and ArrayLike{2} respectively. There are a few exceptions which I noted with the # specific comment required to resolve ambiguity.

I'm managed to convince myself that this should be non-breaking (again); the only way it could introduce ambiguities in a package is if that package is conducting piracy. Is that correct?

I used this overly aggressive strategy not because I think that all these methods should be extended to ArrayLike, but only because I think that someone more knowledgeable than me should be the one to decide whether or not it makes sense.

The next step is to see if I can't get Generator{Array} and Broadcasted to be ArrayLike.

@mbauman
Copy link
Member

mbauman commented Jan 6, 2020

I definitely get the need for improvements surrounding non-AbstractArray indexability, but I'm not sure this is it. I've been following this thread closely but I remain skeptical. My reservations include:

  • Why is dimensionality included in ArrayLike but not the element type? I think it's also likely to have an indexable object that doesn't know its dimensionality in the type domain.
  • I'm still not clear what the definition of ArrayLike is. Is it exactly an AbstractArray but without the type parameter? Is the documented AbstractArray interface actually an ArrayLike interface? If subtyping AbstractArray{Any} gives you additional functionality over subtyping ArrayLike — what is that additional functionality? Edit: Or is the more specific subtype actually a detriment in functionality?

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 6, 2020

I think it's also likely to have an indexable object that doesn't know its dimensionality in the type domain as it is to not know the element type.

I guess I assumed without thinking about it too carefully. Can you give some examples?

Is it exactly an AbstractArray but without the type parameter?

That's how I'm thinking of it.

Is the documented AbstractArray interface actually an ArrayLike interface?

I need to add documentation, but yes, that's how I'm thinking of it, with two exceptions: eltype and similar(A, T) to replace similar(A).

If subtyping AbstractArray gives you additional functionality over subtyping ArrayLike — what is that additional functionality?

I'm not sure about the "should" answer (what do you think?). Currently the "is" answer is that the extra functionality provided by AbstractArrays is simply everything without an ArrayLike method in the PR, that is, every function that dispatches on the eltype. But there are two reasons this could change:

  1. I think that some of the methods in this PR currently extended to ArrayLike should not be.
  2. We might add new generic methods for ArrayLike objects.

@mbauman
Copy link
Member

mbauman commented Jan 6, 2020

It just feels far too fiddly If the only difference between <: ArrayLike{N} and <: AbstractArray{Any,N} is that the latter preserves the Any in new (similar) outputs whereas the former does concrete widening.

When to preserve eltypes is a hard problem — especially in the context of Union{Missing, T}.

If we're inserting a new supertype here, I'd rather it represent a bigger departure from the requirements of AbstractArray.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 6, 2020

That's fair. I'm not particularly wedded to this solution, just looking for a way to get index-able iterators within multiple dispatch.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 6, 2020

If you want to see the kind of things that are possible with this PR, see my fork of MappedArrays https://github.com/bramtayl/MappedArrays.jl which I updated to be ArrayLike

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 8, 2020

@mbauman I started to look at the broadcast.jl code to see if I could get it to be <:ArrayLike{N}. It turned out to be remarkably easy:

struct Broadcasted{N, Style<:Union{Nothing,BroadcastStyle}, Axes, F, Args<:Tuple} <: ArrayLike{N}
    f::F
    args::Args
    axes::Axes
end

Broadcasted(f::F, args::Args, axes=nothing) where {F, Args<:Tuple} =
    Broadcasted{length(axes), typeof(combine_styles(args...))}(f, args, axes)
function Broadcasted{Style}(f::F, args::Args, axes=nothing) where {Style, F, Args<:Tuple}
    Broadcasted{length(axes), Style, typeof(axes), Core.Typeof(f), Args}(f, args, axes)
end

And because ArrayLikes automatically come with (most) of the AbstractArray machinery, I suspect that much of the broadcast machinery wouldn't be necessary, or, perhaps, the methods in the broadcast file could be used as generic methods for all ArrayLike objects. It seems like even if we don't export ArrayLike, this could be a win from the standpoint of code reduction?

@timholy
Copy link
Member

timholy commented Jan 8, 2020

If you want to see the kind of things that are possible with this PR, see my fork of MappedArrays https://github.com/bramtayl/MappedArrays.jl which I updated to be ArrayLike

To me it looks like you had to delete some nice functionality. What's the benefit?

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 8, 2020

I did have to delete some stuff, but hopefully some of it can come back:

  • Offset arrays: had to delete because the OffsetArrays package has a pirate-y similar method that turns into an ambiguity error. This could be fixed by adding a disambiguation method
  • Reinterpret arrays: I'm not sure how to change it to support ArrayLike objects, but maybe it could be done?
  • setindex! automatic type conversion: without an eltype, I don't know how we could support this.

Are those the things you were referring to?

The benefit is that it removes the need for the maybe a bit brittle testvalue function:

function testvalue(data)
    if !isempty(data)
        first(data)
    else
        zero(eltype(data))
    end::eltype(data)
end

If a type don't have a zero defined, and if the array contains a small union of eltypes (e.g. Union{Thing ,Missing}, or if the array is empty, then I don't think this method would work?

@timholy
Copy link
Member

timholy commented Jan 8, 2020

Yeah, that function is brittle. See JuliaArrays/MappedArrays.jl#29 for a potential workaround.

To me this mostly illustrates why we want traits. There's no doubt that it's nice to be able to wrap a more general object, but when it's wrapping an AbstractArray I want to be able to call eltype and have it be treated like an AbstractArray; I also want to have all the printing niceties that I get to inherit/support with AbstractArray. Since MappedArray has to be a subtype of something, the type system has to choose one or the other, but with traits we can have both.

I should get back to andyferris/Traitor.jl#8 someday.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 8, 2020

I would be very happy with a trait based solution too!

@bramtayl
Copy link
Contributor Author

bramtayl commented Jan 9, 2020

Well, I guess I'll close this in favor of a trait based solution, and hope that it comes sooner rather than later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arraylike: AbstractArrays without an eltype
5 participants