Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the official similar interface? #11574

Closed
johnmyleswhite opened this issue Jun 4, 2015 · 30 comments
Closed

What is the official similar interface? #11574

johnmyleswhite opened this issue Jun 4, 2015 · 30 comments

Comments

@johnmyleswhite
Copy link
Member

While discussing #10525, it became clear that NullableArray{Int, 1} is an anomaly with regard to some of the assumptions made about AbstractArray{T, N}. Unlike Array{Int, 1}, which has an eltype of Int, NullableArray{Int, 1} has an eltype of Nullable{Int}. It is, to my knowledge, the only AbstractArray type whose first type parameter is not equal to its eltype. (This, of course, could be changed.)

This distinction affects how one interprets something like

x = NullableArray([1, 2, 3], [false, false, false])

similar(x, Float64, 3)

versus something like

x = NullableArray([1, 2, 3], [false, false, false])

similar(x, Nullable{Float64}, 3)

In the first variant, a call to similar(x, T, dims) produces a new object of typeof(x) <: AbstractArray{T, N} whose first type parameter is the passed argument T.

In the second variant, a call to similar(x, T, dims) produces a new object of typeof(x) <: AbstractArray{T, N} whose first type parameter is eltype(T).

More generally, this raises the issue: should the second argument to similar refer to the first type parameter of the resulting AbstractArray or should it refer to the eltype of the resulting AbstractArray? Or should we require all AbstractArray objects to match the type-parameterization assumptions that the first type is always the eltype and the second is always the order of the array?

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

Worth thinking about whether #11557 ties into this too? It's seeming like using eltype in as many places as we do right now isn't as general as one might like.

@mbauman
Copy link
Member

mbauman commented Jun 4, 2015

My stance on this is fairly firm: I think that what you choose to be the parameters of an AbstractArray subtype are implementation details specific to that type. They don't matter. You don't even need to parameterize an AbstractArray subtype (e.g., type BitArray{N} <: DenseArray{Bool,N}, or SparseMatrix{Tv,Ti} <: AbstractArray{Tv,2}). The fact that we typically do so in the same order as AbstractArray is just a code style convention.

What matters is what you report are the parameters to AbstractArray{T,N}; T must be its eltype, and N must be ndims. It's actually not possible to write methods that generically extract out the type parameters of an arbitrary leaf type, and that's a good thing (yes, I see the irony here with respect to #11547 😄).

So, for similar, I think that it'd be very confusing to allow specifying anything other than the AbstractArray's eltype because it's part of the generic AbstractArray interface. Going back to BitArray, similar(falses(10), Float64) is simply Array(Float64, 10) (even before #10525). I think that similar simply means "give me a chunk of mutable storage of a certain size that can support this element type."

@StefanKarpinski
Copy link
Member

I pretty much agree with everything @mbauman says. It's a little strange that

similar(NullableArray([1], [false]), Float64)

produce an Array{Float64,1} but it seems right.

@mbauman
Copy link
Member

mbauman commented Jun 4, 2015

Diagonal is an interesting case for similar in terms of its mutability (once setindex! is defined for its diagonal elements): #10001 (comment)

Is returning a Diagonal from similar mutable enough? Or does the entire array need to be mutable to satisfy the requirements of similar? What if the dimensions aren't square? Should it return a full array in that case? If so, then it'd be type unstable to return a Diagonal in the square case.

@johnmyleswhite
Copy link
Member Author

I mostly agree with what @mbauman is saying, but I think that specific behavior is really weird. In particular, I think it's odd that similar(NullableArray([1], [false]), Float64) produces an Array, but similar(NullableArray([1], [false]), Nullable{Float64}) produces a NullableArray. What should similar(NullableArray([1], [false]), Nullable{Nullable{Float64}}) produce?

Alternatively, why are we saying that Array{Nullable{Float64}, 1) should never be in the image of similar(NullableArray, ...)} It seems strange to me that you could ever get an array back, but certain arrays could never be produced.

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

Is returning a Diagonal from similar mutable enough?

I don't see how the thinking here could be any different than for SparseMatrixCSC, where similar returning anything other than another SparseMatrixCSC is just asking for an OutOfMemoryError.

What if the dimensions aren't square?

It looks like we don't have non-square Diagonal matrices right now, since the size is not explicitly stored as a field right now, it's assumed to always be square:

size(D::Diagonal) = (length(D.diag),length(D.diag))

#8240 would provide an equivalent though.

@mbauman
Copy link
Member

mbauman commented Jun 4, 2015

I think similar is just spelled wrong for the way we've been using it. If I remember right, I think its name led to a lot of my initial confusion on its purpose. Perhaps a better spelling would be alloc, or maybe even AbstractArray. I see it as a way to ask for an AbstractArray in such a way that an existing array can say something about what kind of array is returned. So @johnmyleswhite: similar(NullableArray([1], [false]), Nullable{Nullable{Float64}}) should yield a NullableArray{Nullable{Float64}}.

@nalimilan
Copy link
Member

Indeed, it helps to think about NullableArray{T} as being an efficient but fully compatible replacement for Array{Nullable{T}}.

@StefanKarpinski
Copy link
Member

This is a good point: when similar was first introduced, we could not call abstract types. Now it might make more sense for these to be methods of abstract types like AbstractArray.

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

What does call on an abstract type even mean? I guess the answer is "whatever we want it to," but that doesn't instill much confidence.

@StefanKarpinski
Copy link
Member

"Give me a reasonable value of this type". For example, you could write Integer(x) to get a reasonable kind of Integer for x – e.g. if x is Float64 then an Int, if x is BigFloat then a BigInt. Admittedly this is a little different since you're not converting a single value. I'm not saying it's what we should do, but it's worth considering as something that we now have the ability to do.

@mbauman
Copy link
Member

mbauman commented Jun 4, 2015

I was just spit-balling, but I'm liking it more and more. It certianly solves @johnmyleswhite's initial question: it'd be more obvious that AbstractArray(A, T, dims) gives you some subtype of AbstractArray{T}.

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

I'm liking it less and less. It's really "give me a reasonable value of a reasonably chosen concrete subtype of this abstract type" which is a bit too ambiguous for my liking. Especially if there start being odd mapping rules from the concrete type of A (combined with T in some interdependent way) to the concrete type of the output. If that mapping is always the identity then at least it's consistent and maybe not too bad, but anything else would feel fishy.

@StefanKarpinski
Copy link
Member

+1 about the identity mapping of element types (unless, of course, a different element type is asked for).

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

AbstractArray(A::AbstractArray) can have that property (and I actually meant return container type), but can AbstractArray(A::AbstractArray, T, dims) ?

@mbauman
Copy link
Member

mbauman commented Jun 4, 2015

Definitely not if its going to wholesale replace similar.

Once SparseVector is introduced, we'll want:

similar(::SparseMatrix, T, ::Int) -> SparseVector{T}
similar(::SparseMatrix, T, ::Int, ::Int) -> SparseMatrix{T}
similar(::SparseMatrix, T, ::Int, ::Int, ::Int) -> SparseCOO{T} # Maybe someday

This is essential for sensible indexing behavior.

@mbauman
Copy link
Member

mbauman commented Jun 5, 2015

Another thing to consider here: it'd also be nice to allow some sort of promotion to a common AbstractArray type when working with two or more arrays at the same time: #2326.

@tomasaschan
Copy link
Member

I think a few basic assumptions, valid for Arrays, makes this a lot easier also for more tricky cases:

  • similar(A) does the same as copy(A) without the actual copy of data (instead returning uninitialized memory)
  • adding arguments to similar only changes that specific aspect of the output - i.e. if the user didn't supply an argument about a specific aspect, that aspect should not change.

In practice, I guess especially the first point means that if copy(A) doesn't yield an Array, then similar needs to be extended with new methods to handle this. Exactly how T relates to eltype, etc, should IMHO depend on what is expected of a call to copy.

@mbauman
Copy link
Member

mbauman commented Jun 5, 2015

I like those rules, @tlycken, but I think I'd slightly amend each. I think similar(A) needs to return a different thing than copy(A) if A is immutable doesn't support setindex!. And I do think that there needs to be room for interaction between the container type and the element type and/or dimensionality.


Here's another idea: break similar into two parts. similar(A, Float64, (10, 10)) becomes:

T = promote_type(typeof(A), AbstractArray{Float64, 2})
T(Float64, 10, 10)

Then we make the constructor (Both T(::Type, ::Dims) and T(::Type, Int...)) a part of the required AbstractArray interface. This is now very explicit. Generic code that simply wants some sort of AbstractArray back can do so, and code specific to a type can make sure that T remains that type.

@mbauman
Copy link
Member

mbauman commented Jun 5, 2015

Hm, upon looking back at that, it doesn't quite do what we want. similar typically is used to say what the output eltype must be, whereas the above would promote all the passed eltypes.

@timholy
Copy link
Member

timholy commented Jun 5, 2015

I think similar(A) needs to return a different thing than copy(A) if A is immutable.

I'm probably missing your main point, but just because the container is immutable doesn't mean the elements are. Example:

immutable MyArray{T,N} <: AbstractArray{T,N}
    data::Array{T,N}
end
A = MyArray(rand(2,2));
A.data[1,1] = 0.5

works just fine.

Also, I'm skeptical about the promote idea, because you'll always get a non-leaf type unless people write specific promote_type methods (and then there's the ambiguity problem again...).

@mbauman
Copy link
Member

mbauman commented Jun 5, 2015

Yeah, I don't think promotion is quite the operation we want here. And see my edit — I was lazy and wrote immutable when I meant not-supporting-setindex!.

I'm still entertaining the idea that similar could be split up into two steps, though: some step to identify the type, and another to do the construction.

@timholy
Copy link
Member

timholy commented Jun 5, 2015

If a type doesn't support setindex!, then I think it's reasonable that it should provide a custom override of similar.

@tomasaschan
Copy link
Member

Also, for many AbstractArray subtypes, it's simply not possible to define a constructor T(::Type, ::Int...) - again, looking at Interpolations.jl this would make subtyping AbstractArray contra-productive.

My point for the first bullet was that similar(A) and A should be, well, similar, in all aspects. Type, eltype, dimensions, memory layout, whatever other characteristics the concrete type of A happens to have, should all be the same for similar(A). The only difference between similar(A), copy(A) and zeros(A) should be what contents they have.

My point with the second bullet, was that when the user supplies additional arguments, it is because they want to change some specific aspect of the result. Exactly which aspect that is, and what the parameters for that should be, might vary between different concrete types. similar(::Array, T) changes both the first type parameter and the eltype of the result to T, while similar(::NullableArray, T) changes the eltype to Nullable{T}. This is perfectly fine.

While writing this, I've come to realize that I don't think there even is a sane way to define what similar(::AbstractArray, args...) is supposed to do without knowing the concrete type of the array, at least not for a very general case that allows package authors and other AbstractArray-subtypers to not define a method or two for similar.

@mbauman
Copy link
Member

mbauman commented Jun 5, 2015

should be, well, similar, in all aspects

That's precisely why I think this is a naming bug. The way similar is currently used does not mean similar. :)

@mbauman
Copy link
Member

mbauman commented Jun 5, 2015

Anyone have a nice name for "ask an array (or ideally arrays) for some modifiable AbstractArray that it knows can support a given element type and dimensionality"?

@timholy
Copy link
Member

timholy commented Jun 5, 2015

To me, similar doesn't imply "the same," so I don't think it's that bad. congruent might be an alternative?

@johnmyleswhite
Copy link
Member Author

I'm ok with closing this, since I think my remaining concerns would require a very substantial reorganization of our abstractions to resolve.

@tkelman
Copy link
Contributor

tkelman commented Sep 8, 2015

defer to #10064 for more general abstraction design?

@tkelman tkelman closed this as completed Sep 8, 2015
@mbauman
Copy link
Member

mbauman commented Sep 8, 2015

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants