Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add squeeze(A) method to squeeze all singleton dimensions #22000

Closed
wants to merge 1 commit into from

Conversation

staticfloat
Copy link
Member

@staticfloat staticfloat commented May 21, 2017

This addresses #11838, which gave me hope that this might be accepted. I'm not sure I agree with the "the user may accidentally shoot themselves in the foot" argument; it seems to me that this function does something useful (that at least a few people want) and the behavior is very straightforward, and so isn't something to cause much surprise.

The only surprising thing about this implementation is that passing in an array with nothing but singleton dimensions returns a zero-dimensional array:

julia> z = randn(1,1)
1×1 Array{Float64,2}:
 -0.454977

julia> squeeze(z)
0-dimensional Array{Float64,0}:
-0.454977

Of course, this is what happens with the old squeeze as well:

julia> squeeze(z, (1,2))
0-dimensional Array{Float64,0}:
-0.454977

```
"""
function squeeze(A::AbstractArray)
singleton_dims = [d for d in 1:ndims(A) if size(A, d) == 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using a generator would be more efficient here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it doesn't effect performance that much (we need to collect the generator output before passing it to squeeze(A, dims) in all cases) it is useless to create a temporary array, so I've done away with that, thanks.

1 3
2 4
```
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a warning re. the potential pitfalls of this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what I would write? This function is incredibly straightforward; it removes all singleton dimensions.

Looking back over the previous discussion, it looks like the concern is that you may pass in an NxMxL matrix expecting L to equal 1, but then being surprised when M also happened to equal 1 and thus the result is a 1-dimensional array. I'm not sure how to make the docstring more explicit that such a thing will happen; that is, after all, the entire point of this function.

@ararslan ararslan added the arrays [a, r, r, a, y, s] label May 21, 2017
@StefanKarpinski
Copy link
Member

I still don't think we should have this – it is inherently type unstable and will be an inevitable cause of buggy corner cases. Instead, I've previously proposed squeeze(r, A, dims...) which would do squeeze(r(A, dims...), dims...) for you, where r is a reducer like sum.

@staticfloat
Copy link
Member Author

The type instability is kind of the point; I have code that takes in a variable that on 0.5 ends up being a Vector, and on 0.6 ends up being an Nx1 Matrix. The operations I perform upon this variable were designed with a Vector in mind, and in a perfect world would work flawlessly upon both, but I don't want to have to go through all the library code that is explicitly expecting a Vector (often enforced with type annotations) and make sure it "does the right thing" when passed a Matrix that just so happens to look like a Vector. I want a function that takes in a Matrix, and if it can be made into a Vector, it makes it into a Vector. Yes this is inherently type-unstable, but as mentioned earlier, that's the whole point.

Half of the whole reason I'm proposing this is because the squeeze() function throws an error if you try to squeeze a dimension that doesn't exist. If it just returned the original vector when I tried to squeeze the nonexistant second dimension then I would be happy; but such a thing does not exist, so I am forced into code like this. Note that this is kind of the opposite of type-unstable; I am enforcing a common type upon varying inputs.

Personally, most of the reason I want this kind of a function is to enforce type stability, rather than create type instability. I want to accept Vectors, 1xN matrices, Nx1 matrices, etc.... and convert all of them to a Vector. Or I want to accept Matrices, NxMx1 tensors, etc..... and convert all of them to a Matrix. In a sense, I care less about what dimensions I'm starting with, and more about what dimensions I end with, so maybe we should attack this the other way around and come up with a smart way of squeezing with a deterministic return type, but in such a way that doesn't require me to perfectly determine the input dimensions so that I don't have to special-case on whether its a column vector or a row-vector, etc...

@andyferris
Copy link
Member

This is exactly what vec is for. The output dimensionality is always 1 but it accepts any shape.

@andyferris
Copy link
Member

(I'm beginning to labour this point - but since frequently Nx1 or 1xN matrices or row vectors are desired shapes in similar situations, functions for these in anology to vec might also be desirable... in general I feel functions which explicitly specify the output shape would be better than a generic squeeze whose behaviour can depend on dynamic size, leading to corner cases exposed at run time (and by a lack of inferability, of course))

@timholy
Copy link
Member

timholy commented May 22, 2017

@staticfloat, just to clarify: it's not possible to have squeeze(A) return a type that the compiler can infer---the sizes of the input matrix are a runtime variable, so there's no way for the compiler to know how many dimensions the output will have. So it can't possibly give you the type stability you seek.

In contrast, vec will give you a type-stable output, and is far clearer in any case.

@martinholters
Copy link
Member

Maybe a squeeze-like function where dims denotes the dimensions to keep would be useful. E.g.

nn(A, (1,3)) == A[:,1,:] # or error if size(A,n) ≠ 1 for any n ∉ {1, 3}

By making duplicate dimensions in dims an error, that would be type-stable as the result would always have length(dims) dimensions. (We probably have this and I just missed it, right?)

@staticfloat
Copy link
Member Author

staticfloat commented May 22, 2017

This is exactly what vec is for. The output dimensionality is always 1 but it accepts any shape.

This is close, and I will tend to use it, but vec() will happily take in a 5x5 Matrix and turn it into a 25-length Vector. That's not really what I'm going for.

Perhaps what I really want is the inverse of #21977, so that I can just give a higher-order tensor to a Vector() constructor, and it will squash things down deterministically, and error out if the operation doesn't make sense, as in the 5x5 Matrix example.

@mbauman
Copy link
Member

mbauman commented May 22, 2017

Hm, then maybe a decent spelling is something like squeeze(Vector, A) or squeeze(Matrix, A). Greedily remove singleton dimensions until you get to the specified number of dimensions (in the Array{<:Any}'s type). Type-stable and meaningful.

@staticfloat
Copy link
Member Author

Yeah, I like that a lot. Let's define it to remove dimensions from right to left, so e.g. squeeze(Matrix, ones(1, 5, 1)) would yield a 1x5 Matrix.

@nalimilan
Copy link
Member

It could also be useful for that function to add singleton dimensions when needed: squeeze(Matrix, x) would convert a vector to a one-column matrix. If we want that, then another name than squeeze should be found.

@StefanKarpinski
Copy link
Member

convert?

@nalimilan
Copy link
Member

convert would work, but currently it doesn't behave that way (and at #16633 people didn't seem OK with changing it).

@andyferris
Copy link
Member

convert?

Having a wider range of conversions or constructors that follow this behavior certainly would solve this problem. Though I'm not sure specifying the type is exactly what you want - really you want something that hooks into the similar interface and has certain rules on how to get the targetted dimensionality from the input's size (or even better, indices).

@mbauman
Copy link
Member

mbauman commented May 22, 2017

It's one thing to convert any and all iterables. It's another thing to convert something that's already an AbstractArray. In fact convert(AbstractMatrix, A) could use similar to allocate the output. I think this is well-defined enough to be a method of convert.

@andyferris
Copy link
Member

andyferris commented May 22, 2017

Right - I agree, a method for convert(::Type{AbstractMatrix}, ::AbstractArray) would be great. The thing which has concerned me in the last few days (in a few issues/PRs open at the moment) is that I'm betting people will much more frequently write Matrix(v) rather than AbstractMatrix(v), making their code less generic than if we just made a matrix(v) function (which can return any AbstractMatrix) and made this the official interface for such conversions, for instance...

@ararslan
Copy link
Member

For the sake of bikeshedding, one possible name for the inverse of squeeze(T, x) would be inflate(T, x). You squeeze out singleton dimensions, and you inflate by adding more singleton dimensions.

@StefanKarpinski
Copy link
Member

inflate is a good name; unsqueeze might be more obvious.

@timholy
Copy link
Member

timholy commented May 27, 2017

Another option for all this might be specialized index types:

  • A[!, :, :] would be shorthand for @assert size(A, 1) == 1; reshape(A, (size(A, 2), size(A, 3))). This is inferrable.
  • A[-,:,:] would drop the first dimension if it can (not type stable, and A[-,-,-] would be like squeeze(A), perhaps could be abbreviated A[-])
  • b[+, :, +] would create a 3d array from a vector b.

This is completely doable since we can dispatch on function types.

@ararslan
Copy link
Member

That's a really neat idea but it seems kind of obscure to me.

@timholy
Copy link
Member

timholy commented May 28, 2017

The main advantage is that it tells you which dimensions to "inflate." Say you want to convert a matrix into a 5-tensor, how do you specify which dimensions should be singletons?

@andyferris
Copy link
Member

APL indexing a dimension with column/row matrices also inserts a singleton dimension. Maybe we can utilise this somehow?

@mbauman
Copy link
Member

mbauman commented May 30, 2017

b[+, :, +] would create a 3d array from a vector b.

This is what numpy's newaxis() does, and we have a first-class equivalent for this one: [CartesianIndex()].

Interestingly, when I saw you passing functions to indexing as an email preview — without any other context — my first thought was that you might have been proposing a new idiom for specifying reductions. See #16606 for that digression.

mbauman added a commit that referenced this pull request Aug 29, 2017
This simple definition makes it easier to write reductions that drops the dimensions over which they reduce. Fixes #16606, addresses part of the root issue in #22000.
@staticfloat
Copy link
Member Author

I'm going to close this in favor of #23500

@ararslan ararslan deleted the sf/squeeze branch October 21, 2017 20:32
nickrobinson251 pushed a commit to invenia/julia that referenced this pull request Aug 31, 2019
This simple definition makes it easier to write reductions that drops the dimensions over which they reduce. Fixes JuliaLang#16606, addresses part of the root issue in JuliaLang#22000.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants