-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embed tensor-like objects as higher-dimensional objects with trailing singleton dimensions #3262
Comments
First off, I entirely agree that there are many worrisome quirks (like But I'm a little unsure if I think we should adopt broadcasting of scalar-like arrays. This comes close to violating one of the design principles of Julia: the types of a method's outputs should be uniquely determined by the types of its inputs. While allowing something like |
@johnmyleswhite Julia already broadcasts Numbers, rule 4 just extends this to scalar-likes. The only reason I included rule 4 was for consistency with MATLAB, where |
By hoops do you mean
I could see an argument for changing that as well, but I'm not sure where things stop before we're universally treating scalar-like tensors as scalars. |
@johnmyleswhite It's not an easy decision, but the sooner a small and consistent set of rules for working with multidimensional arrays is thought out, the better. In the proposal above, scalar-likes are indeed treated as scalars, except that they have a dimensionality. A Number would be a zero-dimensional scalar-like, a 1-Vector would be a 1D scalar-like and so on. We should be able to do things like |
It's definitely a non-starter to consider 1-element vectors or 1x1 matrices to be "scalar-like". Multiplying a 10x10 matrix by a scalar is fine but multiplying a 10x10 matrix by a 1x1 matrix is and should be a dimension mismatch error. Likewise, The fact that There are only two significant problems above as I see it, both involving transposes of vectors:
The first issue could be addressed by ignoring trailing singleton dimensions when comparing tensors for numeric equality ( Introducing transposed tensor types solves both problems:
This would also allow transpose to generalize to tensors by simply reversing the order of indices. The main reason we haven't done this is that I rather want the transpose to be lazy (i.e. share data with the original vector) and that introduce a number of issues. |
This is a question of whether scalar-likes should be broadcasted, not whether Numbers or 1-Vectors should be scalar-likes. I would also prefer not to do broadcasting for scalar-likes, so that
In MATLAB, |
I'm not sure what calling 1x1 matrices "scalar-like" means if it doesn't mean that they can be used like scalars. Regardless of whether you call a 1x1 matrix scalar-like or not, broadcasting is unproblematic: you broadcast along singleton dimensions and absent trailing dimensions and other dimension sizes must match exactly. When you broadcast two arrays, map(i->max(size(a,i),size(b,i)), 1:max(ndims(a),ndims(b))) @toivoh's new broadcasting code does this correctly and efficiently. In particular, multiplication acts as you propose: julia> rand(5,5)*rand(1,1)
ERROR: *: argument shapes do not match
in gemm_wrapper at linalg/matmul.jl:289
in gemm_wrapper at linalg/matmul.jl:280
in * at linalg/matmul.jl:94
julia> rand(5,5).*rand(1,1)
5x5 Float64 Array:
0.633873 0.0593671 0.545569 0.438291 0.00419443
0.590569 0.450254 0.360447 0.415297 0.200254
0.461908 0.633632 0.406851 0.432888 0.027472
0.109804 0.037443 0.298195 0.428459 0.455538
0.0525897 0.554127 0.389851 0.069876 0.427595 Regardless of whatever other functionality it may have, I know that one thing should be true of the dot product: |
The other issue that's come up here is that |
I see a scalar-like as something which has a number of dimensions but behaves as a scalar.
The fact that The result of the following operations are all mathematically equivalent for a vector x:
It is just wrong that the return types of these four operations differ and that some of them can't be used as scalars. What I would like is a uniform return type dimensionality, and whatever this dimensionality may be, that the result can transparently be used as a scalar. The way things are now, I honestly prefer MATLAB's simple yet crude solution of removing trailing singleton dimensions. The problem is not limited to scalars. Analogously, I can not pass |
I can see that an array of shape Now, IIUC mathematics does not recognize "type differences" as in "integer 2" vs. "float 2". The types are things we computer scientists put on things for our own purposes. So from a mathematical standpoint, one cannot complain about the types because they don't exist. However, one can complain about the results of functions such as So it seems this is about allowing the scalars to be embedded in the set of matrices as |
@JeffBezanson Yes, the embedding analogy is a good way of describing the problem. I'm not sure conversion is the solution though: if I have |
Could be of interest: the Tensor Toolbox for MATLAB explicitly stores the tensor's dimensionality, like Julia. From their TOMS paper:
Section 2.2 says:
The paper doesn't say why explicitly storing the dimensionality is a necessity. In other words, why not just drop trailing singleton dimensions, as MATLAB does (or more precisely, have an implicit infinite number of trailing singleton dimensions)? Speaking from my experience as lead developer of Tensorlab, also a MATLAB toolbox for tensors, I don't mind MATLAB's convention so much -- it has never been a hindrance in the development of Tensorlab. There is only one operation which comes to mind right now where the actual number of trailing singleton dimensions matters (and is perhaps the reason the authors of the Tensor Toolbox had in mind): the outer product. If A and B are two n x n matrices, their outer product is an n x n x n x n tensor. If A and B are two n x n x 1 tensors, their outer product should be an n x n x 1 x n x n x 1 tensor. An optimal solution would, at least in my view, keep track of the dimensionality of tensors, yet treat things that look like scalars as scalars, and things that look like vectors as vectors, and so on. The former Julia already does, the latter implies disregarding trailing singleton dimensions for most operations. |
@StefanKarpinski I'm curious of the issues by introducing transposed vector,what would they be? Could we make vector a type containing both value and direction fields? Direction field default to be column vector and has an alternative value of row vector. Would this solve the lazy problem by having the independant value field? |
@lsorber, from the way you're talking about treating things as scalar-like or vector-like, it seems that you may be thinking of some kind of automatic conversion or identification of types in Julia's type system. This isn't something Julia does or will do. We had initially considered it – in particular identifying zero-tensors with scalars – but concluded that it's not a good idea for a variety of reasons, and that decision has turned out to be a sound one. Thus, zero-tensors, while they can be conceptually identified with scalar numbers, are not the same thing in Julia. Rather, they are just made to behave like scalars wherever possible, but they still have a different type and different representation. Similarly, we are never going to have [Taking a step back, the identification of vectors with column matrices is not a unique mathematical phenomenon – the integers are embedded in the rationals, which are embedded in the reals, which are embedded in the complex numbers. If you pay attention to their construction, however, that's really a sleight of hand. The integers can't really be a subset of the rationals because the very definition of the rationals depends on the existence of integers. What's really happening is that we identify the integers with a particular subset of rational numbers which happen to look and act just like the integers. The human brain is so good at dealing with ambiguous situations that this issue never really comes up – it just works. The way Julia handles this is similar: there is a subset of the Rationals that generally behave a lot like the Integers; however, we only identify them behaviorally, we do not try to make them actually the same type.] |
I can accept that more operations should tolerate trailing singleton dimensions. Yes, conversion is not a "solution", just another example of the embedding. Although in some cases it can be a solution by adding definitions, as Stefan points out. The mere existence of both |
I don't understand why one would try to handle only trailing singleton dimensions. Perhaps there is something I'm missing, but take
for example. This is an assignment that currently doesn't work, and wouldn't work when handling only trailing singletons, yet it is unambiguous in intent and the array sizes differ only by singletons. |
That is a separate issue (which shapes are compatible for assignment) covered by #4048. This issue is about a formal embedding; i.e. the sense in which a |
Now I see the reason why Julia treats Let |
Let A has a size n x n x n. I feel that A[1,:,:] should have a size n x n, while A[1:1,:,:] should have a size 1 x n x n. This is much more consistent and uniform (since 1:1 is a range while 1 is not a range). Currently, A[1,:,:] has a size 1 x n x n in Julia. This is not intuitive and is inconsistent (since currently in Julia, A[1,1,1] is a scaler not a 1 x 1 x 1 Array). I think A[1,1,1] being a scaler is the right behaviour, while A[1,:,:] being an Array of size 1 x n x n is not the right behaviour. |
Intuition is subjective--if I slice the top layer off a cube (three-dimensional array) and don't rotate it, it has the appearance (shape) of a layer of a cube which still has depth, not the front face of the cube (a matrix) |
@wenxgwen I also think that is inconsistent. In numpy these would be different shapes:
@pao I think it's hard to argue that this is intuitive, if the range is the last argument they have different shapes!
|
I'm not arguing that Julia's doing the right thing, or doing it consistently, just that I don't believe that "intuition" is useful guidance here. |
I think that this is one of the most critical issues to be decided before julia gets too far along (certainly well before 1.0). I think some of present rules are overly magic. I have slowly come to the belief that a simpler array indexing system is preferable. I would propose the following 4 rules for array indexing, reduction operations, and broadcasting:
I believe the above system would make it easier to reason about code and be less error prone then the present system. Since it makes it easier to drop dimensions, uses of squeeze would go drastically down. However, the need to add dimensions will go up and so a simpler system for adding singleton dimensions would be helpful. There are array languages that have syntax for this, e.g. pseudo-indexes in Yoric that uses '-' for this (but '!', '*', or '$' may be better choices). Thus:
This syntax would make broadcasting operations easy again and would make the intent clearer than the present system. Let me know if I should move this comment into its own issue since it addresses possible new syntax and goes beyond the present issue. |
Any proposal of dropping singleton slices really needs to address @toivoh's associativity problem, which no one has yet done. |
@StefanKarpinski Are you addressing @BobPortmann? I'm not sure I understand the connection between dropping dimensions with integer indexes and reduction operations (which I fully agree with) and the dimensionality that results from matrix multiplication. @timholy What was your objection to reduction functions dropping a dimensions? I know you've expressed skepticism about that, but I'm not sure why. I have a lot of code of the form x=randn(3,5)
col_means = mean(x,0)
z = x-col_means where |
Like #5949, this isn't a proposal to drop all singleton slices, merely a proposal to use scalar/range to encode drop/not drop. I.e, @BobPortmann, perhaps there's a more technical meaning to "reduction" than I'm aware of, but I'd assert that "reduce" can apply at least as well to the size of an array (I've "reduced the amount of data in the array") as to its dimensionality. Personally, I'd prefer to keep the dimensions lined up when one does reductions. |
@malmaud, what does Currently both
to
|
@timholy The opposite of
would be
which would do the right thing. I don't see how it is any harder. And if there was
you wouldn't need it. |
What worries me about dropping dimensions all the time is that dimensions
|
As I argued in #5949, dropping dimensions in reduction would make it particularly difficult to write things like When comes to indexed views, I am not completely sure why the current way is less desirable than @BobPortmann's proposal. I think it is consistent and easy to understand, and just works in many practical cases without using extra functions like |
The '0' is the dimension of the reduction (since python is 0-based, the first dimension). My point was just that in python, edit: |
If we drop the dimension in the first place only to find that we have to add it back in order for the ensuring interactions with other arrays to work properly, then there is a question why we would want to drop the dimension initially. |
@StefanKarpinski It seems to me that the associativtiy problem you reference is different. I think that if you want |
@lindahua What if |
Yes, but it's an extra argument to every single reduction function. |
@lindahua "then there is a question why we would want to drop the dimension initially" I dislike the special casing of tailing dimensions (both Julia and IDL do this). Let x be a 3D array. I do think that if |
In my view, consistency means there is a simple principle/rule that everything conforms to. From this perspective, @BobPortmann your proposal is consistent, treating both The current Julian way is also a consistent framework, that is Generally, there are more than one consistent ways to define things. We have to carefully choose one that provides us the most practical benefit. I think the current Julia approach is a reasonable & sane approach. |
@lindahua It may be consistent, but trailing dimensions still get treated different. In any case, I won't argue semantics, I think you know what I'm trying to say. I hope people give some thought to my proposal and don't just casually dismiss it cause it is a big change. Many of the ideas in this issue are big changes so I cannot understand why it is labeled milestone 1.0. |
I mean, Matlab is also 'consistent' in that limited sense: the simple principle is that all singleton dimensions are dropped when squeeze is explicitly or implicitly called. But I think we all agree that's bad. I find it disturbing that function getslice(myarray:Array{Int,2}, dim, idx)
indexers = {1:size(myarray, 1), 1:size(myarray, 2)}
indexers[dim] = idx
getindex(myarray, indexers...)
end is not type-stable, since the return type depends on whether |
My point is not actually against @BobPortmann's proposal. Just that many different ways can be seen as consistent, and therefore we may need stronger reasons as to why we prefer one to another. |
Certainly you're right. The question is, which happens more often: the current objectionable call to I agree with Stefan that the extra argument to reduction functions is also annoying. On balance I don't find the arguments about slicing out reduced dimensions very convincing. I also agree with @lindahua that the current scheme is consistent, and that there are multiple consistent solutions. It does, however, seem to me that the rules are simplest if you just drop scalar-indexed dimensions everywhere. @JeffBezanson, I agree that dimensions have meanings, but you can keep them using range indexing rather than scalar indexing. However, one annoying consequence of this proposal is that for any algorithm that accepts indexes but wants to keep the meaning of all the dimensions, one will need a call to something like
where
Remembering to insert this call will surely be a source of bugs, at least a first. |
If you have a transposed vector type then that type is precisely what taking a row slice of a matrix should give you. Which brings us back to exactly where we are right now. I think the best proposal aside from we currently do is @toivoh's "absent" dimensions idea. This could be represented with a dimension size of -1, although that's an implementation detail. Dimensions sliced with scalars become "absent" rather than being removed entirely. Then a row vector is -1 x n and a column vector is m x -1. All dimensions are treated the same – the trailing ones are not special. Something about this bothers me, however, although I'm not entirely sure what it is. The scheme can be generalized by allowing negative dimensions of arbitrary size. Then we get something very much like "up" and "down" dimensions. But then it starts to get very complicated. Should a matrix have an up and a down dimension? Should it have two up dimensions if it's a bilinear form? |
Re dropping dimensions in reduction operations, here's another reason: if we dropped dimensions, |
@timholy That doesn't have to be the case. It could do as
|
Sure, if you pass the dims with a tuple, but there are good reasons to allow people to use vectors. |
There have been/are a multitude of issues and discussions on dimensionality compatibility between Arrays: #113, #141, #142, #231, #721, #1953, #2472, #2686, JuliaLang/LinearAlgebra.jl#8, #3253, julia-dev threads one, two, and probably many others.
The core problem is that Julia's rules for interacting with Arrays of different dimensions are just too difficult. In the following, let A be a Matrix and x be a Vector. For example, indexing can drop dimensions (e.g.,
A[:,i]
is a Vector), yet reduction operators such assum
,min
,std
do not (e.g.,sum(A,2)
is a Matrix). Numbers are broadcasted, yet length 1 Vectors and Matrices are not (e.g.,A+5
works, butA+x'*x
does not). A solution to the latter example is to useA+dot(x,x)
, but this is neither intuitive nor convenient. By now it is clear that the current situation is far from ideal. What Julia needs is a small and consistent set of rules for interacting with multidimensional arrays. Below I propose such a list, but I certainly do not wish to imply that this is a best solution.First, I am of the opinion that Julia's decision to have each object have a dimensionality (a number of dimensions) is the correct one. In MATLAB, trailing singleton dimensions are ignored, yet there are many cases in which it is important to consider how many dimensions an Array has. So let's assume that each object should definitely keep track of its dimensionality.
Second, let's consider broadcasting. In #1953, it was suggested that making the
+
and-
operators broadcast for things other than Numbers is "a little too magical". I agree with this statement and think that.+
and.-
are a good solution (so that+
and-
can throw dimension mismatch errors). As a side note, there are many other operators that would also benefit from broadcasting such asmod
,rem
,max
andatan2
(in general, most binary functions). The problem here is that bothrandn()
andrandn(1)
are scalars, yet only the former is a Number. This is whyA+randn()
works, butA+randn(1)
and evenA+x'*x
do not (but should!).Third, trailing singleton dimensions are ignored when adding or subtracting arrays, promoting the return type of the result to the maximal dimensionality of the two arguments. This is behaviour a user would expect, and I would like to expand on this idea per #2686. If
ones(3) + ones(3)''
works, thenones(3) == ones(3)''
should also be true.Here is a small set of rules which hopefully represents the behaviour we desire:
ndims
does not depend on the number of trailing singleton dimensions (unlike in MATLAB).sum
) do not change the the number of dimensions. Another example is thatdot(x,x)
andx'*x
should both have size 1 x 1. [*].+
and.-
). [**]ones(3)'_ones(3,1,1)
is a 1 x 1 x 1 scalar-like andones(3) == ones(3)''
istrue
.[] The only exception is if a function explicitly removes dimensions, such as
squeeze
.[*] The only exception is that broadcasting is not enabled for equality testing with scalar-likes, e.g.,
ones(3,3) == 1
should befalse
.For consistency, I would not even mind going so far as to remove rule 4, requiring people to write
A.+5
instead ofA+5
. The additional required effort is small, and the former makes it clear that the developer is explicitly requesting broadcasting.The text was updated successfully, but these errors were encountered: