Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: strided slices #13253

Closed
ianlancetaylor opened this issue Nov 14, 2015 · 46 comments
Closed

proposal: spec: strided slices #13253

ianlancetaylor opened this issue Nov 14, 2015 · 46 comments

Comments

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 14, 2015

This proposal is intended as an alternative approach to the ideas discussed in https://golang.org/cl/16801 for #6282. This is for discussion only at this stage.

In Go a slice can be seen as a view into an array of some type T. The slice defines the start and length of the view. A slice also has a capacity, but that is mainly relevant when building an array. Once the size of an array is determined, a slice is a view into that array.

This is inconvenient for people who want to deal with multi-dimensional data. Arguments for this can be found in the links above, and I won't rehearse them here. What I want to propose here is a different approach to multi-dimensional data.

Go is a relatively low level language that pays attention to memory layout. The case we are discussing here is a multi-dimensional view of data that is in fact laid out linearly in memory. For cases where the data is not laid out linearly, one would naturally use a slice of slices, or some data structure involving pointers.

I propose adding a new type to the language: a strided slice. A strided slice of T is represented as [*]T. A strided slice has a backing array just as a regular slice does. However, the strided slice also has a stride, an integer. Each element of the strided slice is not a single element of the backing array, but the number of elements described by the stride. Indexing into a strided slice thus produces a slice.

A strided slice is created from an array or slice using a strided slice expression, whose syntax is *[stride]. The strided slice starts at the beginning of the array or slice, and its length is determined by dividing the length of the array or slice by the stride using a truncating division. A strided slice has no capacity (or, if you prefer, the capacity equals the length). Given an array [N]T or slice []T, a strided slice expression produces a strided slice of type [*]T.

In the type [*]T T can itself be a strided slice. For convenience we speak of the arity of a strided slice. The arity of [*]T is 1 if T is not a strided slice. Otherwise, it is the arity of T plus 1. The arity is the number of strides in the strided slice. To be clear: []int is a normal slice, which could be said to have arity 0. [*]int is a strided slice of arity 1. [*][*]int is a strided slice of arity 2.

In the following let's suppose we have an array or slice a of type T. Let's suppose we write ss := a*[S], giving us a strided slice of stride S and arity 1.

A strided slice can be used in an index expression. The result is type []T. The expression ss[i] is equivalent to a[i*S:(i+1)*S:(i+1)*S]. Bounds errors are checked just as for that slice expression. This value is not addressable and is not an lvalue. In fact, that is true of all expressions on a strided slice.

A strided slice can be used in a slice expression with two elements, producing a value of type [*]T. The expression ss[i:j] is equivalent to a[i*S:j*S]*[S].

A strided slice can be used in a strided slice expression, producing a value of type [*][*]T, which is a strided slice of arity 2. The expression ss*[S2] produces a strided slice whose first stride is S2 and whose second stride is S.

Now we discuss how these operations work on a strided slice of arity N where N > 1. Again the type is [*]T, but T is itself a strided slice. Assume we have ss := a*[S]*[S2]*[S3]... for some sequence of S2 S3....

The expression ss[i] produces a slice of strided slices of type []T. It is equivalent to a[i*S:(i+1)*S]*[S2]*[S3]....

The expression ss[i:j] produces a value of type [*]T. It is equivalent to a[i*S:j*S]*[S]*[S2]*[S3]....

The expression ss*[S0] produces a value of type [*][*]T. This produces a strided slice of arity N + 1 whose first stride is S0, with following strides S, S2, S3, ....

A strided slice may be used in a range statement. There are two forms. In the usual form, the range produces two values: an index into the strided slice and the result of the equivalent index expression (which will be a slice or strided slice). In the longer form, it may be used with exactly N + 2 values. The first value is the index using the outermost stride, then the index of the next stride, and so on. The penultimate value is the index into the normal slice that is the result of indexing into the final strided slice. The final value is the value in the underlying array.

The predeclared function len applied to a strided slice returns the number of valid indexes. For a strided slice of arity 1 and stride S and backing array a, this is len(a)/S using a truncating division.

The predeclared functions cap and append may not be applied to a strided slice.

A new predeclared function stride returns the stride of a strided slice, a value of type int.

Examples

Here are a couple of examples of using strided slices.

// A 4x3 matrix.
a := []int{1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3}
for i, j, v := range a*[4] {
   fmt.Printf("%d %d %d ", i, j, v)
}

// Matrix multiplication.
// m1 is d1 x d2, m2 is d2 x d3, producing d1 x d3 matrix.
func matmult(m1, m2 []int, d1, d2, d3 int) []int {
    r := make([]int, d1 * d3)
    ss1 := m1*[d2]
    ss2 := m2*[d3]
    ssr := r*[d3]
    for i := 0; i < d1; i++ {
        for j := 0; j < d3; j++ {
            for k := 0; k < d2; k++ {
                ssr[i][j] += ss1[i][k] * ss2[k][j]
            }
        }
    }
    return ssr
}

Implementation

A strided slice of arity 1 has a backing array, a length, and a stride. Thus the internal representation looks exactly like a normal slice, with the capacity field replaced by a stride field.

A strided slice of arity 2 has a backing array, a length, and two strides. In general, the internal representation of a strided slice of arity N is
struct {
array *T
len int
stride [N]int
}

The compiler always knows the arity of any given slice, so implementing the index and slice expressions is straightforward.

Rationale

Strided slices provide a general way to implement multi-dimensional arrays with no limit on the number of dimensions. They provide a flexible way to access linear data in multiple dimensions without requiring hard coded sizes or explicit multiplication. They are a natural extension of the slice concept into multiple dimensions.

@griesemer
Copy link
Contributor

This is an interesting proposal. Some initial questions and comments (I can't spell out "strided slice" each time, so I call them "strice" here - my apologies...):

  1. Is there a difference between [*][*]T and [*]([*]T)? If I understand correctly, the former is a strice with arity 2 which is implemented as a strice descriptor containing 2 strides and pointing to a single contiguous underlying backing array. The latter would be a strice with arity 1 holding elements that are arity-1 strice descriptors. Is there a way to differentiate the two? The latter I could produce as follows: var a [][*]T; x = a*[s]. Am I missing something? (I think an arity 2 strice is really a 2-dimensional thing laid out flat in memory, so it needs some notation [<something> , <something>]T to differentiate from [<something>][<something>]T).

  2. A strice expression can be used to go from an array or slice to a strice. Is there a way to go from strice back to a slice such that multiple strice elements are "combined" into a slice? I think this would make it possible to reshape such multidimensional structures which might be useful.

  3. Notation: I am worried about syntactic problems with strice expressions. For one a * [s] looks like a multiplication of sorts (while it really divides an array or slice into strice elements). Also, there may be syntactic ambiguities as in a * [s]T (which way does a strice expression bind). Not to mention that we already use * for both multiplication, indirection, and to denote pointer types. Instead of *, / comes to mind, except that we divide into sub-elements of length s rather than into s sub-elements. Maybe a/[s]? Or even introduce a new operator \: a * [s] could be written as a\s (as in split a into elements of length s), which would look very clean on the page.

@griesemer
Copy link
Contributor

PS: To expand on my point 1): A strided slice is really a 2-dimensional slice where the 2nd dimension has a stride (= length of 1st dimension). Or in other words: [*]T is [,]T, a 2-dimensional slice. An n-dimensional slice has n-1 stride values in the descriptor.

@nsf
Copy link

nsf commented Nov 15, 2015

Hi, I was just passing by and noticed this discussion, which is rather interesting to me (as a hobby I write voxel stuff in C++ sometimes: https://github.com/nsf/nextgame/blob/master/source/Geometry/HermiteFieldToMesh.cpp). Just wanted to point that out, because it means I work with 3 dimensional data a lot and I care about efficiency. So, I hope you don't mind some input from just a stranger who also happens to be a Go user. After I introduced myself, few comments:

  1. I agree with @griesemer, [*][*]T syntax is slightly confusing. Because it implies a "slice of slices" kind. Similar syntax dispute can be seen in C# community, where they have both arrays of arrays as int[][] and multidimensional, but contiguous arrays as int[,]. A note about implementation implies that N-arity strided slices are a single thing and not a "slice of slices" kind, hence using something like [,,,]int for N-arity makes more sense. I will use [,]int syntax from now on.

  2. While technically saying that arity is a number of strides in strided slice is correct and hence [,]int is a 1-arity strided slice, imho people will find it confusing. Essentially 1-arity slice represents a two-dimensional array. Just wait when someone replaces arity with dimensionality and starts to say 1-dimensional strided slice. Perhaps it can be seen as correct as well, but confusing. My proposal: in terminology let's keep things closer to dimensionality of arrays, this is what most people are familiar with. [,]int is a strided slice which gives access to two-dimensional data. [,,]int is a strided slice which gives access to three-dimensional data.

  3. A good focus point in explanations would be to emphasize that stride is not dimensionality, while it may seem like one for 2-dimensional arrays. For a 3-dimensional array 3x3x3, strided slices will have two strides, one is 9 and the other is 3. Of course experienced people understand the difference, but education is important. It's a small note, when explaining strides - start with a 3-dimensional arrays.

    For math geeks, offset to an element calculated using strides:
    offset = i1 * S1 + i2 * S2 + ... + iN * SN

    This is what our CPUs love the most - linear math.

  4. As for "strice expressions" as Robert calls them. We can use a similar syntax mentioned above: a[D,], which is an equivalent to a[D,len(a)/D]. For more than 2-dimensions the math gets slightly more complicated, but the idea is the same: a[D1,D2,D3,]. A good question would be: why a[D,] and not a[,D]. And given what I said in point number three, why I use D here instead of S.

    Let me elaborate on this a bit more. One of the most common things you will use this "strice expression" for is to go from slices to strided slices and back. See, we need a way to go back. I'll explain going back in point number 5, but let's talk about slice -> strided slice path first.

    People are comfortable with dimensions. The stride matches one of the dimensions of an array in 1-arity case. But in 2-arity (3 dimensions) and further, strides are different from dimensions and when you perform "stricing", doing math manually seems pointless. However, people get confused by the whole "row-major" vs "column-major" thing. And the answer to that problem is really simple, because both are invalid when we talk about computing. Computer memory is linear and terms rows/columns do not apply here. In that regard I appreciate the stride term. But let's try to combine the two and put it this way for Go: first dimension affects first stride. We don't have rows and columns, so I'm not saying that first dimension is a row or column, both would be incorrect, but it does define a first stride value and in this sense it has a very perfect low level and semantical meaning. And this is how we will define multidimensional arrays in Go, in terms of strides. The formula for calculating strides from dimensions would be (I use functional notation instead of subscript here, sorry, markdown limitations):

    S(N) = S(N-1) / D(N), when N > 0
    S(0) = len(slice)
    where S - stride, D - dimension
    

    Let's go super formal and define slices as strided slices with stride equals 1 (and in addition they return elements instead of slices). Checking our formula. Having a slice of 10 elements, S(1) = S(0) / D(1). Slice is a 1-dimensional thing and size of the dimension matches its length. Hence S(1) = 1. Seems correct.

    Let's say I have an array of 12 elements and I want a strided slice with stride 2, as a result we call it a two-dimensional array with dimensions: 6x2 (in that order). This will be the only order defined by Go semantically. What matters here is memory representation and strides.

    S(1) = S(0) / 6 = 2
    S(2) = S(1) / 2 = 1

    Because indexing a striding slice returns a strided slice (for our formality experiment nothing but strided slices exist). We can see how stride of a returned slice is 1, which makes it an ordinary slice.

    We would write this in code this way: ss := s[6,] or ss := s[6,2]. Because our notation of dimensions say that the first dimension affects the first stride and the second one affects the second stride. Of course in reality there is only one stride value, because the last one is always 1. This is also why last dimension is optional. We know the last stride is always 1. Stride of 1 means we have a normal slice. I hope you get what I mean.

    Similar for 3 dimensions. Let's say I have an array of 27 elements and I want a strided slice with strides 9 and 3, which represents a 3x3x3 array. So I can do it this way: ss := s[3,3,]. Checking math:

    S(1) = S(0) / 3 = 27 / 3 = 9
    S(2) = S(1) / 3 = 3
    S(3) = S(2) / 3 = 1

    In my opinion this way of connecting abstract term dimension with a very precise and technical term stride will yield a lot of benefits for Go programmers. Forget about rows and columns, please!

  5. And the final part. When we work with multi-dimensional arrays, we store them in memory and in other places as one-dimensional data. As a result of that process we often need to convert multi-dimensional data to a slice. For example if I have a 128x64x64 voxel array, at some point I might decide to copy a 2x64x64 chunk at the bottom of this array from another one. The question is - how would I do it with strided slices? ss[0] yields a strided slice which represents another strided slice which represents 64x64 array. Can I get a slice with length 2*64*64 directly? Can I convert arbitrary part of a strided slice of any arity to a slice? While most operations which operate on a part of N-dimensional array do it on a line level, sometimes I need that low level access to the whole data. You can even view it simpler. What if I have a strided slice of 128x64x64 3d array. Can I get a slice of length 128*64*64, because I want to pass it down to a compression algorithm? Seems like Ian mentioned some of that, but using *-based syntax is very confusing.

    I think if we just use [,,] syntax it will be easier to understand. Let me try to make a few examples, based on my 128x64x64 array above.

    ss := s[128,64,64] creates a strided slice [,,]T
    ss[0] yields a 64x64 strided slice [,]T
    ss[0:2] yields a 2x64x64 strided slice [,,]T

    Which is great for getting strided slices from strided slices, but how do we get slices from strided slices? At some point Go had implicit conversion from arrays to slices, I don't think we need to repeat that idea with strided slices and allowing implicit conversion from strided slices to slices. But we have a lot of libraries which use slices, how can we connect two worlds?

    Technically it seems it should be allowed to convert any strided slice to a slice, but what's the syntax? Perhaps explicit conversion would suffice? []T(ss)?

    Let me give you one more example why I need it:

    var a [,,]int // strided slice of 128x64x64 voxels
    var b [,,]int // strided slice of 128x64x64 voxels
    // copy a chunk of 2x64x64 at the bottom of 'b' to the top of 'a'
    copy([]int(a[len(a)-2:]), []int(b[:2])

    Explicit conversion ruins fun quite a bit, but we need to connect the two worlds somehow.

  6. And the final final part is just an idea. If we talk about efficency, can we have fixed-size strided slices? Syntax allows the type: var a [128,64,64]int. Pros: no need to store stride values anywhere, Cons: size is fixed. It seems we can allow "stricing" a fixed-size strided slice just like slicing an array.

@nsf
Copy link

nsf commented Nov 15, 2015

Oh, small correction. I mentioned syntax for fixed-size strided slices as [128,64,64]int, but it uses dimensions, not stride values. What I said about dimensions and strides applies. Perhaps for readability sake it would be a good idea to make it this way. [128,64,64]int in terms of dimensions means [4096,64,1]int in terms of strides. Another example why calculating strides by hand is not fun at all, compiler can do that for us.

@nsf
Copy link

nsf commented Nov 15, 2015

Another idea for strided slice -> slice deconstruction.

ss := s[128,64,64] 128x64x64 strided slice of type [,,]int
ss2 := ss[,] deconstruct slice to 8192x64 strided slice of type [,]int

You can think of it as taking slabs and instead of keeping them one on top of another put them into line.

E.g. ▤ becomes _ _ _ _ _ _

It's possible to apply slicing during deconstruction:

ss2 := ss[0:4,] deconstruct slice into 256x64 strided slice of type [,]int

But you can deconstruct it directly into a slice also!

s2 := ss[,,] yields 524288 slice of type []int

My example of copying 2x64x64 from bottom of b to top of a would look like this:

var a [,,]int // 128x64x64 strided slice
var b [,,]int // 128x64x64 strided slice
copy(a[len(a)-2:,,], b[:2,,])

I know it's a bit insane. But it's just reversing the process.

@somadivad
Copy link

The following are a couple of features that are important in machine learning applications, and that I would be interested to understand whether strided slices can support. (These features are supported by numpy ndarrays and torch tensors, for example.)

  • The ability to manipulate the "shape" of the multi-dimensional array on the same underlying data. For example, given a [30] slice, to construct a [5,6] slice on the same data; given a [5,6] slice, to construct a [6,5] slice which represents the transpose. These are all operations which amount to having different strides on the same data.
  • The ability to subslice along any dimension, not only the first. For example, given a [h,w] slice, to select either a [h',w] subslice or a [h,w'] subslice or a [h',w'] subslice.

In order to implement these requirements, I think the internal representation would need to be something more like this:
struct {
data []T
shape []int
stride []int
}

@nsf
Copy link

nsf commented Nov 15, 2015

The ability to subslice along any dimension, not only the first. For example, given a [h,w] slice, to select either a [h',w] subslice or a [h,w'] subslice or a [h',w'] subslice.

Seems like a much higher level abstraction than the idea behind strided slices. As far as I understand strided slices always point to a contiguous chunk of memory.

The ability to manipulate the "shape" of the multi-dimensional array on the same underlying data. For example, given a [30] slice, to construct a [5,6] slice on the same data; given a [5,6] slice, to construct a [6,5] slice which represents the transpose. These are all operations which amount to having different strides on the same data.

Similar problem. If you mean having a transposed abstraction without changing underlying data - that's a violation of being contiguous. We can however transpose data by constructing another slice and "stricing" it using right dimensions.

The main problem here is that stride slices proposed by @ianlancetaylor do not define strides per dimension and separate rules for accessing data of each dimension.

E.g. when you say ss[3][4], ss[3] must return a slice, which implies contiguous memory on last dimension. Virtually transposing something implies that last dimension becomes non-contiguous.

In a way strided slice is a very low level abstraction, which basically says how you should work with multi-dimensional data on modern CPUs by enforcing a set of strides sorted in a descending order and iterating starting from the smallest stride first (if we take "virtual" strided slices into account, it has a stride which always equals to 1).

@somadivad
Copy link

@nsf Yes, I agree that the abstraction I described goes beyond what strided slices are offering, and is slightly more difficult to implement. My point is that I think this is what users actually want. To me, strided slices as proposed don't appear to offer any very significant advantage over slices of slices, [][]T. In my eyes, the benefit does not justify the cost.

(Incidentally, it is of course possible to build a type with functionality equivalent to numpy ndarray or torch tensor in golang. The thing that I as a user want is the syntactic sugar to make such a thing more pleasant to use.)

@RalphCorderoy
Copy link

The proposal might benefit from examples of how to verbalise the suggested syntax, e.g. how is m2*[d3] spoken? Also, * does seem overworked already, as griesemer said; could range be put to use? It's more wordy, but then these aren't going to be used in most functions.

@sbinet
Copy link
Member

sbinet commented Nov 15, 2015

one of the problematic point in the alternative table proposal by @btracey et al., was the reflect representation of such types (thus, the table only addresses 2D tables.)
could you clarify how you would address this point?

(thanks for doing this, BTW)

@yiyus
Copy link

yiyus commented Nov 16, 2015

I think that a slice expression with two values should reduce the arity. For example, if ss has type [*][*]T, the expression ss[:] would return a strided slice [*]T, and ss[:][:] would return a slice with all the elements. This would make reshaping trivial: ss[:][:]*[m]*[n]. Some sugar would be needed for the usual case ss[i:j]*[stride(ss)] (which is the case covered now by the proposal).

I agree with other people that there must be a better syntax, but I like the concept and really appreciate that some thought is being given to this problem.

@btracey
Copy link
Contributor

btracey commented Nov 16, 2015

@ianlancetaylor Could I ask if there are specific problems you are addressing with my proposal? I ask to understand how my proposal could be shifted to address your concerns (assuming you have them and don't feel it is fundamentally flawed).

The main advantage in your proposal is the ability to go from a single slice into a []T. Effectively, this is allowing a "reshape" operation in matlab/numpy speak. The downside to your proposal is that I don't see how to do rectangular views in your proposal. One may slice the rows with your [] expressions, but one may not slice all of the columns. That is, the t[a:b, c:d] operation in my proposal does not seem possible.

If views are not important, then matrix multiplication can be coded as

func matmul(a, b, c [*]float64)

But slicing is a crucial property for matrix operations. Almost every Lapack function uses matrix views (though it's hard to see from the code as it's coded with the single slice mentality). As a result, matrix multiplication must be coded with (multiple) size parameters per slice, which defeats one of the major benefits of the table data structure.

You say "A slice also has a capacity, but that is mainly relevant when building an array." I don't think that's true. Capacities are also very important for append and append-like operations. This is especially important for tables/strided slices where no built-in append exists. A frequent use case of mine is building up a matrix row-by-row as new data enters. I don't see how that would be possible with strided slices without also remembering the effective number of rows in the matrix.

@nsf Fixed sized strided slices are already allowed -- arrays of arrays. You can already declare a [128][64][64]T

@nsf @somadivad : The ability to easily reshape easily and the ability to take views easily are basically mutually exclusive without some wrangling. As mentioned above, it seems like taking views on []T is impossible. With my tables proposal, you can take views easily, but because the stride is >= to the number of columns (and not =), you cannot reshape arbitrarily. A matrix with rows=m, cols = n, stride = k, the "flattened slice" has size mk. What use cases for reshaping do you have @somadivad? A reshape as you propose above is not the same as an actual transpose operation.

@ianlancetaylor Just as a note to short circuit the conversation, it's easy to extend my proposal to slices of arbitrary dimension. There are also opportunities with slicing in the last dimension(s), i.e. t[2,:] --> []T. This would have a similar effect as strided slices, and could lead to the similar simplifications.

@kortschak
Copy link
Contributor

This misses the possibility of a number of significant matrix operation optimisations.

A very clear example of this is the optimisation available when multiplying block diagonal matrices. This operation allows massive savings to be made by multiplying the non-zero blocks, but depends on the possibility of the width of the matrix not having identity with the stride. I don't see how that is possible here.

@somadivad
Copy link

@btracey The main use case I have for reshaping is as follows:
I work with neural networks, and the input is often image data, with shape (c, h, w) (c = number of channels, eg 3 for colour). It needs to be this shape to pass through convolutional layers, ending up as shape (c', h', w'). But it then needs to pass through linear layers, so it needs to be reshaped to (c' x h' x w'). And then for the backward pass, that reshaping needs to happen in reverse.

Transposing is also used in neural networks. We use a weight matrix on the forward pass, and its transpose on the backward pass. (Admittedly, there are ways round this.)

Please can you explain your comment "The ability to reshape easily and the ability to take views easily are basically mutually exclusive without some wrangling". Perhaps I'm misunderstanding what you mean, but I would say that both numpy ndarray and torch tensor offer this functionality.

@nsf
Copy link

nsf commented Nov 16, 2015

Fixed sized strided slices are already allowed -- arrays of arrays. You can already declare a [128][64][64]T

Array is not a slice. Slice is a pointer to an array. Fixed-size strided slice would still be a pointer to an array.

@kortschak
Copy link
Contributor

kortschak commented Nov 16, 2015 via email

@somadivad
Copy link

@kortschak Ah ok, I understand. But I'm still not quite sure what conclusion you or @btracey are trying to draw from this.

So in numpy ndarray or torch tensor, you can take non-contiguous views (eg column in a row major matrix), and you can reshape arbitrarily - but the price you pay is that reshape may have to take a copy of (part of) the data. (An alternative design decision would have been for reshape to throw an error when called on a non-contiguous view, thus forcing the user to explicitly clone() in that case. However, both numpy and torch have chosen the implicit copy route.)

So one option in the design space is to support non-contiguous views and reshape, and put up with this limitation in how they interact. But I suppose another option is to disallow one (or both) out of non-contiguous views and reshape. Is this what you and/or @btracey prefer?

@kortschak
Copy link
Contributor

kortschak commented Nov 16, 2015 via email

@btracey
Copy link
Contributor

btracey commented Nov 16, 2015

Similarly, @somadivad , we definitely agree transposes are important. In fact, transpose is one of the three methods in the matrix interface (https://godoc.org/github.com/gonum/matrix/mat64#Matrix). We want to support nearly cost-free transposes without having to copy.

Reshaping is not the same thing as transpose, even with a copy. Take the linear elements 0-7 as a 2 x 4 matrix

0 1 2 3 
4 5 6 7

If we reshape this to a 2x4 matrix, we get

0 1
2 3
4 5
6 7

which is not the same thing as taking a transpose of the 2x4 matrix. Transpose is a distinct operation.

@nsf [128][64][64]float64 is indeed an array, but *[128][64][64]float64 is a pointer to a fixed-size array.

@somadivad
Copy link

@btracey Agreed, reshape and transpose are different. What I was trying to say is that they are both operations which amount to using different strides on the same underlying data.

@nsf
Copy link

nsf commented Nov 16, 2015

[128][64][64]float64 is indeed an array, but *[128][64][64]float64 is a pointer to a fixed-size array

Doesn't help much. What I mean by fixed-size slices and this case comes up often in voxel work. Is this: I have a 3d array, say 256x64x64, but I also need to be able to "view" its parts, like four 64x64x64 ones. It's only possible via slices.

@btracey
Copy link
Contributor

btracey commented Nov 16, 2015

@somadivad Transpose is not the same thing as using a different stride on the same data. To do a cost-free transpose, one would have to change not the stride, but the column-major / row-major opinion on the data. Even then the interactions are quite tricky, because naively [Take View] -> [Take Transpose] -> [Extend View] does not give the answer you expect unless you're very careful.

@nsf What you're asking for isn't covered under any proposal. In my proposal, you can view it, but not to a fixed size

var fs = [128][64][64]float64
// Get data into fs
// Slice fs
sub := fs[0:64, :, :]  // type of sub is [,,]float64

There is an issue related to this #395, but that behavior won't help for your case. A [64][64][64]T is a continuous view of data (all 64x64x64 elements are in a row), while a sliced [128][64][64] is not. Under my proposal, you can either do the subslicing mentioned above, or you can copy.

@somadivad
Copy link

@btracey Suppose that we have data [6]T. We can represent it as a (row-major) 2x3 matrix by using the shape (2, 3) and strides 3, 1. The transpose of this matrix is representable on the same underlying data [6]T by using the shape (3, 2) and strides 1, 3.
(The indexing operation is m[i,j] = data[i * stride[0] + j * stride[1]]. Note that the last stride is not constrained to be 1.)

@btracey
Copy link
Contributor

btracey commented Nov 16, 2015

Yes, you can do that, but then you need n strides for n-dimensional data. All of the proposals have n-1 strides for an n-dimensional slice.

While nice in theory, you lose a lot of speed that way. You want to guarantee at least one of the strides is 1 so you can take advantage of SIMD and caches (and even simpler, range). Row vs. Column major is a very good example of 'worse-is-better'. It sounds nice to allow both, but then every higher-level has to be written to support both orderings. Much better to choose and fix one, and then find how to support common operations efficiently (like we have with Transpose).

@bjwbell
Copy link

bjwbell commented Nov 16, 2015

Pardon me for butting in, don't you want have a block oriented data layout and not row or column for best efficiency? eg Morton-order Matrices Deserve Compilers ’ Support.

Similar to graphics textures, https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/. Typically in graphics blocks are 4kb in size, except for sparse which are 64kb.

Am I missing something or is it just a lot of work to support instead of row or column?

Again if you guys have already thought about this please excuse my ignorance.

@btracey
Copy link
Contributor

btracey commented Nov 16, 2015

Looking at the picture https://en.wikipedia.org/wiki/Z-order_curve#cite_note-12 , it's not clear to me that one could take an arbitrary view efficiently.

Morton ordering also introduces other inconsistencies. One cannot slice an [m][n]T to get a [,]T (the ordering would need to change).

It is an interesting concept though. I'd be interested in seeing a go-based parallel Dgemm computation and compare with what we have now.

@bjwbell
Copy link

bjwbell commented Nov 16, 2015

You're right, efficient arbitrary views aren't possible. Sounds like an experiment is an order. Parallel isn't the main goal of morton ordering, not thrashing the caches is why it's done.

@dr2chase
Copy link
Contributor

I worked on a similar problem (enhancing array/matrix support) for Java, but did not reach a conclusive answer. One thing similar in both Go and Java is the numerous requests for extensions along multiple feature axes. For example (I've *'d the ones I've used over the years, and arguably I also worked with matrices of constructive reals):

  • element type (Java primitives*, complex, quaternions, rationals, constructive reals, polynomials*)
  • layout (dense*, sparse, triangular, banded, k-diagonal)
  • distribution (flat memory*, numa, networked)
  • access patterns (random*, iterative*, fork-join*)
  • bulk operations (per-element*, multiplication*, eliminations*, convolution [with what edge conditions?])
  • aliasing (transposition*, sub-matrix*, glued matrix, permuted matrix*)

I may not have picked the best names for the feature axes. My goal was to come up with something that was good enough by default that "ordinary" users would pick it for their own bulk operations (basic linear algebra stuff, there was a machine learning team in Burlington happy to make feature requests along those lines) while still enabling people who knew how to do specialized stuff to use it and get good performance and then export that work to the "ordinary users" crowd. This might include "matrix on a GPU", this might include "specialized decomposition". I didn't finish -- this was exploratory, both in the direction of features and performance.

One problem I ran into for Java (which already has generics) is that the type system prevents you from writing a library that supports extensions in multiple axes -- i.e., don't think that if you add generics to Go, all will be solved. You may not be able to say what you want, and this may be part of the motivation for loading this onto slices, because slices get to be polymorphic.

The impression I get from this is that perhaps asking for language extensions in the direction of "features I want" is the wrong approach, because we'll quickly run into limits and the type system will thwart us at every turn. Instead, is it possible to do this with some sort of a generator, or a family of interacting generators? Would this work better if we added particular optimizations to the compiler (e.g., good loop invariant code motion and reduction in strength)? Is there something missing from the language that would make this work better, that we should be asking for instead of just a bullet list of features? What makes me suggest this is that specialized code generation seemed to always pop up as a necessary component of these things. In Java it had to happen at the last minute (or a little later -- in practice getting full performance required multiple recompilations) but there's no reason it could not happen ahead of time.

@btracey
Copy link
Contributor

btracey commented Nov 17, 2015

Thanks for the reply.
In the general case, generators are not a good solution, at least in the sense of creating a domain specific language solely for making matrix operations a little better. This creates language bifurcation and means that users need to deal with multiple languages simultaneously (see: arguments for translating the compiler to go).

I do not feel like the general discussion is asking for a bullet list of features. Just one feature (rectangular data support), and the rest is how that data type interacts with the language. As you say 'is there something from the language that would make this work better', and we think yes, the table data type.

Specifically to your points:

  • Types: As this is generic, it will already support all types
  • Layout: There are so many ways to implement sparse matrices (see all of the ScaLapack types). Supporting these types needs to happen at the package level. Many important sparse matrices (block dense for example) benefit highly from having a table type to use in the underlying implementation. Your other examples (all the banded types) are already implemented using a rectangular data structure (see for example the documentation about Banded matrices https://godoc.org/github.com/gonum/blas/native)
  • Distribution/Access: Goroutines go a long way to addressing these issues. A fork-join model is very simple to code (https://godoc.org/github.com/gonum/blas/native#Implementation.Dgemm is already coded this way).
  • Operators: As discussed in my proposal, most operators are just as easily implemented as functions/methods. It's a couple extra characters, but it's a price you're frequently willing to pay anyway to save allocations, etc.
  • Aliasing: Again, best implemented at the package level as we have in gonum/matrix.

The point of the above discussion is that tables go a long way to address all of these issues, and this is not a Pandora's box for requesting more features (at least on our end). Compiler optimizations are extremely useful. SSA is moving toward bounds checking alleviation, and I hope they work on SIMD next (it's already there for copy). As documented in my proposal, however, tables make it much easier to implement such optimizations.

Generators, a la go generate, are useful in dealing with the generics problem. In fact, we autogenerate the float32 BLAS implementation from the float64 implementation. Generics/Generators can only go so far though, the complex128 implementation has different behavior in many cases, so specialized code is necessary. I imagine this would be the case with a quaternion256 as well -- you'd need specialized code to say what "transpose" means (I suspect).

@bjwbell
Copy link

bjwbell commented Nov 17, 2015

Minor note regarding SIMD, the general direction to go in the near term seems along the lines of https://github.com/bjwbell/gensimd.

I'm doing some rework to use the new SSA backend for it right now. It's why I got interested in this and the table proposal.

@ianlancetaylor
Copy link
Contributor Author

This is a lot more feedback than I expected.

@griesemer You're right: the proposed syntax doesn't work. The ambiguity between a strided slice of arity > 1 and a strided slice whose slice elements are strided slices is bad. A better syntax might be what @nsf suggested: [,]T for a strided slice of arity 1 and type T. Then we can use [,,]T for a strided slice of arity 2. But now I'm not sure how to write a strided slice expression. We could use a[stride,] with a trailing comma, but that is kind of ugly.

@griesemer You're right: there is currently no obvious way to go from a strided slice back to a regular slice. At first I thought you could index into the strided slice and use a slice expression, but that doesn't work because the cap of the index expression will limit you to the stride. In order to go back based only on the strided slice, the strided slice would need to keep a copy of the cap of the original slice. That could be done, but I don't know how often people will want to do that operation.

@nsf You're right: the arity number is confusing. Perhaps better to talk about the dimension, which is the arity plus 1, as you suggest.

@nsf The suggestion of allowing an empty index to convert from the strided slice back to a regular slice is an interesting one, and it does seem implementable. I can't think of any problem with it off hand.

@somadivad As far as I can see this approach isn't going to support a transpose operation. But I'm pretty sure that strided slices would be more efficient, and easier to set up, than slices of slices, because slices of slices require additional pointer fetching.

@sbinet I think the implementation in the reflect package follows directly from the discussion of the internal implementation.

@btracey I don't think your proposal is fundamentally flawed, though I do think it should plan for multiple dimensions right from the start. I think the main advantage of this proposal is that it allows for different strides across the same data. You've indicated in the past that you don't feel this is important, and it may not be, but it seems to me that there are people commenting here who would find that feature valuable.

@btracey You're right that rectangular views are more awkward. You have to do them by dropping down to the original slice and doing a new strided slice. Something like ss[,][a:d][b,].

@btracey I agree that cap is needed for append operations, it's just not clear to me why anybody would use append on a strided slice. You're right: you can't build up a matrix row by row using a strided slice. But your proposal doesn't have append either, and I'm actually not sure why you have a capacity, must less two capacities.

@kortschak I don't know the optimization to which you are referring.

@somadivad I don't think the language should support an operation like transpose that implicitly allocates and copies. That kind of operation should be done using a function call.

@nsf
Copy link

nsf commented Nov 17, 2015

@ianlancetaylor

The suggestion of allowing an empty index to convert from the strided slice back to a regular slice is an interesting one, and it does seem implementable. I can't think of any problem with it off hand.

I actually meant it like an ordinary conversion, e.g. we can convert a string to byte slice via s := []byte(b), same way for converting strided slices to normal slices. And by empty index I guess you mean something like: s := ss[]. An interesting idea as well.

The most important parts of my post however are these:

  1. Strides vs dimensions. You mentioned [stride,], but what I actually meant is [dimension,]. Because for more than two dimensions stride values become not so obvious to compute. I mean it's nothing too complicated, but people don't think in stride values. Especially when by looking at strides you want to figure out the dimensions.
  2. And my personal wish is fixed-size strided slices. But it's more like a performance paranoia kind of feature.

@ianlancetaylor
Also why trailing comma is ugly? It's consistent, we have that on slice expressions, except it's a trailing colon there (x := y[5:]). Same logic here: s[3,3,3] can be written as s[3,3,], because last dimension can be inferred via length, just like with slices the upper bound is len(s) by default. You can write it without trailing comma, I believe it can be implemented and parsed without ambiguities.

@btracey
Copy link
Contributor

btracey commented Nov 17, 2015

@ianlancetaylor : Our notations are converging with different semantics. Below when I say [,]T, I am referring to my proposal, and I use the [*] notation for yours to distinguish.

I do think it should plan for multiple dimensions right from the start.

The message we thought we were receiving from the Go team when it was originally written is different than the one now. Now that the initial proposal is in, I'll adapt it to include arbitrary dimensions.

You have to do them by dropping down to the original slice and doing a new strided slice. Something like ss[,][a:d][b,]

I don't see how your sub-expression creates the correct view. If we have a 2-D slice, for example

0 1 2 3 
4 5 6 7
8 9 10 11
12 13 14 15

We need to be able to do t2 := t[1:3, 1:3] to give us

(0 1 2 3)
(4) 5 6 (7)
(8) 9 10 (11)
(12 13 14 15)

where the data in parenthesis exists but isn't accessible through the sliced table. t2[0,0] == 5.

In your proposal, we can do ss := a[5:13]*[4](I think that's the syntax), but that gives us

(0 1 2 3 4)
5 6 7 8
9 10 11 12
(13 14 15)

An extra parameter is still needed to remember the effective number of columns.

it's just not clear to me why anybody would use append on a strided slice

For example, building up a matrix one element at a time when the full size is not known ahead of time. In Gaussian Processes, for example, one bulids up a Kernel matrix, where K_i,j = kernel(x_i, x_j). It is very typical that a prediction is made with N points. This prediction gives you the N+1th point, and then you expand the kernel matrix given the new point. Something along the lines of:

n := len(t)[0]
if n == cap(t)[0] {
      // allocate new table and slice
      tnew = make([,]float64, n*2, n*2)
      copy(tnew, t)
      t = tnew[:n, :n]
}
t = t[:n+1, :n+1]
// Add kernel data to row and column n+1

Without slicing, either this make/copy needs to happen on each new data entry, or lots of meta data needs to be maintained about the effective size of the matrix. Capacities are necessary for both dimensions to see if you've sliced outside either bound, just like in normal slices. You can imagine code similar to the above, except extending an additional row or column. My proposal does not include behavior for append because it is not clear how to represent clean syntax for all of the possible cases, even though the behavior is useful.

One can argue that only one capacity is needed since the stride is equal to the capacity. However, it was felt that if one can do 3-element slicing on []T, so should one be able to do so with [,]T.

I think the main advantage of this proposal is that it allows for different strides across the same data. You've indicated in the past that you don't feel this is important

What I intended to say is that (copy-free) views are much more important than (copy-free) re-sizing, and that the interaction between the behaviors is not simple. I have been thinking about this since your proposal, and have been considering what could be done, if anything, about it. In particular, there are especially nice cases for resize when trying to avoid allocations. For this reason, it is worth thinking about some form of strided slice expression to allow a []T to go to a [,]T (or a [,,]T or whatever). Possibly there can be a built-in function

pack([]T, int...)

Which turns a []T into a [,]T, with the given sizes (length = capacity) and the number of integer arguments must be fixed at compile time.

Going in the other direction (the equivalent unpack function) is much trickier. The expression

pack(unpack(t), len(t)...)

does not return the original table if the table has been viewed.

In terms of other languages, Matlab copies everything. The reshape function in Numpy will either copy or not depending if the ndarray has been sliced (Go's implementation would have to be predictable). Julia does seem to be headed toward both being copy free (during their arraymageddon release). This sounds great, except in the end the type of the data may end up as ReshapedArray{SubArray{ReshapedArray{... T ...}}}. This is the opposite of simple. Additionally, this highly complicates array accesses (see discussion: https://groups.google.com/forum/#!msg/julia-dev/7M5qzmXIChM/kOTlGSIvAwAJ and PR: JuliaLang/julia#10507). To quote directly from the discussion "Specifically, reshaping does not compose with subarray-indexing, so you need two types of views".

In short, I don't see a way to implement arbitrary re-sizing and also support views. I do agree that the pack code listed above would be useful, though perhaps that's best left as a function in Reflect.

@kortschak
Copy link
Contributor

I don't know the optimization to which you are referring.

If you have a pair of block diagonal matrices A and B such that

    ( X 0 0 )
A = ( 0 Y 0 )
    ( 0 0 Z )

    ( U 0 0 )
B = ( 0 V 0 )
    ( 0 0 W )

and XU, YV and ZW are legal matrix multiplies, then AB is

     ( XU  0  0 )
AB = (  0 YV  0 )
     (  0  0 ZW )

This depends on the capacity to take a rectangular view of A and B to create X, Y, Z, U, V and W. Despite your comment to the contrary, I don't believe your proposal does allow this without significant extra work, obviating the benefits of the proposal.

You may contend that the sensible approach might have been to copy the views, perform the operations and copy back (assuming the case that the matrix cannot be treated as a collection of separable systems, in which case they should just be separate), but how? The absence of a capacity to make a rectangular view hampers this as well.

@rwcarlsen
Copy link

While I don't have anything in the way of technical suggestions for this proposal, I just thought I'd throw my hat in with @btracey and @kortschak. Coming from an (nuclear) engineering background, an approach catering a bit to fast/natural matrix ops would be much preferred.

@ianlancetaylor
Copy link
Contributor Author

I see what you mean now about views. We would need to know not just the stride, but also the length of the resulting slice.

@yiyus
Copy link

yiyus commented Nov 18, 2015

Here is an alternative proposal that tries to find a compromise between tables and strided slices. I tried to make it short, so it is not detailed at all, but I hope everything will be well understood from the examples.

Strided slices are generated slicing a slice (or array) in two or more dimensions. A slice expression in N dimensions consists of N ranges separated by commas.

s := make([]float64, 16)
m44 := s[:4,:4]  // == s[:4,:] == s[:4,] type [,]float64
m224 := m44[:2,:2,:4]  // == m44[:2,:2,] == s[:2,:2,] type [,,]float64

Indexing a strided slice of N dimensions returns a strided slice of N-1 dimensions, or a slice if N is 2. Sugar: m[i,j] == m[i][j] (may be valid for slices of slices and arrays of arrays too).

r1 := m44[1]  // == s[4:8]
f12 := m44[1,2]  // == m44[1][2] == s[6]
f112 := m224[1,0,2]  // == m224[1][0][2] == s[10]

Slicing a strided slice with the number of dimensions it has generates another strided slice with the same number of dimensions.

m22 := m44[1:3,1:3]  // type [,]float64
col1 := m44[:,1:2]  // type [,]float64
row1 := m44[1:2,:]  // type [,]float64 NB: m44[1:2,:] != m44[1]

The builtins len and cap applied to strided slices return the total number of indexable elements and the total number of contiguous elements in memory, respectively (therefore, if capacity is larger than length, it means that the strided slice has elements that cannot be accessed).

n, c := len(m44), cap(m44)  // == 16, 16 => contiguous
n, c = len(m22), cap(m22) // == 4, 6 => not contiguous

Two new builtins: dim returns []int with the dimensions and flat returns a slice with all the indexable elements (requires allocation and copying if len != cap, else it returns a slice of the original data)

d := dim(m224) // d == []int{2,2,4}
m28 := flat(m224)[:2,]  // reshape, same backing array
d = dim(m28)  // d == []int{2,8}
s4 := flat(m22)  // requires allocating a [4]float64 and copying 2 slices of length 2

Range expressions work like for slices (from 0 to dim(ss)[0]), with the addition of a new form with index variables for each dimension:

for i, m2d := range(m3d) {
    for j, row := range(m2d) {
        for k, v := range(row) {
            fmt.Printf(“m[%d,%d,%d] = %f\n”, i, j, k, v)
        }
    }
}

for i, j, k, v := range(m3d) {
    fmt.Printf(“m[%d,%d,%d] = %f\n”, i, j, k, v)
}

Implementation of strided slice of N dimensions (other representations are possible):

struct {
    array *T
    dim [N]int
    stride [N-1]int
}

What do you think?

@btracey
Copy link
Contributor

btracey commented Nov 18, 2015

Some quick thoughts.

Overview
The general idea I think is good if Go is okay with that level of complexity. It supports the most common case of resizing, which is on linear slices, and allows downslicing, which is a common case.

Of course, this also brings inconsistencies. Slicing only works upwards, not downwards. You can "downslice", but only along one of the dimensions. However, these are inherent in any proposal that keeps slices unstrided and allows views on rectangles.

Details:
Access/assignment
I think the address overloading is confusing, as it obscures the dimension of the addressee and thus the output of the address. I would rather see it as
// t is a [,,]T
t[:, :, :] --> returns [,,]T
t[0, :, :] --> returns [,]T
t[0,1,:] --> returns []T
t[0,1,2] --> returns T

This is more clear. The number of commas in the address is the number of dimensions -1, and the number of : is the dimension of the returned type. Colons must be continuous, which is also easy to asses.

Capacity:
The implementation of the type needs to be something like

struct {
    array *T
    length [N]int
    capacity [N]int
    stride [N-1]int
}

in order to support views (see my above discussion). The consequence of this is that if len and cap still return an int, both a dims and a capDims are needed. At that point, it's not clear what utility the single return len and cap serve. This is especially true since cap can be computed from the [N]int, and it's not clear what len actually means as far as views are concerned.

Range:
If the extra complexity is desired, the "N" element range is consistent. The difficulty with it is that there is a tricky difference between

for i, v := range s 

if s is a []int or a [,]int. Both loop over all of the elements, both return two integers, but v is something very different in the two cases. It seems like this could result in tricky bugs. The same is true for an N dimensional table and an N-1 dimensional []int. I can't speak for other domains, but there is probably at least a 10:1 ratio with "loop over a single dimension" vs. "loop over all elements". It's not clear to me that the extra complexity and edge case is worth it.

In single dimensions range is tricky. Currently range returns a copy of the element in all cases. In your proposal, it instead returns a reference to a subset. No other case works like that. Similarly, under your proposal, it is only possible to range over the "rows" of a slice. There does not appear to be a way to range over the columns. Lastly, constructing a table/slice header is significantly enough more expensive than doing an access, which would likely limit use in real code. For these reasons, it seems to me like my proposal for range is more pragmatic.

Slice expressions:
It feels to me like 'reshaping' expressions should look more different than slicing expressions, as the behavior is quite different. Specifically, if I have mistaken a []T for a [,]T, the expression s[:4,:4] returns a [,]T for both cases, and it again seems easy to be wrong and not told otherwise. In go, one generally has to be noisy about changing the type of a piece of data, and this case does not feel like an exception. Some alternatives are:

  • reshape([]T, int...) which returns a [,...]T where the number of integers must be >2 and known at compile time.
  • []T.([m,n]T) or []T.([m,n,p]T), like an interface assertion, which would return a [,]T with lengths/capacities of [2]int{m,n}.
    In all cases, I would prefer that the expression be a runtime panic of len(T) != m_n(_p), that is, the slice being resized must have the correct length to be resized exactly.

Conclusion:
It seems like allowing a "restriding" behavior as suggested by Ian, and a "subslicing" behavior as you suggest make a complete story for multidimensional data. They allow the common case of reshaping []T into some kind of n-d slice, and they allow for copy-free "views" of continuous data elements.

For the reasons expressed above, I would learn toward adding those features to my proposal, and keeping my semantics for len and range.

Finding a good syntax for restriding seems to be an open issue.

@yiyus
Copy link

yiyus commented Nov 19, 2015

Thanks for your comments.

Access/assignment

I do not know what you mean by "address overloading". I have left away of the proposal the case of mixing single and multiple indices for brevity.

The idea is that only one index can be used in strided slices, multiple indices separated by comas are just syntactic sugar with the rule t[i,j] == t[i][j]. The cases you propose could be valid if the norm said that i must be a single index and j can be a single index or a slice expression in one or more dimensions, such that:

t[0, :, :] == t[0][:, :] ( == t[0] )
t[0, 1, :] == t[0,1][:] == t[0][1][:] ( == t[0][1] )
t[0, 1, 2] == t[0,1][2] == t[0][1,2] = t[0][1][2]

The behavior with one index is like Ian's strided slices. Applying this syntactic rule, the tables syntax is obtained. This norm could even be extended to slices of slices and arrays of arrays. It is a feature completely orthogonal to the a new type that could be discussed later.

Capacity:

Unless I am missing something, I think views are allowed in the implementation I propose. Please, let me know what is the exact problem you see. For example:

var a [16]float64
m44 := a[:4,:4]  // {array: a, dim: [4, 4], stride: [4]}
m22 := m44[1:3,1:3]  // {array: a[5], dim: [2, 2], stride: [4]}
m22[i,j] == a[5 + i*4 + j] // with i < 2 and j < 2
n, c := len(m22), cap(m22) // == 4, 6 == dim[0] * dim[1], (dim[0] - 1) * stride[0] + dim[1]

While cap gives you the total number of elements you are storing, len tells you how many you can actually use. The utility of cap is indeed quite limited, but it is there to let you know if a flat operation will require an allocation or not. Also, in an extreme case like m[:2,:10000][:2,:2], it can be useful to know that you are storing more than 10000 elements when you are using only 4 (for example, you may decide to flat and reshape when len/cap reaches a certain ratio).

Range:

This case would return two integers in case s is a []int, but not if it is a [,]int. It would return an int and a []int. To get both indices you would use for i, j, _ := range s

I think you are misunderstanding something in my proposal. Range for strided slices works exactly like for slices of slices:

s := make([]float64, 4)
m22 := [][]int{s[0:2], s[2:4]}
for i, row := range m44 {
    for j, v := range row {
        // ...
    }
}
// using m22 := s[:2,:2] would be equivalent

In the new case proposed, all the indices must be present (for i, j, v := range m22). This special case could again be applied to slices of slices and arrays of arrays too, and it is an orthogonal feature to strided slices that could be considered separately.

To iterate over the elements in the column you would slice a column and iterate over all the elements (for i, _, v := range m[:,nc:nc+1]) or would index in a row (for i, row := range m { /* use row[nc] */ }) or the whole matrix (for i := 0; i < dim(m)[0]; i++ { /* use m[i, nc] */ }).

Slice expressions:

I was trying to mimic the behavior of slicing an array. The problems here are the same as when we do a[:5] and a can be an array or a slice. We do not do something like slice(a, 0, 5) or a.([]int)[:5] and it is not a problem you frequently see. But maybe a builtin could be a valid option too.

Conclusion:

Something I am trying with my proposal is to reduce it as much as possible. Initially, I think it could consist of only a new type, the slicing in multiple dimensions operation (which could be a reshape builtin), and minimal changes to len, maybe cap and range to support the new type. I think this would give us enough to have views. What would you be missing for gonum then?

Additional features could be added later, eventually converging to something very similar to the tables in your proposal, but with multiple dimensions. For example, it would be possible to add more builtins (like dim, flat or reshape and support in make, copy and maybe append), syntactic sugar (like multiple indices separated by comas and new range expressions) and maybe capacities in multiple dimensions.

It would be great if we could find a general agreement in the basic type first.

@btracey
Copy link
Contributor

btracey commented Nov 19, 2015

I do not know what you mean by "address overloading". I have left away of the proposal the case of mixing single and multiple indices for brevity.

I just meant I find the multiple options confusing. I think it's clearer to have the syntax be different for slices of slices and tables, but that could just be me.

While cap gives you the total number of elements you are storing, len tells you how many you can actually use. The utility of cap is indeed quite limited, but it is there to let you know if a flat operation will require an allocation or not.

Yes, but the multi-dimensional capacity is there to let you know if a multi-dimensional operation will need an allocation or not. In your example

var a [16]float64
m44 := a[:4,:4]  // {array: a, dim: [4, 4], stride: [4], capacities [4,4]}
m22 := m44[1:3,1:3]  // {array: a[5], dim: [2, 2], stride: [4], capacities [3,3]}

For example, I can now perform

m23 := m22[:, :3]  // dim [2,3], capacities[3,3], stride[4]

to expand the columns (for instance). The capacities are needed to say if this "extension slice" is within bounds. Similar, a multi-dimensional capacity is needed in order to support 3-element slicing.

This case would return two integers in case s is a []int, but not if it is a [,]int. It would return an int and a []int. To get both indices you would use for i, j, _ := range s

I agree that solves my concern, but I don't think it's tenable. Every other range action has the last element be optional, not required. This would create an inconsistency in the language.

To iterate over the elements in the column you would slice a column and iterate over all the elements (for i, _, v := range m[:,nc:nc+1]) or would index in a row

I think you mean

for _, j, v := range m[:, nc:nc+1]

Yes, I agree that that does range over a column, I did not see that trick earlier. However, it does not solve the problem highlighted in my proposal which is allowing range to work even for tables of length 0. Here, I have to slice to a particular column, so if there are no columns the operation will panic. If I want to loop over an arbitrary column, I will need something like

if len(m)[1] != 0 {
        for _, j, _ := range m[:, 0:1]
}

This may not be as big of a problem in your range syntax as you can range over a row without worry, i.e. something like

for i := range m {
   for _, j, _ := range m[:, i:i+1] {

loops over every column row-wise and works for any since. I'm not sure, especially since enabling this requires both range syntaxes.

The problems here are the same as when we do a[:5] and a can be an array or a slice.

It's true, but a slice is already a sliced array. For both arrays and slices, a[:5] gives you the first 5 elements of that entity, and thus has the same conceptual meaning. With s[:4,:4], if s is a slice it gives you all of the same elements, just now in a different looking form. If s is already a table, it gives you a shorter or longer view of the elements. The operations are conceptually different, even though there is overlapping syntax.

@yiyus
Copy link

yiyus commented Nov 19, 2015

As I said, I think we should agree in the concept before discussing the syntax. I do not think it makes much sense to have a lengthy discussion about the details of how to use range with the new type until we have reached an agreement on what the semantics of this new type should be.

As I see it, the main difference between strided slices (either in Ian's proposal or mine) and tables is that accessing a strided slice (indexing it, always with a single index) gives you a strided slice of lower rank, or a normal slice if the rank is 2. Tables, on the other hand, introduce a new concept, which we could call "multidimensional indexing", giving you direct access to the inner elements without intermediary types. Both have pros and cons.

I find the former much easier to think about, since it is the same behavior we are used to when working with slices of slices and arrays of arrays, making possible to apply many already well known rules. Multidimensional indices are however much more convenient to work with, because they match better hand-written matrix indexing. I proposed the concept of strided slices with syntax sugar for multidimensional indices to get the best of both worlds, but I think we can discuss the most convenient concept first and worry about the syntax later.

There may also be performance considerations (creating the intermediate types could be relatively expensive, as you point out in a previous comment), but I hope this is something the compiler can easily handle.

We should also decide is multiple capacities are something we really need and something we really need from the first day.

In my opinion, the most basic concept to work with multidimensional data is the strided slice, more or less as defined by Ian. In order to support what we are calling "views" (arbitrary selections with lower and upper limits in each dimension), we need to store dimensions too. This is what my proposal tries to achieve. If we want to support "up-slicing" too then, indeed, we will also need capacities, as in your proposal.

I think we all agree that strided slices as initially defined here are too basic to be useful for scientific work, and in particular for gonum. We need views. While I can imagine situations in which up-slicing may be convenient, I think it is something we may live without (it is not something I've ever missed in Fortran, for example). And we could always add it later if there is a real need. But of course, this is a very personal opinion. Would you say this is an essential feature for gonum, or is it more in the nice-to-have category? Would you accept a proposal that, at least initially, did not include this feature?

@btracey
Copy link
Contributor

btracey commented Nov 19, 2015

First of all, it has been my impression that a partial proposal is a non-starter.

Agreed that the index semantics are a second order issue.

I tried to detail it in the text of the proposal, but maybe it's better to detail our arguments here. The starting point of our proposal is that tables should match slices as much as possible. This seems like a reasonable starting point given the consistency goals of Go. What are the properties of slices?

  • Taking a "slice" is getting a continuous subset of data.
  • A slice is an array that has been sliced
  • A slice action gets any continuous subset from the start of the slice to the capacity of the slice or length of array.
  • A slice action may use 3-element syntax to change the capacity of the slice.

Given these properties, the following properties seem extremely natural for a "multi-dimensional slice"
Desiderata:

  • Taking a "view" (a.k.a. multidimensional slice) is getting a continuous subset of the data in all of the dimensions
  • A "table" (a.k.a multi-dimensional slice) is a multi-dimensional array that has been sliced
  • A view action gets any continuous subset in each dimension from the start of the dimension up until the capacity of the dimension
  • A view action may use 3-element syntax to change the capacity in any dimension

The above behavior, to me, feels like an extremely natural extension to slices and their fundamental behavior. For example, under this definition a single slice is just a special case of a table. Taking as a given that slicing in multiple dimensions should feel like slicing in a single dimension, lots of things follow.

  • Data is stored in row-major format. This must be as arrays of arrays are row-major, and a table is just a sliced array of arrays
  • Views can be "down-slicing" or "up-slicing". This is by definition since the goal is to match single slice behavior
  • The underlying data structure must have storage for data, lengths, capacities, and strides. This is the only way to support the above behavior.
  • Arbitrary reshaping between tables of different dimensions is not allowed, as this is not possible given the above desiderata.

There are lots of questions that are not answered by the desiderata. The most important are the behaviors you an Ian suggest

  • Can you also construct a table using data from a single array or slice?
  • Can you get a lower-dimensional table from a higher dimensional one?

If the answer to the above are both yes, it has one set of ramifications for the rest of the defining behavior of tables (syntax, len, range etc.). If no, it has a different set.

Go wants to be consistent, so I think the desiderata I propose are a necessary starting point (as they make "multi-dimensional slices" feel like "uni-dimensional slices"). This is what I think the concept should be.

@yiyus
Copy link

yiyus commented Nov 19, 2015

Consistency is of course a requirement, but it should not be the main goal of a new feature.

I agree that capacities add consistency with slices, but they also add quite some complexity. There should be a real need for the feature to justify that complexity (a real need would be, for example, simplifying considerably some gonum libraries). It is a feature that I have never used in other languages, and I have never missed it, but if it is essential for other users I have nothing against it.

@somadivad
Copy link

@yiyus You proposed this implementation:
struct {
array *T
dim [N]int
stride [N-1]int
}

I suppose that you are assuming that the last stride is 1. I think it would be fine for this to be true in the first version, but I think that the representation needs to be future-proof, so I think it should store N strides.

By building the N into the representation, you are assuming that [,]T is a different type to [,,]T, and so on. I'm not sure that this is essential or desirable. When working with torch tensors or numpy ndarrays, I don't think of tensors of different dimensions as belonging to different types. Indeed, the objects can be mutated in ways that change the number of dimensions. (Of course, those are both within dynamic languages, but the point stands.)

@dr2chase
Copy link
Contributor

"but I hope this is something the compiler can easily handle"

I'd love to get to that in a timely fashion, but I'm currently attending to the hairy details (signed integer overflow is a thing) of bounds check elimination.

From where I sit the interesting question -- both for performance and for capability -- is whether the last dimension is contiguous or not. (I think any requirement of row-to-row contiguity is a mistake; I think Dan Kortschak's block-diagonal example is compelling, and fork-join-parallelism generates similarly non-contiguous-row aliased submatrices.) I'm assuming that people view the potential for aliasing of sub-things as a feature (because they view not-aliasing of sub-things as a performance problem).

If you assume that the compiler is doing a modest amount of optimization (and 1.6 does not yet, and 1.7 might not reach this bar) -- and by this I mean

  1. recognizing that it does not need to reload stride variables within a loop
  2. bounds-check-elimination where obvious
  3. reduction-in-strength of indexing operations
    then within iterative-accessing loops, if you happen to have a contiguous last dimension, the cost will be low-to-zero (add loop-invariant register that contains a one versus add constant 1, and in both cases iterative access has the same cache locality).

Where this will cost is in compilers that don't yet have that much optimization in them, or for random access to tables. What would normally be
table.array[i*table.stride[0] + j]
will become
table.array[i*table.stride[0] + j*table.stride[1]]

Given that random access also requires bounds checks (more or less, sometimes we can get clever with the modulus operator) and given that if it's truly "random" there's probably something else going on, plus if it's random and large then it's not in cache, I'm not sure I want to commit to this (premature?) optimization because we're that worried about performance. There are other optimizations that deal with this cost if it turns out to be widespread and significant -- if it's a hot loop with table.stride[0] == 1, then we multiversion the compiled code. (This is a more out-there optimization because of costs if applied willy-nilly, but it's not conceptually hard at all -- easier than reduction-in-strength and bounds-check elimination for example)

It seems to me that this boils down to a tradeoff between a few features. On the one hand, we have the ability to represent both row and column-major tables, and to (for example) alias a 1-dimensional table onto the diagonal of a 2-d table. On the other hand, we have the ability to regard access to the last dimension as if it were access to a slice -- and it could be a slice -- which gives us the ability to reuse some existing slice-handling code.

@rsc rsc changed the title Proposal: strided slices proposal: spec: strided slices Nov 24, 2015
@rsc rsc added this to the Proposal milestone Nov 24, 2015
@rsc rsc added the Proposal label Nov 24, 2015
@ianlancetaylor
Copy link
Contributor Author

Thanks for the discussion. This proposal clearly does not meet the bar for a language change, and I'm withdrawing it.

@golang golang locked and limited conversation to collaborators Dec 1, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests