-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: strided slices #13253
Comments
This is an interesting proposal. Some initial questions and comments (I can't spell out "strided slice" each time, so I call them "strice" here - my apologies...):
|
PS: To expand on my point 1): A strided slice is really a 2-dimensional slice where the 2nd dimension has a stride (= length of 1st dimension). Or in other words: |
Hi, I was just passing by and noticed this discussion, which is rather interesting to me (as a hobby I write voxel stuff in C++ sometimes: https://github.com/nsf/nextgame/blob/master/source/Geometry/HermiteFieldToMesh.cpp). Just wanted to point that out, because it means I work with 3 dimensional data a lot and I care about efficiency. So, I hope you don't mind some input from just a stranger who also happens to be a Go user. After I introduced myself, few comments:
|
Oh, small correction. I mentioned syntax for fixed-size strided slices as |
Another idea for
You can think of it as taking slabs and instead of keeping them one on top of another put them into line. E.g. ▤ becomes _ _ _ _ _ _ It's possible to apply slicing during deconstruction:
But you can deconstruct it directly into a slice also!
My example of copying 2x64x64 from bottom of var a [,,]int // 128x64x64 strided slice
var b [,,]int // 128x64x64 strided slice
copy(a[len(a)-2:,,], b[:2,,]) I know it's a bit insane. But it's just reversing the process. |
The following are a couple of features that are important in machine learning applications, and that I would be interested to understand whether strided slices can support. (These features are supported by numpy ndarrays and torch tensors, for example.)
In order to implement these requirements, I think the internal representation would need to be something more like this: |
Seems like a much higher level abstraction than the idea behind strided slices. As far as I understand strided slices always point to a contiguous chunk of memory.
Similar problem. If you mean having a transposed abstraction without changing underlying data - that's a violation of being contiguous. We can however transpose data by constructing another slice and "stricing" it using right dimensions. The main problem here is that stride slices proposed by @ianlancetaylor do not define strides per dimension and separate rules for accessing data of each dimension. E.g. when you say In a way strided slice is a very low level abstraction, which basically says how you should work with multi-dimensional data on modern CPUs by enforcing a set of strides sorted in a descending order and iterating starting from the smallest stride first (if we take "virtual" strided slices into account, it has a stride which always equals to 1). |
@nsf Yes, I agree that the abstraction I described goes beyond what strided slices are offering, and is slightly more difficult to implement. My point is that I think this is what users actually want. To me, strided slices as proposed don't appear to offer any very significant advantage over slices of slices, [][]T. In my eyes, the benefit does not justify the cost. (Incidentally, it is of course possible to build a type with functionality equivalent to numpy ndarray or torch tensor in golang. The thing that I as a user want is the syntactic sugar to make such a thing more pleasant to use.) |
The proposal might benefit from examples of how to verbalise the suggested syntax, e.g. how is m2*[d3] spoken? Also, * does seem overworked already, as griesemer said; could range be put to use? It's more wordy, but then these aren't going to be used in most functions. |
one of the problematic point in the alternative (thanks for doing this, BTW) |
I think that a slice expression with two values should reduce the arity. For example, if I agree with other people that there must be a better syntax, but I like the concept and really appreciate that some thought is being given to this problem. |
@ianlancetaylor Could I ask if there are specific problems you are addressing with my proposal? I ask to understand how my proposal could be shifted to address your concerns (assuming you have them and don't feel it is fundamentally flawed). The main advantage in your proposal is the ability to go from a single slice into a []T. Effectively, this is allowing a "reshape" operation in matlab/numpy speak. The downside to your proposal is that I don't see how to do rectangular views in your proposal. One may slice the rows with your [] expressions, but one may not slice all of the columns. That is, the t[a:b, c:d] operation in my proposal does not seem possible. If views are not important, then matrix multiplication can be coded as
But slicing is a crucial property for matrix operations. Almost every Lapack function uses matrix views (though it's hard to see from the code as it's coded with the single slice mentality). As a result, matrix multiplication must be coded with (multiple) size parameters per slice, which defeats one of the major benefits of the table data structure. You say "A slice also has a capacity, but that is mainly relevant when building an array." I don't think that's true. Capacities are also very important for append and append-like operations. This is especially important for tables/strided slices where no built-in append exists. A frequent use case of mine is building up a matrix row-by-row as new data enters. I don't see how that would be possible with strided slices without also remembering the effective number of rows in the matrix. @nsf Fixed sized strided slices are already allowed -- arrays of arrays. You can already declare a [128][64][64]T @nsf @somadivad : The ability to easily reshape easily and the ability to take views easily are basically mutually exclusive without some wrangling. As mentioned above, it seems like taking views on []T is impossible. With my tables proposal, you can take views easily, but because the stride is >= to the number of columns (and not =), you cannot reshape arbitrarily. A matrix with rows=m, cols = n, stride = k, the "flattened slice" has size mk. What use cases for reshaping do you have @somadivad? A reshape as you propose above is not the same as an actual transpose operation. @ianlancetaylor Just as a note to short circuit the conversation, it's easy to extend my proposal to slices of arbitrary dimension. There are also opportunities with slicing in the last dimension(s), i.e. t[2,:] --> []T. This would have a similar effect as strided slices, and could lead to the similar simplifications. |
This misses the possibility of a number of significant matrix operation optimisations. A very clear example of this is the optimisation available when multiplying block diagonal matrices. This operation allows massive savings to be made by multiplying the non-zero blocks, but depends on the possibility of the width of the matrix not having identity with the stride. I don't see how that is possible here. |
@btracey The main use case I have for reshaping is as follows: Transposing is also used in neural networks. We use a weight matrix on the forward pass, and its transpose on the backward pass. (Admittedly, there are ways round this.) Please can you explain your comment "The ability to reshape easily and the ability to take views easily are basically mutually exclusive without some wrangling". Perhaps I'm misunderstanding what you mean, but I would say that both numpy ndarray and torch tensor offer this functionality. |
Array is not a slice. Slice is a pointer to an array. Fixed-size strided slice would still be a pointer to an array. |
@somadivad the issue is the capacity to both reshape and take views. To rephrase at a lower level, it is not possible to reshape a tensor where the stride does not match the width (for row major matrices this means matrices where cols != stride) without copying, since there are regions in the linear representation of the tensor that are not in the tensor. I doubt numpy or torch take views in the way @btracey intends (a quick look at numpy.ndarry.view confirms this).
|
@kortschak Ah ok, I understand. But I'm still not quite sure what conclusion you or @btracey are trying to draw from this. So in numpy ndarray or torch tensor, you can take non-contiguous views (eg column in a row major matrix), and you can reshape arbitrarily - but the price you pay is that reshape may have to take a copy of (part of) the data. (An alternative design decision would have been for reshape to throw an error when called on a non-contiguous view, thus forcing the user to explicitly clone() in that case. However, both numpy and torch have chosen the implicit copy route.) So one option in the design space is to support non-contiguous views and reshape, and put up with this limitation in how they interact. But I suppose another option is to disallow one (or both) out of non-contiguous views and reshape. Is this what you and/or @btracey prefer? |
The explicit copy route is fine; in gonum/matrix this is possible already with or without a view.
However, the inability of do a copy-free reshape on a view is just another symptom of the syndrome that comes from having only a stride and no width field. Another symptom is pointed out in my original comment here. The theme is that copy-free view-based algorithms are only possible when width equals stride, blocking a whole swag of things that are very useful. Even when copying is allowed, the copy from the view is tricky as is the copy back. In the end, you need to bolt on a width field and you end up pretty much exactly where we are right now for everything except for the stride equals width case.
|
Similarly, @somadivad , we definitely agree transposes are important. In fact, transpose is one of the three methods in the matrix interface (https://godoc.org/github.com/gonum/matrix/mat64#Matrix). We want to support nearly cost-free transposes without having to copy. Reshaping is not the same thing as transpose, even with a copy. Take the linear elements 0-7 as a 2 x 4 matrix
If we reshape this to a 2x4 matrix, we get
which is not the same thing as taking a transpose of the 2x4 matrix. Transpose is a distinct operation. @nsf [128][64][64]float64 is indeed an array, but *[128][64][64]float64 is a pointer to a fixed-size array. |
@btracey Agreed, reshape and transpose are different. What I was trying to say is that they are both operations which amount to using different strides on the same underlying data. |
Doesn't help much. What I mean by fixed-size slices and this case comes up often in voxel work. Is this: I have a 3d array, say 256x64x64, but I also need to be able to "view" its parts, like four 64x64x64 ones. It's only possible via slices. |
@somadivad Transpose is not the same thing as using a different stride on the same data. To do a cost-free transpose, one would have to change not the stride, but the column-major / row-major opinion on the data. Even then the interactions are quite tricky, because naively [Take View] -> [Take Transpose] -> [Extend View] does not give the answer you expect unless you're very careful. @nsf What you're asking for isn't covered under any proposal. In my proposal, you can view it, but not to a fixed size
There is an issue related to this #395, but that behavior won't help for your case. A [64][64][64]T is a continuous view of data (all 64x64x64 elements are in a row), while a sliced [128][64][64] is not. Under my proposal, you can either do the subslicing mentioned above, or you can copy. |
@btracey Suppose that we have data [6]T. We can represent it as a (row-major) 2x3 matrix by using the shape (2, 3) and strides 3, 1. The transpose of this matrix is representable on the same underlying data [6]T by using the shape (3, 2) and strides 1, 3. |
Yes, you can do that, but then you need n strides for n-dimensional data. All of the proposals have n-1 strides for an n-dimensional slice. While nice in theory, you lose a lot of speed that way. You want to guarantee at least one of the strides is 1 so you can take advantage of SIMD and caches (and even simpler, range). Row vs. Column major is a very good example of 'worse-is-better'. It sounds nice to allow both, but then every higher-level has to be written to support both orderings. Much better to choose and fix one, and then find how to support common operations efficiently (like we have with Transpose). |
Pardon me for butting in, don't you want have a block oriented data layout and not row or column for best efficiency? eg Morton-order Matrices Deserve Compilers ’ Support. Similar to graphics textures, https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/. Typically in graphics blocks are 4kb in size, except for sparse which are 64kb. Am I missing something or is it just a lot of work to support instead of row or column? Again if you guys have already thought about this please excuse my ignorance. |
Looking at the picture https://en.wikipedia.org/wiki/Z-order_curve#cite_note-12 , it's not clear to me that one could take an arbitrary view efficiently. Morton ordering also introduces other inconsistencies. One cannot slice an [m][n]T to get a [,]T (the ordering would need to change). It is an interesting concept though. I'd be interested in seeing a go-based parallel Dgemm computation and compare with what we have now. |
You're right, efficient arbitrary views aren't possible. Sounds like an experiment is an order. Parallel isn't the main goal of morton ordering, not thrashing the caches is why it's done. |
I worked on a similar problem (enhancing array/matrix support) for Java, but did not reach a conclusive answer. One thing similar in both Go and Java is the numerous requests for extensions along multiple feature axes. For example (I've *'d the ones I've used over the years, and arguably I also worked with matrices of constructive reals):
I may not have picked the best names for the feature axes. My goal was to come up with something that was good enough by default that "ordinary" users would pick it for their own bulk operations (basic linear algebra stuff, there was a machine learning team in Burlington happy to make feature requests along those lines) while still enabling people who knew how to do specialized stuff to use it and get good performance and then export that work to the "ordinary users" crowd. This might include "matrix on a GPU", this might include "specialized decomposition". I didn't finish -- this was exploratory, both in the direction of features and performance. One problem I ran into for Java (which already has generics) is that the type system prevents you from writing a library that supports extensions in multiple axes -- i.e., don't think that if you add generics to Go, all will be solved. You may not be able to say what you want, and this may be part of the motivation for loading this onto slices, because slices get to be polymorphic. The impression I get from this is that perhaps asking for language extensions in the direction of "features I want" is the wrong approach, because we'll quickly run into limits and the type system will thwart us at every turn. Instead, is it possible to do this with some sort of a generator, or a family of interacting generators? Would this work better if we added particular optimizations to the compiler (e.g., good loop invariant code motion and reduction in strength)? Is there something missing from the language that would make this work better, that we should be asking for instead of just a bullet list of features? What makes me suggest this is that specialized code generation seemed to always pop up as a necessary component of these things. In Java it had to happen at the last minute (or a little later -- in practice getting full performance required multiple recompilations) but there's no reason it could not happen ahead of time. |
Thanks for the reply. I do not feel like the general discussion is asking for a bullet list of features. Just one feature (rectangular data support), and the rest is how that data type interacts with the language. As you say 'is there something from the language that would make this work better', and we think yes, the table data type. Specifically to your points:
The point of the above discussion is that tables go a long way to address all of these issues, and this is not a Pandora's box for requesting more features (at least on our end). Compiler optimizations are extremely useful. SSA is moving toward bounds checking alleviation, and I hope they work on SIMD next (it's already there for copy). As documented in my proposal, however, tables make it much easier to implement such optimizations. Generators, a la go generate, are useful in dealing with the generics problem. In fact, we autogenerate the float32 BLAS implementation from the float64 implementation. Generics/Generators can only go so far though, the complex128 implementation has different behavior in many cases, so specialized code is necessary. I imagine this would be the case with a quaternion256 as well -- you'd need specialized code to say what "transpose" means (I suspect). |
Minor note regarding SIMD, the general direction to go in the near term seems along the lines of https://github.com/bjwbell/gensimd. I'm doing some rework to use the new SSA backend for it right now. It's why I got interested in this and the table proposal. |
This is a lot more feedback than I expected. @griesemer You're right: the proposed syntax doesn't work. The ambiguity between a strided slice of arity > 1 and a strided slice whose slice elements are strided slices is bad. A better syntax might be what @nsf suggested: @griesemer You're right: there is currently no obvious way to go from a strided slice back to a regular slice. At first I thought you could index into the strided slice and use a slice expression, but that doesn't work because the cap of the index expression will limit you to the stride. In order to go back based only on the strided slice, the strided slice would need to keep a copy of the cap of the original slice. That could be done, but I don't know how often people will want to do that operation. @nsf You're right: the arity number is confusing. Perhaps better to talk about the dimension, which is the arity plus 1, as you suggest. @nsf The suggestion of allowing an empty index to convert from the strided slice back to a regular slice is an interesting one, and it does seem implementable. I can't think of any problem with it off hand. @somadivad As far as I can see this approach isn't going to support a transpose operation. But I'm pretty sure that strided slices would be more efficient, and easier to set up, than slices of slices, because slices of slices require additional pointer fetching. @sbinet I think the implementation in the reflect package follows directly from the discussion of the internal implementation. @btracey I don't think your proposal is fundamentally flawed, though I do think it should plan for multiple dimensions right from the start. I think the main advantage of this proposal is that it allows for different strides across the same data. You've indicated in the past that you don't feel this is important, and it may not be, but it seems to me that there are people commenting here who would find that feature valuable. @btracey You're right that rectangular views are more awkward. You have to do them by dropping down to the original slice and doing a new strided slice. Something like @btracey I agree that cap is needed for append operations, it's just not clear to me why anybody would use append on a strided slice. You're right: you can't build up a matrix row by row using a strided slice. But your proposal doesn't have append either, and I'm actually not sure why you have a capacity, must less two capacities. @kortschak I don't know the optimization to which you are referring. @somadivad I don't think the language should support an operation like transpose that implicitly allocates and copies. That kind of operation should be done using a function call. |
I actually meant it like an ordinary conversion, e.g. we can convert a string to byte slice via The most important parts of my post however are these:
@ianlancetaylor |
@ianlancetaylor : Our notations are converging with different semantics. Below when I say [,]T, I am referring to my proposal, and I use the [*] notation for yours to distinguish.
The message we thought we were receiving from the Go team when it was originally written is different than the one now. Now that the initial proposal is in, I'll adapt it to include arbitrary dimensions.
I don't see how your sub-expression creates the correct view. If we have a 2-D slice, for example
We need to be able to do t2 := t[1:3, 1:3] to give us
where the data in parenthesis exists but isn't accessible through the sliced table. t2[0,0] == 5. In your proposal, we can do ss := a[5:13]*[4](I think that's the syntax), but that gives us
An extra parameter is still needed to remember the effective number of columns.
For example, building up a matrix one element at a time when the full size is not known ahead of time. In Gaussian Processes, for example, one bulids up a Kernel matrix, where K_i,j = kernel(x_i, x_j). It is very typical that a prediction is made with N points. This prediction gives you the N+1th point, and then you expand the kernel matrix given the new point. Something along the lines of:
Without slicing, either this make/copy needs to happen on each new data entry, or lots of meta data needs to be maintained about the effective size of the matrix. Capacities are necessary for both dimensions to see if you've sliced outside either bound, just like in normal slices. You can imagine code similar to the above, except extending an additional row or column. My proposal does not include behavior for append because it is not clear how to represent clean syntax for all of the possible cases, even though the behavior is useful. One can argue that only one capacity is needed since the stride is equal to the capacity. However, it was felt that if one can do 3-element slicing on []T, so should one be able to do so with [,]T.
What I intended to say is that (copy-free) views are much more important than (copy-free) re-sizing, and that the interaction between the behaviors is not simple. I have been thinking about this since your proposal, and have been considering what could be done, if anything, about it. In particular, there are especially nice cases for resize when trying to avoid allocations. For this reason, it is worth thinking about some form of strided slice expression to allow a []T to go to a [,]T (or a [,,]T or whatever). Possibly there can be a built-in function
Which turns a []T into a [,]T, with the given sizes (length = capacity) and the number of integer arguments must be fixed at compile time. Going in the other direction (the equivalent unpack function) is much trickier. The expression
does not return the original table if the table has been viewed. In terms of other languages, Matlab copies everything. The reshape function in Numpy will either copy or not depending if the ndarray has been sliced (Go's implementation would have to be predictable). Julia does seem to be headed toward both being copy free (during their arraymageddon release). This sounds great, except in the end the type of the data may end up as ReshapedArray{SubArray{ReshapedArray{... T ...}}}. This is the opposite of simple. Additionally, this highly complicates array accesses (see discussion: https://groups.google.com/forum/#!msg/julia-dev/7M5qzmXIChM/kOTlGSIvAwAJ and PR: JuliaLang/julia#10507). To quote directly from the discussion "Specifically, reshaping does not compose with subarray-indexing, so you need two types of views". In short, I don't see a way to implement arbitrary re-sizing and also support views. I do agree that the pack code listed above would be useful, though perhaps that's best left as a function in Reflect. |
If you have a pair of block diagonal matrices A and B such that
and XU, YV and ZW are legal matrix multiplies, then AB is
This depends on the capacity to take a rectangular view of A and B to create X, Y, Z, U, V and W. Despite your comment to the contrary, I don't believe your proposal does allow this without significant extra work, obviating the benefits of the proposal. You may contend that the sensible approach might have been to copy the views, perform the operations and copy back (assuming the case that the matrix cannot be treated as a collection of separable systems, in which case they should just be separate), but how? The absence of a capacity to make a rectangular view hampers this as well. |
While I don't have anything in the way of technical suggestions for this proposal, I just thought I'd throw my hat in with @btracey and @kortschak. Coming from an (nuclear) engineering background, an approach catering a bit to fast/natural matrix ops would be much preferred. |
I see what you mean now about views. We would need to know not just the stride, but also the length of the resulting slice. |
Here is an alternative proposal that tries to find a compromise between tables and strided slices. I tried to make it short, so it is not detailed at all, but I hope everything will be well understood from the examples. Strided slices are generated slicing a slice (or array) in two or more dimensions. A slice expression in N dimensions consists of N ranges separated by commas.
Indexing a strided slice of N dimensions returns a strided slice of N-1 dimensions, or a slice if N is 2. Sugar:
Slicing a strided slice with the number of dimensions it has generates another strided slice with the same number of dimensions.
The builtins len and cap applied to strided slices return the total number of indexable elements and the total number of contiguous elements in memory, respectively (therefore, if capacity is larger than length, it means that the strided slice has elements that cannot be accessed).
Two new builtins: dim returns []int with the dimensions and flat returns a slice with all the indexable elements (requires allocation and copying if len != cap, else it returns a slice of the original data)
Range expressions work like for slices (from 0 to dim(ss)[0]), with the addition of a new form with index variables for each dimension:
Implementation of strided slice of N dimensions (other representations are possible):
What do you think? |
Some quick thoughts. Overview Of course, this also brings inconsistencies. Slicing only works upwards, not downwards. You can "downslice", but only along one of the dimensions. However, these are inherent in any proposal that keeps slices unstrided and allows views on rectangles. Details: This is more clear. The number of commas in the address is the number of dimensions -1, and the number of : is the dimension of the returned type. Colons must be continuous, which is also easy to asses. Capacity:
in order to support views (see my above discussion). The consequence of this is that if len and cap still return an int, both a dims and a capDims are needed. At that point, it's not clear what utility the single return len and cap serve. This is especially true since cap can be computed from the [N]int, and it's not clear what len actually means as far as views are concerned. Range:
if s is a []int or a [,]int. Both loop over all of the elements, both return two integers, but v is something very different in the two cases. It seems like this could result in tricky bugs. The same is true for an N dimensional table and an N-1 dimensional []int. I can't speak for other domains, but there is probably at least a 10:1 ratio with "loop over a single dimension" vs. "loop over all elements". It's not clear to me that the extra complexity and edge case is worth it. In single dimensions range is tricky. Currently range returns a copy of the element in all cases. In your proposal, it instead returns a reference to a subset. No other case works like that. Similarly, under your proposal, it is only possible to range over the "rows" of a slice. There does not appear to be a way to range over the columns. Lastly, constructing a table/slice header is significantly enough more expensive than doing an access, which would likely limit use in real code. For these reasons, it seems to me like my proposal for range is more pragmatic. Slice expressions:
Conclusion: For the reasons expressed above, I would learn toward adding those features to my proposal, and keeping my semantics for len and range. Finding a good syntax for restriding seems to be an open issue. |
Thanks for your comments.
I do not know what you mean by "address overloading". I have left away of the proposal the case of mixing single and multiple indices for brevity. The idea is that only one index can be used in strided slices, multiple indices separated by comas are just syntactic sugar with the rule
The behavior with one index is like Ian's strided slices. Applying this syntactic rule, the tables syntax is obtained. This norm could even be extended to slices of slices and arrays of arrays. It is a feature completely orthogonal to the a new type that could be discussed later.
Unless I am missing something, I think views are allowed in the implementation I propose. Please, let me know what is the exact problem you see. For example:
While cap gives you the total number of elements you are storing, len tells you how many you can actually use. The utility of cap is indeed quite limited, but it is there to let you know if a flat operation will require an allocation or not. Also, in an extreme case like m[:2,:10000][:2,:2], it can be useful to know that you are storing more than 10000 elements when you are using only 4 (for example, you may decide to flat and reshape when len/cap reaches a certain ratio).
This case would return two integers in case s is a []int, but not if it is a [,]int. It would return an int and a []int. To get both indices you would use I think you are misunderstanding something in my proposal. Range for strided slices works exactly like for slices of slices:
In the new case proposed, all the indices must be present ( To iterate over the elements in the column you would slice a column and iterate over all the elements (
I was trying to mimic the behavior of slicing an array. The problems here are the same as when we do
Something I am trying with my proposal is to reduce it as much as possible. Initially, I think it could consist of only a new type, the slicing in multiple dimensions operation (which could be a reshape builtin), and minimal changes to len, maybe cap and range to support the new type. I think this would give us enough to have views. What would you be missing for gonum then? Additional features could be added later, eventually converging to something very similar to the tables in your proposal, but with multiple dimensions. For example, it would be possible to add more builtins (like dim, flat or reshape and support in make, copy and maybe append), syntactic sugar (like multiple indices separated by comas and new range expressions) and maybe capacities in multiple dimensions. It would be great if we could find a general agreement in the basic type first. |
I just meant I find the multiple options confusing. I think it's clearer to have the syntax be different for slices of slices and tables, but that could just be me.
Yes, but the multi-dimensional capacity is there to let you know if a multi-dimensional operation will need an allocation or not. In your example
For example, I can now perform
to expand the columns (for instance). The capacities are needed to say if this "extension slice" is within bounds. Similar, a multi-dimensional capacity is needed in order to support 3-element slicing.
I agree that solves my concern, but I don't think it's tenable. Every other range action has the last element be optional, not required. This would create an inconsistency in the language.
I think you mean
Yes, I agree that that does range over a column, I did not see that trick earlier. However, it does not solve the problem highlighted in my proposal which is allowing range to work even for tables of length 0. Here, I have to slice to a particular column, so if there are no columns the operation will panic. If I want to loop over an arbitrary column, I will need something like
This may not be as big of a problem in your range syntax as you can range over a row without worry, i.e. something like
loops over every column row-wise and works for any since. I'm not sure, especially since enabling this requires both range syntaxes.
It's true, but a slice is already a sliced array. For both arrays and slices, a[:5] gives you the first 5 elements of that entity, and thus has the same conceptual meaning. With s[:4,:4], if s is a slice it gives you all of the same elements, just now in a different looking form. If s is already a table, it gives you a shorter or longer view of the elements. The operations are conceptually different, even though there is overlapping syntax. |
As I said, I think we should agree in the concept before discussing the syntax. I do not think it makes much sense to have a lengthy discussion about the details of how to use range with the new type until we have reached an agreement on what the semantics of this new type should be. As I see it, the main difference between strided slices (either in Ian's proposal or mine) and tables is that accessing a strided slice (indexing it, always with a single index) gives you a strided slice of lower rank, or a normal slice if the rank is 2. Tables, on the other hand, introduce a new concept, which we could call "multidimensional indexing", giving you direct access to the inner elements without intermediary types. Both have pros and cons. I find the former much easier to think about, since it is the same behavior we are used to when working with slices of slices and arrays of arrays, making possible to apply many already well known rules. Multidimensional indices are however much more convenient to work with, because they match better hand-written matrix indexing. I proposed the concept of strided slices with syntax sugar for multidimensional indices to get the best of both worlds, but I think we can discuss the most convenient concept first and worry about the syntax later. There may also be performance considerations (creating the intermediate types could be relatively expensive, as you point out in a previous comment), but I hope this is something the compiler can easily handle. We should also decide is multiple capacities are something we really need and something we really need from the first day. In my opinion, the most basic concept to work with multidimensional data is the strided slice, more or less as defined by Ian. In order to support what we are calling "views" (arbitrary selections with lower and upper limits in each dimension), we need to store dimensions too. This is what my proposal tries to achieve. If we want to support "up-slicing" too then, indeed, we will also need capacities, as in your proposal. I think we all agree that strided slices as initially defined here are too basic to be useful for scientific work, and in particular for gonum. We need views. While I can imagine situations in which up-slicing may be convenient, I think it is something we may live without (it is not something I've ever missed in Fortran, for example). And we could always add it later if there is a real need. But of course, this is a very personal opinion. Would you say this is an essential feature for gonum, or is it more in the nice-to-have category? Would you accept a proposal that, at least initially, did not include this feature? |
First of all, it has been my impression that a partial proposal is a non-starter. Agreed that the index semantics are a second order issue. I tried to detail it in the text of the proposal, but maybe it's better to detail our arguments here. The starting point of our proposal is that tables should match slices as much as possible. This seems like a reasonable starting point given the consistency goals of Go. What are the properties of slices?
Given these properties, the following properties seem extremely natural for a "multi-dimensional slice"
The above behavior, to me, feels like an extremely natural extension to slices and their fundamental behavior. For example, under this definition a single slice is just a special case of a table. Taking as a given that slicing in multiple dimensions should feel like slicing in a single dimension, lots of things follow.
There are lots of questions that are not answered by the desiderata. The most important are the behaviors you an Ian suggest
If the answer to the above are both yes, it has one set of ramifications for the rest of the defining behavior of tables (syntax, len, range etc.). If no, it has a different set. Go wants to be consistent, so I think the desiderata I propose are a necessary starting point (as they make "multi-dimensional slices" feel like "uni-dimensional slices"). This is what I think the concept should be. |
Consistency is of course a requirement, but it should not be the main goal of a new feature. I agree that capacities add consistency with slices, but they also add quite some complexity. There should be a real need for the feature to justify that complexity (a real need would be, for example, simplifying considerably some gonum libraries). It is a feature that I have never used in other languages, and I have never missed it, but if it is essential for other users I have nothing against it. |
@yiyus You proposed this implementation: I suppose that you are assuming that the last stride is 1. I think it would be fine for this to be true in the first version, but I think that the representation needs to be future-proof, so I think it should store N strides. By building the N into the representation, you are assuming that [,]T is a different type to [,,]T, and so on. I'm not sure that this is essential or desirable. When working with torch tensors or numpy ndarrays, I don't think of tensors of different dimensions as belonging to different types. Indeed, the objects can be mutated in ways that change the number of dimensions. (Of course, those are both within dynamic languages, but the point stands.) |
"but I hope this is something the compiler can easily handle" I'd love to get to that in a timely fashion, but I'm currently attending to the hairy details (signed integer overflow is a thing) of bounds check elimination. From where I sit the interesting question -- both for performance and for capability -- is whether the last dimension is contiguous or not. (I think any requirement of row-to-row contiguity is a mistake; I think Dan Kortschak's block-diagonal example is compelling, and fork-join-parallelism generates similarly non-contiguous-row aliased submatrices.) I'm assuming that people view the potential for aliasing of sub-things as a feature (because they view not-aliasing of sub-things as a performance problem). If you assume that the compiler is doing a modest amount of optimization (and 1.6 does not yet, and 1.7 might not reach this bar) -- and by this I mean
Where this will cost is in compilers that don't yet have that much optimization in them, or for random access to tables. What would normally be Given that random access also requires bounds checks (more or less, sometimes we can get clever with the modulus operator) and given that if it's truly "random" there's probably something else going on, plus if it's random and large then it's not in cache, I'm not sure I want to commit to this (premature?) optimization because we're that worried about performance. There are other optimizations that deal with this cost if it turns out to be widespread and significant -- if it's a hot loop with table.stride[0] == 1, then we multiversion the compiled code. (This is a more out-there optimization because of costs if applied willy-nilly, but it's not conceptually hard at all -- easier than reduction-in-strength and bounds-check elimination for example) It seems to me that this boils down to a tradeoff between a few features. On the one hand, we have the ability to represent both row and column-major tables, and to (for example) alias a 1-dimensional table onto the diagonal of a 2-d table. On the other hand, we have the ability to regard access to the last dimension as if it were access to a slice -- and it could be a slice -- which gives us the ability to reuse some existing slice-handling code. |
Thanks for the discussion. This proposal clearly does not meet the bar for a language change, and I'm withdrawing it. |
This proposal is intended as an alternative approach to the ideas discussed in https://golang.org/cl/16801 for #6282. This is for discussion only at this stage.
In Go a slice can be seen as a view into an array of some type T. The slice defines the start and length of the view. A slice also has a capacity, but that is mainly relevant when building an array. Once the size of an array is determined, a slice is a view into that array.
This is inconvenient for people who want to deal with multi-dimensional data. Arguments for this can be found in the links above, and I won't rehearse them here. What I want to propose here is a different approach to multi-dimensional data.
Go is a relatively low level language that pays attention to memory layout. The case we are discussing here is a multi-dimensional view of data that is in fact laid out linearly in memory. For cases where the data is not laid out linearly, one would naturally use a slice of slices, or some data structure involving pointers.
I propose adding a new type to the language: a strided slice. A strided slice of
T
is represented as[*]T
. A strided slice has a backing array just as a regular slice does. However, the strided slice also has a stride, an integer. Each element of the strided slice is not a single element of the backing array, but the number of elements described by the stride. Indexing into a strided slice thus produces a slice.A strided slice is created from an array or slice using a strided slice expression, whose syntax is
*[stride]
. The strided slice starts at the beginning of the array or slice, and its length is determined by dividing the length of the array or slice by the stride using a truncating division. A strided slice has no capacity (or, if you prefer, the capacity equals the length). Given an array[N]T
or slice[]T
, a strided slice expression produces a strided slice of type[*]T
.In the type
[*]T
T
can itself be a strided slice. For convenience we speak of the arity of a strided slice. The arity of[*]T
is 1 ifT
is not a strided slice. Otherwise, it is the arity ofT
plus 1. The arity is the number of strides in the strided slice. To be clear:[]int
is a normal slice, which could be said to have arity 0.[*]int
is a strided slice of arity 1.[*][*]int
is a strided slice of arity 2.In the following let's suppose we have an array or slice
a
of typeT
. Let's suppose we writess := a*[S]
, giving us a strided slice of strideS
and arity 1.A strided slice can be used in an index expression. The result is type
[]T
. The expressionss[i]
is equivalent toa[i*S:(i+1)*S:(i+1)*S]
. Bounds errors are checked just as for that slice expression. This value is not addressable and is not an lvalue. In fact, that is true of all expressions on a strided slice.A strided slice can be used in a slice expression with two elements, producing a value of type
[*]T
. The expressionss[i:j]
is equivalent toa[i*S:j*S]*[S]
.A strided slice can be used in a strided slice expression, producing a value of type
[*][*]T
, which is a strided slice of arity 2. The expressionss*[S2]
produces a strided slice whose first stride isS2
and whose second stride isS
.Now we discuss how these operations work on a strided slice of arity
N
whereN > 1
. Again the type is[*]T
, butT
is itself a strided slice. Assume we havess := a*[S]*[S2]*[S3]...
for some sequence ofS2 S3...
.The expression
ss[i]
produces a slice of strided slices of type[]T
. It is equivalent toa[i*S:(i+1)*S]*[S2]*[S3]...
.The expression
ss[i:j]
produces a value of type[*]T
. It is equivalent toa[i*S:j*S]*[S]*[S2]*[S3]...
.The expression
ss*[S0]
produces a value of type[*][*]T
. This produces a strided slice of arityN + 1
whose first stride isS0
, with following stridesS
,S2
,S3
, ....A strided slice may be used in a range statement. There are two forms. In the usual form, the range produces two values: an index into the strided slice and the result of the equivalent index expression (which will be a slice or strided slice). In the longer form, it may be used with exactly
N + 2
values. The first value is the index using the outermost stride, then the index of the next stride, and so on. The penultimate value is the index into the normal slice that is the result of indexing into the final strided slice. The final value is the value in the underlying array.The predeclared function
len
applied to a strided slice returns the number of valid indexes. For a strided slice of arity 1 and strideS
and backing arraya
, this islen(a)/S
using a truncating division.The predeclared functions
cap
andappend
may not be applied to a strided slice.A new predeclared function
stride
returns the stride of a strided slice, a value of typeint
.Examples
Here are a couple of examples of using strided slices.
Implementation
A strided slice of arity 1 has a backing array, a length, and a stride. Thus the internal representation looks exactly like a normal slice, with the capacity field replaced by a stride field.
A strided slice of arity 2 has a backing array, a length, and two strides. In general, the internal representation of a strided slice of arity N is
struct {
array *T
len int
stride [N]int
}
The compiler always knows the arity of any given slice, so implementing the index and slice expressions is straightforward.
Rationale
Strided slices provide a general way to implement multi-dimensional arrays with no limit on the number of dimensions. They provide a flexible way to access linear data in multiple dimensions without requiring hard coded sizes or explicit multiplication. They are a natural extension of the slice concept into multiple dimensions.
The text was updated successfully, but these errors were encountered: