diff --git a/blog/_posts/2016-03-27-arrays-iteration.md b/blog/_posts/2016-03-27-arrays-iteration.md new file mode 100644 index 0000000000..45cb1ddcd4 --- /dev/null +++ b/blog/_posts/2016-03-27-arrays-iteration.md @@ -0,0 +1,429 @@ +--- +layout: post +title: "Generalizing AbstractArrays: opportunities and challenges" +author: Tim Holy +--- + +# Introduction: generic algorithms with AbstractArrays + +Somewhat unusually, this blog post is future-looking: it mostly +focuses on things that don't yet exist. Its purpose is to lay out the +background for community discussion about possible changes to the core +API for `AbstractArray`s, and serves as background reading and +reference material for a more focused "julep" (a julia enhancement +proposal). Here, often I'll use the shorthand "array" to mean +`AbstractArray`, and use `Array` if I explicitly mean julia's concrete +`Array` type. + +As the reader is likely aware, in julia it's possible to write +algorithms for which one or more inputs are only assumed to be +`AbstractArray`s. This is "generic" code, meaning it should work +(i.e., produce a correct result) on any specific concrete array type. +In an ideal world---which julia approaches rather well in many +cases---generality of code should not have a negative impact on its +performance: a generic implementation should be approximately as fast +as one restricted to specific array type(s). This implies that +generic algorithms should be written using lower-level operations that +give good performance across a wide variety of array types. + +Providing efficient low-level operations is a different kind of design +challenge than one experiences with programming languages that +"vectorize" everything. When successful, it promotes much greater +reuse of code, because efficient, generic low-level parts allow you to +write a wide variety of efficient, generic higher-level functions. + +Naturally, as the diversity of array types grows, the more careful we +have to be about our abstractions for these low-level operations. + +# Examples of arrays + +In discussing general operations on arrays, it's useful to have a +diverse collection of concrete arrays in mind. + +In core julia, some types we support fairly well are: + +- `Array`: the prototype for all arrays + +- `Range`s: a good example of what I often consider a "computed" + array, where essentially none of the values are stored in + memory. Since there is no storage, these are immutable containers: + you can't set values in individual slots. + +- `BitArray`s: arrays that can only store 0 or 1 (`false` or `true`), + and for which the internal storage is packed so that each entry + requires only one bit. + +- `SubArray`s: the problems this type introduced, and the resolution + we adopted, probably serves as the best model for the + generalizations considered here. Therefore, this case is discussed + in greater detail below. + +Another important class of array types in Base are sparse arrays: +`SparseMatrixCSC` and `SparseVector`, as well as other sparse +representations like `Diagonal`, `Bidiagonal`, and `Tridiagonal`. +These are good examples of array types where access patterns deserve +careful thought. Notably, despite many commonalities in "strategy" +among the 5 or so sparse parametrizations we have, implementations of +core algorithms (e.g., matrix multiplication) are specialized for each +sparse-like type---in other words, these mimic the "high level +vectorized functions" strategy common to other languages. What we lack +is a "sparse iteration API" that lets you write the main algorithms of +sparse linear algebra efficiently in a generic way. Our current model +is probably fine for `SparseLike*Dense` operations, but gets to be +harder to manage if you want to efficiently compute, e.g., `Bidiagonal*SparseMatrixCSC`: the number of possible combinations you have to +support grows rapidly with more sparse types, and thus represents a +powerful incentive for developing efficient, generic low-level +operations. + +Outside of Base, there are some other mind-stretching examples of +arrays, including: + +- `DataFrames`: indexing arrays with symbols rather than + integers. Other related types include `NamedArrays`, `AxisArrays`. + +- `Interpolations`: indexing arrays with non-integer floating-point + numbers + +- `DistributedArrays`: another great example of a case in which you + need to think through access patterns carefully + +# SubArrays: a case study + +For arrays of fixed dimension, one can write algorithms that index +arrays as `A[i,j,k,...]` (good examples can be found in our linear +algebra code, where everything is a vector or matrix). For algorithms +that have to support arbitrary dimensionality, for a long time our +fallback was linear indexing, `A[i]` for integer `i`. However, in +general SubArrays cannot be efficiently accessed by a linear index +because it results in call(s) to `div`, and `div` is slow. This is a +CPU problem, not a Julia-specific problem. The slowness of `div` is +still true despite the [recent addition of +infrastructure](https://github.com/JuliaLang/julia/pull/15357) to make +it much faster---now one can make it merely "really bad" rather than +["Terrible, Horrible, No Good, and Very +Bad"](https://en.wikipedia.org/wiki/Alexander_and_the_Terrible,_Horrible,_No_Good,_Very_Bad_Day). + +The way we (largely) resolved this problem was to make it possible to +do cartesian indexing, `A[i,j,k,...]`, for arrays of arbitrary +dimensionality (the `CartesianIndex` type). To leverage this in +practical code, we also had to extend our iterators with the `for I in +eachindex(A)` construct. This allows one to select an iterator that +optimizes the efficiency of access to elements of `A`. In generic +algorithms, the performance gains were not small, sometimes on the +scale of ten- to fifty-fold. These types were described in a +[previous blog post](http://julialang.org/blog/2016/02/iteration). + +To my knowledge, this approach has given Julia one of the most +flexible yet efficient "array view" types in any programming language. +Many languages base views on array *strides*, meaning situations in +which the memory offset is regular along each dimension. Among other +things, this requires that the underlying array is dense. In +contrast, in Julia we can easily handle non-strided arrays (e.g., +sampling at `[1,3,17,428,...]` along one dimension, or creating a view +of a `SparseMatrixCSC`). We can also handle arrays for which there is +no underlying storage (e.g., `Range`s). Being able to do this with a +common infrastructure is part of what makes different optimized array +types useful in generic programming. + +It's also worth pointing out some problems: + +- Most importantly, it requires that one adopt a slightly different + programming style. Despite being well into another release cycle, + this transition is still [not complete, even in Base](https://github.com/JuliaLang/julia/pull/15434#issuecomment-194991739). + +- For algorithms that involve two or more arrays, there's a + possibility that their "best" iterators will be of different + types. *In principle*, this is a big problem. Consider matrix-vector + multiplication, `A[i,j]*v[j]`, where `j` needs to be in-sync for + both `A` and `v`, yet you'd also like all of these accesses to be + maximally-efficient. *In practice*, right now this isn't a burning + problem: even if our arrays don't all have efficient linear + indexing, to my knowledge all of our (dense) array types have + efficient cartesian indexing. Since indexing by `N` integers (where + `N` is equal to the dimensionality of the array) is always + performant, this serves as a reliable default for generic code. + (It's worth noting that this isn't true for sparse arrays, and the + lack of a corresponding generic solution is probably the main reason + we lack a generic API for writing sparse algorithms.) + +Unfortunately, I suspect that if we want to add support for certain +new operations or types (specific examples below), it will force us to +set the latter problem on fire. + +# Challenging examples + +Some possible new `AbstractArray` types pose novel challenges. + +## `ReshapedArrays` ([#15449](https://github.com/JuliaLang/julia/pull/15449)) + +These are the front-and-center motivation for this post. These are +motivated by a desire to ensure that `reshape(A, dims)` always returns +a "view" of `A` rather than allocating a copy of `A`. (Much of the +urgency of this julep goes away if we decide to abandon this goal, in +which case for consistency we should always return a copy of `A`.) +It's worth noting that besides an explicit `reshape`, we have some +mechanisms for reshaping that currently cause a copy to be created, +notably `A[:]` or `A[:, :]` applied to a 3D array. + +Similar to `SubArrays`, the main challenge for `ReshapedArrays` is +getting good performance. If `A` is a 3D array, and you reshape it to +a 2D array `B`, then `B[i,j]` must be expanded to `A[k,l,m]`. The +problem is that computing the correct `k,l,m` might result in a call +to `div`. So ReshapedArrays violate a crutch of our current ecosystem, +in that indexing with `N` integers might not be the fastest way to +access elements of `B`. From a performance perspective, this problem +is substantial (see [#15449](https://github.com/JuliaLang/julia/pull/15449), about five- to ten-fold). + +In simple cases, there's an easy way to circumvent this performance +problem: define a new iterator type that (internally) iterates over +the parent `A`'s indexes directly. In other words, create an iterator +so that `B[I]` immediately expands to `A[I']`, and so that the latter +has "ideal" performance. + +Unfortunately, this strategy runs into a lot of trouble when you need +to keep two arrays in sync: if you want to adopt this strategy, you +simply can't write `B[i,j]*v[j]` for matrix-vector multiplication +anymore. A potential way around *this* problem is to define a new class +of iterators that operate on specific dimensions of an array ([#15459](https://github.com/JuliaLang/julia/pull/15459)), +writing `B[ii,jj]*v[j]`. `jj` (whatever that is) and `j` need to be +in-sync, but they don't necessarily need to both be integers. Using +this kind of strategy, matrix-vector multiplication + +```jl +for j = 1:size(B, 2) + vj = v[j] + for i = 1:size(B, 1) + dest[i] += B[i,j] * vj + end +end +``` + +might be written in a more performant manner like this: + +```jl +for (jj, vj) in zip(eachindex(B, Dimension{2}), v) + for (i, ii) in zip(eachindex(dest), eachindex(B, (:, jj))) + dest[i] += B[ii,jj]*vj + end +end +``` + +It's not too hard to figure out what `eachindex(B, Dimension{2})` and +`eachindex(B, (:, jj))` should do: `ii`, for example, could be a +`CartesianInnerIndex` (a type that does not yet exist) that for a +particular column of `B` iterates from `A[3,7,4]` to `A[5,8,4]`, where +the `d`th index component wraps around at `size(A, d)`. The +big performance advantage of this strategy is that you only have to +compute a `div` to set the bounds of the iterator on each column; the +inner loop doesn't require a `div` on each element access. No doubt, +given suitable definition of `jj` one could be even more clever and +avoid calculating `div` altogether. To the author, this strategy +seems promising as a way to resolve the majority of the performance +concerns about ReshapedArrays---only if you needed "random access" +would you require slow (integer-based) operations. + +However, a big problem is that compared to the "naive" implementation, +this is rather ugly. + + +## Row-major matrices, PermutedDimensionArrays, and "taking transposes seriously" + +Julia's `Array` type stores its entries in column-major order, meaning +that `A[i,j]` and `A[i+1,j]` are in adjacent memory locations. For +certain applications---or for interfacing with certain external code +bases---it might be convenient to support row-major arrays, where +instead `A[i,j]` and `A[i,j+1]` are in adjacent memory locations. More +fundamentally, this is partially related to one of the most +commented-on issues in all of julia's development history, known as +"taking transposes seriously" aka [#4774](https://github.com/JuliaLang/julia/issues/4774). There have been at least two +attempts at implementation, [#6837](https://github.com/JuliaLang/julia/pull/6837) and the `mb/transpose` branch, and +for the latter a summary of benefits and challenges was [posted](https://github.com/JuliaLang/julia/issues/4774#issuecomment-149349751). + +One of the biggest challenges mentioned was the huge explosion of +methods that one would need to support. Can generic code come to the +rescue here? There are two related concerns. The first is linear +indexing: oftentimes this is conflated with "storage order," i.e., +given two linear indexes `i` and `j` for the same array, the offset in +memory is proportional to `i-j`. For row-major arrays, this +notion is not viable, because otherwise a loop + +```jl +function copy!(dest, src) + for i = 1:length(src) + dest[i] = src[i] # trouble if `i` means "memory offset" + end + dest +end +``` + +would end up taking a transpose if `src` and `dest` don't use the same +storage order. Consequently, a linear index has to be defined in +terms of the corresponding cartesian (full-dimensionality) index. +This isn't much of a real problem, because it's one we know how to +solve: use `ind2sub` (which is slow) when you have to, but for +efficiency make row major arrays belong to the category (`LinearSlow`) +of arrays that defaults to iteration with cartesian indexes. Doing so +will ensure that if one uses generic constructs like `eachindex(src)` +rather than `1:length(src)`, then the loop above can be fast. + +The far more challenging problem concerns cache-efficiency: it's much +slower to access elements of an array in anything other than +[storage-order](http://julialang.org/blog/2013/09/fast-numeric). Some +reasonably fast ways to write matrix-vector multiplication are + +```jl +for j = 1:size(B, 2) + vj = v[j] + for i = 1:size(B, 1) + dest[i] += B[i,j] * vj + end +end +``` +for a column-major matrix `B`, and + +```jl +for i = 1:size(B, 1) + for j = 1:size(B, 2) + dest[i] += B[i,j] * v[j] + end +end +``` + +for a row-major matrix. (One can do even better than this by using a +scalar temporary accumulator, but let's not worry about that here.) +The key point to note is that the order of the loops has been +switched. + +One could generalize this by defining a `RowMajorRange` iterator +that's a lot like our `CartesianRange` iterator, but traverses the +array in row-major order. `eachindex` claims to return an "efficient +iterator," and without a doubt the `RowMajorRange` is a (much) more +efficient iterator than a `CartesianRange` iterator for row-major +arrays. So let's imagine that `eachindex` does what it says, and +returns a `RowMajorRange` iterator. Using this strategy, the two +algorithms above can be combined into a single generic implementation: + +``` +for I in eachindex(B) + dest[I[1]] += B[I]*v[I[2]] +end +``` + +Yay! Score one for efficient generic implementations. + +But our triumph is short-lived. Let's return to the example of +`copy!` above, and realize that `dest` and `src` might be two +different array types, and therefore might be most-efficiently indexed +with different iterator types. We're tempted to write this as + +```jl +function copy!(dest, src) + for (idest, isrc) in zip(eachindex(dest), eachindex(src)) + dest[idest] = src[isrc] + end + dest +end +``` + +Up until we introduced our `RowMajorRange` return-type for +`eachindex`, this implementation would have been fine. But we just +broke it, because now this will incorrectly take a transpose in +certain situations. + +In other words, without careful design the goals of +"maximally-efficient iteration" and "keeping accesses in-sync" are in +conflict. + +## OffsetArrays and the meaning of AbstractArray + +Julia's arrays are indexed starting at 1, whereas some other languages +start numbering at 0. If you take comments on various blog posts at +face value, there are vast armies of programmers out there eagerly +poised to adopt julia, but who won't even try it because of this +difference in indexing. Since recruiting those armies will lead to +world domination, this is clearly a problem of the utmost urgency. + +More seriously, there *are* algorithms which simplify if you can index +outside of the range from `1:size(A,d)`. In my own lab's internal +code, we've long been using a `CenterIndexedArray` type, in which such +arrays (all of which have odd sizes) are indexed over the range `-n:n` +and for which 0 refers to the "center" element. One package which +generalizes this notion is `OffsetArrays`. Unfortunately, in practice +both of these array types produce segfaults (due to built-in +assumptions about when `@inbounds` is appropriate) for many of julia's +core functions; over time my lab has had to write implementations +specialized for `CenterIndexedArrays` for quite a few julia functions. + +`OffsetArrays` illustrates another conceptual challenge, which can +easily be demonstrated by `copy!`. When `dest` is a 1-dimensional +`OffsetArray` and `src` is a standard `Vector`, what should `copy!` +do? In particular, where does `src[1]` go? Does it go in the `first` +element of `dest`, or does it get stored in `dest[1]` (which may not +be the first element). + +Such examples force us to think a little more deeply about what an +array really is. There seem to be two potential conceptions. One is +that arrays are lists, and multidimensional arrays are +lists-of-lists-of-lists-of... In such a world view, the right thing +to do is to put `src[1]` into the first slot of `dest`, because 1 is +just a synonym for `first`. However, this world view doesn't really +endow any kind of "meaning" to the index-tuple of an array, and in +that sense doesn't even include the distinction conveyed by an +`OffsetArray`. In other words, in this world an `OffsetArray` is +simply nonsensical, and shouldn't exist. + +If instead one thinks `OffsetArray`s should exist, this essentially +forces one to adopt a different world view: arrays are effectively +associative containers, where each index-tuple is the "key" by which +one retrieves a value. With this mode of thinking, `src[1]` should be +stored in `dest[1]`. + +# Formalizing AbstractArray + +These examples suggest a formalization of `AbstractArray`: + +- `AbstractArray`s are specialized associative containers, in that the + allowable "keys" may be restricted by more than just their julia + type. Specifically, the allowable keys must be representable as a + cartesian product of one-dimensional lists of values. The allowed + keys may depend not just on the array type but also the specific + array (e.g., its size). Attempted access by keys that cannot be + converted to one of the allowed keys, for that specific array, + result in `BoundsError`s. + +- For any given array, one must be able to generate a + finite-dimensional parametrization of the full domain of valid keys + from the array itself. This might only require knowledge of the + array size, or the keys might depend on some internal storage (think + `DataFrames` and `OffsetArrays`). In some cases, just the array + type might be sufficient (e.g., `FixedSizeArrays`). By this + definition, note that a `Dict{ASCII5,Int}`, where `ASCII5` is a type + that means an ASCII string with 5 characters, would qualify as a + 5-dimensional (sparse) array, but that a `Dict{ASCIIString,Int}` + would not (because there is no length limit to an `ASCIIString`, and + hence no finite dimensionality). + +- An array may be indexed by more than one key type (i.e., keys may + have multiple parametrizations). Different key parametrizations are + equivalent when they refer to the same element of a given + array. Linear indexes and cartesian indexes are simple examples of + interconvertable representations, but specialized iterators can + produce other key types as well. + +- Arrays may support multiple iterators that produce non-equivalent + key sequences. In other words, a row-major matrix may support both + `CartesianRange` and `RowMajorRange` iterators that access elements + in different orders. + +# Finding a way forward + +Resolving these conflicting demands is not easy. One approach might be +to decree that some of these array types simply can't be supported +with generic code. It is possible that this is the right +strategy. Alternatively, one can attept to devise an array API that +handles all of these types (and hopefully more). + +In GitHub issue +[#15648](https://github.com/JuliaLang/julia/issues/15648), we are +discussing APIs that may resolve these challenges. Readers are +encouraged to contribute to this discussion.