Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julep: Generalize indexing with and by Associatives #24019

Open
andyferris opened this issue Oct 6, 2017 · 54 comments
Open

Julep: Generalize indexing with and by Associatives #24019

andyferris opened this issue Oct 6, 2017 · 54 comments
Labels
collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself julep Julia Enhancement Proposal

Comments

@andyferris
Copy link
Member

andyferris commented Oct 6, 2017

I've always loved how in Julia (and MATLAB) one can create a new array from an old one, using what is now the APL indexing rules. Basically if you index a collection of values with a collection of indices, you get a new collection of the indexed values. Beautiful, simple. Indexing has also been extended by allowing arrays that don't use 1-based indexing by e.g. the OffsetArrays.jl package.

I'm not sure if this issue exists elsewhere as its own entity (cleaning up distinctions between arrays and associatives was surely mentioned in #20402 and this Julep seems to be a logical extension of #22907), but here I propose specifically that we extend indexing of and by Associative and make related changes so that the semantics are consistent across these two types of container. I prototyped ideas at https://github.com/andyferris/AssociativeArray.jl and basically came up with the ability to (with simple code):

  • Index an Associative{K,V} with an Associative{I,K} to get an Associative{I,V}. E.g. Dict(:a=>1, :b=>2, c:=>3)[Dict("a"=>:a, "c"=>:c)] == Dict("a"=>1, "c"=>3).
  • Index an Associative{K,V} with an AbstractArray{K,N} to get an AbstractArray{V,N}. E.g. Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1].
  • Index an AbstractArray{T,N} with an Associative{K,I} to get an Associative{K,T} (where I might be Int for linear indexing, or a CartesianIndex{N} for Cartesian indexing). E.g. [11,12,13][Dict(:a=>1, :c=>3)] == Dict(:a=>11, :c=>13).

The semantics are consistent across arrays and dictionaries, and provide that for out = a[b]:

  • The output container out shares the indices of b (note: these are CartesianRange for arrays)
  • The values out[i] correspond to a[b[i]].

This is fully consistent with both the Base arrays and the OffsetArrays.jl package (We can do something similar for setindex!).

To make everything consistent, it helps to make the following associated changes:

  • Make Associatives be containers of values, not of index=>value pairs, so that arrays and dictionaries are consistent on this fundamental point. Use the existing pairs function when necessary (and ideally make it preserve indexability).
  • Make similar always return a container with the same indices, even for dictionaries. Ideally, unify similar across Associatives and Arrays (for example a dictionary which is similar to a distributed array might also be distributed) via use of the indices.
  • Have an new empty function that makes empty Dicts and Vectors to which elements should be added. (Done, Add empty and change similar(::Associative) #24390).
  • Consider whether we want to have collection of things you call getindex and setindex! with be called indices, rather than keys (and rename the current indices(::AbstractArray) to something else)
  • Have view work for the various combinations where getindex works.

The demonstration package also prototypes making AbstractArray{T, N} <: Associative{CartesianIndex{N}, T} - I don't think this is strictly necessary but it helped (me) to highlight which parts of the existing interface were inconsistent. The package does demonstrate that we can put something simple together without excessive amounts of code (some performance tuning is surely required).

Finally, a word on what motivates this: lately I've been playing with what fundamental data operations (such as mapping, grouping, joining or filtering) would be useful for both generic data structures and tables/dataframes (that iterate rows), and I found whenever I created say a grouping (using a dictionary of groups), I immediately felt the loss of ability to do complex indexing and other operations with the result (as well have to worry whether the output iterates values or key-value pairs, etc).

@andyferris andyferris added collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself julep Julia Enhancement Proposal labels Oct 6, 2017
@andyferris
Copy link
Member Author

andyferris commented Oct 6, 2017

I should have also mentioned that "logical indexing" dict[inds], where the values of inds are Bool and the indices of inds match the indices of dict, should probably also work, just like they do for AbstractArray. That would basically be adding Dict(:a=>1, :b=>2, c:=>3)[Dict(:a=>true, :b=>false, :c=>true)] == Dict(:a=>1, :c=>3).

@TotalVerb
Copy link
Contributor

This looks really cool! I haven't had the chance to look through it thoroughly yet, but it's very appealing to me that arrays can be thought of as associative containers with a shape.

@macd
Copy link
Contributor

macd commented Oct 7, 2017

Would it be a goal of AssociativeArray to support say , if v = Vec([3,2,1]), something like v[[1,1,1,1]] (which should be [3,3,3,3])? It doesn't look like that works at present, or is there special syntax for that?

@andyferris
Copy link
Member Author

@macd Yes, but does v[Vec([1, 1, 1, 1])] do what you expect? I didn't put any effort into making Assoc interoperate with Base types like Vector - it's just for demonstration but we can add more to the package if there is more design space to explore.

@macd
Copy link
Contributor

macd commented Oct 7, 2017

Very cool, does work as I expect. I didn't get the interoperability issue, but it makes perfect sense.

@timholy
Copy link
Member

timholy commented Oct 7, 2017

As someone who has worked on this kind of stuff, I found it interesting that I was initially surprised by some of your choices, until you pointed out the guiding principle. I completely agree that out = a[b] implies out[i] = a[b[i]] is the most reliable foundation for these design decisions. 👍

@andyferris
Copy link
Member Author

Thanks @timholy. I should point out that this is 100% motivated by a few of your comments along the lines of "arrays are really just some special kind of associative container" (in regards to motivating/justifying offset arrays) and I've been trying to make sense of that since :)

@andyferris
Copy link
Member Author

andyferris commented Oct 7, 2017

OK, so far I'm seeing positive feedback here, at least on the indexing behavior. I'd like to know people's opinions on my list of suggested breaking changes that IMO would make all this come together consistently. Enumerated:

  1. Have Associative iterate values not index-value pairs. When I first saw Jeff mention this some time ago, I was pretty surprised (my initial reaction was scepticism). However, after thinking about this since then, I pretty much consider this a given now - it will help a lot for generic programming to arbitrary indexables (and we aren't going to make Array iterate pairs, now, are we?).
  2. similar preserves indices and empty creates an empty container (with no indices). This is breaking for similar(::Associative), but consistent with similar(::AbstractArray). I can already see that this will enormously help in creating code/patterns which work across Associative and AbstractArray. It's about clear semantics.
  3. Rename keys to indices and keytype to indextype. While this might be seen as pure bikeshedding, I think it might be more intuitive to work with getindex, setindex! and indices (or else I would be tempted to propose getindex -> getkey, etc).
  4. The crazy part of AssociativeArray.jl: make AbstractArray a subtype of Associative. I still don't know how to feel about this one.

Anyway, some design feedback here would help in creating an implementation.

@mbauman
Copy link
Member

mbauman commented Oct 7, 2017

Very interesting indeed. Still pondering this. One more major breaking point: Allowing non-scalar indexing on Associatives means you'll no longer be able to have arrays or associatives as keys. That is,

julia> d = Dict(:a=>1, :b=>2, :c=>3, [:c,:a]=>4)
Dict{Any,Int64} with 4 entries:
  :a             => 1
  :b             => 2
  :c             => 3
  Symbol[:c, :a] => 4

julia> d[[:c,:a]]
4

@andyferris
Copy link
Member Author

@mbauman Yes this is an interesting case. Note that for cases where the indices are strongly typed (such as Dict{Vector{Symbol}, Int}) we can catch this correctly at dispatch, so not all is lost. What to do exactly when the index type is not concrete or not known is a bit less clear. (I would tend to try and prioritize scalar indexing so this isn't a breaking change at all, but this might introduce a run-time cost in certain situations. However, if the index type is Any, well, things are probably going to be a bit slow anyway).

Also - it would help if I knew of a use case for such index types. I know we've worried about hashing of vectors and unit ranges and so-on, but I haven't come across a use for this yet...

@mschauer
Copy link
Contributor

mschauer commented Oct 8, 2017

In any case we need a nicer way to express that an array should be treated as scalar in an expression where arrays usually are splashed, broadcasted or unboxed in other ways, compare the crutch Scalar from StaticArrays to prevent broadcast. If we figure out that on a high enough level, then also this julep benefits.

@mbauman
Copy link
Member

mbauman commented Oct 8, 2017

Yes, this is also currently a problem for non-scalar indexed assignments for arrays of arrays. That is, A[:] = 1:2 is always interpreted as A[1] = 1; A[2] = 2 and not fill!(A, 1:2). It can really be a pain for generic programming.

I have a very strong distaste for changing semantics based upon eltype changes.

@andyferris
Copy link
Member Author

That does seem like a tricky case... I'm wondering what the solution is. We could always favour scalar indexing unless we can prove it doesn't work, or something...

(It seems to me that the scalar case is the more fundamental kind of indexing (at least of getindex, I'm still torn on that setindex! example). @mschauer I was more wondering if we should force users to use Base.Slice or some-such to opt-in to non-scalar indexing in ambiguous cases, rather than ask them to opt-in to scalar indexing. OTOH - using something like Scalar might be easier to use?).

@vtjnash
Copy link
Member

vtjnash commented Oct 9, 2017

Make Associatives be containers of values, not of index=>value pairs

Then this might no longer be an Associative data-structure. It’s dangerous to generalize the behavior of a concretion, Dict, to the properties of all abstract Associative data structures. An Associative Array is defined as "a collection of (key, value) pairs" (https://en.wikipedia.org/wiki/Associative_array). If you pick that collection to be an unordered Set over the keyspace, then you get a Dict, but if you pick it to be something else, you can still get an Associative subtype. Some other collections that can hold associative pairs include a binary tree (making a SortedDict), an array (making a MultiDict, or the ImmutableDict implementation in Base).

Wikipedia also notes some examples of common Associative data-structures which are not uniquely indexable, including HTML queries and header. They are ordered and can be iterated as pairs, but attempting to index them may return a non-unique result (one name for these is: ordered-multimaps).

@mbauman
Copy link
Member

mbauman commented Oct 9, 2017

It's worth noting that if I were developing non-scalar indexing from scratch right now, I'd totally detangle the scalar and non-scalar behaviors with dot-broadcasting. A[I,J,K] actually behaves very much like the indices are broadcast together, except that J is projected into a higher dimension such that it is always completely orthogonal to I, and K even higher, and so on. This is APL indexing in a nutshell.

Actually, this makes me realize that we could deprecate the non-scalar indexed assignment of many values to many indices to A[idxs] .= [values...]. Then we could reclaim A[idxs] = [value] to place the same array into many locations.

@andyferris
Copy link
Member Author

andyferris commented Oct 9, 2017

@vtjnash. It is interesting to consider all the possible data structures here. I guess in some senses I've mostly thought Associative to mean "indexable". To expand, from Wikipedia:

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.

I'm pretty sure the uniqueness of the keys is pretty important here - we'll need to make some assumptions to make a good indexing interface suitable for generic programming. For my uses, I don't see any problem with using the pairs function as required - it's also useful for e.g. Arrays. The benefits to having for example map(indexable) and map(pairs(indexable)) and so-on work in a sensible (index-preserving) and consistent fashion with the same semantic for AbstractArray and Associative seem great (not only will some generic code work for both, but perhaps more importantly it reduces the cognitive load for the code author or reader). I do appreciate that it may be traditional to consider such associative maps as containers of key=>value pairs but I am questioning whether that is the best default choice for iteration for usability in Julia.

I consider the more "exotic" multi-maps and XML structures as being something a bit different to a Julia Associative (e.g. it seems to me that a MultiDict{K,V} is just as useful as a Dict{K,Vector{V}}, with perhaps a bit of an expanded interface to make life easy).

One interesting thing from the Wikipedia article that I have been thinking about is that there is a difference between updating the value corresponding to an existing index, and adding a new index to the collection. It's extremely convenient that I can add new indices to a Dict with setindex! but when reading code I do sometimes get confused which operation is being performed (and of course there is no run-time safety here either, to constrain it to one operation or the other). Not sure how/if we should deal with this somehow (I'm thinking something like push!(collection, index, value)).

@andyferris
Copy link
Member Author

Actually, this makes me realize that we could deprecate the non-scalar indexed assignment of many values to many indices to A[idxs] .= [values...]

This seems like a reasonable idea to me. (It reminds me, I've of done something like @view(A[idxs]) .= some dot expression before to take advantage of dot-fusion, so it would be nice to get the fusion for free without the @view).

@mbauman
Copy link
Member

mbauman commented Oct 9, 2017

That should work just fine without the @view:

julia> expand(:(A[idxs] .= sqrt.(B)))
:((Base.broadcast!)(sqrt, (Base.dotview)(A, idxs), B))

dotview is a view for non-scalar indexing expressions, regular getindex otherwise.

@andyferris
Copy link
Member Author

Oh, that's nice to know :)

You've now got me wondering if it should be A[i] .= sqrt.(B) for scalar i and A.[inds] .= sqrt.(B) for non-scalar inds (and A[i] vs. A.[i] more generally... I think this is what you were saying above... hugely breaking I know but still interesting... and if we get @ for field referencing from #21912 then something like A@[inds] for view(A, inds)... mind blown. Pity it's a bit of an ASCII scramble though).

@mbauman
Copy link
Member

mbauman commented Oct 9, 2017

The major issue with A.[] is logical indexing and other representations of many indices that aren't just arrays of Ints. We're getting a bit far afield from your original idea here, though. We can keep discussing it over at #22858.

@mschauer
Copy link
Contributor

mschauer commented Oct 9, 2017

Only because out[i] = a[b[i]] or out[I] = [out[i] for i in I] is the design principle in the "Bracket calculus" document I wrote last year, allow me to link it here: https://gist.github.com/mschauer/b04e000e9d0963e40058, see e.g. https://gist.github.com/mschauer/b04e000e9d0963e40058#indexing-of-arrays

@vtjnash
Copy link
Member

vtjnash commented Oct 9, 2017

map(indexable) and map(pairs(indexable)) and so-on work in a sensible (index-preserving) and consistent fashion with the same semantic for AbstractArray and Associative seem great

I think there's two operations here being described as map: I'll call them enumerable-map and associative-map. Usually in Julia, map refers to enumerable-map (e.g. map(identity, (i for i in 1:10)) is perfectly valid, and is not an indexable container). However, we can also define an associative map operation that applies the function just to the value as follows: associative-map(f, a) = map( (k => v) -> (k => f(v)), a ). In other languages, this function seems to often be called mapvalues.

We can extend this function to multiple associative collections by making the definition varargs: associative-map(f, a1, as...) = map( (k => v1) -> (k => f(v1, map(a -> a[k], as)...)), a1 )

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 10, 2017

I like this a lot. The disconnect between associatives and arrays has always bothered me. I like the notion of array as a specialization of associative collections. The most visible cost of changing this would be having to write for (k, v) in pairs(d) instead of for (k, v) in d but I think that's a pretty negligible cost for all the benefits it brings. In particular, I think that if you want to address the question of "how should dicts iterate?" you fundamentally have to think about "how should map work on dicts?" since those two are so deeply connected. The current map has several issues:

  1. It's unclear that map(f, dict) should return another dict at all. What if f doesn't return pairs?
  2. What if the keys returned by f are non-unique? There are several different possible behaviors in that case, none of which is obviously right.
  3. The 0.6 behavior where map(f, dict) passes key-value pairs to f as a pair of arguments is convenient but fundamentally wrong if we consider dicts to be collections of key-value pairs. The 0.7 behavior where map(f, dict) passes key-value pairs as a single pair argument is correct with respect to iteration of key-values pairs but super annoying to use.

If we change the model for associative collections from being a collection of pairs to being an indexed collection of values and accordingly iterate values instead of pairs, that would clarify the behavior of map(f, dict): it transforms values and that's all. It's obvious that it should return a dict because it leaves the indexing structure of the input untouched just as it does for arrays. We could introduce mappairs and mapkeys functions, but I'm not sure that we need to since we can just write Dict(f(k) => g(v) for (k, v) in pairs(d)) with appropriate f and g.

@mbauman
Copy link
Member

mbauman commented Oct 10, 2017

Another question here: should we continue to treat d[1,2] as d[(1,2)]? Given that we've never seen anyone even consider requesting such a monster, I'm not sure it's worth burning time on a multidimensional dictionary that would have a tuple of array-like indices and have values that span the cartesian product thereof.

But we may want to deprecate the implicit tuplization behavior here in order to make the two more similar.

@TotalVerb
Copy link
Contributor

That syntax has always seemed to me like a convenient poor man's sparse matrix.

@JeffBezanson
Copy link
Member

What if f doesn't return pairs?

That's the smoking gun. The usual structure of a map operation is (A->B) -> Collection[A] -> Collection[B]. For Dict, A is a Pair type, but since B can be anything this structure does not match. But if we take Dict{K} as the Collection type, it matches the structure of mapping over the values.

@cstjean
Copy link
Contributor

cstjean commented Oct 11, 2017

Also - it would help if I knew of a use case for such index types. I know we've worried about hashing of vectors and unit ranges and so-on, but I haven't come across a use for this yet...

Memoization is a big one.

Index an Associative{K,V} with an AbstractArray{K,N} to get an AbstractArray{V,N}. E.g. Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1].

Mappings and functions are the same thing, so why not replace your proposed indexing with function application: Dict(:a=>1, :b=>2, :c=>3) ∘ [:c, :a] == [3,1]?

It would be error-prone to write generic code where dict[x] is either a look-up, or APL-style indexing, depending on x's type.

@andyferris
Copy link
Member Author

andyferris commented Oct 21, 2017

I've been thinking about sets and keys. Here's a relatively sane property that I'm considering which may be desirable across arrays and associatives:

a[keys(a)] == a

That is, both the keys and values will be preserved by this indexing operation. (Perhaps this will be a.[keys(a)] == a if @mbauman has his way).

To achieve this:

  1. keys(a) returns an array or associative which is an "identity mapping" (I made up that term). If b is an identity mapping then b[i] == i for all i in keys(b). Base.OneTo is an example of an identity mapping (and I see no reason why a CartesianRange can't be an array supporting getindex). From memory, I think @timholy might have made some custom unit-range objects which have this property for offset arrays.
  2. AbstractSets are identity mappings, and we should in theory be able to have AbstractSet{T} <: Associative{T,T}. Set can follow the same internal implementation as current, but support getindex and so-on. We can still talk about element in set if we have 3.
  3. AbstractSets and Associatives will iterate values, not key-value pairs. We won't need a values function.
  4. We follow the indexing property in the OP (possibly implemented later, as the triage people indicate).

I think if we have 1-3 for v1.0, things may be pretty consistent and tied together nicely and ready for 4 in the future.

The final thing I'd suggest for v1.0 is to allow parsing (not lowering) of a.[i] so that we can experiment with macros in packages on how to implement the OP without doing weird stuff to e.g. indexing of Dict{Any}s.

@vtjnash
Copy link
Member

vtjnash commented Oct 21, 2017

a[keys(a)] == a

This property is true of arrays, but is not true of all associative containers. One counter-example is a multi-dicts, but another more commonly encountered counter-example is the min-heap.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 21, 2017

You keep talking about multi-dicts, but they are not, in my view, a thing. If m is a "multidict" then what does m[k] return? Multiset, sure, that's a thing – it's an unordered collection of values, which can have repeats, in which one can test containment/count in O(1) time – also known as a Dict{K,Int}. If by "multi-dict" you mean something that maps keys to sets of values, then that's a Dict{K,Set{V}}.

@vtjnash
Copy link
Member

vtjnash commented Oct 21, 2017

but they are not, in my view, a thing

You also keep trying to claim this, despite that Julia provides an implementation of one (https://github.com/JuliaLang/julia/blob/master/base/dict.jl#L747), and which also fails to conform to your suggestion that it's just a "Dict{K, Set{V}}"

@JeffBezanson
Copy link
Member

An ImmutableDict is "persistent" (I believe that term is used in Clojure) in that previous mappings aren't destroyed. But I think the fact that this is exposed by iteration is more of a bug.

@andyferris
Copy link
Member Author

another more commonly encountered counter-example is the min-heap

What does indexing mean here? Wouldn't the user-facing interface be that of a priority queue (i.e. deque- or bag-like) rather than as an indexable?

The OP is specifically referring to collections which provide mappings from a unique set of keys to a single value, which is what I thought Associative{K,V} tries to encapsulate. You can of course use that to represent more complex data structures if the values are themselves containers, or you can make your own abstract type which isn't an Associative with an appropriate interface for non-unique keys.

@vtjnash
Copy link
Member

vtjnash commented Oct 29, 2017

But I think the fact that this is exposed by iteration is more of a bug.

It's entirely reasonable to additionally define and create another persistent dict type without needing to first accuse a different data structure of being a bug. It's a feature that we can also implement a ordered multi-dict with the same interface as other associative mappings.

which is what I thought Associative{K,V} tries to encapsulate

By restricting the classification, it'll be a tradeoff. As with any classification system, making the definition more specific ("a unique mapping from a key to a single value") sometimes makes it more useful (you can assume indexing is possible). But it also then excludes other possibilities (the counter-examples I gave), which now will form some new categories. In these cases, these perhaps might be a Queue{ElementT} (or perhaps more generally an Iterable{ElementT}) – which currently is just represented as an interface (push/pop or iterate) over other structures – and a MultiMap{KeyT, ValueT}. We can change the definition of these classes pretty much however we want – there isn’t one right or wrong way to define them – but we should at least acknowledge there are repercussions of the changes: both pros and cons.

The OP is specifically referring to collections which provide mappings from a unique set of keys to a single value

There are alternative ways of writing the OP which don’t make this choice of definition. Here’s one generic implementation option that instead assumes the existence of an appropriate definition for similar and of pair iteration:

function getindex.(iterable, keys::Associative)
    newmap = similar(first(eltype(keys)) => last(eltype(pairs(iterable)), iterable)
    for (newkey, oldkey) in keys
        for (key, value) in pairs(iterable)
            if key == oldkey
                push!(newmap, newkey => value)
            end
       end
    end
    return newmap
end
function getindex.(iterable, keys::AbstractArray)
    newarray = similar(last(eltype(pairs(iterable))), iterable)
    for oldkey in keys
        for (key, value) in pairs(iterable)
            if key == oldkey
                push!(newarray, value)
            end
       end
    end
    return newarray
end

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Nov 16, 2017
@andyferris
Copy link
Member Author

FYI I've started some prototyping at https://github.com/andyferris/Indexing.jl

@andyferris
Copy link
Member Author

andyferris commented Dec 19, 2017

I think the Indexing.jl prototype is working pretty well now. It's not too much code (it probably needs a bit more code to make it really fast) so it doesn't seem too scary to port to Base at some point.

It's probably a good time to think about future syntax, in case depredations are necessary. I would suggest ending up with something like:

a[i]               --> getindex(a, i)       # scalar only
a.[inds]           --> getindices(a, inds)  # or view(a, inds)?
a[i] = v           --> setindex!(a, v, i)   # scalar only
a.[inds] = v       --> setindices!(a, v, inds)
a[i] .= v          --> broadcast!(identity, getindex(a, i), v)
a.[inds] .= values --> broadcast!(identity, view(a, inds), values)

Note the lack of dotview and maybeview. The last two could support dot-fusion on the RHS. Also, the default for a.[inds] could potentially move to view rather than getindices (or we can make it dovetail with lazy broadcasting a la #23692 (comment) nicely).

IIUC we'd need to deprecate dotview and perhaps a semantically-ambiguous setindex! method on arrays (this one noted by @mbauman) in v0.7, if we want the above to appear in v1.x. Thoughts?

@davidavdav
Copy link
Contributor

FYI, the second bullet point has been implemented in NamedArrays:

n = NamedArrays.NamedArray(rand(3), ([:a, :b, :c],))
@show n[[:c, :a]];
n[[:c, :a]] = 2-element Named Array{Float64,1}
A  │ 
───┼─────────
:c0.410735
:a0.253642

(ended up here searching for a way to extend REPL completion for Associative-like structures)

@andyferris
Copy link
Member Author

andyferris commented Feb 5, 2018

the second bullet point has been implemented

Technically, the rules described further in the OP might imply that indexing with a Vector would return a Vector, so you'd need to index with a named array like:

n[NamedArray([:a, :c], ([:a, :c]))

which isn't exactly ergonomic for your purposes!

(The way I think about it is the rules described in the OP is basically equivalent to getindices(a, inds) = broadcast(i -> a[i], inds), so the output shares a lot of similarity with inds - in fact, if broadcast on Associative worked on values not pairs then we could simply make a.[inds] lower to the RHS above and not have a getindices function at all).

@davidavdav
Copy link
Contributor

Technically, the rules described further in the OP might imply that indexing with a Vector would return a Vector

I wasn't really suggesting that NamedArray is a satisfying solution to your OP. In fact I struggled with how to implement n[[:a :c; :b :a]]. Currently, this still gives a NamedArray, but I had to make up the names of the indices. In the 1D-index case, the names could be obtained from the original NamedArray, but with 2D indexing this no longer makes sense.

I suppose that for type stability reasons (I probably am the most type instable programmer ever) NamedArrays (taking the role of the Associative) should implement your OP, and always produce the type of the indexing array with elements of the type of the indexed array.

@andyferris
Copy link
Member Author

I thought it might be relevant to reference Dictionaries.jl which was recently released, and implements a system for dictionary non-scalar indexing in combination with Indexing.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself julep Julia Enhancement Proposal
Projects
None yet
Development

No branches or pull requests