Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NegatedIndex type #1032

Closed
StefanKarpinski opened this issue Jul 10, 2012 · 31 comments
Closed

NegatedIndex type #1032

StefanKarpinski opened this issue Jul 10, 2012 · 31 comments
Labels
status:help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@StefanKarpinski
Copy link
Sponsor Member

Idea spawned by this discussion. The idea is that just as writing v[b], v[!b] where b is a logical indexing vector partitions v into two disjoint sets of elements, writing v[x], x[!x] should also partition v into two disjoint sets of elements when x is a single index or a range of indices. This requires some new NegatedIndex type, together with ! methods for integers and ranges and the appropriate method for indexing arrays with NegatedIndex objects.

@JeffBezanson
Copy link
Sponsor Member

What does it do on integer vectors?

@StefanKarpinski
Copy link
Sponsor Member Author

I'm basically thinking that the structure looks like this:

type NegatedIndex{T}
    idx::T
end

So you just keep the original object until its time to do the actual indexing, at which point you know what you're "subtracting" the index object from, which lets you compute the actual index set and then do regular indexing with that. So ![2,5] would produce NegatedIndex([2,4]). In v[![2,5]], the indexing operations would compute [1:length(v)]-[2,5] as [1,3,5,...,length(v)] (supposing length(v) ≥ 5) and return [v[1],v[3],v[5],...,v[end]]. Make sense?

@ViralBShah
Copy link
Member

We do not define ! on non-booleans. That makes it possible for us to give it a different meaning, at least for Index types. But, ~ in matlab does have a defined behaviour, and that could lead to confusion.

That said, I do like the idea, and here are some thoughts.

ComplementIndex is an alternative name. We could alternatively allow \ to be used as a unary operator, to mirror the set complement operator.

A complement operator may even be useful outside of indexing, like in loop iterations and such.

@ViralBShah
Copy link
Member

Looking at the names in http://en.wikipedia.org/wiki/Complement_(set_theory), we could even just have a functional interface, using except, which returns the new Index type.

@StefanKarpinski
Copy link
Sponsor Member Author

I really think that the ! operator makes sense in this case, although I would be ok with writing except(k) as well. Actually, calling the type Except would make a fair amount of sense.

@ViralBShah
Copy link
Member

ExceptIndex would be clearer for the type name.

-viral

On 12-Jul-2012, at 11:40 PM, Stefan Karpinski wrote:

I really think that the ! operator makes sense in this case, although I would be ok with writing except(k) as well. Actually, calling the type Except would make a fair amount of sense.


Reply to this email directly or view it on GitHub:
#1032 (comment)

@toivoh
Copy link
Contributor

toivoh commented Jul 31, 2012

I think it seems a bit fishy to define v[!integer_array]:
would v[![1,2]] == v[![2,1]] == v[![1,2,1]] == v[3] == 3 for v=[1,2,3]?
The trouble seems to be to negate a sequence (array) rather than a set.
Perhaps better to allow v[int_set] and v[!int_set] for subsetting, (and keep v[!int_index])?

@StefanKarpinski
Copy link
Sponsor Member Author

It is a little weird, but I was thinking that you resolve, e.g. NegatedIndex([2,1]) when indexed into an indexing slot with range 1:n by iterating 1:n and then "emitting" the index value if and only if !contains([2,1],i). Does that make sense?

@toivoh
Copy link
Contributor

toivoh commented Aug 7, 2012

I'm a bit worried about surprises that might come out of this. E g you might expect a[!inds] to always give you a subsequence, but what if someone passed an inds::NegatedIndex argument into your function?

The way interpret it, !index is a set complement operation, with sets represented as strictly increasing sequences.
Perhaps one could refuse to create/use a NegatedIndex from a sequence that is not strictly increasing?
Then common cases like a[!2:5] would work, ! would be its own inverse, and

x, y = a[inds], a[!inds]

would always partition the elements of a.

@johnmyleswhite
Copy link
Member

This seems to have fizzled out, but I still really, really want this functionality to be added. It's one of the most convenient tricks in R for describing data resampling. See the jackknife for the simplest example of a statistical method in which this notation helps.

I just discovered that one of the other DataFrame developers hacked together an implementation of this style of indexing for the columns DataFrames simply because it was so unnatural to articulate one algorithm without it.

@punkrockpolly
Copy link

If I'm understanding this correctly, the negated index [!x] would strip the member whose position has the same value as the index [x], creating a vector slice with the member removed.

Can someone please explain the following:

"For example, the following creates a vector slice with the third member removed.![2,5] would produce NegatedIndex([2,4]) ...
and return [v[1],v[3],v[5],...,v[end]]."

Why wouldn't ![2,5] produce NegatedIndex([2,5]), and v[5] be removed instead of v[4]?? Is there something I'm misunderstanding?

@StefanKarpinski
Copy link
Sponsor Member Author

Yes, that's absolutely right. I just made a bunch of mistakes jumbled together in that post :-)

@punkrockpolly
Copy link

I'm looking into this.
Quick question: how do i define the ! operator to reference either the original array or it's size to figure out which indices are not idx in a[!idx]?

@timholy
Copy link
Sponsor Member

timholy commented Nov 8, 2013

You might not have to; define ! for the type of indexes you want to support (e.g., integers, ranges, and Vector{Int}) to produce a new type like NegatedIndex (or ComplementIndex). Then define the function getindex(a, idx::NegatedIndex).

@timholy
Copy link
Sponsor Member

timholy commented Nov 8, 2013

Of course, you'll also need to look at setindex! and consider how to handle multidimensional arrays. You may find that you need to add more than a couple of function definitions :-).

@JeffBezanson
Copy link
Sponsor Member

We might have to change the indexing code to explicitly combine dimension sizes with indexes to allow implementing things like this (and Colon). For example instead of

for i in idx
  ...

we'd write

for i in fullindex(idx, size(A,n))
  ...

@StefanKarpinski
Copy link
Sponsor Member Author

Jeff, can you elaborate on that? I'm failing to see the need for it – not that there isn't one, I'm just not seeing why.

@JeffBezanson
Copy link
Sponsor Member

Because there is no way to implement iteration of something like NegatedIndex([2]) without knowing the size of the indexed dimension.

@StefanKarpinski
Copy link
Sponsor Member Author

Yes, obviously you need a context for that. The part I needed clarification of was "change the indexing code" – what indexing code are you referring to?

punkrockpolly added a commit to punkrockpolly/Playing-with-Julia that referenced this issue Nov 10, 2013
Mostly works on the following index types: (Int, Range1{Int}, Range{Int})
Does not yet work for index of Array{Int}
@StefanKarpinski
Copy link
Sponsor Member Author

After looking at some code with @punkrockpolly, I'm wondering if the best approach here isn't to have a Complement{T} type that primarily provides an in method:

immutable Complement{T}
  collection::T
end

in(x,c::Complement) = !in(x,c.collection)

Then you can also do things like this:

getindex(v::AbstractVector, c::Complement) = v[filter!(i->in(i,c), 1:end)]

Of course, that doesn't cover everything, but I think the limited scope of the Complement abstraction clarifies things a bit.

StefanKarpinski added a commit that referenced this issue Nov 22, 2013
Paired on this with @punkrockpolly, see:

https://github.com/punkrockpolly/Playing-with-Julia/blob/master/negatedindex.jl

The premise of the Complement type is that it abstracts the
complement of a collection – primarily  that

	`x in c` <=> `!(x in c.collection)`

This commit punts on indexing for more than two dimensions because
that turns out to be an incredibly invasive change to huge amounts
the multidimensional array indexing code.
@JeffBezanson JeffBezanson mentioned this issue Aug 10, 2014
15 tasks
@ViralBShah
Copy link
Member

I am closing this, since this is something that can easily live in a package. Please reopen if you think otherwise.

@nalimilan
Copy link
Member

Well, if it was to be implemented, it would make much more sense in Base than in a package, since it would affect all indexing.

Would the recent changes in array indexing make this easier to implement efficiently for any number of dimensions? @timholy @mbauman?

@timholy
Copy link
Sponsor Member

timholy commented Apr 16, 2015

I'd suggest putting this into SubArray as a valid index type---eventually A[negated(1:15,3), :] would return a SubArray anyway.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 16, 2015

Well, #10525 would allow us to define this once and transform the complement to direct indices (or potentially even doing so without any intermediates).

One thing I realized the other day is that we almost have a negated index type already in base. It's the complement IntSet. If IntSets were not in the range 0:(typemax(Int)-1), but instead 1:typemax(Int), it'd actually simplify quite a bit of their implementation. And it'd make IntSet much more useful for indexing… especially if we start using BitArrays as their bit-storage.

@nalimilan
Copy link
Member

Cool. But AFAIK the order of elements is not defined for IntSet, while for indexing it really matters.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 16, 2015

It's documented as being sorted, which seems sensible.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 16, 2015

Ah, and yes, my changes in #10331 for indexing with Colon fix Jeff's concerns above.

@tpapp
Copy link
Contributor

tpapp commented Apr 5, 2016

Would #13157 allow "negated" indexes for array-like constructs with named indexes? (which map keys to indexes, like NamedArray, which already has its Not construct)

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 5, 2016

Not directly, no. Both SubArray and fallback non-scalar indexing have converged to simply re-index into the stored or passed indices. The difficulty with using IntSets and complement types as indices is that they do not directly support fast indexed access. I don't want more special cases, so that means doing what we do with logical vectors: transform them to normal index vectors after checking their bounds.

Perhaps the only action item for Base is to change the Base.to_index API to accept the array and index dimension so there's enough context to transform a negated index. That would allow this to live in packages and work uniformly. It still requires some funny business, though, since a NegatedIndex must be an AbstractArray to dispatch properly, but it'd really only implement checkbounds and to_index.

@StefanKarpinski
Copy link
Sponsor Member Author

See https://github.com/mbauman/InvertedIndices.jl for an implementation of this idea as a package.

@nalimilan
Copy link
Member

Is there any chance this would be included in Base? People keep asking about it:
https://stackoverflow.com/questions/52378061/julias-negative-complement-indexing-like-r
https://stackoverflow.com/questions/42382210/array-range-complement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

10 participants