Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Clean up indexing, take 2 #104

Merged
merged 6 commits into from
Jul 11, 2014
Merged

Clean up indexing, take 2 #104

merged 6 commits into from
Jul 11, 2014

Conversation

simonster
Copy link
Member

This is a reimplementation getindex/setindex! based on the implementation in Base. nd indexing now works, as does setting part of a DataArray to be another DataArray that contains NAs.

Performance of a[1:end] = a[1:end] is roughly 2x Base. I think this could still be improved by hoisting loads of the fields of the DataArray/PooledDataArray, but I need to see if that's possible without making the code substantially more complicated.

Indexing with a DataArray that contains NAs now throws. I removed some tests that were testing indexing operations that now throw a BoundsError. (These indexing operations also throw a BoundsError on Arrays.) I also removed some comments, many of which referred to methods that are no longer needed. Most forms of getindex/setindex! are now implemented with a single method.

This will fix #69, fix #39, and close #47.

This is a reimplementation getindex/setindex! based on the
implementation in Base. nd indexing now works, as does setting part
of a DataArray to be another DataArray that contains NAs.

Performance of a[1:end] = a[1:end] is roughly 2x Base. I think this
could still be improved by hoisting loads of the fields of the
DataArray/PooledDataArray, but I need to see if that's possible without
making the code substantially more complicated.

Indexing with a DataArray that contains NAs now throws. I removed
some tests that were testing indexing operations that now throw a
BoundsError. (These indexing operations also throw a BoundsError on
Arrays.)
@simonster simonster mentioned this pull request Jul 9, 2014
@simonster
Copy link
Member Author

There are some further optimizations I should do here to make use of the pool for getindex for PooledDataArrays and setindex! of a PooledDataArray into a PooledDataArray.

@garborg
Copy link
Member

garborg commented Jul 9, 2014

That's a ream of red 👍

@johnmyleswhite
Copy link
Member

This is pretty inspiring work.

@simonster
Copy link
Member Author

The optimizations mentioned above turned out to be fairly straightforward, but now I have a question. When indexing into a PooledDataArray with an AbstractVector of indices, should the returned PooledDataArray always have the same pool, or should the pool contain only the elements that are actually present in the indexed subset? I suppose this depends in part on how we deal with #73, but for now, which behavior is preferable?

@johnmyleswhite
Copy link
Member

I've been thinking about this a lot. I suspect we're going to need to have the pool track the expected levels of a factor, rather than the observed levels. So I'd keep the whole pool around.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.16%) when pulling 96eea8c on sjk/indexing2 into aca6c87 on master.

@garborg
Copy link
Member

garborg commented Jul 9, 2014

That was my thought, too.

- Extract fields from DataArray and PooledDataArray before looping.
  This avoids checking repeatedly for undefined references.
- Keep the whole pool for getindex for PooledDataArray.
- Faster setindex! for a PooledDataArray into another PooledDataArray.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.45%) when pulling abc0061 on sjk/indexing2 into aca6c87 on master.

@simonster
Copy link
Member Author

@johnmyleswhite What would you like to do about the documentation here? My feeling is that it's not really necessary as long as indexing a DataArray does the same thing as indexing an Array. But since I had to add back a couple methods that you'd previously documented in order to fix a dispatch issue (the indexing implementation in Base was getting called instead), I can copy/paste the docs from the old implementation if you'd like.

@johnmyleswhite
Copy link
Member

I think we can ditch the documentation since it's easy enough to rebuild.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.25%) when pulling e7ebe30 on sjk/indexing2 into aca6c87 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.26%) when pulling f2a403a on sjk/indexing2 into aca6c87 on master.

@nalimilan
Copy link
Member

Preserving the PooledDataArray pool when indexing is terribly frustrating and IMHO quite useless for the user, but it's required for fast arrays views where you don't want to traverse the whole array to check which levels should be kept. If array views are to become the default in the future, then I agree we should keep the pool as-is.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.22%) when pulling 5b4bb42 on sjk/indexing2 into aca6c87 on master.

@simonster
Copy link
Member Author

Any final thoughts before I merge this?

@johnmyleswhite
Copy link
Member

I'd say go ahead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement logical indexing with nd arrays Don't allow NA inside indices
5 participants