Clean up indexing, take 2 #104

simonster · 2014-07-09T02:39:10Z

This is a reimplementation getindex/setindex! based on the implementation in Base. nd indexing now works, as does setting part of a DataArray to be another DataArray that contains NAs.

Performance of a[1:end] = a[1:end] is roughly 2x Base. I think this could still be improved by hoisting loads of the fields of the DataArray/PooledDataArray, but I need to see if that's possible without making the code substantially more complicated.

Indexing with a DataArray that contains NAs now throws. I removed some tests that were testing indexing operations that now throw a BoundsError. (These indexing operations also throw a BoundsError on Arrays.) I also removed some comments, many of which referred to methods that are no longer needed. Most forms of getindex/setindex! are now implemented with a single method.

This will fix #69, fix #39, and close #47.

This is a reimplementation getindex/setindex! based on the implementation in Base. nd indexing now works, as does setting part of a DataArray to be another DataArray that contains NAs. Performance of a[1:end] = a[1:end] is roughly 2x Base. I think this could still be improved by hoisting loads of the fields of the DataArray/PooledDataArray, but I need to see if that's possible without making the code substantially more complicated. Indexing with a DataArray that contains NAs now throws. I removed some tests that were testing indexing operations that now throw a BoundsError. (These indexing operations also throw a BoundsError on Arrays.)

simonster · 2014-07-09T12:34:55Z

There are some further optimizations I should do here to make use of the pool for getindex for PooledDataArrays and setindex! of a PooledDataArray into a PooledDataArray.

garborg · 2014-07-09T15:16:03Z

That's a ream of red 👍

johnmyleswhite · 2014-07-09T16:04:33Z

This is pretty inspiring work.

simonster · 2014-07-09T22:38:44Z

The optimizations mentioned above turned out to be fairly straightforward, but now I have a question. When indexing into a PooledDataArray with an AbstractVector of indices, should the returned PooledDataArray always have the same pool, or should the pool contain only the elements that are actually present in the indexed subset? I suppose this depends in part on how we deal with #73, but for now, which behavior is preferable?

johnmyleswhite · 2014-07-09T22:40:17Z

I've been thinking about this a lot. I suspect we're going to need to have the pool track the expected levels of a factor, rather than the observed levels. So I'd keep the whole pool around.

coveralls · 2014-07-09T22:45:32Z

Coverage decreased (-0.16%) when pulling 96eea8c on sjk/indexing2 into aca6c87 on master.

garborg · 2014-07-09T22:45:42Z

That was my thought, too.

- Extract fields from DataArray and PooledDataArray before looping. This avoids checking repeatedly for undefined references. - Keep the whole pool for getindex for PooledDataArray. - Faster setindex! for a PooledDataArray into another PooledDataArray.

coveralls · 2014-07-09T22:53:30Z

Coverage decreased (-0.45%) when pulling abc0061 on sjk/indexing2 into aca6c87 on master.

simonster · 2014-07-09T23:39:40Z

@johnmyleswhite What would you like to do about the documentation here? My feeling is that it's not really necessary as long as indexing a DataArray does the same thing as indexing an Array. But since I had to add back a couple methods that you'd previously documented in order to fix a dispatch issue (the indexing implementation in Base was getting called instead), I can copy/paste the docs from the old implementation if you'd like.

johnmyleswhite · 2014-07-09T23:41:28Z

I think we can ditch the documentation since it's easy enough to rebuild.

coveralls · 2014-07-09T23:53:48Z

Coverage increased (+0.25%) when pulling e7ebe30 on sjk/indexing2 into aca6c87 on master.

coveralls · 2014-07-10T00:04:39Z

Coverage increased (+0.26%) when pulling f2a403a on sjk/indexing2 into aca6c87 on master.

nalimilan · 2014-07-10T07:21:51Z

Preserving the PooledDataArray pool when indexing is terribly frustrating and IMHO quite useless for the user, but it's required for fast arrays views where you don't want to traverse the whole array to check which levels should be kept. If array views are to become the default in the future, then I agree we should keep the pool as-is.

coveralls · 2014-07-10T17:06:07Z

Coverage increased (+0.22%) when pulling 5b4bb42 on sjk/indexing2 into aca6c87 on master.

simonster · 2014-07-10T23:09:25Z

Any final thoughts before I merge this?

johnmyleswhite · 2014-07-11T02:54:19Z

I'd say go ahead.

Clean up indexing, take 2

simonster mentioned this pull request Jul 9, 2014

WIP: Clean up indexing #62

Closed

Fix indexing bugs and add more comprehensive tests

dc31c09

Reorganize indexing functions, remove inline getindex/setindex! docs

e7ebe30

One more PooledDataArray optimization

f2a403a

Use setindex! to implement append!

5b4bb42

simonster added a commit that referenced this pull request Jul 11, 2014

Merge pull request #104 from JuliaStats/sjk/indexing2

157b34c

Clean up indexing, take 2

simonster merged commit 157b34c into master Jul 11, 2014

simonster deleted the sjk/indexing2 branch July 11, 2014 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up indexing, take 2 #104

Clean up indexing, take 2 #104

simonster commented Jul 9, 2014

simonster commented Jul 9, 2014

garborg commented Jul 9, 2014

johnmyleswhite commented Jul 9, 2014

simonster commented Jul 9, 2014

johnmyleswhite commented Jul 9, 2014

coveralls commented Jul 9, 2014

garborg commented Jul 9, 2014

coveralls commented Jul 9, 2014

simonster commented Jul 9, 2014

johnmyleswhite commented Jul 9, 2014

coveralls commented Jul 9, 2014

coveralls commented Jul 10, 2014

nalimilan commented Jul 10, 2014

coveralls commented Jul 10, 2014

simonster commented Jul 10, 2014

johnmyleswhite commented Jul 11, 2014

Clean up indexing, take 2 #104

Clean up indexing, take 2 #104

Conversation

simonster commented Jul 9, 2014

simonster commented Jul 9, 2014

garborg commented Jul 9, 2014

johnmyleswhite commented Jul 9, 2014

simonster commented Jul 9, 2014

johnmyleswhite commented Jul 9, 2014

coveralls commented Jul 9, 2014

garborg commented Jul 9, 2014

coveralls commented Jul 9, 2014

simonster commented Jul 9, 2014

johnmyleswhite commented Jul 9, 2014

coveralls commented Jul 9, 2014

coveralls commented Jul 10, 2014

nalimilan commented Jul 10, 2014

coveralls commented Jul 10, 2014

simonster commented Jul 10, 2014

johnmyleswhite commented Jul 11, 2014