-
Notifications
You must be signed in to change notification settings - Fork 50
Conversation
This is a reimplementation getindex/setindex! based on the implementation in Base. nd indexing now works, as does setting part of a DataArray to be another DataArray that contains NAs. Performance of a[1:end] = a[1:end] is roughly 2x Base. I think this could still be improved by hoisting loads of the fields of the DataArray/PooledDataArray, but I need to see if that's possible without making the code substantially more complicated. Indexing with a DataArray that contains NAs now throws. I removed some tests that were testing indexing operations that now throw a BoundsError. (These indexing operations also throw a BoundsError on Arrays.)
There are some further optimizations I should do here to make use of the pool for |
That's a ream of red 👍 |
This is pretty inspiring work. |
The optimizations mentioned above turned out to be fairly straightforward, but now I have a question. When indexing into a PooledDataArray with an AbstractVector of indices, should the returned PooledDataArray always have the same pool, or should the pool contain only the elements that are actually present in the indexed subset? I suppose this depends in part on how we deal with #73, but for now, which behavior is preferable? |
I've been thinking about this a lot. I suspect we're going to need to have the pool track the expected levels of a factor, rather than the observed levels. So I'd keep the whole pool around. |
That was my thought, too. |
- Extract fields from DataArray and PooledDataArray before looping. This avoids checking repeatedly for undefined references. - Keep the whole pool for getindex for PooledDataArray. - Faster setindex! for a PooledDataArray into another PooledDataArray.
@johnmyleswhite What would you like to do about the documentation here? My feeling is that it's not really necessary as long as indexing a DataArray does the same thing as indexing an Array. But since I had to add back a couple methods that you'd previously documented in order to fix a dispatch issue (the indexing implementation in Base was getting called instead), I can copy/paste the docs from the old implementation if you'd like. |
I think we can ditch the documentation since it's easy enough to rebuild. |
Preserving the |
Any final thoughts before I merge this? |
I'd say go ahead. |
This is a reimplementation
getindex
/setindex!
based on the implementation in Base. nd indexing now works, as does setting part of a DataArray to be another DataArray that contains NAs.Performance of
a[1:end] = a[1:end]
is roughly 2x Base. I think this could still be improved by hoisting loads of the fields of the DataArray/PooledDataArray, but I need to see if that's possible without making the code substantially more complicated.Indexing with a DataArray that contains NAs now throws. I removed some tests that were testing indexing operations that now throw a BoundsError. (These indexing operations also throw a BoundsError on Arrays.) I also removed some comments, many of which referred to methods that are no longer needed. Most forms of
getindex
/setindex!
are now implemented with a single method.This will fix #69, fix #39, and close #47.