-
-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .iloc and .ndloc integer indexing methods for Datasets #1435
Conversation
iloc
integer indexing interface for Datasets76db7c4
to
fd350c3
Compare
Still looking through this PR but my initial impression is that it looks good. Here are my first two questions:
|
Could do I suppose, not sure that's a better name though.
They match although I realized I still need to add boolean array indexing support (or at least add tests for it). |
I think naming it the same as the method (but capitalized) makes it instantly clear what the class is for. I agree that We already have a |
fd350c3
to
9efc3d9
Compare
I've now added a section to our |
I've now also added an .ndloc interface for gridded datasets. It works by applying integer indexing to the canonical value dimension orientations. Since our Image ndarray interface is flipped along the y-axis the indexing behavior there is probably unintuitive but it has to be consistent. Here's an example of creating an Image using the Image ndarray, xarray and gridded dictionary interfaces and applying arr = np.random.randn(5, 10)
ds = hv.Dataset({'x': range(10), 'y': range(5), 'z': arr}, kdims=['x', 'y'], vdims=['z'],
datatype=['grid'])
dict_img = hv.Image(ds, label='Gridded Dictionary')
xr_img = img.clone(datatype=['xarray'], label='XArray')
arr_img = hv.Image(arr[::-1], bounds=img.bounds, label='NdArray+Bounds')
(dict_img + dict_img.ndloc[1:3, 0:5] +
xr_img + xr_img.ndloc[1:3, 0:5] +
arr_img + arr_img.ndloc[1:3, 0:5]).cols(2) |
b91554c
to
af4353d
Compare
With my latest commit Image.sample now uses sheet2matrixidx and ndloc to efficiently sample the underlying array in continuous coordinates, this addresses issues with floating point precision on sampling and is considerably more efficient (addressing #1450), all existing sampling unit tests pass. I'll have to add some more direct unit tests and docstrings for ndloc though. |
7729e63
to
922accd
Compare
@jlstevens Ready for review. |
@@ -550,7 +550,7 @@ | |||
"source": [ | |||
"print(rgb_parrot)\n", | |||
"print(rgb_parrot[0,0])\n", | |||
"print(rgb_parrot[0,0][0])" | |||
"print(rgb_parrot[0,0].iloc[0, 0])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was left over from when we returned tuples when indexing RGBs, so I ended up updating it, suppose rgb_parrot[0, 0, 'R']
would have been clearer but we're probably throwing this notebook out right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
holoviews/core/data/__init__.py
Outdated
Allow selection by integer index, slice and list of integer | ||
indices and boolean arrays, e.g.: | ||
|
||
Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit redundant after 'e.g:' (which is sufficient)
""" | ||
Dask does not support iloc, therefore iloc will execute | ||
the call graph and lose the laziness of the operation. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there could be optional performance warnings we could issue when laziness is lost. Not for this PR though.
Other than a very minor comment about the docstrings, this PR looks good: the API is nice and clean and thanks for adding all those unit tests. I don't think people will need integer indexing often but there are certainly times when you need it and having this API will really help. Happy to merge when the docstring is updated. The tests are passing except for one transient in the Python2 pr build. |
Thanks! Travis/conda is having issues at the moment but the tests were passing before the docstring changes so I'll just merge. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Adds a tabular integer indexing interface for all data interfaces, allowing slicing and indexing by row and column indices. In the case of gridded ndarray data this operates on the flattened arrays. The main issue is that dask does not support integer indexing so
iloc
actually has to evaluate the graph and load a whole column at a time into memory, we should probably warn about this.Valid signatures:
And any combination of these.
This also allows for the following optimizations:
decimate
operation implementationTabular
indexing and implement truncating of bokeh table output