-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized lazy indexing #1899
Vectorized lazy indexing #1899
Changes from 9 commits
dceb298
c1b4b60
d989a15
218763c
541fca3
3e05a16
030a2c4
b9f97b4
850f29c
943ec78
91aae64
991c1da
936954a
9144965
d1cb976
95e1f1c
180c4f5
dbbe531
c2e61ad
b545c3e
872de73
bb5d1f6
cfe29bb
17a7dac
2dff278
ead6327
a90ac05
73f4958
fd04966
b3c3d80
259f36c
0e7eb2e
0c2e31b
d8421a5
7e0959c
4fccdee
8e96710
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,18 +24,22 @@ def dtype(self): | |
return self.array.dtype | ||
|
||
def __getitem__(self, key): | ||
key = indexing.unwrap_explicit_indexer( | ||
key, target=self, allow=indexing.BasicIndexer) | ||
key, np_inds = indexing.decompose_indexer(key, self.shape, | ||
mode='basic') | ||
|
||
# pull the data from the array attribute if possible, to avoid | ||
# downloading coordinate data twice | ||
array = getattr(self.array, 'array', self.array) | ||
result = robust_getitem(array, key, catch=ValueError) | ||
result = robust_getitem(array, key.tuple, catch=ValueError) | ||
# pydap doesn't squeeze axes automatically like numpy | ||
axis = tuple(n for n, k in enumerate(key) | ||
axis = tuple(n for n, k in enumerate(key.tuple) | ||
if isinstance(k, integer_types)) | ||
if len(axis) > 0: | ||
result = np.squeeze(result, axis) | ||
|
||
for ind in np_inds: | ||
result = indexing.NumpyIndexingAdapter(result)[ind] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. F821 undefined name 'ind' |
||
|
||
return result | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,14 +27,18 @@ def get_array(self): | |
return self.datastore.ds.variables[self.variable_name] | ||
|
||
def __getitem__(self, key): | ||
key = indexing.unwrap_explicit_indexer( | ||
key, target=self, allow=indexing.BasicIndexer) | ||
key, np_inds = indexing.decompose_indexer(key, self.shape, | ||
mode='outer') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PyNIO actually has a very robust indexing interface. We might be over-simplifying here, and this could be leading to bugs. I am not exactly sure we even need to do all of this indirection. PyNIO allows for the creation of a "selection object" that can be passed to the pynio array objects. Perhaps we should be creating that object instead of forcing the pynio array object into a numby index adapter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how selection objects would really help us here. I guess they can used to do a form of outer indexing, but putting integer indices into a string seems very inefficient. I agree that we should save it for later. Are we actually sure that PyNIO even supports "outer" indexing? I had it marked down as only supporting basic indexing before. "Outer" indexing would include indexing with 1D arrays like |
||
|
||
with self.datastore.ensure_open(autoclose=True): | ||
array = self.get_array() | ||
if key == () and self.ndim == 0: | ||
return array.get_value() | ||
return array[key] | ||
|
||
for ind in np_inds: | ||
array = indexing.NumpyIndexingAdapter(array)[ind] | ||
|
||
return array | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah-hah! I see the bug! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, this should probably be something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the debugging this!! |
||
|
||
|
||
class NioDataStore(AbstractDataStore, DataStorePickleMixin): | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -41,8 +41,10 @@ def shape(self): | |||||||||||||||
return self._shape | ||||||||||||||||
|
||||||||||||||||
def __getitem__(self, key): | ||||||||||||||||
key = indexing.unwrap_explicit_indexer( | ||||||||||||||||
key, self, allow=(indexing.BasicIndexer, indexing.OuterIndexer)) | ||||||||||||||||
# TODO support indexing.decompose_indexer | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I need someone's help to update indexing scheme for rasterio backend. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rasterio is strictly a slicing approach for fetching data, or pulling out individual data points. In this case, you want to avoid converting the slices into numpy arrays of indexes. Maybe it would make sense to have a function that could re-compose numpy arrays of indexes into groups of slices? So, if we were indexing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WeatherGod Thanks for the review.
Agreed, it would be an improvement. But I am wondering the combination will explode if the dimension increases. Some heuristics would be also necessary how many chunks we used. EDIT: the chunking would also improve the efficiency if we want to get a full diagonal from a large 2d-array. In my current implementation, we need to load a full slice, but it could be reduced by the factor 2 if we use 2x2 chunks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we certainly have time to think about it. This feature doesn't seem to change anything for indexing that worked before, so I don't think we have to worry about regressions. Improvements to indexing can be thought of holistically as a separate task. For our immediate needs though, for rasterio, we probably should just create a failback indexer that would just load up that entire dimension and index it as given if the original indexer is anything but scalar integers or slices. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fmaussion can you help me for this?
These two indexers (for backend and np.ndarray) can be computed by key, np_inds = indexing.decompose_indexer(
key, self.shape, indexing.IndexingSupport.OUTER) I am not familiar with rasterio data structure (I do not yet understand what is going on below...), There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rasterio (as wrapped by xarray) supports a strange form of indexing:
I think the simplest fix would be to extend our rasterio wrapper to support arbitrary outer indexing. This would entail changing the final clause below, where rasterio currently raises an error: xarray/xarray/backends/rasterio_.py Lines 72 to 78 in e0621c7
Instead, we would want to create a slice object based on the minimum/maximum elements, and then add another indexing to |
||||||||||||||||
if isinstance(key, indexing.VectorizedIndexer): | ||||||||||||||||
raise NotImplementedError | ||||||||||||||||
key = key.tuple | ||||||||||||||||
|
||||||||||||||||
# bands cannot be windowed but they can be listed | ||||||||||||||||
band_key = key[0] | ||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind doing
asarray
here, but not for other backends?