Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray.drop_isel / .drop_sel with duplicated initial time stamp - InvalidIndexError #6605

Open
pohleric opened this issue May 13, 2022 · 4 comments

Comments

@pohleric
Copy link

What is your issue?

I have a DataArray ds with 7098 time steps in cftime format. One of these time steps is a duplicate (pos: 7087 & 7088).
I tried to drop the first instance using both ds.drop_isel(time=[7087]) and ds.drop_sel(ds.time[7087]) but get the error:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects.
The method works if I do not have the duplicate in it. Maybe this is the wrong method in the first place but it seemed to be the way to go. I am confused on how to get rid of the duplicate, and, if the method is the correct method, if this might be an issue.

ds
Out[247]: 
<xarray.DataArray 'LST_Night_CMG' (time: 3, y: 2, x: 2)>
array([[[14357., 14357.],
        [14342., 14358.]],
       [[14409., 14409.],
        [14435., 14388.]],
       [[14409., 14409.],
        [14435., 14388.]]])
Coordinates:
  * time         (time) object 2019-09-18 00:00:00 ... 2019-09-19 00:00:00
  * x            (x) float64 67.98 68.03
  * y            (y) float64 40.12 40.07
    spatial_ref  int32 0
Attributes:
    coordinates:   spatial_ref band
    scale_factor:  1.0
    add_offset:    0.0


ds.time
Out[249]: 
<xarray.DataArray 'time' (time: 3)>
array([cftime.DatetimeProlepticGregorian(2019, 9, 18, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2019, 9, 19, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2019, 9, 19, 0, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time         (time) object 2019-09-18 00:00:00 ... 2019-09-19 00:00:00
    spatial_ref  int32 0



ds.drop_isel(time=1)
Traceback (most recent call last):
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-248-0f89b7589e90>", line 1, in <module>
    ds.drop_isel(time=1)
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\dataarray.py", line 2413, in drop_isel
    dataset = dataset.drop_isel(indexers=indexers, **indexers_kwargs)
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\dataset.py", line 4565, in drop_isel
    ds = ds.loc[dimension_index]
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\dataset.py", line 563, in __getitem__
    return self.dataset.sel(key)
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\dataset.py", line 2505, in sel
    self, indexers=indexers, method=method, tolerance=tolerance
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\coordinates.py", line 422, in remap_label_indexers
    obj, v_indexers, method=method, tolerance=tolerance
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\indexing.py", line 120, in remap_label_indexers
    idxr, new_idx = index.query(labels, method=method, tolerance=tolerance)
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\indexes.py", line 240, in query
    indexer = get_indexer_nd(self.index, label, method, tolerance)
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\xarray\core\indexes.py", line 142, in get_indexer_nd
    flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
  File "C:\Users\einfa\anaconda3\envs\climate\lib\site-packages\pandas\core\indexes\base.py", line 3442, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

@pohleric pohleric added the needs triage Issue that has not been reviewed by xarray team member label May 13, 2022
@dcherian
Copy link
Contributor

Can you try drop_duplicates please?

@pohleric
Copy link
Author

This works, thanks.

@dcherian
Copy link
Contributor

dcherian commented May 13, 2022

Thanks this looks like a bug to me.

da = xr.DataArray(np.ones(5), dims="x", coords={"x": np.array([1, 2, 2, 3, 4])})
ds.sel(x=2)  # works
ds.isel(x=2)  # works
da.drop_isel(x=2)  # error

I think the issue is we should not treat unindexed dims the same as an indexed dim.

in Dataset.drop_isel(self, indexers, **indexers_kwargs)
   4565     new_index = index.delete(pos_for_dim)
   4566     dimension_index[dim] = new_index
-> 4567 ds = ds.loc[dimension_index]
   4568 return ds

...
...

pos_indexers, new_indexes = indexing.remap_label_indexers(
    422     obj, v_indexers, method=method, tolerance=tolerance
    423 )

Also ds.sel(x=2) works, so it makes sense ds.drop_sel(x=2) should also work though this will drop both elements with x=2.

@dcherian dcherian added bug topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels May 13, 2022
@max-sixty
Copy link
Collaborator

Great find @dcherian !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants