Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to rename items inside a coordinate, matching pandas rename() functionality #5048

Closed
Jeitan opened this issue Mar 18, 2021 · 6 comments

Comments

@Jeitan
Copy link

Jeitan commented Mar 18, 2021

I have by necessity been converting a bunch of code that used pandas Panels to xarray, but I seem to have hit the fact that the xarray "renaming" functionality is not, in fact, related at all to the pandas "rename," and there appears to be no functionality for the latter in xarray.

Describe the solution you'd like
I want to be able to rename specific items inside of a coordinate, which for 2D DataArrays is analogous to renaming a column in a pandas DataFrame.

For example, let's say I have the following DataArray in variable da:

da = xr.DataArray(np.random.randn(3, 5),
                  dims=['one', 'two'],
                  coords={'one': [1, 2, 3],
                          'two': ['a', 'b', 'c', 'dd', 'e']})

Gives me

<xarray.DataArray (one: 3, two: 5)>
array([[-0.06079764, -0.54747953,  0.30818265, -1.66491362,  0.73121399],
       [-1.0309981 ,  1.54785819, -1.23288457,  0.30912773,  1.24241736],
       [-0.5355933 ,  0.08441669,  0.70498245,  0.1723775 , -1.06150325]])
Coordinates:
  * one      (one) int32 1 2 3
  * two      (two) <U2 'a' 'b' 'c' 'dd' 'e'

Let's pretend that the 'dd' should be 'd' and is a typo from something outside of my control. If that were a Pandas DataFrame, I could just do df.rename({'dd':'d'}, axis=1). I want to be able to do something similar in xarray, e.g.

da.rename({'two': {'dd': 'd'}})
or
da.rename({'dd': 'd'}, dim='two')
or anything similar, whatever it may be called.

I found this issue #4825 on renaming/resetting/setting, etc. but everything listed there deals with the whole coordinate, not the innards of one. The only thing I've found that allows what I want to do is assign_coords, but that requires me to iterate over the coordinate myself (and in the case of strings, call .item() on it while I'm doing it, but that's a separate annoyance) to make the new coordinates. This is what I have done to make this work:

rdict = {'dd': 'd'}
coordlist = [k.item() for k in da.coords['two']]
newlist = [rdict[k] if k in rdict else k for k in coordlist]
da = da.assign_coords(dict(two=newlist))

It's exceedingly cumbersome. If there is already a way to do this and I'm just missing it, please just let me know and I will be super happy to go off and eat my crow and use the existing method. If not, I think this is actually a really important functionality. It may even be a deal-breaker for me ... I have to rename columns a LOT.

@andersy005
Copy link
Member

@Jeitan, have you looked into where() yet?

In [16]: da['two'] = da.two.where(da.two != 'dd', 'd')

In [17]: da
Out[17]: 
<xarray.DataArray (one: 3, two: 5)>
array([[-0.01774725,  0.42453644, -0.80758257,  0.18659729, -1.14712204],
       [ 2.06813654,  1.32605605, -1.49284031,  0.5929485 , -2.29315181],
       [ 0.30243348,  0.05863335,  2.09869485,  1.79054292, -0.99844954]])
Coordinates:
  * one      (one) int64 1 2 3
  * two      (two) <U2 'a' 'b' 'c' 'd' 'e'

@Jeitan
Copy link
Author

Jeitan commented Mar 18, 2021

I have not ... that does work! Do you have a nice way to do that for a whole set? I only used one here, but I'm typically renaming something like 5-8 at a time, so "rdict" would be much bigger.

@max-sixty
Copy link
Collaborator

Agree with @andersy005 's suggestion (and it takes a lambda fwiw)

Though I also find myself reaching for a method like pandas' rename occasionally.

Aside from the rename name, which I think is not good, what do others think about a method that takes a mapping and replaces values?

@Jeitan
Copy link
Author

Jeitan commented Mar 22, 2021

@andersy005 's method definitely works, and I can iterate to do it for multiple names, so I've already gone away happy. A dedicated method would be nice though :).

"rename" is understandably overloaded ... would something like "update_coords" be workable? It sort of matches the existing "assign_coords".

@shoyer
Copy link
Member

shoyer commented Mar 22, 2021

NumPy has "select" and "choose" for this, but I don't really love either of those APIs, either.

Maybe replace_in_data or update_data would be appropriate names for a DataArray method that uses a dictionary as an argument? Then updating a coordinate would be solved with something like:
e.g.,

da['two'] = da['two'].replace_in_data({'dd': 'd'})

@dcherian
Copy link
Contributor

Closing in favour of #6377

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants