-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic multiIndex support and stack/unstack methods #702
Conversation
I'd like to this into the next release in something close to its current state. It's not as full featured as I would eventually like (see checklist above), but it's enough to be useful, and I'd like to get v0.7 (with the new name) out next week. |
Big 👍 from me. This seems like a feature with enormous potential.
I have an example notebook for doing svd on a sea-surface-temperature field which should be pretty easy to adapt to these new methods. (Currently I just switch over to numpy for the actual svd.) |
Any opinions, even on the API here? I'd like to merge this this week... |
I think the api is great. Stack / unstack is a nice way to describe the operation of aggregating coordinates. |
couple of comments:
|
@jreback thanks for the comments!
Agreed -- this is part of my "better repr" TODO.
This is true, and definitely worth noting as a compatibility break. But I do think we have a good reason for this: pandas's stack uses
I'm not quite sure what you mean here --
|
hmm, is
ok makes sense. |
We have a dropna in xarray. The problem is that for dask arrays, you need to know the shape of the result. With dropna, you don't know the shape until you've actually done the computation, so it can't be done lazily. |
makes sense about dask.array.dropna though I think you should dropna if at all possible (or have an option at least) it IS a bit suprising to get back the full index finally - think about only supporting sequential stacking as it conceptually makes more sense |
Thanks for taking a look. I'm writing some docs on reshaping today, then will merge this and issue the new release / rename if I have time. On Sat, Jan 16, 2016 at 11:23 PM, Joe Hamman notifications@github.com
|
Basic multiIndex support and stack/unstack methods
The docs say (http://xarray.pydata.org/en/stable/data-structures.html#creating-a-dataarray)
Is that sentence still accurate given this PR? |
No, that should be updated. Thanks for pointing it! |
Fixes #164, #700
Example usage:
TODO (maybe not necessary yet, but eventually):
.loc
and.sel()
stack
/unstack
ds['time']
can pull out the'time'
level of a multi-index)isel_points
/sel_points
return objects with a MultiIndex? (probably after the previous TODO, so we can preserve basic backwards compatibility)set_index
/reset_index
/swaplevel
to make it easier to create and manipulate multi-indexesIt would be nice to eventually build a full example showing how
stack
can be combined with lazy loading / dask to do out-of-core PCA on a large geophysical dataset (e.g., identify El Nino).cc @MaximilianR @jreback @jhamman