Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic multiIndex support and stack/unstack methods #702

Merged
merged 13 commits into from
Jan 18, 2016
44 changes: 31 additions & 13 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ API reference
#############

This page provides an auto-generated summary of xray's API. For more details
and examples, refer to the relevant chapter in the main part of the
and examples, refer to the relevant chapters in the main part of the
documentation.

Top-level functions
Expand Down Expand Up @@ -110,10 +110,7 @@ Computation
Dataset.reduce
Dataset.groupby
Dataset.resample
Dataset.transpose
Dataset.diff
Dataset.shift
Dataset.roll

**Aggregation**:
:py:attr:`~Dataset.all`
Expand Down Expand Up @@ -155,6 +152,18 @@ Computation
:py:attr:`~core.groupby.DatasetGroupBy.fillna`
:py:attr:`~core.groupby.DatasetGroupBy.where`

Reshaping and reorganizing
--------------------------

.. autosummary::
:toctree: generated/

Dataset.transpose
Dataset.stack
Dataset.unstack
Dataset.shift
Dataset.roll

DataArray
=========

Expand Down Expand Up @@ -218,6 +227,16 @@ Indexing
DataArray.reindex
DataArray.reindex_like

Comparisons
-----------

.. autosummary::
:toctree: generated/

DataArray.equals
DataArray.identical
DataArray.broadcast_equals

Computation
-----------

Expand All @@ -227,11 +246,8 @@ Computation
DataArray.reduce
DataArray.groupby
DataArray.resample
DataArray.transpose
DataArray.get_axis_num
DataArray.diff
DataArray.shift
DataArray.roll

**Aggregation**:
:py:attr:`~DataArray.all`
Expand Down Expand Up @@ -273,16 +289,18 @@ Computation
:py:attr:`~core.groupby.DataArrayGroupBy.fillna`
:py:attr:`~core.groupby.DataArrayGroupBy.where`

Comparisons
-----------

Reshaping and reorganizing
--------------------------

.. autosummary::
:toctree: generated/

DataArray.equals
DataArray.identical
DataArray.broadcast_equals

DataArray.transpose
DataArray.stack
DataArray.unstack
DataArray.shift
DataArray.roll

.. _api.ufuncs:

Expand Down
16 changes: 8 additions & 8 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,13 @@ This means, for example, that you always subtract an array from its transpose:

c - c.T

You can explicitly broadcast xray data structures by using the
:py:func:`~xray.broadcast` function:

a2, b2 = xray.broadcast(a, b2)
a2
b2

.. _math automatic alignment:

Automatic alignment
Expand Down Expand Up @@ -223,13 +230,6 @@ Datasets support most of the same methods found on data arrays:
ds.mean(dim='x')
abs(ds)

:py:meth:`~xray.Dataset.transpose` can also be used to reorder dimensions on
all variables:

.. ipython:: python

ds.transpose('y', 'x')

Unfortunately, a limitation of the current version of numpy means that we
cannot override ufuncs for datasets, because datasets cannot be written as
a single array [1]_. :py:meth:`~xray.Dataset.apply` works around this
Expand All @@ -256,5 +256,5 @@ Arithmetic between two datasets matches data variables of the same name:
Similarly to index based alignment, the result has the intersection of all
matching variables, and ``ValueError`` is raised if the result would be empty.

.. [1] When numpy 1.10 is released, we should be able to override ufuncs for
.. [1] When numpy 1.12 is released, we should be able to override ufuncs for
datasets by making use of ``__numpy_ufunc__``.
46 changes: 2 additions & 44 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -436,8 +436,8 @@ dataset variables:

ds.rename({'temperature': 'temp', 'precipitation': 'precip'})

Finally, you can use :py:meth:`~xray.Dataset.swap_dims` to swap dimension and
non-dimension variables:
The related :py:meth:`~xray.Dataset.swap_dims` method allows you do to swap
dimension and non-dimension variables:

.. ipython:: python

Expand Down Expand Up @@ -535,48 +535,6 @@ dimension and whose the values are ``Index`` objects:

ds.indexes

Converting datasets and arrays
------------------------------

To convert from a Dataset to a DataArray, use :py:meth:`~xray.Dataset.to_array`:

.. ipython:: python

arr = ds.to_array()
arr

This method broadcasts all data variables in the dataset against each other,
then concatenates them along a new dimension into a new array while preserving
coordinates.

To convert back from a DataArray to a Dataset, use
:py:meth:`~xray.DataArray.to_dataset`:

.. ipython:: python

arr.to_dataset(dim='variable')

The broadcasting behavior of ``to_array`` means that the resulting array
includes the union of data variable dimensions:

.. ipython:: python

ds2 = xray.Dataset({'a': 0, 'b': ('x', [3, 4, 5])})

# the input dataset has 4 elements
ds2

# the resulting array has 6 elements
ds2.to_array()

Otherwise, the result could not be represented as an orthogonal array.

If you use ``to_dataset`` without supplying the ``dim`` argument, the DataArray will be converted into a Dataset of one variable:

.. ipython:: python

arr.to_dataset(name='combined')


.. [1] Latitude and longitude are 2D arrays because the dataset uses
`projected coordinates`__. ``reference_time`` refers to the reference time
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Documentation
indexing
computation
groupby
reshaping
combining
time-series
pandas
Expand Down
125 changes: 125 additions & 0 deletions doc/reshaping.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
.. _reshape:

###############################
Reshaping and reorganizing data
###############################

These methods allow you to reorganize

.. ipython:: python
:suppress:

import numpy as np
import pandas as pd
import xray
np.random.seed(123456)

Reordering dimensions
---------------------

To reorder dimensions on a :py:class:`~xray.DataArray` or across all variables
on a :py:class:`~xray.Dataset`, use :py:meth:`xray.DataArray.transpose` or the
``.T`` property:

.. ipython:: python

ds = xray.Dataset({'foo': (('x', 'y', 'z'), [[[42]]]), 'bar': (('y', 'z'), [[24]])})
ds.transpose('y', 'z', 'x')
ds.T

Converting between datasets and arrays
--------------------------------------

To convert from a Dataset to a DataArray, use :py:meth:`~xray.Dataset.to_array`:

.. ipython:: python

arr = ds.to_array()
arr

This method broadcasts all data variables in the dataset against each other,
then concatenates them along a new dimension into a new array while preserving
coordinates.

To convert back from a DataArray to a Dataset, use
:py:meth:`~xray.DataArray.to_dataset`:

.. ipython:: python

arr.to_dataset(dim='variable')

The broadcasting behavior of ``to_array`` means that the resulting array
includes the union of data variable dimensions:

.. ipython:: python

ds2 = xray.Dataset({'a': 0, 'b': ('x', [3, 4, 5])})

# the input dataset has 4 elements
ds2

# the resulting array has 6 elements
ds2.to_array()

Otherwise, the result could not be represented as an orthogonal array.

If you use ``to_dataset`` without supplying the ``dim`` argument, the DataArray will be converted into a Dataset of one variable:

.. ipython:: python

arr.to_dataset(name='combined')

.. _reshape.stack:

Stack and unstack
-----------------

As part of xray's nascent support for :py:class:`pandas.MultiIndex`, we have
implemented :py:meth:`~xray.DataArray.stack` and
:py:meth:`~xray.DataArray.unstack` method, for combining or splitting dimensions:

.. ipython:: python

array = xray.DataArray(np.random.randn(2, 3),
coords=[('x', ['a', 'b']), ('y', [0, 1, 2])])
stacked = array.stack(z=('x', 'y'))
stacked
stacked.unstack('z')

These methods are modeled on the :py:class:`pandas.DataFrame` methods of the
same name, although they in xray they always create new dimensions rather than
adding to the existing index or columns.

Like :py:meth:`DataFrame.unstack<pandas.DataFrame.unstack>`, xray's ``unstack`` always succeeds, even
if the multi-index being unstacked does not contain all possible levels. Missing
levels are filled in with ``NaN`` in the resulting object:

.. ipython:: python

stacked2 = stacked[::2]
stacked2
stacked2.unstack('z')

However, xray's ``stack`` has an important difference from pandas: unlike
pandas, it does not automatically drop missing values. Compare:

.. ipython:: python

array = xray.DataArray([[np.nan, 1], [2, 3]], dims=['x', 'y'])
array.stack(z=('x', 'y'))
array.to_pandas().stack()

We departed from pandas's behavior here because predictable shapes for new
array dimensions is necessary for :ref:`dask`.

Shift and roll
--------------

To adjust coordinate labels, you can use the :py:meth:`~xray.Dataset.shift` and
:py:meth:`~xray.Dataset.roll` methods:

.. ipython:: python

array = xray.DataArray([1, 2, 3, 4], dims='x')
array.shift(x=2)
array.roll(x=2)
Loading