Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-index indexing #802

Merged
merged 24 commits into from
Jul 19, 2016
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,60 @@ elements that are fully masked:

arr2.where(arr2.y < 2, drop=True)

.. _multi-level indexing:

Multi-level indexing
--------------------

The ``loc`` and ``sel`` methods of ``Dataset`` and ``DataArray`` both accept
dictionaries for label-based indexing on multi-index dimensions:

.. ipython:: python

idx = pd.MultiIndex.from_product([list('abc'), [0, 1]],
names=('one', 'two'))
da_midx = xr.DataArray(np.random.rand(6, 3),
[('x', idx), ('y', range(3))])
da_midx
da_midx.sel(x={'one': 'a', 'two': 0})
da_midx.loc[{'one': 'a'}, ...]

As shown in the last example above, xarray handles partial selection on
pandas multi-index ; it automatically renames the dimension and replaces the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would use a new sentence instead of the semicolon.

coordinate when a single index is returned (level drop).

Like pandas, it is also possible to slice a multi-indexed dimension by providing
a tuple of multiple indexers (i.e., slices, labels, list of labels, or any
selector allowed by pandas). Note that for now xarray doesn't fully handle
partial selection in that case (no level drop is done):

.. ipython:: python

da_midx.sel(x=(list('ab'), [0]))

Lists or slices of tuples can be used to select several combinations of
multi-index labels:

.. ipython:: python

da_midx.sel(x=[('a', 0), ('b', 1)])

A single, flat tuple can be used to select a given combination of
multi-index labels:

.. ipython:: python

da_midx.sel(x=('a', 0))

Unlike pandas, xarray can't make the distinction between index levels and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "can't make the distinction", let's say "does not guess".

dimensions when using ``loc`` in some ambiguous cases. For example, for
``da_midx.loc[{'one': 'a', 'two': 0}]`` and ``da_midx.loc['a', 0]`` xarray
always interprets ('one', 'two') and ('a', 0) as the names and
labels of the 1st and 2nd dimension, respectively. You must specify all
dimensions or use the ellipsis in the ``loc`` specifier, e.g. in the example
above, ``da_midx.loc[{'one': 'a', 'two': 0}, :]`` or
``da_midx.loc[('a', 0), ...]``.

Multi-dimensional indexing
--------------------------

Expand Down
6 changes: 5 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,14 @@ Enhancements
attributes are retained in the resampled object. By
`Jeremy McGibbon <https://github.com/mcgibbon>`_.

- DataArray and Dataset methods :py:meth:`sel` and :py:meth:`loc` now
accept dictionaries or nested tuples for indexing on multi-index dimensions.
By `Benoit Bovy <https://github.com/benbovy>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please add a note about the changed behavior (we now drop levels, which is consistent with pandas) in the "Breaking changes" section above?

Also, add a reference here to the documentation section you added:

:ref:`multi-level indexing`


- New (experimental) decorators :py:func:`~xarray.register_dataset_accessor` and
:py:func:`~xarray.register_dataarray_accessor` for registering custom xarray
extensions without subclassing. They are described in the new documentation
page on :ref:`internals`. By `Stephan Hoyer <https://github.com/shoyer>`
page on :ref:`internals`. By `Stephan Hoyer <https://github.com/shoyer>`_.

- Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF
formats would raise an error since netCDF does not have a `bool` datatype.
Expand Down
30 changes: 15 additions & 15 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,24 +86,22 @@ def __init__(self, data_array):
self.data_array = data_array

def _remap_key(self, key):
def lookup_positions(dim, labels):
index = self.data_array.indexes[dim]
return indexing.convert_label_indexer(index, labels)

if utils.is_dict_like(key):
return dict((dim, lookup_positions(dim, labels))
for dim, labels in iteritems(key))
else:
if not utils.is_dict_like(key):
# expand the indexer so we can handle Ellipsis
key = indexing.expanded_indexer(key, self.data_array.ndim)
return tuple(lookup_positions(dim, labels) for dim, labels
in zip(self.data_array.dims, key))
labels = indexing.expanded_indexer(key, self.data_array.ndim)
key = dict(zip(self.data_array.dims, labels))
return indexing.remap_label_indexers(self.data_array, key)

def __getitem__(self, key):
return self.data_array[self._remap_key(key)]
pos_indexers, new_indexes = self._remap_key(key)
ds = self.data_array[pos_indexers]._to_temp_dataset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid creating the temporary dataset here? _to_temp_dataset()/_from_temp_dataset is a little expensive (and also a bit of a hack).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree this is not very nice. I did this to avoid duplicating the _replace_indexes method, but I agree that it may be actually the least bad option.

return self.data_array._from_temp_dataset(
ds._replace_indexes(new_indexes)
)

def __setitem__(self, key, value):
self.data_array[self._remap_key(key)] = value
pos_indexers, new_indexes = self._remap_key(key)
self.data_array[pos_indexers] = value


class _ThisArray(object):
Expand Down Expand Up @@ -599,8 +597,10 @@ def sel(self, method=None, tolerance=None, **indexers):
Dataset.sel
DataArray.isel
"""
return self.isel(**indexing.remap_label_indexers(
self, indexers, method=method, tolerance=tolerance))
ds = self._to_temp_dataset().sel(
method=method, tolerance=tolerance, **indexers
)
return self._from_temp_dataset(ds)

def isel_points(self, dim='points', **indexers):
"""Return a new DataArray whose dataset is given by pointwise integer
Expand Down
27 changes: 22 additions & 5 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,18 @@ def _replace_vars_and_dims(self, variables, coord_names=None,
obj = self._construct_direct(variables, coord_names, dims, attrs)
return obj

def _replace_indexes(self, indexes):
variables = OrderedDict()
for k, v in iteritems(self._variables):
if k in indexes.keys():
idx = indexes[k]
variables[k] = Coordinate(idx.name, idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If idx.name != k above, then this could be constructing an invalid dataset.

I think we should create Coordinate(k, idx) and then remap back to the original names below, if necessary.

else:
variables[k] = v
obj = self._replace_vars_and_dims(variables)
dim_names = {dim: idx.name for dim, idx in iteritems(indexes)}
return obj.rename(dim_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the rename only done if necessary? I think this can be kind of expensive.

Putting things together:

variables = self._variables.copy()
for name, idx in indexes.items():
   variables[name] = Coordinate(name, idx)
obj = self._replace_vars_and_dims(variables)

# switch from dimension to level names, if necessary
dim_names = {}
for dim, idx in indexes.items():
    if idx.name != dim:
        dim_names[dim] = idx.name
if dim_names:
    obj = obj.rename(dim_name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems much nicer!

What about

if not len(indexes):
    return self

at the top of the function, given that in many use cases indexes will be empty? (I don't know if self._replace_vars_and_dims is expensive)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's even better!

self._replace_vars_and_dims is pretty cheap (it skips all validation) but not free.


def copy(self, deep=False):
"""Returns a copy of this dataset.

Expand Down Expand Up @@ -954,7 +966,9 @@ def sel(self, method=None, tolerance=None, **indexers):
Requires pandas>=0.17.
**indexers : {dim: indexer, ...}
Keyword arguments with names matching dimensions and values given
by scalars, slices or arrays of tick labels.
by scalars, slices or arrays of tick labels. For dimensions with
multi-index, the indexer may also be a dict-like object with keys
matching index level names.

Returns
-------
Expand All @@ -972,8 +986,10 @@ def sel(self, method=None, tolerance=None, **indexers):
Dataset.isel_points
DataArray.sel
"""
return self.isel(**indexing.remap_label_indexers(
self, indexers, method=method, tolerance=tolerance))
pos_indexers, new_indexes = indexing.remap_label_indexers(
self, indexers, method=method, tolerance=tolerance
)
return self.isel(**pos_indexers)._replace_indexes(new_indexes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle the case where new_indexes is None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, that can't happen.


def isel_points(self, dim='points', **indexers):
"""Returns a new dataset with each array indexed pointwise along the
Expand Down Expand Up @@ -1114,8 +1130,9 @@ def sel_points(self, dim='points', method=None, tolerance=None,
Dataset.isel_points
DataArray.sel_points
"""
pos_indexers = indexing.remap_label_indexers(
self, indexers, method=method, tolerance=tolerance)
pos_indexers, new_indexes = indexing.remap_label_indexers(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we ignore new_indexes here (which is fine, certainly for now), use the variable name _ instead to indicate that it's unused.

self, indexers, method=method, tolerance=tolerance
)
return self.isel_points(dim=dim, **pos_indexers)

def reindex_like(self, other, method=None, tolerance=None, copy=True):
Expand Down
61 changes: 54 additions & 7 deletions xarray/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from . import utils
from .pycompat import iteritems, range, dask_array_type, suppress
from .utils import is_full_slice
from .utils import is_full_slice, is_dict_like


def expanded_indexer(key, ndim):
Expand Down Expand Up @@ -135,11 +135,27 @@ def _asarray_tuplesafe(values):
return result


def _is_nested_tuple(tup, index):
"""Check for a compatible nested tuple and multiindex (taken from
pandas.core.indexing.is_nested_tuple).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to wrap my head around exactly what this check does :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm ! I've just stolen this from pandas without much modification :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a weird function the way it's currently written. Why not make this:

def _is_nested_tuple(possible_tuple):
    return (isinstance(possible_tuple, tuple)
            and any(isinstance(value, (tuple, list, slice)
                    for value in possible_tuple))

The isinstance(index, pd.MultiIndex) should be outside the function, since it's totally unrelated.

"""
if not isinstance(tup, tuple):
return False

# are we nested tuple of: tuple,list,slice
for i, k in enumerate(tup):
if isinstance(k, (tuple, list, slice)):
return isinstance(index, pd.MultiIndex)

return False


def convert_label_indexer(index, label, index_name='', method=None,
tolerance=None):
"""Given a pandas.Index and labels (e.g., from __getitem__) for one
dimension, return an indexer suitable for indexing an ndarray along that
dimension
dimension. If label is a dict-like object and a pandas.MultiIndex is given,
also return a new pandas.Index, otherwise return None.
"""
# backwards compatibility for pandas<0.16 (method) or pandas<0.17
# (tolerance)
Expand All @@ -152,6 +168,8 @@ def convert_label_indexer(index, label, index_name='', method=None,
'the tolerance argument requires pandas v0.17 or newer')
kwargs['tolerance'] = tolerance

new_index = None

if isinstance(label, slice):
if method is not None or tolerance is not None:
raise NotImplementedError(
Expand All @@ -166,6 +184,17 @@ def convert_label_indexer(index, label, index_name='', method=None,
raise KeyError('cannot represent labeled-based slice indexer for '
'dimension %r with a slice over integer positions; '
'the index is unsorted or non-unique')

elif is_dict_like(label):
if not isinstance(index, pd.MultiIndex):
raise ValueError('cannot use a dict-like object for selection on a '
'dimension that does not have a MultiIndex')
indexer, new_index = index.get_loc_level(tuple(label.values()),
level=tuple(label.keys()))

elif _is_nested_tuple(label, index):
indexer = index.get_locs(label)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could reproduce what pandas does in terms of collapsing tuple levels if we call get_loc_level for non-nested tuples:

# untested!

elif isinstance(label, tuple) and isinstance(index, pd.MultiIndex):
    if _is_nested_tuple(label):
        indexer = index.get_locs(label)
    else:
        indexer, new_index = index.get_loc_level(label, level=range(len(label)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it works!

However, using non-nested tuples here consists of selecting single elements and raises the question of how we handle returned scalar values. In that specific case we should drop the dimension but keep the 0-d (multi-level) coordinate so that data.sel(x=('a', 0)) is equivalent to data.isel(x=0) in the example above.

More generally, I think we definitely need to carefully address level drop in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing some tests, it seems like get_loc_level properly drops levels from a MultiIndex, but doesn't handle dropping the level entirely. For that, we need to use get_loc.

Good:

In [11]: idx = pd.MultiIndex.from_product([['a', 'b'], [1, 2], [-1, -2]])

In [12]: idx.get_loc_level(('a', 1), [0, 1])
Out[12]:
(array([ True,  True, False, False, False, False, False, False], dtype=bool),
 Int64Index([-1, -2], dtype='int64'))

In [13]: idx.get_loc_level(('a', -1), [0, 2])
Out[13]:
(array([ True, False,  True, False, False, False, False, False], dtype=bool),
 Int64Index([1, 2], dtype='int64'))

In [15]: idx.get_loc_level(('a',), [0])
Out[15]:
(array([ True,  True,  True,  True, False, False, False, False], dtype=bool),
 MultiIndex(levels=[[1, 2], [-2, -1]],
            labels=[[0, 0, 1, 1], [1, 0, 1, 0]]))

In [16]: idx.get_loc_level((1,), [1])
Out[16]:
(array([ True,  True, False, False,  True,  True, False, False], dtype=bool),
 MultiIndex(levels=[['a', 'b'], [-2, -1]],
            labels=[[0, 0, 1, 1], [1, 0, 1, 0]]))

Bad:

In [14]: idx.get_loc_level(('a', 1, -1), [0, 1, 2])
Out[14]:
(array([ True, False, False, False, False, False, False, False], dtype=bool),
 MultiIndex(levels=[['a', 'b'], [1, 2], [-2, -1]],
            labels=[[0], [0], [1]]))

Good:

In [17]: idx.get_loc(('a', 1, -1))
Out[17]: 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess we need to check the length of the tuple (probably also in the dict path above):

elif isinstance(label, tuple) and isinstance(index, pd.MultiIndex):
    if _is_nested_tuple(label):
        indexer = index.get_locs(label)
    elif len(label) == index.nlevels:
        indexer = index.get_loc(label)
    else:
        indexer, new_index = index.get_loc_level(label, level=range(len(label)))

Copy link
Member Author

@benbovy benbovy Jun 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(EDIT: forget about this comment, it is complete nonsense :) )
I thinking about something like this:

def _maybe_drop_levels(index):
    drop_levels = [i for i, lab in enumerate(index.labels)
                   if not np.ptp(lab.values())]
    if len(drop_levels) < len(index.labels):
        return index.droplevel(drop_levels)
    else:
        return index

def convert_label_indexer(...):

    # ...

    if isinstance(new_index, pd.MultiIndex):
        new_index = _maybe_drop_levels(new_index)

    return indexer, new_index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of doing something like my proposed logic (which I think is similar to what pandas does) is that whether a level is dropped depends only on the indexer type and the number of multi-index levels, as opposed to dropping levels in a way that depends also on the particular values in the indexer and index. Code that depends only on type information rather than values is generally easier to understand and less error prone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, np.ptp only works on numbers, not strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I used np.ptp here as pandas MultiIndex.labels elements are integers by definition.

Anyway, I get your logic. It is also much more efficient!

else:
label = _asarray_tuplesafe(label)
if label.ndim == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where scalars end up -- probably need to add a clause here to handle MultiIndex

Expand All @@ -177,18 +206,36 @@ def convert_label_indexer(index, label, index_name='', method=None,
if np.any(indexer < 0):
raise KeyError('not all values found in index %r'
% index_name)
return indexer
return indexer, new_index


def remap_label_indexers(data_obj, indexers, method=None, tolerance=None):
"""Given an xarray data object and label based indexers, return a mapping
of equivalent location based indexers.
of equivalent location based indexers. Also return a mapping of pandas'
single index objects returned from multi-index objects.
"""
if method is not None and not isinstance(method, str):
raise TypeError('``method`` must be a string')
return dict((dim, convert_label_indexer(data_obj[dim].to_index(), label,
dim, method, tolerance))
for dim, label in iteritems(indexers))

pos_indexers, new_indexes = {}, {}
for dim, label in iteritems(indexers):
index = data_obj[dim].to_index()

if isinstance(index, pd.MultiIndex):
# set default names for multi-index unnamed levels so that
# we can safely rename dimension / coordinate later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! We might also consider moving this logic to around this line of core.variable.as_compatible_data. All data passed to xarray objects goes through this method.

Copy link
Member Author

@benbovy benbovy Jun 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be better to move this logic to indexing.PandasIndexAdapter, wrap the names (and/or name) property, and return a PandasIndexAdapter object from convert_label_indexer?

This is because I worry about implicit copy or in-place renaming.

The problem would be to set default level names that are unique across dimensions, but maybe we can pass the variable name in the PandasIndexAdapter constructor (as an additional name kwarg).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe moving this here (to_index()) would be simpler...

Copy link
Member

@shoyer shoyer Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PandasIndexAdapter is really for handling aspects of pandas indexes that we can't change by converting into a new index. For example, PeriodIndex claims to have dtype=int, when it's really a sort of object dtype.

pandas.Index is actually immutable, so we don't need to worry about changing the data. Calling .copy() just copies the metadata and reuses the same data. That said, calling .set_names() (which returns a new MultiIndex) is probably a more obvious way of handling this.

It does looks like we already some similar logic in to_index(), so that could be a reasonable place to put this, too. But I think doing the coercion in as_compatible_data is also reasonable.

Copy link
Member Author

@benbovy benbovy Jun 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to as_compatible_data, moving to to_index() would create a copy of the multi-index (almost free I guess) each time it is called but it would also allow tracking level name changes on the original MultiIndex:

>>> idx = pd.MultiIndex.from_product([['a', 'b'], [1, 2], [-1, -2]])
>>> y = xr.DataArray(np.random.rand(2 * 2 * 2), [('x', idx)])
>>> y.x.to_index()
MultiIndex(levels=[['a', 'b'], [1, 2], [-2, -1]],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1], [1, 0, 1, 0, 1, 0, 1, 0]],
           names=['x_level_0', 'x_level_1', 'x_level_2'])
>>> idx.names = ('one', 'two', 'three')
>>> y.x.to_index()
MultiIndex(levels=[['a', 'b'], [1, 2], [-2, -1]],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1], [1, 0, 1, 0, 1, 0, 1, 0]],
           names=['one', 'two', 'three'])

and we have also direct access to the dimension/coordinate name so that we can set default level names that are unique (as shown above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we have also direct access to the dimension/coordinate name so that we can set default level names that are unique (as shown above).

True, but unless we allow directly accessing levels as variables, level_0 on dim_0 is an unambiguous as dim_0_level_0.

valid_level_names = [name or '{}_level_{}'.format(dim, i)
for i, name in enumerate(index.names)]
index = index.copy()
index.names = valid_level_names

idxr, new_idx = convert_label_indexer(index, label,
dim, method, tolerance)
pos_indexers[dim] = idxr
if new_idx is not None and not isinstance(new_idx, pd.MultiIndex):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if new_idx is a MultiIndex? Right not it looks like it gets thrown away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to add a test for this situation (e.g., 3 level index -> 2 level index).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes multi-indexes are not updated, but maybe we should do so (see my comment below on level drop).

new_indexes[dim] = new_idx

return pos_indexers, new_indexes


def slice_slice(old_slice, applied_slice, size):
Expand Down
33 changes: 25 additions & 8 deletions xarray/test/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,8 @@ def test_loc_single_boolean(self):
self.assertEqual(data.loc[False], 1)

def test_multiindex(self):
idx = pd.MultiIndex.from_product([list('abc'), [0, 1]])
idx = pd.MultiIndex.from_product([list('abc'), [0, 1]],
names=('one', 'two'))
data = DataArray(range(6), [('x', idx)])

self.assertDataArrayIdentical(data.sel(x=('a', 0)), data.isel(x=0))
Expand All @@ -495,6 +496,22 @@ def test_multiindex(self):
self.assertDataArrayIdentical(data.sel(x=[('a', 0), ('c', 1)]),
data.isel(x=[0, -1]))
self.assertDataArrayIdentical(data.sel(x='a'), data.isel(x=slice(2)))
self.assertVariableNotEqual(data.sel(x={'one': slice(None)}), data)
self.assertDataArrayIdentical(data.isel(x=[0]),
data.sel(x={'one': 'a', 'two': 0}))
self.assertDataArrayIdentical(data.isel(x=[0, 1]), data.sel(x='a'))
self.assertVariableIdentical(
data.sel(x={'one': 'a'}),
data.unstack('x').sel(one='a').dropna('two')
)
self.assertDataArrayIdentical(data.sel(x=('a', slice(None))),
data.isel(x=[0, 1]))

self.assertDataArrayIdentical(data.loc['a'], data[:2])
self.assertDataArrayIdentical(data.loc[{'one': 'a', 'two': 0}, ...],
data[[0]])
self.assertDataArrayIdentical(data.loc[{'one': 'a'}, ...],
data.sel(x={'one': 'a'}))

def test_time_components(self):
dates = pd.date_range('2000-01-01', periods=10)
Expand Down Expand Up @@ -1818,29 +1835,29 @@ def test_full_like(self):
actual = _full_like(DataArray([1, 2, 3]), fill_value=np.nan)
self.assertEqual(actual.dtype, np.float)
np.testing.assert_equal(actual.values, np.nan)

def test_dot(self):
x = np.linspace(-3, 3, 6)
y = np.linspace(-3, 3, 5)
z = range(4)
z = range(4)
da_vals = np.arange(6 * 5 * 4).reshape((6, 5, 4))
da = DataArray(da_vals, coords=[x, y, z], dims=['x', 'y', 'z'])

dm_vals = range(4)
dm = DataArray(dm_vals, coords=[z], dims=['z'])

# nd dot 1d
actual = da.dot(dm)
expected_vals = np.tensordot(da_vals, dm_vals, [2, 0])
expected = DataArray(expected_vals, coords=[x, y], dims=['x', 'y'])
self.assertDataArrayEqual(expected, actual)

# all shared dims
actual = da.dot(da)
expected_vals = np.tensordot(da_vals, da_vals, axes=([0, 1, 2], [0, 1, 2]))
expected = DataArray(expected_vals)
self.assertDataArrayEqual(expected, actual)

# multiple shared dims
dm_vals = np.arange(20 * 5 * 4).reshape((20, 5, 4))
j = np.linspace(-3, 3, 20)
Expand All @@ -1849,7 +1866,7 @@ def test_dot(self):
expected_vals = np.tensordot(da_vals, dm_vals, axes=([1, 2], [1, 2]))
expected = DataArray(expected_vals, coords=[x, j], dims=['x', 'j'])
self.assertDataArrayEqual(expected, actual)

with self.assertRaises(NotImplementedError):
da.dot(dm.to_dataset(name='dm'))
with self.assertRaises(TypeError):
Expand Down
27 changes: 27 additions & 0 deletions xarray/test/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -840,6 +840,33 @@ def test_loc(self):
with self.assertRaises(TypeError):
data.loc[dict(dim3='a')] = 0

def test_multiindex(self):
idx = pd.MultiIndex.from_product([list('abc'), [0, 1]],
names=('one', 'two'))
data = Dataset(data_vars={'var': ('x', range(6))}, coords={'x': idx})

self.assertDatasetIdentical(data.sel(x=('a', 0)), data.isel(x=0))
self.assertDatasetIdentical(data.sel(x=('c', 1)), data.isel(x=-1))
self.assertDatasetIdentical(data.sel(x=[('a', 0)]), data.isel(x=[0]))
self.assertDatasetIdentical(data.sel(x=[('a', 0), ('c', 1)]),
data.isel(x=[0, -1]))
self.assertDatasetIdentical(data.sel(x=(['a', 'c'], [0, 1])),
data.isel(x=[0, 1, -2, -1]))
self.assertDatasetIdentical(data.sel(x='a'), data.isel(x=slice(2)))
self.assertVariableNotEqual(data.sel(x={'one': slice(None)})['var'],
data['var'])
self.assertDatasetIdentical(data.isel(x=[0]),
data.sel(x={'one': 'a', 'two': 0}))
self.assertDatasetIdentical(data.isel(x=[0, 1]), data.sel(x='a'))
self.assertVariableIdentical(
data.sel(x={'one': 'a'})['var'],
data.unstack('x').sel(one='a').dropna('two')['var']
)

self.assertDatasetIdentical(data.loc[{'x': 'a'}], data.sel(x='a'))
self.assertDatasetIdentical(data.loc[{'x': {'one': 'a', 'two': 0}}],
data.sel(x={'one': 'a', 'two': 0}))

def test_reindex_like(self):
data = create_test_data()
data['letters'] = ('dim3', 10 * ['a'])
Expand Down
Loading