Skip to content

ENH/BUG: Fix names, levels and labels handling in MultiIndex #4039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 11, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 86 additions & 62 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -868,66 +868,6 @@ convert to an integer index:
df_new[(df_new['index'] >= 1.0) & (df_new['index'] < 2)]


.. _indexing.class:

Index objects
-------------

The pandas Index class and its subclasses can be viewed as implementing an
*ordered set* in addition to providing the support infrastructure necessary for
lookups, data alignment, and reindexing. The easiest way to create one directly
is to pass a list or other sequence to ``Index``:

.. ipython:: python

index = Index(['e', 'd', 'a', 'b'])
index
'd' in index

You can also pass a ``name`` to be stored in the index:


.. ipython:: python

index = Index(['e', 'd', 'a', 'b'], name='something')
index.name

Starting with pandas 0.5, the name, if set, will be shown in the console
display:

.. ipython:: python

index = Index(list(range(5)), name='rows')
columns = Index(['A', 'B', 'C'], name='cols')
df = DataFrame(np.random.randn(5, 3), index=index, columns=columns)
df
df['A']


Set operations on Index objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _indexing.set_ops:

The three main operations are ``union (|)``, ``intersection (&)``, and ``diff
(-)``. These can be directly called as instance methods or used via overloaded
operators:

.. ipython:: python

a = Index(['c', 'b', 'a'])
b = Index(['c', 'e', 'd'])
a.union(b)
a | b
a & b
a - b

``isin`` method of Index objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One additional operation is the ``isin`` method that works analogously to the
``Series.isin`` method found :ref:`here <indexing.boolean>`.

.. _indexing.hierarchical:

Hierarchical indexing (MultiIndex)
Expand Down Expand Up @@ -1189,7 +1129,7 @@ are named.

.. ipython:: python

s.index.names = ['L1', 'L2']
s.index.set_names(['L1', 'L2'], inplace=True)
s.sortlevel(level='L1')
s.sortlevel(level='L2')

Expand Down Expand Up @@ -1229,7 +1169,9 @@ However:
::

>>> s.ix[('a', 'b'):('b', 'a')]
Exception: MultiIndex lexsort depth 1, key was length 2
Traceback (most recent call last)
...
KeyError: Key length (3) was greater than MultiIndex lexsort depth (2)

Swapping levels with ``swaplevel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1274,6 +1216,88 @@ not check (or care) whether the levels themselves are sorted. Fortunately, the
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
if you compute the levels and labels yourself, please be careful.

.. _indexing.class:

Index objects
-------------

The pandas Index class and its subclasses can be viewed as implementing an
*ordered set* in addition to providing the support infrastructure necessary for
lookups, data alignment, and reindexing. The easiest way to create one directly
is to pass a list or other sequence to ``Index``:

.. ipython:: python

index = Index(['e', 'd', 'a', 'b'])
index
'd' in index

You can also pass a ``name`` to be stored in the index:


.. ipython:: python

index = Index(['e', 'd', 'a', 'b'], name='something')
index.name

Starting with pandas 0.5, the name, if set, will be shown in the console
display:

.. ipython:: python

index = Index(list(range(5)), name='rows')
columns = Index(['A', 'B', 'C'], name='cols')
df = DataFrame(np.random.randn(5, 3), index=index, columns=columns)
df
df['A']


Set operations on Index objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _indexing.set_ops:

The three main operations are ``union (|)``, ``intersection (&)``, and ``diff
(-)``. These can be directly called as instance methods or used via overloaded
operators:

.. ipython:: python

a = Index(['c', 'b', 'a'])
b = Index(['c', 'e', 'd'])
a.union(b)
a | b
a & b
a - b

``isin`` method of Index objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One additional operation is the ``isin`` method that works analogously to the
``Series.isin`` method found :ref:`here <indexing.boolean>`.

Setting index metadata (``name(s)``, ``levels``, ``labels``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _indexing.set_metadata:

Indexes are "mostly immutable", but it is possible to set and change their
metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and
``labels``).

You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels``
to set these attributes directly. They default to returning a copy; however,
you can specify ``inplace=True`` to have the data change inplace.

.. ipython:: python

ind = Index([1, 2, 3])
ind.rename("apple")
ind
ind.set_names(["apple"], inplace=True)
ind.name = "bob"
ind

Adding an index to an existing DataFrame
----------------------------------------

Expand Down
26 changes: 26 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ pandas 0.13
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
- Significant table writing performance improvements in ``HDFStore``
- ``Index.copy()`` and ``MultiIndex.copy()`` now accept keyword arguments to
change attributes (i.e., ``names``, ``levels``, ``labels``)
(:issue:`4039`)
- Add ``rename`` and ``set_names`` methods to ``Index`` as well as
``set_names``, ``set_levels``, ``set_labels`` to ``MultiIndex``.
(:issue:`4039`)

**API Changes**

Expand All @@ -66,6 +72,7 @@ pandas 0.13
an alias of iteritems used to get around ``2to3``'s changes).
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)

- ``HDFStore``

- added an ``is_open`` property to indicate if the underlying file handle is_open;
Expand All @@ -83,6 +90,21 @@ pandas 0.13
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`)

- ``Index`` and ``MultiIndex`` changes (:issue:`4039`):

- Setting ``levels`` and ``labels`` directly on ``MultiIndex`` is now
deprecated. Instead, you can use the ``set_levels()`` and
``set_labels()`` methods.
- ``levels``, ``labels`` and ``names`` properties no longer return lists,
but instead return containers that do not allow setting of items
('mostly immutable')
- ``levels``, ``labels`` and ``names`` are validated upon setting and are
either copied or shallow-copied.
- ``__deepcopy__`` now returns a shallow copy (currently: a view) of the
data - allowing metadata changes.
- ``MultiIndex.astype()`` now only allows ``np.object_``-like dtypes and
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)

**Experimental Features**

**Bug Fixes**
Expand Down Expand Up @@ -136,6 +158,10 @@ pandas 0.13
- frozenset objects now raise in the ``Series`` constructor (:issue:`4482`,
:issue:`4480`)
- Fixed issue with sorting a duplicate multi-index that has multiple dtypes (:issue:`4516`)
- Fixed bug in ``DataFrame.set_values`` which was causing name attributes to
be lost when expanding the index. (:issue:`3742`, :issue:`4039`)
- Fixed issue where individual ``names``, ``levels`` and ``labels`` could be
set on ``MultiIndex`` without validation (:issue:`3714`, :issue:`4039`)

pandas 0.12
===========
Expand Down
18 changes: 18 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,24 @@ API changes
import os
os.remove(path)

- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
``labels``, and ``names``) (:issue:`4039`):

..code-block ::

# previously, you would have set levels or labels directly
index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]

# now, you use the set_levels or set_labels methods
index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])

# similarly, for names, you can rename the object
# but setting names is not deprecated.
index = index.set_names(["bob", "cranberry"])

# and all methods take an inplace kwarg
index.set_names(["bob", "cranberry"], inplace=True)

Enhancements
~~~~~~~~~~~~

Expand Down
88 changes: 87 additions & 1 deletion pandas/core/base.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""
Base class(es) for all pandas objects.
Base and utility classes for pandas objects.
"""
from pandas import compat
import numpy as np

class StringMixin(object):
"""implements string methods so long as object defines a `__unicode__` method.
Expand Down Expand Up @@ -56,3 +57,88 @@ def __unicode__(self):
"""
# Should be overwritten by base classes
return object.__repr__(self)

class FrozenList(PandasObject, list):
"""
Container that doesn't allow setting item *but*
because it's technically non-hashable, will be used
for lookups, appropriately, etc.
"""
# Sidenote: This has to be of type list, otherwise it messes up PyTables typechecks

def __add__(self, other):
if isinstance(other, tuple):
other = list(other)
return self.__class__(super(FrozenList, self).__add__(other))

__iadd__ = __add__

# Python 2 compat
def __getslice__(self, i, j):
return self.__class__(super(FrozenList, self).__getslice__(i, j))

def __getitem__(self, n):
# Python 3 compat
if isinstance(n, slice):
return self.__class__(super(FrozenList, self).__getitem__(n))
return super(FrozenList, self).__getitem__(n)

def __radd__(self, other):
if isinstance(other, tuple):
other = list(other)
return self.__class__(other + list(self))

def __eq__(self, other):
if isinstance(other, (tuple, FrozenList)):
other = list(other)
return super(FrozenList, self).__eq__(other)

__req__ = __eq__

def __mul__(self, other):
return self.__class__(super(FrozenList, self).__mul__(other))

__imul__ = __mul__

def __hash__(self):
return hash(tuple(self))

def _disabled(self, *args, **kwargs):
"""This method will not function because object is immutable."""
raise TypeError("'%s' does not support mutable operations." %
self.__class__)

def __unicode__(self):
from pandas.core.common import pprint_thing
return "%s(%s)" % (self.__class__.__name__,
pprint_thing(self, quote_strings=True,
escape_chars=('\t', '\r', '\n')))

__setitem__ = __setslice__ = __delitem__ = __delslice__ = _disabled
pop = append = extend = remove = sort = insert = _disabled


class FrozenNDArray(PandasObject, np.ndarray):

# no __array_finalize__ for now because no metadata
def __new__(cls, data, dtype=None, copy=False):
if copy is None:
copy = not isinstance(data, FrozenNDArray)
res = np.array(data, dtype=dtype, copy=copy).view(cls)
return res

def _disabled(self, *args, **kwargs):
"""This method will not function because object is immutable."""
raise TypeError("'%s' does not support mutable operations." %
self.__class__)

__setitem__ = __setslice__ = __delitem__ = __delslice__ = _disabled
put = itemset = fill = _disabled

def _shallow_copy(self):
return self.view()

def values(self):
"""returns *copy* of underlying array"""
arr = self.view(np.ndarray).copy()
return arr
4 changes: 1 addition & 3 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,16 @@

from numpy.lib.format import read_array, write_array
import numpy as np

import pandas.algos as algos
import pandas.lib as lib
import pandas.tslib as tslib

from pandas import compat
from pandas.compat import StringIO, BytesIO, range, long, u, zip, map


from pandas.core.config import get_option
from pandas.core import array as pa


# XXX: HACK for NumPy 1.5.1 to suppress warnings
try:
np.seterr(all='ignore')
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1150,7 +1150,7 @@ def to_records(self, index=True, convert_datetime64=True):
arrays = ix_vals+ [self[c].values for c in self.columns]

count = 0
index_names = self.index.names
index_names = list(self.index.names)
if isinstance(self.index, MultiIndex):
for i, n in enumerate(index_names):
if n is None:
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ def drop(self, labels, axis=0, level=None):
new_axis = axis.drop(labels)
dropped = self.reindex(**{axis_name: new_axis})
try:
dropped.axes[axis_].names = axis.names
dropped.axes[axis_].set_names(axis.names, inplace=True)
except AttributeError:
pass
return dropped
Expand Down
Loading