Skip to content

Cannot subclass MultiIndex #11267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
janmedlock opened this issue Oct 9, 2015 · 7 comments · Fixed by #37180
Closed

Cannot subclass MultiIndex #11267

janmedlock opened this issue Oct 9, 2015 · 7 comments · Fixed by #37180
Labels
Enhancement Index Related to the Index class or subclasses MultiIndex

Comments

@janmedlock
Copy link

MultiIndex cannot be subclassed because in several spots, including __new__, the output class is hardcoded as MultiIndex rather than cls or self.__class__ or the like.

This code illustrates the problem:

$ python
Python 2.7.10 (default, Sep 13 2015, 20:30:50) 
[GCC 5.2.1 20150911] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> class MyMultiIndex(pandas.MultiIndex):
...     pass
... 
>>> multiindex = MyMultiIndex([['a'], ['b']],
...                           [[0], [0]])
>>> print type(multiindex)
<class 'pandas.core.index.MultiIndex'>

The last line should instead read

<class '__main__.MyMultiIndex'>

I have a patch in the works.

>>> pandas.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.utf8

pandas: 0.16.2.dev
nose: 1.3.6
Cython: 0.23.2
numpy: 1.9.2
scipy: 0.14.1
statsmodels: 0.6.1
IPython: 2.3.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.2
pytz: 2012c
bottleneck: None
tables: 3.1.1
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: None
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
janmedlock added a commit to janmedlock/pandas that referenced this issue Oct 9, 2015
MultiIndex had several places where the output class was hard-coded to
MultiIndex rather than cls, self.__class__, or the like.  These have
been replaced.
@janmedlock
Copy link
Author

I just submitted pull request #11268 to fix this.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

can you provide a use-case for this (not objecting to the change though)

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels Oct 9, 2015
@janmedlock
Copy link
Author

Sure. I don't want to need to know the order of the indices in the MultiIndex. E.g.

mi = MultiIndex((('a', 'b'), ('A', 'B')), ((0, 0), (0, 1), (1, 0), (1, 1)), names=('foo', 'bar'))
df = Series(..., index = mi)

I want to be able to write something like

df[mi(foo = 'a', bar = 'b')]

or similar to get the correct Series elements.

Maybe this functionality already exists in MultiIndex, but I didn't see it and decided to subclass.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

we've had discussions here

mi = pd.MultiIndex.from_product([list('ab'),list('AB')],names=['foo','bar'])

In [5]: s = Series(range(4),index=mi)

In [6]: s
Out[6]: 
foo  bar
a    A      0
     B      1
b    A      2
     B      3
dtype: int64

to allow:

s.loc[{'foo' : 'a', 'bar' : B'}]

xray calls this .sel, see here

or maybe
s.loc[S(foo='a',bar='B')] where S is simply

class S(object):
    def __call__(self, *kwargs):
         return kwargs

if we allow the dict syntax

you wouldn't normally want to have to do:

s[s.index(foo='a', bar='B')], although not terrible

so, you don't need to sub-class at all, simply patch it in

In [8]: MultiIndex.__call__ = lambda self, **kwargs: tuple([ (kwargs.get(n,slice(None))) for n in self.names ])

In [9]: s.loc[s.index(foo='b',bar='B')]
Out[9]: 3

In [10]: s.loc[s.index(foo='b')]
Out[10]: 
foo  bar
b    A      2
     B      3
dtype: int64

In [11]: s.loc[s.index(bar='B')]
Out[11]: 
foo
a    1
b    3
dtype: int64

In [12]: s.loc[s.index(bar=['B'])]
Out[12]: 
foo  bar
a    B      1
b    B      3
dtype: int64

kind of like that actually :)

cc @shoyer

@janmedlock
Copy link
Author

Thanks for the answer!

@jrderuiter
Copy link

So this never got changed? Was there any special reason for not persuing this further?

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

the PR went stale. in principle it is ok if you'd like to pick it up; needs more testing

@jreback jreback added this to the Next Major Release milestone Sep 23, 2017
@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019
@mroeschke mroeschke added Enhancement and removed Compat pandas objects compatability with Numpy or Python functions labels Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Index Related to the Index class or subclasses MultiIndex
Projects
None yet
6 participants