Skip to content

BUG: Allow MultiIndex to be subclassed #11267 #11268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

janmedlock
Copy link

closes #11267
MultiIndex had several places where the output class was hard-coded to
MultiIndex rather than cls, self.class, or the like. These have
been replaced.

MultiIndex had several places where the output class was hard-coded to
MultiIndex rather than cls, self.__class__, or the like.  These have
been replaced.
@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

  • needs some tests
  • test/fix ensure_index (though not sure you can make this more general)

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Compat pandas objects compatability with Numpy or Python functions labels Oct 9, 2015
@max-sixty
Copy link
Contributor

Should this use ._constructor rather than .__class__? I had thought that was the pandas standard

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

@MaximilianR yes _constructor is defined in the super class (and is the correct method here)

@janmedlock
Copy link
Author

I added a simple test of MultiIndex subclassing.

As for _constructor, it doesn't look so simple. MultiIndex._constructor is currently set to MultiIndex.from_tuple() and many of the places where I replaced MultiIndex with self.__class__ call the base class MultiIndex() and other methods like MultiIndex.from_array().

I have another branch with MultiIndex._constructor redefined to MultiIndex itself, and then all of my self.__class__ uses replaced with self._constructor. It passes all the tests in test_index.py, but I haven't tested it further. I can push this if you're interested.

@janmedlock
Copy link
Author

I don't understand the comment about ensure_index. There's _ensure_index in index.py, but it's a function, not a method of MultiIndex, so there's nothing obvious to do, and I'm unsure of exactly what that function is supposed to do.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2015

If you actually were to sub-class MultiIndex, the second you do ops on it, it will likely get wiped out and replaced with a base MultiIndex. This is all completely fixed for Index, so it should be straightforward, but _ensure_index is called a lot, and should return the same type.

Simply sub-class testing is not nearly enough. You have to use the sub-classed index in operations and assert that they are propogated properly.

@max-sixty
Copy link
Contributor

@janmedlock I'm not the authority here, but to try and answer your Q on _constructor - I think the pandas standard is to use ._constructor(...) rather than .__class__(...). So all the references to MultiIndex should be replaced with _constructor, and then children of MultiIndex would implement that method, probably with a reference to their class.

Although I'm not sure why pandas uses that standard rather than the direct reference to the class.

@janmedlock
Copy link
Author

@jreback The biggest issue was MultiIndex.__new__() calls object.__new__(MultiIndex) rather than object.__new__(cls) so you can't even create an instance of a subclass.

I changed all the MultiIndex calls in MultiIndex.append() and the like, so those should work, depending on whether the method of an instance of the subclass or of the parent class is called. I agree my simple test does nothing to check this.

@janmedlock
Copy link
Author

@MaximilianR Yes, I understood that, but MultiIndex._constructor is currently set to MultiIndex.from_tuples(), so that e.g. in a place inside the MultiIndex class definition where MultiIndex() is called I cannot replace it with self._constructor(): I'm stuck with self.__class__(). If that's an intended place to use _constructor, it MultiIndex._constructor needs to be redefined.

@max-sixty
Copy link
Contributor

@janmedlock Ah, good point.
I'll leave it to @jreback to give a verdict on the overall principle, then

class MyMultiIndex(MultiIndex):
pass
mi = MyMultiIndex([['a'], ['b']], [[0], [0]])
self.assertTrue(isinstance(mi, MyMultiIndex))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a fair bit of testing, e.g. run thru the some typical operations of a MI, just with a sub-class one.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

closing, but if you'd like to update, pls reopen

@jreback jreback closed this Nov 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot subclass MultiIndex
3 participants