-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add optional argument index to pd.melt to maintain index values #33659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does the CI always have this many issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Rik-de-Kort almost there. generally lgtm. just a couple more comments.
pandas/core/reshape/melt.py
Outdated
result = frame._constructor(mdata, columns=mcolumns) | ||
|
||
if not ignore_index: | ||
new_index = np.tile(frame.index, K) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both MI and Index already have a .repeat() method, I think we could add a .tile() method to make this easier. (or just use repeat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index(["foo", "bar"]).repeat(2)
yields Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
, where as np.tile(["foo", "bar"])
yields array(['foo', 'bar', 'foo', 'bar', 'foo', 'bar'], dtype=object)
. The latter corresponds to the layout used in melt
so it's very not trivial to use repeat instead of tile.
I tried having a look at implementing tile
on indices but then I would also have to do it for multiindices and document it and tests, and argument validation which I've never even looked at before and I think it's a big hassle that I will not undertake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have _tile_compat in pandas\core\reshape\util.py. This may allow futher simplification here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts to object dtype. Can you use pandas.core.reshape.util._tile_compat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to remove the next section as well, since you won't be converting to an ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that simplifies the code a lot! Build is failing but that's due to a worker crashing.
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. a doc comment, pls ping on green.
pandas/core/reshape/melt.py
Outdated
result = frame._constructor(mdata, columns=mcolumns) | ||
|
||
if not ignore_index: | ||
new_index = np.tile(frame.index, K) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair enough
@jreback I though I gave you a ping, but I don't see it, so here it is! |
thanks @Rik-de-Kort |
You're welcome |
Finishing up a stale PR idea: #28859 and #17459
Has some tests and better code.
I think it's fair to duplicate the index values and not bend over backwards to maintain uniqueness like in previous iterations.
Apologies for the mess, it was a quick job and I didn't want to spend an hour fiddling with the commits.
Finally, I deleted some ignore type comments for mypy because the commits weren't going on my system. Is there some other fix for that? Other than that I think it's good to go.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff