Skip to content

ENH: Add optional argument index to pd.melt to maintain index values #33659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 83 commits into from
Jul 9, 2020
Merged

ENH: Add optional argument index to pd.melt to maintain index values #33659

merged 83 commits into from
Jul 9, 2020

Conversation

Rik-de-Kort
Copy link
Contributor

@Rik-de-Kort Rik-de-Kort commented Apr 19, 2020

Finishing up a stale PR idea: #28859 and #17459

Has some tests and better code.
I think it's fair to duplicate the index values and not bend over backwards to maintain uniqueness like in previous iterations.

Apologies for the mess, it was a quick job and I didn't want to spend an hour fiddling with the commits.

Finally, I deleted some ignore type comments for mypy because the commits weren't going on my system. Is there some other fix for that? Other than that I think it's good to go.

@Rik-de-Kort Rik-de-Kort reopened this May 24, 2020
@Rik-de-Kort
Copy link
Contributor Author

Does the CI always have this many issues?

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rik-de-Kort almost there. generally lgtm. just a couple more comments.

result = frame._constructor(mdata, columns=mcolumns)

if not ignore_index:
new_index = np.tile(frame.index, K)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both MI and Index already have a .repeat() method, I think we could add a .tile() method to make this easier. (or just use repeat)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index(["foo", "bar"]).repeat(2) yields Index(['foo', 'foo', 'bar', 'bar'], dtype='object'), where as np.tile(["foo", "bar"]) yields array(['foo', 'bar', 'foo', 'bar', 'foo', 'bar'], dtype=object). The latter corresponds to the layout used in melt so it's very not trivial to use repeat instead of tile.

I tried having a look at implementing tile on indices but then I would also have to do it for multiindices and document it and tests, and argument validation which I've never even looked at before and I think it's a big hassle that I will not undertake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have _tile_compat in pandas\core\reshape\util.py. This may allow futher simplification here.

Copy link
Contributor

@TomAugspurger TomAugspurger Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converts to object dtype. Can you use pandas.core.reshape.util._tile_compat?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove the next section as well, since you won't be converting to an ndarray.

Copy link
Contributor Author

@Rik-de-Kort Rik-de-Kort Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that simplifies the code a lot! Build is failing but that's due to a worker crashing.

Rik-de-Kort and others added 3 commits June 25, 2020 17:44
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. a doc comment, pls ping on green.

result = frame._constructor(mdata, columns=mcolumns)

if not ignore_index:
new_index = np.tile(frame.index, K)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough

@jreback jreback added this to the 1.1 milestone Jun 25, 2020
@Rik-de-Kort
Copy link
Contributor Author

@jreback I though I gave you a ping, but I don't see it, so here it is!

@jreback jreback merged commit c8d85e2 into pandas-dev:master Jul 9, 2020
@jreback
Copy link
Contributor

jreback commented Jul 9, 2020

thanks @Rik-de-Kort

@Rik-de-Kort
Copy link
Contributor Author

You're welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index gets lost when DataFrame melt method is used
5 participants