Skip to content

Melt enhance #17677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 46 additions & 11 deletions doc/source/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,24 +265,59 @@ the right thing:
Reshaping by Melt
-----------------

The top-level :func:`melt` and :func:`~DataFrame.melt` functions are useful to
The top-level :func:`melt` function and the equivalent :func:`DataFrame.melt` method are useful to
massage a DataFrame into a format where one or more columns are identifier variables,
while all other columns, considered measured variables, are "unpivoted" to the
row axis, leaving just two non-identifier columns, "variable" and "value". The
names of those columns can be customized by supplying the ``var_name`` and
row axis, leaving just two non-identifier columns, "variable" and "value".

For instance, it is possible to unpivot the fruit columns (``Mango``, ``Orange``, and ``Watermelon``) into a single column
with their corresponding values in another.

.. ipython:: python

df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'],
'Mango':[4, 10, 90],
'Orange': [10, 8, 14],
'Watermelon':[40, 99, 43]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generaly like to keep docs to 80 chars or less (readibility)

columns=['State', 'Mango', 'Orange', 'Watermelon'])

df

df.melt(id_vars='State', value_vars=['Mango', 'Orange', 'Watermelon'])

The resulting names of the unpivoted columns can be customized by supplying strings to the ``var_name`` and
``value_name`` parameters.

For instance,
.. ipython:: python

df.melt(id_vars='State', value_vars=['Mango', 'Orange', 'Watermelon'],
var_name='Fruit', value_name='Pounds')

.. versionadded:: 0.22.0

Passing a list of lists to `value_vars` allows you to simultaneously melt
independent column groups. The following DataFrame contains an addtional column grouping of drinks (``Gin`` and ``Vokda``)
that may be unpivoted along with the fruit columns. The groups need not be the same size. Additionally,
the ``var_name`` and ``value_name`` parameters may be passed a list of strings to name each of the returned
variable and value columns.

.. ipython:: python

cheese = pd.DataFrame({'first' : ['John', 'Mary'],
'last' : ['Doe', 'Bo'],
'height' : [5.5, 6.0],
'weight' : [130, 150]})
cheese
cheese.melt(id_vars=['first', 'last'])
cheese.melt(id_vars=['first', 'last'], var_name='quantity')
df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'],
'Mango':[4, 10, 90],
'Orange': [10, 8, 14],
'Watermelon':[40, 99, 43],
'Gin':[16, 200, 34],
'Vodka':[20, 33, 18]},
columns=['State', 'Mango', 'Orange', 'Watermelon',
'Gin', 'Vodka'])

df

df.melt(id_vars='State',
value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']],
var_name=['Fruit', 'Drink'],
value_name=['Pounds', 'Ounces'])

Another way to transform is to use the ``wide_to_long`` panel data convenience
function.
Expand Down
40 changes: 39 additions & 1 deletion doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,50 @@ deprecations, new features, enhancements, and performance improvements along
with a large number of bug fixes. We recommend that all users upgrade to this
version.

Highlights include:

- The :meth:`DataFrame.melt` method and top-level :func:`melt` function can now simultaneously unpivot independent groups of columns, see :ref:`here <whatsnew_0220.enhancements.melt>`.

.. contents:: What's new in v0.22.0
:local:
:backlinks: none
:depth: 2

.. _whatsnew_0220.enhancements:

New features
~~~~~~~~~~~~

-
.. _whatsnew_0220.enhancements.melt:

Simultaneous unpivoting of independent groups of columns with ``melt``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, ``melt`` was only able to unpivot a single group of columns. This was done by passing all the column names in the group as a list to the ``value_vars`` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number(s)


In the following DataFrame, there are two groups, fruits (``Mango``, ``Orange``, ``Watermelon``) and drinks (``Gin``, ``Vodka``) that can each be unpivoted into their own column. Previously, ``melt`` could only unpivot a single column grouping:

.. ipython:: python

df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'],
'Mango':[4, 10, 90],
'Orange': [10, 8, 14],
'Watermelon':[40, 99, 43],
'Gin':[16, 200, 34],
'Vodka':[20, 33, 18]},
columns=['State', 'Mango', 'Orange',
'Watermelon', 'Gin', 'Vodka'])

df.melt(id_vars='State', value_vars=['Mango', 'Orange', 'Watermelon'],
var_name='Fruit', value_name='Pounds')

Now, ``melt`` can unpivot any number of column groups by passing a list of lists to the ``value_vars`` parameter. The resulting unpivoted columns can be named by passing a list to ``var_name``. The corresponding values of each group may also be named by passing a list to ``value_name``. Notice that the column groups need not be equal in length:

.. ipython:: python

df.melt(id_vars='State',
value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']],
var_name=['Fruit', 'Drink'],
value_name=['Pounds', 'Ounces'])
-
-

Expand Down
46 changes: 36 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4587,10 +4587,10 @@ def unstack(self, level=-1, fill_value=None):
leaving identifier variables set.

This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
or more columns are identifier variables (`id_vars`), while other groups of
columns, considered measured variables (`value_vars`), are "unpivoted" so
that each group consists of two new columns, a 'variable', labeled by
`var_name`, and its corresponding 'value', labeled by `value_name`.

%(versionadded)s
Parameters
Expand All @@ -4599,13 +4599,14 @@ def unstack(self, level=-1, fill_value=None):
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
Column(s) to unpivot. If list of lists, simultaneously unpivot
each sublist into its own variable column. If not specified, uses all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a mention where this changes as appropriate (meaning in 0.22.0) in the doc-string

columns that are not set as `id_vars`.
var_name : scalar or list
Name(s) to use for the 'variable' column(s). If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
value_name : scalar or list, default 'value'
Name(s) to use for the 'value' column(s).
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.

Expand Down Expand Up @@ -4673,6 +4674,31 @@ def unstack(self, level=-1, fill_value=None):
1 b B E 3
2 c B E 5

.. versionadded:: 0.22.0

Simultaneously melt multiple groups of columns:

>>> df2 = pd.DataFrame({'City': ['Houston', 'Miami'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple examples first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as simple as it gets. One id column, two column groups of each length two and only two rows of data.

'Mango':[4, 10],
'Orange': [10, 8],
'Gin':[16, 200],
'Vodka':[20, 33]},
columns=['City','Mango', 'Orange', 'Gin', 'Vodka'])
>>> df2
City Mango Orange Gin Vodka
0 Houston 4 10 16 20
1 Miami 10 8 200 33

>>> %(caller)sid_vars='City',
value_vars=[['Mango', 'Orange'], ['Gin', 'Vodka']],
var_name=['Fruit', 'Drink'],
value_name=['Pounds', 'Ounces'])
City Fruit Pounds Drink Ounces
0 Houston Mango 4 Gin 16
1 Miami Mango 10 Gin 200
2 Houston Orange 10 Vodka 20
3 Miami Orange 8 Vodka 33

""")

@Appender(_shared_docs['melt'] %
Expand Down
Loading