-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Melt enhance #17677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Melt enhance #17677
Changes from all commits
ce2499a
b4f3a30
d570a71
755c3db
68e55d9
614fc01
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,12 +8,50 @@ deprecations, new features, enhancements, and performance improvements along | |
with a large number of bug fixes. We recommend that all users upgrade to this | ||
version. | ||
|
||
Highlights include: | ||
|
||
- The :meth:`DataFrame.melt` method and top-level :func:`melt` function can now simultaneously unpivot independent groups of columns, see :ref:`here <whatsnew_0220.enhancements.melt>`. | ||
|
||
.. contents:: What's new in v0.22.0 | ||
:local: | ||
:backlinks: none | ||
:depth: 2 | ||
|
||
.. _whatsnew_0220.enhancements: | ||
|
||
New features | ||
~~~~~~~~~~~~ | ||
|
||
- | ||
.. _whatsnew_0220.enhancements.melt: | ||
|
||
Simultaneous unpivoting of independent groups of columns with ``melt`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Previously, ``melt`` was only able to unpivot a single group of columns. This was done by passing all the column names in the group as a list to the ``value_vars`` parameter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add the issue number(s) |
||
|
||
In the following DataFrame, there are two groups, fruits (``Mango``, ``Orange``, ``Watermelon``) and drinks (``Gin``, ``Vodka``) that can each be unpivoted into their own column. Previously, ``melt`` could only unpivot a single column grouping: | ||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'], | ||
'Mango':[4, 10, 90], | ||
'Orange': [10, 8, 14], | ||
'Watermelon':[40, 99, 43], | ||
'Gin':[16, 200, 34], | ||
'Vodka':[20, 33, 18]}, | ||
columns=['State', 'Mango', 'Orange', | ||
'Watermelon', 'Gin', 'Vodka']) | ||
|
||
df.melt(id_vars='State', value_vars=['Mango', 'Orange', 'Watermelon'], | ||
var_name='Fruit', value_name='Pounds') | ||
|
||
Now, ``melt`` can unpivot any number of column groups by passing a list of lists to the ``value_vars`` parameter. The resulting unpivoted columns can be named by passing a list to ``var_name``. The corresponding values of each group may also be named by passing a list to ``value_name``. Notice that the column groups need not be equal in length: | ||
|
||
.. ipython:: python | ||
|
||
df.melt(id_vars='State', | ||
value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']], | ||
var_name=['Fruit', 'Drink'], | ||
value_name=['Pounds', 'Ounces']) | ||
- | ||
- | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4587,10 +4587,10 @@ def unstack(self, level=-1, fill_value=None): | |
leaving identifier variables set. | ||
|
||
This function is useful to massage a DataFrame into a format where one | ||
or more columns are identifier variables (`id_vars`), while all other | ||
columns, considered measured variables (`value_vars`), are "unpivoted" to | ||
the row axis, leaving just two non-identifier columns, 'variable' and | ||
'value'. | ||
or more columns are identifier variables (`id_vars`), while other groups of | ||
columns, considered measured variables (`value_vars`), are "unpivoted" so | ||
that each group consists of two new columns, a 'variable', labeled by | ||
`var_name`, and its corresponding 'value', labeled by `value_name`. | ||
|
||
%(versionadded)s | ||
Parameters | ||
|
@@ -4599,13 +4599,14 @@ def unstack(self, level=-1, fill_value=None): | |
id_vars : tuple, list, or ndarray, optional | ||
Column(s) to use as identifier variables. | ||
value_vars : tuple, list, or ndarray, optional | ||
Column(s) to unpivot. If not specified, uses all columns that | ||
are not set as `id_vars`. | ||
var_name : scalar | ||
Name to use for the 'variable' column. If None it uses | ||
Column(s) to unpivot. If list of lists, simultaneously unpivot | ||
each sublist into its own variable column. If not specified, uses all | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a mention where this changes as appropriate (meaning in 0.22.0) in the doc-string |
||
columns that are not set as `id_vars`. | ||
var_name : scalar or list | ||
Name(s) to use for the 'variable' column(s). If None it uses | ||
``frame.columns.name`` or 'variable'. | ||
value_name : scalar, default 'value' | ||
Name to use for the 'value' column. | ||
value_name : scalar or list, default 'value' | ||
Name(s) to use for the 'value' column(s). | ||
col_level : int or string, optional | ||
If columns are a MultiIndex then use this level to melt. | ||
|
||
|
@@ -4673,6 +4674,31 @@ def unstack(self, level=-1, fill_value=None): | |
1 b B E 3 | ||
2 c B E 5 | ||
|
||
.. versionadded:: 0.22.0 | ||
|
||
Simultaneously melt multiple groups of columns: | ||
|
||
>>> df2 = pd.DataFrame({'City': ['Houston', 'Miami'], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. simple examples first There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is as simple as it gets. One id column, two column groups of each length two and only two rows of data. |
||
'Mango':[4, 10], | ||
'Orange': [10, 8], | ||
'Gin':[16, 200], | ||
'Vodka':[20, 33]}, | ||
columns=['City','Mango', 'Orange', 'Gin', 'Vodka']) | ||
>>> df2 | ||
City Mango Orange Gin Vodka | ||
0 Houston 4 10 16 20 | ||
1 Miami 10 8 200 33 | ||
|
||
>>> %(caller)sid_vars='City', | ||
value_vars=[['Mango', 'Orange'], ['Gin', 'Vodka']], | ||
var_name=['Fruit', 'Drink'], | ||
value_name=['Pounds', 'Ounces']) | ||
City Fruit Pounds Drink Ounces | ||
0 Houston Mango 4 Gin 16 | ||
1 Miami Mango 10 Gin 200 | ||
2 Houston Orange 10 Vodka 20 | ||
3 Miami Orange 8 Vodka 33 | ||
|
||
""") | ||
|
||
@Appender(_shared_docs['melt'] % | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generaly like to keep docs to 80 chars or less (readibility)