BUG: coerce pd.wide_to_long suffixes to ints #17628

tdpetrou · 2017-09-22T13:54:31Z

closes wide_to_long does not convert integer suffixes to int #17627
I had to change nearly all the tests which had the suffixes as strings to integers. I also added a few other tests including one for string suffixes
passes git diff upstream/master -u -- "*.py" | flake8 --diff

I also cleaned up the finding of the var_names and substituted in some list comprehensions.

jreback · 2017-09-22T15:12:34Z

pandas/tests/reshape/test_reshape.py

@@ -991,3 +991,54 @@ def test_non_unique_idvars(self):
        })
        with pytest.raises(ValueError):
            wide_to_long(df, ['A_A', 'B_B'], i='x', j='colname')
+
+    def test_cast_j_int(self):


can you add an example where the cast fails? e.g. columns are ['A_1', 'A_foo']...

jreback · 2017-09-22T15:13:21Z

pandas/core/reshape/reshape.py


    def melt_stub(df, stub, i, j, value_vars, sep):
        newdf = melt(df, id_vars=i, value_vars=value_vars,
                     value_name=stub.rstrip(sep), var_name=j)
        newdf[j] = Categorical(newdf[j])
        newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "")
+        newdf[j] = newdf[j].astype('int', errors='ignore')


cast to int64 (int is platform specific)

jreback

add a whatsnew note (in api breaking)

jreback · 2017-09-22T15:14:34Z

pandas/core/reshape/reshape.py

@@ -852,7 +852,7 @@ def lreshape(data, groups, dropna=True, label=None):


 def wide_to_long(df, stubnames, i, j, sep="", suffix='\d+'):
-    r"""
+    """
    Wide panel to long format. Less flexible but more user-friendly than melt.



need a node that the casting will occur (with a versionadded tag)

I added a one line explanation, followed by the versionadded tag. Not sure if thats correct

jreback · 2017-09-22T15:15:13Z

pandas/tests/reshape/test_reshape.py

@@ -991,3 +991,54 @@ def test_non_unique_idvars(self):
        })
        with pytest.raises(ValueError):
            wide_to_long(df, ['A_A', 'B_B'], i='x', j='colname')
+
+    def test_cast_j_int(self):
+        df = pd.DataFrame({


add the issue as a comment

jreback · 2017-09-22T15:15:51Z

pandas/tests/reshape/test_reshape.py

+                      'Pirates of the Caribbean',
+                      'Spectre',
+                      'Avatar',
+                      'Pirates of the Caribbean',


can you use

result = expected = tm.assert_frame_equal(result, expected)

codecov · 2017-09-22T17:20:47Z

Codecov Report

❗ No coverage uploaded for pull request base (master@1355df6). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #17628   +/-   ##
=========================================
  Coverage          ?   91.58%           
=========================================
  Files             ?      153           
  Lines             ?    51275           
  Branches          ?        0           
=========================================
  Hits              ?    46961           
  Misses            ?     4314           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.45% <100%> (?)`
#single	`40.68% <14.28%> (?)`

Impacted Files	Coverage Δ
pandas/core/reshape/melt.py	`97.24% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1355df6...b9d3e62. Read the comment docs.

jreback · 2017-09-22T20:18:53Z

pandas/core/reshape/reshape.py


    def melt_stub(df, stub, i, j, value_vars, sep):
        newdf = melt(df, id_vars=i, value_vars=value_vars,
                     value_name=stub.rstrip(sep), var_name=j)
        newdf[j] = Categorical(newdf[j])
        newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "")

+        # GH17627 Cast numerics suffixes to int/float
+        newdf[j] = newdf[j].astype('int64', errors='ignore')


just use to_numeric(..., errors='ignore') and you will get what you need here. This is a soft conversion.

Thanks. Changed.

pep8speaks · 2017-09-22T21:33:36Z

Hello @tdpetrou! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 10, 2017 at 16:20 Hours UTC

jreback · 2017-09-22T21:35:46Z

doc/source/whatsnew/v0.21.0.txt

@@ -434,6 +434,7 @@ Other API Changes
 - :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`).
 - :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`)
 - Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`)
+- :func:`wide_to_long` previously kept interger-only suffixes as ``object`` dtype. Now they are casted to ``int64`` if possible (:issue:`17627`)


s/interger/integer/. Now they are casted to a numeric dtype if possible.

jreback · 2017-09-22T21:36:13Z

pandas/core/reshape/reshape.py

@@ -1347,4 +1353,4 @@ def make_axis_dummies(frame, axis='minor', transform=None):
    values = np.eye(len(items), dtype=float)
    values = values.take(labels, axis=0)

-    return DataFrame(values, columns=items, index=frame.index)
+    return DataFrame(values, columns=items, index=frame.index)


need a return at the end

jreback · 2017-09-22T21:36:19Z

pandas/core/reshape/reshape.py

@@ -9,11 +9,12 @@

 from pandas.core.dtypes.common import (
    _ensure_platform_int,
-    is_list_like, is_bool_dtype,
+    is_list_like, is_bool_dtype, is_object_dtype,


jreback · 2017-09-22T21:37:11Z

pandas/tests/reshape/test_reshape.py

+                              i='A', j='colname', suffix='.+', sep='_')
+        tm.assert_frame_equal(result, expected)
+
+    def test_float_suffix(self):


try to use parametrize here to avoid adding lots of code

you can also use fixtures if appropriate

Ok, not sure where I can use it here. All DataFrames are different.

jreback · 2017-09-23T14:16:38Z

can you rebase.

@TomAugspurger @jorisvandenbossche comments.

tdpetrou · 2017-09-23T22:35:00Z

edited to add: I did the following and then squashed all the commits into 1 and change a word in the commit message

First time rebasing. I did...

git fetch upstream
git rebase upstream/master

Resolved conflicts then did

git add <files>
git rebase --continue

Gave some message about forgetting to do git add which i just did. Then i did

git rebase --skip
git push origin wide-to-long-int -f

Which did something but I don't know if this is what you wanted.

jreback · 2017-09-24T12:37:00Z

it looks rebased

TomAugspurger

I'm +0 on this. I think it's the preferable outcome, but I'm unsure about breaking API. Overall, it's probably OK to skip a keyword / deprecation cycle.

Perhaps make the API breakage slightly more obvious in the release notes by creating a mini-section with code examples detailing the old and new behavior.

jreback · 2017-09-28T14:16:00Z

maybe add downcast=True (as the default), consistent with other functions that we have.

tdpetrou · 2017-09-30T14:34:15Z

@jreback Unless there is a different to_numeric, there is no option for downcast=True. And why would you want to downcast it to something other than the default 64 bit types?

jreback · 2017-11-12T19:00:05Z

needs a rebase. this is orthogonal to #17677 ?

tdpetrou · 2017-11-13T17:48:40Z

Yes, this is mutually exclusive. I think pd.wide_to_long can be rewritten too. Its quite slow.

jreback · 2017-11-17T01:17:36Z

ok, rebase and move whatsnew to 0.22

jreback

tiny comments. ping on green.

jreback · 2017-11-19T16:27:30Z

doc/source/whatsnew/v0.21.0.txt

@@ -1172,4 +1172,3 @@ Other
 ^^^^^
 - Bug where some inplace operators were not being wrapped and produced a copy when invoked (:issue:`12962`)
 - Bug in :func:`eval` where the ``inplace`` parameter was being incorrectly handled (:issue:`16732`)


pls revert this file

jreback · 2017-11-19T16:28:08Z

pandas/core/reshape/melt.py

-        regex = "^{stub}{sep}{suffix}".format(
-            stub=re.escape(stub), sep=re.escape(sep), suffix=suffix)
-        return df.filter(regex=regex).columns.tolist()
+        regex = '^{0}{1}{2}$'.format(re.escape(stub), re.escape(sep), suffix)


slightly prefer the names in formatting

tdpetrou · 2017-11-22T14:56:27Z

@jreback I attempted to revert whatsnew 0.21. Not sure if I did it correctly. I changed the names in the format as well.

jreback · 2017-11-23T16:17:36Z

pandas/core/reshape/melt.py

@@ -197,6 +198,10 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix=r'\d+'):

        .. versionadded:: 0.20.0

+        When all suffixes are numeric, they are cast to int64/float64.
+
+        .. versionadded:: 0.21.0


change to 0.22

jreback · 2017-11-23T16:18:02Z

pandas/core/reshape/melt.py

@@ -335,22 +340,25 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix=r'\d+'):
    -----
    All extra variables are left untouched. This simply uses
    `pandas.melt` under the hood, but is hard-coded to "do the right thing"
-    in a typicaly case.
+    in a typical case.
    """
    def get_var_names(df, stub, sep, suffix):
        regex = "^{stub}{sep}{suffix}".format(


use re.compile here

jreback · 2017-11-23T16:18:55Z

pandas/tests/reshape/test_reshape.py

@@ -764,12 +764,12 @@ def test_simple(self):
        exp_data = {"X": x.tolist() + x.tolist(),
                    "A": ['a', 'b', 'c', 'd', 'e', 'f'],


these need to move to test_melt.py as well.

jreback · 2017-11-23T16:22:46Z

this should be after #18428, you can rebase on top (of that one after you have pushed)

tdpetrou · 2017-11-25T14:12:01Z

@jreback changes made. Should I add examples to whatsnew?

jreback · 2017-11-25T14:33:32Z

pandas/core/reshape/melt.py

@@ -199,6 +200,10 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix=r'\d+'):

        .. versionadded:: 0.20.0



I would add an example here (in the doc-string); I think all the other examples are resulting as strings anyhow (could also modify an example to avoid making this longer). you can check docs in reshape.rst to see if anything needs updating.

tdpetrou · 2017-12-06T02:39:19Z

@jreback Since all the examples had integers as suffixes, I added one with strings.

jreback

small changes & you have a lint issue. ping on green.

jreback · 2017-12-06T11:23:00Z

doc/source/whatsnew/v0.22.0.txt

@@ -127,6 +127,7 @@ Other API Changes
 - :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
 - The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)
 - Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
+- :func:`wide_to_long` previously kept suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`)


same numeric-like suffixes

jreback · 2017-12-06T11:23:53Z

pandas/core/reshape/melt.py

    """
    def get_var_names(df, stub, sep, suffix):
-        regex = "^{stub}{sep}{suffix}".format(
+        regex = '^{stub}{sep}{suffix}$'.format(


can you use r here

tdpetrou · 2017-12-06T19:55:54Z

@jreback

jreback · 2017-12-08T11:29:26Z

can you rebase

tdpetrou · 2017-12-08T17:46:07Z

@jreback Got lost in rebase hell. Will self-flagellate 10 times if I did this wrong.

jorisvandenbossche

Got lost in rebase hell.

Can you do it once more? (will try to merge as soon as possible to prevent new conflicts)

BTW, if you have multiple commits, I personally find merging master into the branch easier

jorisvandenbossche · 2017-12-10T14:13:16Z

pandas/core/reshape/melt.py

@@ -199,6 +200,10 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix=r'\d+'):

        .. versionadded:: 0.20.0

+        When all suffixes are numeric, they are cast to int64/float64.
+
+        .. versionadded:: 0.22.0


Can you use versionchanged instead of versionadded and format it this like:

.. versionchanged:: 0.22.0 When all suffixes are numeric, they are cast to int64/float64.

(see http://www.sphinx-doc.org/en/stable/markup/para.html#directive-versionadded)

Sure, I just pushed this change. It looks a little weird now though because there is a versionadded directly above it.

... suffix : str, default '\\d+' A regular expression capturing the wanted suffixes. '\\d+' captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class '\\D+'. You can also further disambiguate suffixes, for example, if your wide variables are of the form Aone, Btwo,.., and you have an unrelated column Arating, you can ignore the last one by specifying `suffix='(!?one|two)'` .. versionadded:: 0.20.0 .. versionchanged:: 0.22.0 When all suffixes are numeric, they are cast to int64/float64.

edit: oops didn't rebase... will do so now.

tdpetrou · 2017-12-10T21:31:17Z

@jorisvandenbossche

jorisvandenbossche · 2017-12-10T21:36:25Z

Thanks!

jreback reviewed Sep 22, 2017

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 22, 2017

jreback requested changes Sep 22, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Sep 23, 2017

TomAugspurger reviewed Sep 25, 2017

View reviewed changes

jreback removed this from the 0.21.0 milestone Sep 28, 2017

tdpetrou mentioned this pull request Oct 30, 2017

Melt enhance #17677

Closed

2 tasks

jreback requested changes Nov 19, 2017

View reviewed changes

jreback requested changes Nov 23, 2017

View reviewed changes

jreback requested changes Nov 25, 2017

View reviewed changes

jreback requested changes Dec 6, 2017

View reviewed changes

jorisvandenbossche added this to the 0.22.0 milestone Dec 10, 2017

jorisvandenbossche reviewed Dec 10, 2017

View reviewed changes

tdpetrou added 4 commits December 10, 2017 11:18

BUG: coerce pd.wide_to_long suffixes to numeric

3d18d42

reverting whatsnew 0.21.0.txt attempt2

54c69a7

small changes

e1c2204

used versionchanged in docstring

b9d3e62

jorisvandenbossche merged commit a259b64 into pandas-dev:master Dec 10, 2017

tdpetrou deleted the wide-to-long-int branch December 10, 2017 21:51

jreback added a commit to jreback/pandas that referenced this pull request Dec 11, 2017

STYLE: linting issue, xref pandas-dev#17628

b4f866a

jreback added a commit that referenced this pull request Dec 11, 2017

STYLE: linting issue, xref #17628 (#18722)

9a99df4

		@@ -764,12 +764,12 @@ def test_simple(self):
		exp_data = {"X": x.tolist() + x.tolist(),
		"A": ['a', 'b', 'c', 'd', 'e', 'f'],

		@@ -199,6 +200,10 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix=r'\d+'):

		.. versionadded:: 0.20.0

Uh oh!

BUG: coerce pd.wide_to_long suffixes to ints #17628

BUG: coerce pd.wide_to_long suffixes to ints #17628

Uh oh!

Conversation

tdpetrou commented Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on December 10, 2017 at 16:20 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 23, 2017

Uh oh!

tdpetrou commented Sep 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Sep 24, 2017

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 28, 2017

Uh oh!

tdpetrou commented Sep 30, 2017

Uh oh!

jreback commented Nov 12, 2017

Uh oh!

tdpetrou commented Nov 13, 2017

Uh oh!

jreback commented Nov 17, 2017

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdpetrou commented Nov 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdpetrou commented Sep 22, 2017 •

edited

Loading

codecov bot commented Sep 22, 2017 •

edited

Loading

pep8speaks commented Sep 22, 2017 •

edited

Loading

tdpetrou commented Sep 23, 2017 •

edited

Loading

tdpetrou Dec 10, 2017 •

edited

Loading