Skip to content

fixed bug in DataFrame.diff - issue #10907 #10930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

fixed bug in DataFrame.diff - issue #10907 #10930

wants to merge 2 commits into from

Conversation

terrytangyuan
Copy link
Contributor

Fixed issue #10907

@terrytangyuan
Copy link
Contributor Author

This is my first PR here. Could someone help me pass this Travis test? After I made the changes in the code, what should I do?

@jreback
Copy link
Contributor

jreback commented Aug 29, 2015

try running the tests locally first

eg

nosetests pandas/test_frame.py

@jreback
Copy link
Contributor

jreback commented Aug 29, 2015

the consolidate has to go in core/internals.py (where diff is defined)

@@ -10771,6 +10771,11 @@ def test_diff(self):
assert_series_equal(the_diff['A'],
tf['A'] - tf['A'].shift(1))

df = pd.DataFrame({'y': pd.Series([2]), 'z': pd.Series([3])})
df.insert(0, 'x', 1)
self.assertEqual(df.diff(axis=1)['x']==np.nan, False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct an expected frame the use
assert_frame_equal

@terrytangyuan
Copy link
Contributor Author

When I run tests by calling nosetests pandas/tests/test_frame.py, the following error occurred: ImportError: No module named test_frame.

Is there anything else I should do before test it?

@jreback
Copy link
Contributor

jreback commented Aug 29, 2015

@@ -2513,7 +2513,7 @@ def putmask(self, **kwargs):
return self.apply('putmask', **kwargs)

def diff(self, **kwargs):
return self.apply('diff', **kwargs)
return self.consolidate().apply('diff', **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this

add to .apply an argument consolidate=True, and:

if consolidate:
    self._consolidate_inplace()

This should fix this issue and not break anything else.

@jreback jreback added Bug Internals Related to non-user accessible pandas implementation labels Aug 30, 2015
@jreback jreback added this to the 0.17.0 milestone Aug 30, 2015
@@ -10771,6 +10771,13 @@ def test_diff(self):
assert_series_equal(the_diff['A'],
tf['A'] - tf['A'].shift(1))

df = pd.DataFrame({'y': pd.Series([2]), 'z': pd.Series([3])})
df.insert(0, 'x', 1)
the_diff = df.consolidate().diff(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do

result = df.diff(axis=1)
expected = DataFrame(....)
assert_frame_equal(result, expected)

@terrytangyuan
Copy link
Contributor Author

Thanks so much for your help! I have made the changes. Now I understand the code structure better. Hopefully I'll be able to contribute more in the future.

@@ -10771,6 +10771,12 @@ def test_diff(self):
assert_series_equal(the_diff['A'],
tf['A'] - tf['A'].shift(1))

df = pd.DataFrame({'x': pd.Series([1]),'y': pd.Series([2]), 'z': pd.Series([3])})
result = df.diff(axis=1).astype(float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't modify result

the astype might be hiding something

@terrytangyuan
Copy link
Contributor Author

Done. Should be able to merge now since it passes tests locally. There are too many Travis build tasks pending now.

@jreback
Copy link
Contributor

jreback commented Aug 30, 2015

needs a whatsnew note
pls squash

@terrytangyuan
Copy link
Contributor Author

This should do it now, right?

@@ -10771,6 +10771,12 @@ def test_diff(self):
assert_series_equal(the_diff['A'],
tf['A'] - tf['A'].shift(1))

df = pd.DataFrame({'x': pd.Series([1]),'y': pd.Series([2]), 'z': pd.Series([3])})
result = df.diff(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here.

@jreback
Copy link
Contributor

jreback commented Aug 31, 2015

pls squash as well.

…t consolidated (+1 squashed commit)

Squashed commits:
[6fe71d3] moved changes to correct place and fixed test_diff (+1 squashed commit)
Squashed commits:
[2bf3c2b] moved change to where diff is defined and updated test (+1 squashed commit)
Squashed commits:
[6715d7f] added unit test to test this fix (+1 squashed commit)
Squashed commits:
[f06fa5e] fixed bug in DataFrame.diff
@terrytangyuan
Copy link
Contributor Author

Thanks. Done.

@jreback
Copy link
Contributor

jreback commented Aug 31, 2015

ok, pls squash to a single commit. ping when green.

@terrytangyuan
Copy link
Contributor Author

It has conflicts that I don't know how to solve if I squash...could we leave it as it is this time? Both commits are descriptive to me.

@jreback
Copy link
Contributor

jreback commented Aug 31, 2015

ping me when green. typically we use just a single commit on things like this. this is just the way pandas has always worked.

@terrytangyuan
Copy link
Contributor Author

@jreback could you merge?

@jreback
Copy link
Contributor

jreback commented Aug 31, 2015

will get to soon

@@ -10771,6 +10771,13 @@ def test_diff(self):
assert_series_equal(the_diff['A'],
tf['A'] - tf['A'].shift(1))

# issue 10907
df = pd.DataFrame({'x': pd.Series([1]),'y': pd.Series([2]), 'z': pd.Series([3])})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is doesn't actually test anything (and it works), but it doesn't replicate the original issue problem. IIRC you had that their, but it has changed.

@terrytangyuan
Copy link
Contributor Author

This is the same as the original. Now in the testing code, I've created the same DataFrame that's originally created via DataFrame.insert. The core problem rises from DataFrame.diff instead of DataFrame.insert. Please take a quick comparison between the outdated diff in this PR and the new push in this PR. Thanks.

@terrytangyuan
Copy link
Contributor Author

If you do a assertDataFrameEqual to the original test code and the current test code. You'll see those DataFrames are equivalent.

@jreback
Copy link
Contributor

jreback commented Aug 31, 2015

your test passes on 0.16.2
so you have not replicated it exactly

you haven't proven that you fixed the bug (you did) but your test doesn't work

@terrytangyuan
Copy link
Contributor Author

Ok didn't know that. I'll revise the test soon. Thanks for double checking.
On Aug 31, 2015 3:44 PM, "Jeff Reback" notifications@github.com wrote:

your test passes on 0.16.2
so you have not replicated it exactly

you haven't proven that you fixed the bug (you did) but your test doesn't
work


Reply to this email directly or view it on GitHub
#10930 (comment).

…+1 squashed commit)

Squashed commits:
[810cbda] DOC: Added a note in whatsnew and doc-string for fixing issue 10907 (+1 squashed commit)
Squashed commits:
[f9220a2] DOC: Added a note in whatsnew and doc-string for fixing issue 10907 (+1 squashed commit)
Squashed commits:
[0f1836f] added consolidate as param in doc-string
@terrytangyuan
Copy link
Contributor Author

@jreback Could you double check new push?

@jreback
Copy link
Contributor

jreback commented Sep 1, 2015

merged via 219ad75

thanks!

generally you want to create a branch each time you make a change (and not do it on master)

see here

@hhuuggoo
Copy link

@jreback I don't think this fix works if you have mixed types, since consolidate can't consolidate those blocks. This is much less common, but I could see someone with mixed ints and floats that wants to shift

@jreback
Copy link
Contributor

jreback commented Sep 10, 2015

@hhuuggoo yes, this is prob just wrong. it needs to create the indexer then do a take.

Can you create another issue with an example?

thxs

@hhuuggoo
Copy link

yup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants