Skip to content

ENH: NDFrame.equals #5183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Oct 11, 2013 · 10 comments · Fixed by #5283
Closed

ENH: NDFrame.equals #5183

jreback opened this issue Oct 11, 2013 · 10 comments · Fixed by #5283

Comments

@jreback
Copy link
Contributor

jreback commented Oct 11, 2013

seems we need a public version of tm.assert_frame_equal

maybe equals on generic NDFrame objects

http://stackoverflow.com/questions/19322506/pandas-nan-comparison

seems good on the values comparison (should compare axes/types as well of course)

((df1 == df2) | ((df != df) & (df2 != df2))).values.all()
@jtratner
Copy link
Contributor

Well, there's eq, lt, etc now on NDFrame, why not allow argument to eq
called something like fillna=True or comparena=True. so it'd just be
NDFrame.eq(fillna=True).all()

This would also be a good opportunity for a recursive=True kwarg to all
that lets you always get to a bool regardless of the NDFrame type you have.
(so all() for Series, all().all() for DataFrame and I think
all().all().all() for Panel)

seems we need a public version of tm.assert_frame_equal

maybe equals on generic NDFrame objects

http://stackoverflow.com/questions/19322506/pandas-nan-comparison


Reply to this email directly or view it on
GitHubhttps://github.com//issues/5183
.

@jreback
Copy link
Contributor Author

jreback commented Oct 11, 2013

eq and such are element wise operators

equals would signify an object comparator

@jtratner
Copy link
Contributor

okay, that's reasonable, so it's just what you have + df1.index.equals(df2.index)? I still hanker to add a recursive all :P (plus you then would not need to copy to values).

Also, would it make more sense to do:

df1 = df1.values
df2 = df2.values
((df1 == df2) | ((df1 != df1) & (df2 != df2))).all()

instead? seems like that could be simpler. Also, I'm assuming you'd want to check if their shapes are compatible or even just the same (so that you can always safely use equals), so maybe: check shape (len(columns), len(index)), then check dtypes, etc. And then would you want to have a check_metadata argument too? (i.e., column names, index names, etc)

@jreback
Copy link
Contributor Author

jreback commented Oct 11, 2013

yep in fact these might be a nice replacement for assert_frame_equal (and it's friends)
and should be much faster too

@jtratner
Copy link
Contributor

what about non-comparable things?

@jreback
Copy link
Contributor Author

jreback commented Oct 12, 2013

I think they should return False
eg series.equals(frame)

@jtratner
Copy link
Contributor

and gotta check for matching shape first b/c of broadcasting

@ghost
Copy link

ghost commented Oct 12, 2013

I'm a little skeptical on the speed benefit for tests, when I profiled things 6 months ago
I came up with <10% improvement potential. Small potatoes compared
to all the network tests that have found their way into the "fast" test suite.

I may have been wrong though.

@jreback
Copy link
Contributor Author

jreback commented Oct 12, 2013

@y-p oh...that's not really the reason for this (that's a side benefit)...

its the issue of comparing two objects that have nan's in them where if the nan locations match then it should be True (and not False)....so typical x == y fail on this

@ghost
Copy link

ghost commented Oct 12, 2013

Fair enough, just apropos #3150.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants