Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_almost_equal / equals should allow access to np.allclose #9457

Closed
rockg opened this issue Feb 10, 2015 · 7 comments · Fixed by #30562
Closed

assert_almost_equal / equals should allow access to np.allclose #9457

rockg opened this issue Feb 10, 2015 · 7 comments · Fixed by #30562
Labels
API Design Enhancement Testing pandas testing functions or related to the test suite
Milestone

Comments

@rockg
Copy link
Contributor

rockg commented Feb 10, 2015

I am testing the equivalence of two large DataFrames (9084x367). The two are the same up to 1x10-13 but when np.array_equal fails there is a much slower code path (comparing these two frames takes upwards of 20 seconds). If I'm not mistaken, if the arrays aren't equivalent it does a more complicated version of np.allclose. I think a good intermediate step would be to check for array equivalence and then as a the second step call np.allclose--or maybe just do this on the outset. If that fails, which it will if there are any NaNs or if the tolerance is not met, then it will use the current logic. Or we could use np.isclose to consider NaNs as equivalent.

https://github.com/pydata/pandas/blob/master/pandas/src/testing.pyx#L85

@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

this is not possible
the entire point is to do a simple element comparison for testing purposes to show where the elements are not equal considering any dtype

this is for testing only and should used at your own risk in production code

the Numpy routines do not consider nan positions correctly in any event

you can use np.allclosr if you wish on the numeric subset and it will work

@rockg
Copy link
Contributor Author

rockg commented Feb 10, 2015

I understand the other datatypes, but I'm thinking we can change array_equal with all(isclose(a, b)) and that would make a faster comparison of numpy types and would not have an issue with NaNs (this is essentially what is done anyways, numerics are compared at 5x10-6). This should create no difference to existing tests. I feel like the pandas testing api should provide a reliable way to test the equivalence of two dataframes. I am using it for my own unit tests and came across this being quite slow.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

ok that could be reasonable
further I think passing thru keywords to .equals() would allow something similar
and is an exposed equivalents method (uses array_equivalent) under the hood

@rockg
Copy link
Contributor Author

rockg commented Feb 10, 2015

I agree that more control over .equals would be nice.

@jreback jreback changed the title assert_almost_equal should use numpy allclose for numpy arrays assert_almost_equal / equals should allow access to np.allclose Feb 10, 2015
@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

so signature could be:

def equals(self, other, strict=False, close=False, **kwargs)

strict would map to strict_nan in com.array_equivalent (currently its an internal parameter)
and close=False would trigger a call to np.allclose rather than np.array_equal, and pass
thru **kwarsgs (e.g. if one wanted to modify rtol/atol from the defaults)

@jreback jreback added this to the 0.16.0 milestone Feb 10, 2015
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 5, 2015
@ian-contiamo
Copy link

It seems like this change has not been made as of pandas 0.20.3:
https://github.com/pandas-dev/pandas/blob/v0.20.3/pandas/core/generic.py#L865-L872

Did I miss something? Just as the OP, I think it would be nice to have access to numpy.isclose directly...

@jreback jreback added Difficulty Intermediate Testing pandas testing functions or related to the test suite labels Jul 10, 2017
@jreback
Copy link
Contributor

jreback commented Jul 10, 2017

@ian-contiamo the 'Open' issue indicator is a good one!

assert_*_equal already have the check_less_precise kwarg so not sure this is necessary for testing functions. Would not be averse to adding a isclose=false kwarg to .equals(). This API needs to be fleshed out a bit.

@jreback jreback modified the milestones: Contributions Welcome, 1.0.0 Jan 20, 2020
@TomAugspurger TomAugspurger modified the milestones: 1.0.0, 1.1 Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants