-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST/PERF: Re-write assert_almost_equal() in cython #4398 #5219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/PERF: Re-write assert_almost_equal() in cython #4398 #5219
Conversation
you might be able to use the null machinery (for scalars) thats already in cython, e.g. util._checknull.....(good except for dates and such) |
As noted in the commit message, this brings a measurable but only modest gain to the "not slow" test suite, 10-15% on my machine. It also beefs up the test for assert_almost_equal(), which did unearth a couple of edge cases which I fixed. While the entire test suite doesn't improve that much, this does significantly speed up assert_almost_equal(). The main performance gain comes from replacing np.testing.assert_almost_equal(float_value1, float_value2, decimal=decimal) into the simplified very fast C cython function. As noted in the commit message, a test like test_expressions.py which heavily uses this function runs over 3x faster with the cython version. There are other ways to squeak out more performance (moving isnull(obj) into cython for example), but given that assert_almost_equal() isnt' that much of a bottleneck I err-ed on the side of maintaining readability instead of converting everything into cython. |
Just tried naively replacing |
I'm glad you're starting on this (and that you've fixed weird cases in assert_almost_equal) - I have a few points I'd like to consider for this:
Finally, could we change all the uses of np's assert_equal in the test suite to use this instead? |
My plan was to fix this in a separate change, but I can fix it here. My main reason for keeping it simple was for the sake of the reviewer and my general development philosophy (described below).
This may come down to a development philosophy issue (where I am happy to align my style to the project), but my preferred way of development is making changes as small as possible, then getting them merged as quickly as possible and then continuing to iterate in that way to add complexity. So for an example like this, instead of going from "python function --> fixed and updated cython function", I prefer "python function --> cython function --> fixed and updated cython function", with a commit and merge at each step of the way. I find this is easier for other people's readability of the changes and makes it easier to locate where bugs were added (IE did I add the bug when going from python --> cython or did I add the bug trying to update the function?). However, the overhead of reviewing the changes falls with you guys, so in that way I am happy to develop in whatever way you prefer. |
@danbirken you're right, I'm okay with merging sooner. I've been on the other side and it is a heck of a pain to maintain (though I've also been responsible for some huge PRs recently). That said, we need to fix the type checks and effective failure messages before this can be merged. How about you have one commit that moves to Cython, then add a second commit that fixes type checks and other elements like that (so we can see changes you've made on top of the Cython code)? Also, there have been a number of bugs in the error messages in this module, mostly stemming from the fact that the argument names are |
I think these commits should handle all the above (other than the a/b naming thing -- which is pretty straightforward to fix afterwards). |
ok with with merging this... @wesm ? |
@danbirken can you add a 1-liner in release notes as well...thanks |
Added release notes into first commit. Also I spent a few hours playing with test performance yesterday and I didn't really find any other places where it seems you could get a huge performance gain just by refactoring stuff (other than this change). I did find a couple areas where you could shave off a few percent in the test speed (mostly by making test cases smaller and other assorted minor performance tweaks), but I didn't see any other low hanging fruit. It seems the tests take a while to run because there are a lot of them and because matplotlib is slow, neither of which is really solvable. |
ok...perfect... |
It's somewhat important to have longer frames in some places (especially
|
This looks good to me - any comments or critiques? |
ok by me |
Can you rebase this on top of master one more time to make sure everything passes? then I'll look to merge it tonight. |
1 similar comment
Can you rebase this on top of master one more time to make sure everything passes? then I'll look to merge it tonight. |
Add a testing.pyx cython file, and port assert_almost_equal() from python to cython. On my machine this brings a modest gain to the suite of "not slow" tests (160s -> 140s), but on assert_almost_equal() heavy tests, like test_expressions.py, it shows a large improvement (14s -> 4s).
Many of the edge cases were related to ordering of the items, but in some cases there were also issues with type checking. This fixes both of those issues and massively expands the testing for this function.
Should be good. Luckily no rebase issues. |
can you take a look at #5283 believe you are just going to call these routines, yes? |
The "np.array_equal" part can be replaced by this function, yes. Might provide some speed up in the average case. |
Maybe we merge this first, then consider whether we reuse the code for equals/array_equivalent/whatever (and then move the code, obviously) |
Works for me. This change is pretty independent from that one, in that the two main things assert_almost_equal does are provide useful error messages about why 2 things are not equal and compare floats while allowing some small margin of difference. If there is a function which is better at determining if two arrays are exactly equal quickly (like np.array_equal or array_equivalent), that is a useful optimization for this function, but it doesn't actually replace any of the functionality here. |
ok....just going to quick build on windows and then merge |
TST/PERF: Re-write assert_almost_equal() in cython #4398
thank you sir! |
I was confused - nvm |
closes #4398
Add a testing.pyx cython file, and port assert_almost_equal() from
python to cython. This also fixes a few minor bugs that were in the
python version of assert_almost_equal() and adds more test cases to
test_tests.py
On my machine this brings a modest gain to the suite of "not slow" tests
(160s -> 140s), but on assert_almost_equal() heavy tests, like
test_expressions.py, it shows a large improvement (14s -> 4s).