-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve benchmarks #1142
[WIP] Improve benchmarks #1142
Conversation
Should the next step be adding more benchmarks? |
|
It has been moved to solve a cyclic import due to the following dependencies:
I proposed to move |
It looks complicated but it's done to make the whole |
Codecov Report
@@ Coverage Diff @@
## master #1142 +/- ##
=======================================
Coverage 92.81% 92.82%
=======================================
Files 93 94 +1
Lines 9181 9193 +12
=======================================
+ Hits 8521 8533 +12
Misses 660 660
Continue to review full report at Codecov.
|
|
||
class Hist: | ||
params = (True, False) | ||
param_names = "Numba" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This have to be a single element tuple, the string is interpreted as an iterable and only the first letter is used: https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2274&view=logs&j=273a7ac9-cc40-5d2f-3776-d61d22a0ef9c&t=8fe29edb-6726-5e00-142e-7c337980bc77&l=60
See Atleast_Nd
for an example
histogram(self.data, bins=100) | ||
|
||
|
||
class Variance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variance benchmark is failing, could you look into it? https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2274&view=logs&j=273a7ac9-cc40-5d2f-3776-d61d22a0ef9c&t=8fe29edb-6726-5e00-142e-7c337980bc77&l=99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to debug this so I ran the benchmarks with --show-stderr
flag but it only showed
asv: benchmark timed out (timeout 60.0s)
And when I ran the benchmarks with both --quick --show-stderr
flags, it worked but the function took 31s to run with Numba=False
and when I used the function in a notebook with the same data and numba disabled it ran in milliseconds. I think it's some issue with the benchmarks configuration, I'll continue looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, the data was too large so the function was actually really slow without Numba speed-up (taking ~30s and not milliseconds after I rechecked) and it caused the benchmark to fail :(
I'll reduce the data.
What about having a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I proposed some minor changes to try and make benchmarks more informative.
else: | ||
az.Numba.disable_numba() | ||
|
||
def time_variance(self, numba_flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably name this one time_variance_2d
_fast_kde(self.x) | ||
|
||
class Fast_KDE_2d: | ||
params = [(True, False), (10**5, 10**6)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have realized that with these shapes, there is no speedup when using numba, I tried to look if the same issue as with fast_kde_1d happened where the speedup was lost in a future PR, and it looks like the speed-up is still there, but is only seen for larger sizes.
Could we try something like this and see how do dimensions affect performance?
class Fast_KDE_2d:
params = [
(True, False),
((100, 10**4), (10**4, 100), (1000, 1000))
]
param_names = ("Numba", "shape")
def setup(self, numba_flag, shape):
self.x = np.random.randn(*shape)
self.y = np.random.randn(n//10, 10)
if numba_flag:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, I'll implement this.
Also, in #1088 you mentioned that more functions need to be covered in benchmarks and later on memory benchmarks and line profiling also to be added, so should I start adding all this now in this PR or sometime later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to wait for another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have realized that with these shapes, there is no speedup when using numba, I tried to look if the same issue as with fast_kde_1d happened where the speedup was lost in a future PR, and it looks like the speed-up is still there, but is only seen for larger sizes.
Could we try something like this and see how do dimensions affect performance?
class Fast_KDE_2d: params = [ (True, False), ((100, 10**4), (10**4, 100), (1000, 1000)) ] param_names = ("Numba", "shape") def setup(self, numba_flag, shape): self.x = np.random.randn(*shape) self.y = np.random.randn(n//10, 10) if numba_flag: ...
It gives the following error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1000000 and the array at index 1 has size 100
Do x
and y
have to be of the same size always?
Also, I did self.y = np.random.randn(shape[0]//10, 10)
instead of self.y = np.random.randn(n//10, 10)
. This is what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to update y, it should be created like x, they must be the same shape. Also, there shoud be no n instances anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I had removed n
. The only problem was the different sizes of x
and y
. You'll see after the tests finish that there's still no speed-up :(
I was thinking that benchmarks should also be formatted with black. I think 3 lines have to be changed
|
scripts/lint.sh
Outdated
python -m black -l 100 --check ${SRC_DIR}/arviz/ ${SRC_DIR}/examples/ | ||
======= | ||
python -m black -l 100 --check ${SRC_DIR}/arviz/ ${SRC_DIR}/examples/ ${SRC_DIR}/asv_benchmarks/ | ||
>>>>>>> update fast_kde_2d benchmark and changelog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there has been some merge issue
Description
Fixes #782 and builds on top of the work done in #1088.
I've moved the following functions to resolve cyclic imports that occured when
histogram
was imported for_fast_kde
:_fast_kde
,_fast_kde_2d
,get_bins
and_sturges_formula
--> numeric_utilsget_coords
--> utilsI've also modified the benchmarks and will cover more functions soon.
cc @OriolAbril, @ahartikainen
Checklist