[WIP] Improve benchmarks #1142

nitishp25 · 2020-04-07T15:24:41Z

Description

Fixes #782 and builds on top of the work done in #1088.

I've moved the following functions to resolve cyclic imports that occured when histogram was imported for _fast_kde:

_fast_kde, _fast_kde_2d, get_bins and _sturges_formula --> numeric_utils
get_coords --> utils

I've also modified the benchmarks and will cover more functions soon.

cc @OriolAbril, @ahartikainen

Checklist

Follows official PR format
Includes new or updated tests to cover the new feature
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

nitishp25 · 2020-04-08T04:44:41Z

Should the next step be adding more benchmarks?

aloctavodia · 2020-04-08T12:16:26Z

get_bins and _sturges_formula are mainly helper functions to compute plots, not to compute statistics (inside stats). That's why they are inside plot_utils. Any reason to move them to stats_utils?

OriolAbril · 2020-04-08T13:33:02Z

It has been moved to solve a cyclic import due to the following dependencies:

_fast_kde should import _histogram from stats_utils, it currently does not because of this cyclic import which means that _fast_kde does not use numba anymore -> imports stats_utils
calculate_point_estimate needs _fast_kde -> imports fast_kde (assuming it is moved to its own file) -> imports stats_utils
hpd (multimodal only) needs get_bins -> imports plot_utils

I proposed to move _sturges_formula too (even though it is not necessary) to have it in the same file as get_bins

nitishp25 · 2020-04-08T14:12:57Z

It looks complicated but it's done to make the whole stats module independent of plots so that plots can import functions from stats module easily

codecov · 2020-04-08T14:50:27Z

Codecov Report

Merging #1142 into master will increase coverage by 0.00%.
The diff coverage is 96.47%.

@@           Coverage Diff           @@
##           master    #1142   +/-   ##
=======================================
  Coverage   92.81%   92.82%           
=======================================
  Files          93       94    +1     
  Lines        9181     9193   +12     
=======================================
+ Hits         8521     8533   +12     
  Misses        660      660

Impacted Files	Coverage Δ
arviz/numeric_utils.py	`95.18% <95.18%> (ø)`
arviz/utils.py	`91.02% <95.45%> (+1.02%)`	⬆️
arviz/plots/__init__.py	`100.00% <100.00%> (ø)`
arviz/plots/backends/bokeh/densityplot.py	`94.28% <100.00%> (ø)`
arviz/plots/backends/bokeh/distplot.py	`85.18% <100.00%> (ø)`
arviz/plots/backends/bokeh/forestplot.py	`92.88% <100.00%> (ø)`
arviz/plots/backends/bokeh/posteriorplot.py	`98.09% <100.00%> (+0.01%)`	⬆️
arviz/plots/backends/bokeh/ppcplot.py	`99.12% <100.00%> (ø)`
arviz/plots/backends/bokeh/violinplot.py	`94.64% <100.00%> (-0.10%)`	⬇️
arviz/plots/backends/matplotlib/densityplot.py	`96.42% <100.00%> (+0.06%)`	⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3352a8a...7e4e638. Read the comment docs.

OriolAbril · 2020-04-08T14:55:56Z

asv_benchmarks/benchmarks/benchmarks.py

+
+class Hist:
+    params = (True, False)
+    param_names = "Numba"


This have to be a single element tuple, the string is interpreted as an iterable and only the first letter is used: https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2274&view=logs&j=273a7ac9-cc40-5d2f-3776-d61d22a0ef9c&t=8fe29edb-6726-5e00-142e-7c337980bc77&l=60

See Atleast_Nd for an example

OriolAbril · 2020-04-08T14:57:28Z

asv_benchmarks/benchmarks/benchmarks.py

+        histogram(self.data, bins=100)
+
+
+class Variance:


variance benchmark is failing, could you look into it? https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2274&view=logs&j=273a7ac9-cc40-5d2f-3776-d61d22a0ef9c&t=8fe29edb-6726-5e00-142e-7c337980bc77&l=99

I was trying to debug this so I ran the benchmarks with --show-stderr flag but it only showed

asv: benchmark timed out (timeout 60.0s)

And when I ran the benchmarks with both --quick --show-stderr flags, it worked but the function took 31s to run with Numba=False and when I used the function in a notebook with the same data and numba disabled it ran in milliseconds. I think it's some issue with the benchmarks configuration, I'll continue looking.

My bad, the data was too large so the function was actually really slow without Numba speed-up (taking ~30s and not milliseconds after I rechecked) and it caused the benchmark to fail :(

I'll reduce the data.

aloctavodia · 2020-04-08T17:53:34Z

What about having a numerics.py (or some other name), and put there histogram, _fast_kde, _fast_kde_2d, get_bins and _sturges_formula?

OriolAbril

LGTM

I proposed some minor changes to try and make benchmarks more informative.

OriolAbril · 2020-04-11T14:20:18Z

asv_benchmarks/benchmarks/benchmarks.py

+        else:
+            az.Numba.disable_numba()
+
+    def time_variance(self, numba_flag):


we should probably name this one time_variance_2d

OriolAbril · 2020-04-11T14:40:09Z

asv_benchmarks/benchmarks/benchmarks.py

+        _fast_kde(self.x)
+
+class Fast_KDE_2d:
+    params = [(True, False), (10**5, 10**6)]


I have realized that with these shapes, there is no speedup when using numba, I tried to look if the same issue as with fast_kde_1d happened where the speedup was lost in a future PR, and it looks like the speed-up is still there, but is only seen for larger sizes.

Could we try something like this and see how do dimensions affect performance?

class Fast_KDE_2d: params = [ (True, False), ((100, 10**4), (10**4, 100), (1000, 1000)) ] param_names = ("Numba", "shape") def setup(self, numba_flag, shape): self.x = np.random.randn(*shape) self.y = np.random.randn(n//10, 10) if numba_flag: ...

Ooh, I'll implement this.

Also, in #1088 you mentioned that more functions need to be covered in benchmarks and later on memory benchmarks and line profiling also to be added, so should I start adding all this now in this PR or sometime later?

Maybe it's better to wait for another PR

I have realized that with these shapes, there is no speedup when using numba, I tried to look if the same issue as with fast_kde_1d happened where the speedup was lost in a future PR, and it looks like the speed-up is still there, but is only seen for larger sizes.

Could we try something like this and see how do dimensions affect performance?

class Fast_KDE_2d: params = [ (True, False), ((100, 10**4), (10**4, 100), (1000, 1000)) ] param_names = ("Numba", "shape") def setup(self, numba_flag, shape): self.x = np.random.randn(*shape) self.y = np.random.randn(n//10, 10) if numba_flag: ...

It gives the following error:

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1000000 and the array at index 1 has size 100

Do x and y have to be of the same size always?

Also, I did self.y = np.random.randn(shape[0]//10, 10) instead of self.y = np.random.randn(n//10, 10). This is what you meant?

I forgot to update y, it should be created like x, they must be the same shape. Also, there shoud be no n instances anymore

Yup, I had removed n. The only problem was the different sizes of x and y. You'll see after the tests finish that there's still no speed-up :(

OriolAbril · 2020-04-12T14:06:13Z

I was thinking that benchmarks should also be formatted with black. I think 3 lines have to be changed

black arviz/ examples/ asv_benchmarks/ in contributing guide (in PR checklist)
python -m black --check arviz examples asv_benchmarks
similarly in

arviz/scripts/lint.sh

Line 12 in 3352a8a

python -m black -l 100 --check ${SRC_DIR}/arviz/ ${SRC_DIR}/examples/

OriolAbril · 2020-04-12T15:41:34Z

scripts/lint.sh

 python -m black -l 100 --check ${SRC_DIR}/arviz/ ${SRC_DIR}/examples/
+=======
+python -m black -l 100 --check ${SRC_DIR}/arviz/ ${SRC_DIR}/examples/ ${SRC_DIR}/asv_benchmarks/
+>>>>>>> update fast_kde_2d benchmark and changelog


there has been some merge issue

OriolAbril mentioned this pull request Apr 7, 2020

[WIP] Benchmark main ArviZ code #1088

Closed

4 tasks

OriolAbril reviewed Apr 8, 2020

View reviewed changes

nitishp25 requested a review from OriolAbril April 10, 2020 10:12

OriolAbril reviewed Apr 11, 2020

View reviewed changes

OriolAbril and others added 18 commits April 12, 2020 20:53

start rewriting benchmarks

4081b5b

azure update

a32d31c

fix typo

4b4cefa

git use jitted histogram again in fast_kde

90bbe43

continue benchmark rewrite

d094b1a

move plot_kde and get_bins

446d936

add kde_utils

0f01cf9

modify benchmarks

9bd8fd9

minor changes

b38619a

minor modifications to benchmarks

9ea2f95

remove comma

40f844c

move get_coords to utils

5662efb

move _sturges_formula

08ca77b

fix pydocstyle

17de0ba

black changes

e9adf6c

fix benchmarks

bc77761

create numerical_utils

a1a0f27

update fast_kde_2d benchmark and changelog

e5414b5

nitishp25 force-pushed the benchmarks branch from 9319999 to e5414b5 Compare April 12, 2020 15:24

OriolAbril reviewed Apr 12, 2020

View reviewed changes

update lint.sh

7e4e638

ahartikainen approved these changes Apr 12, 2020

View reviewed changes

ahartikainen merged commit 8d11040 into arviz-devs:master Apr 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Improve benchmarks #1142

[WIP] Improve benchmarks #1142

nitishp25 commented Apr 7, 2020 •

edited

Loading

nitishp25 commented Apr 8, 2020 •

edited

Loading

aloctavodia commented Apr 8, 2020

OriolAbril commented Apr 8, 2020

nitishp25 commented Apr 8, 2020

codecov bot commented Apr 8, 2020 •

edited

Loading

OriolAbril Apr 8, 2020

OriolAbril Apr 8, 2020

nitishp25 Apr 8, 2020

nitishp25 Apr 9, 2020

aloctavodia commented Apr 8, 2020

OriolAbril left a comment

OriolAbril Apr 11, 2020

OriolAbril Apr 11, 2020

nitishp25 Apr 11, 2020

OriolAbril Apr 11, 2020

nitishp25 Apr 12, 2020

OriolAbril Apr 12, 2020

nitishp25 Apr 12, 2020

OriolAbril commented Apr 12, 2020

OriolAbril Apr 12, 2020

[WIP] Improve benchmarks #1142

[WIP] Improve benchmarks #1142

Conversation

nitishp25 commented Apr 7, 2020 • edited Loading

Description

Checklist

nitishp25 commented Apr 8, 2020 • edited Loading

aloctavodia commented Apr 8, 2020

OriolAbril commented Apr 8, 2020

nitishp25 commented Apr 8, 2020

codecov bot commented Apr 8, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aloctavodia commented Apr 8, 2020

OriolAbril left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OriolAbril commented Apr 12, 2020

Choose a reason for hiding this comment

nitishp25 commented Apr 7, 2020 •

edited

Loading

nitishp25 commented Apr 8, 2020 •

edited

Loading

codecov bot commented Apr 8, 2020 •

edited

Loading