fill() performance #57

belm0 · 2019-04-15T07:40:39Z

For the use case of collecting execution time samples during a program run (and ultimately reporting quantiles), I'd like fill() to be fairly fast.

physt fill() execution time seem to be independent of binning strategy (trivially using constant data value in my tests). I was surprised that bin search is implemented via np.searchsorted() in all cases, even fixed_width binning.

$ python -m timeit -s 'from physt import h1; h = h1(None, "exponential", 100, range=(1e-6, 1))' 'h.fill(.1)'
10000 loops, best of 5: 38 usec per loop

$ python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 36.9 usec per loop

Comparing to (unmaintained) https://github.com/carsonfarmer/streamhist:

python -m timeit -s 'from streamhist import StreamHist; h = StreamHist()' 'h.update(.1)'
50000 loops, best of 5: 7.52 usec per loop

(aside: streamhist is quite nice about managing binning and being able to report arbitrary quantiles. Perhaps some of it could be adopted.)

The text was updated successfully, but these errors were encountered:

janpipek · 2019-04-15T10:32:57Z

This can be solved by moving find_bin from Histogram1D to BinningBase (and including more efficient variants in daughter classes).

belm0 · 2019-04-16T00:55:05Z

I don't believe that StreamHist uses fixed-width bins, yet is still able to have 5x faster update in pure Python. The README credits https://github.com/grantjenks/sorted_containers, if I understand correctly.

janpipek · 2019-04-16T13:21:32Z

I probably won't be able to make a significant refactoring soon but... in any case, I'd recommend you to use the "fill_n" method if you can.

In [26]: data = np.random.randn(100000)                                                                                                                                   

In [27]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [28]: %time for d in data: HA.fill(d)                                                                                                                                  
CPU times: user 3.86 s, sys: 56.4 ms, total: 3.92 s
Wall time: 3.84 s

In [29]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [30]: %time HA.fill_n(data)                                                                                                                                            
CPU times: user 16.2 ms, sys: 4.01 ms, total: 20.2 ms
Wall time: 19.2 ms

Or, more realistically (simulating that the data come from somewhere one by one):

In [48]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [49]: %time l = []; [l.append(i) for i in data]; HA.fill_n(l)                                                                                                          
CPU times: user 36.9 ms, sys: 4.04 ms, total: 40.9 ms
Wall time: 40.3 ms

belm0 · 2019-04-16T14:16:10Z

My use case is real time, and spikes from fill_n() batches would be unwanted. Also fill_n() is very slow for small arrays (probably because numpy is).

python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(.0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 35.3 usec per loop

python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5)); d=[.1]*10' 'h.fill_n(d)'
500 loops, best of 5: 740 usec per loop

janpipek · 2019-04-17T08:43:39Z

Ok, I'll try to optimize the single-value fill soon-ish.

belm0 · 2019-04-25T13:16:28Z

Here is a more fair timing of streamhist. Since my previous test filled with a constant value, the compute-intensive merging of bins was never triggered.

$ python -m timeit -s 'from random import random; from streamhist import StreamHist; h = StreamHist();' 'h.update(random())'
10000 loops, best of 5: 35.4 usec per loop

That result is just with the the default max bin count of 64. It gets worse quickly as max bins is increased. (Note: overhead of random() is negligible, about 50 ns.)

However, physt is not off the hook so easily... I have an implementation working at 12 usec for the same max bin count. More at #58 (comment).

janpipek self-assigned this Apr 15, 2019

janpipek added the performance label Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fill() performance #57

fill() performance #57

belm0 commented Apr 15, 2019

janpipek commented Apr 15, 2019

belm0 commented Apr 16, 2019

janpipek commented Apr 16, 2019 •

edited

Loading

belm0 commented Apr 16, 2019

janpipek commented Apr 17, 2019

belm0 commented Apr 25, 2019 •

edited

Loading

fill() performance #57

fill() performance #57

Comments

belm0 commented Apr 15, 2019

janpipek commented Apr 15, 2019

belm0 commented Apr 16, 2019

janpipek commented Apr 16, 2019 • edited Loading

belm0 commented Apr 16, 2019

janpipek commented Apr 17, 2019

belm0 commented Apr 25, 2019 • edited Loading

janpipek commented Apr 16, 2019 •

edited

Loading

belm0 commented Apr 25, 2019 •

edited

Loading