-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.1 Introduction]: why add_python
is faster than add_numpy
for vectorization add
#74
Comments
add_python
is faster than add_numpy
for vectorization add
in my testadd_python
is faster than add_numpy
for vectorization add
Interesting. I re-tested it using Python 3.7 and I got:
|
The same thing for me. Using standard python arrays (python 3.7, Mac OS Mojave)
Using np.arrays instead, the timings change in an interesting way:
Interestingly the python call overhead starts to really show when doing such micro benchmarks. So to summarize:
I'd say that is about as expected, so maybe that is what should be compared in the example instead of trying to do both compute paths with native python lists first? |
I'd say the examples are just way too small to make the differences really visible. When upscaling the input a bit, I get this:
|
Nice. Could you make a PR for the book? |
Sure, but it will probably take me until the Christmas-time. |
(Also my english is shit, so you will have to improve that probably. Sorry) |
Mine is the same, not sure I can correct :) |
Hi @dwt, getting similar results. Can you explain why this is about as expected (due to recent python optimizations on arrays)? |
My thinking is that you have to think about a numpy operation in three parts. Switching from the python to the c layer, doing the actual computation and then switching back to python. Now the actual computation part is pretty much always faster than doing the same computation in python. BUT if the context switches take more time than you save by doing the computation faster, then the pure python solution can still be faster. This is why larger lists / arrays / vectors make the context switch to C more worth it, as the savings in the computation can dominate the costs of switching to the C layer. |
Thank you for the explanation! |
I've been playing around with this more today, and it seems that most of the time the python version is faster. My assumption is that addition is already fairly heavily optimized in python, leaving the time dominated by the numpy overhead. vec_length = 1_000_000
Z1, Z2 = random.sample(range(vec_length), vec_length), random.sample(range(vec_length), vec_length)
# %timeit add_python(Z1, Z2)
# 253 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# %timeit add_numpy(Z1, Z2)
# 501 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) I got similar results at different sizes. It might be worth swapping out this example for something more convoluted to make a point: def add_python(Z1, Z2):
return [((z1**2 + z2**2)**0.5) + ((z1 + z2)**3) for z1, z2 in zip(Z1, Z2)]
def add_numpy(Z1, Z2):
return np.sqrt(Z1**2 + Z2**2) + (Z1 + Z2)**3
vec_length = 1_000_000
Z1, Z2 = random.sample(range(vec_length), vec_length), random.sample(range(vec_length), vec_length)
Z1_np, Z2_np = np.array(Z1, dtype=np.float64), np.array(Z2, dtype=np.float64)
%timeit add_python(Z1, Z2)
# 665 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit add_numpy(Z1_np, Z2_np)
# 54.2 ms ± 2.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) |
I tried again with the simple add version and 1,000,000 elements, and I get: %timeit add_python(Z1, Z2)
54.6 ms ± 331 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit add_numpy(Z1_np, Z2_np)
645 µs ± 3.91 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) |
Interesting -- my example is running on python 3.11, windows 10, and numpy 1.24.3. Your results are not only much more apparent, but much faster overall. |
OSX, macbook M1, Python 3.11, Numpy 1.26.0 |
I found an opposite conclusion when running the example code in 4.1 Introduction, following code is my results tested in IPython 6.4.0 with Python 3.6.5 and Numpy 1.14.3:
The text was updated successfully, but these errors were encountered: