-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
anatomy of an array: multidimensional array start and stop from view #108
Comments
Good question and I don't remember exactly :) I think the last -1 is to have the offset stop just at the last start of last byte and not the byte after. This would need to be tested with a very simple array and check for the value of the two bounds. |
Many thanks for your reply. Your online book has been a great help; I
realise that I don't even know 10% of numpy!
I am trying to demonstrate that if we reshape an array it is faster to go
through its elements by preserving index
precedence. Unfortunately I am not having much luck and performance is
pretty much equal (I have copied the code below). Is there some other way
to show that index precedence matters for performance?
```
import numpy as np
import timeit
a = np.arange(1,37).astype(np.int16).reshape((3,2,2,3))
def without_precedence(a):
sum=0
n=0
for l in range(a.shape[3]):
for k in range(a.shape[2]):
for j in range(a.shape[1]):
for i in range(a.shape[0]):
sum+=a[i,j,k,l]
n=n+1
return sum, n
def with_precedence(a):
sum=0
n=0
for i in range(a.shape[0]):
for j in range(a.shape[1]):
for k in range(a.shape[2]):
for l in range(a.shape[3]):
sum+=a[i,j,k,l]
n=n+1
return sum, n
if __name__ == "__main__":
print(timeit.timeit("with_precedence(a)", "from __main__ import with_precedence, a",number=100000))
print(timeit.timeit("without_precedence(a)","from __main__ import without_precedence, a",number=100000))
```
…On Thu, 2 Feb 2023 at 12:05, Nicolas P. Rougier ***@***.***> wrote:
Good question and I don't remember exactly :) I think the last -1 is to
have the offset stop just at the last start of last byte and not the byte
after. This would need to be tested with a very simple array and check for
the value of the two bounds.
—
Reply to this email directly, view it on GitHub
<#108 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATE2KDUH5SLAMHGMXRGFJQLWVOPHTANCNFSM6AAAAAATZV5JWA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Why do you think it would make a difference? |
I am thinking in terms of latency and the way cache memory operates in the
CPU. Since the array is stored as a one-dimensional block of memory, if the
program has to use a stride of multiple bytes to move along
the axis=0 direction, it will have to restart the read operation which
typically requires more clock cycles. The nice thing about the axis=1
direction is that as soon as the program starts reading values every next
memory block is consecutive so there is no latency. Then there is the cache
size (this effect is more debatable); for a level2 cache and a large array
above its size, the system stores successive blocks of the array and again
if I use a stride of a few thousand bytes to move along axis=0 I will not
be able to utilize the values stored in the more efficient level2 cache.
This will generate a cache miss and the system will have to retrieve values
from less efficient memory. By the way, I realise that my theory might be
missing additional enhancements from SIMD that summing along axis=0
generates (I have no idea if numpy actually does this).
…On Fri, 3 Feb 2023 at 21:04, Nicolas P. Rougier ***@***.***> wrote:
Why do you think it would make a difference?
—
Reply to this email directly, view it on GitHub
<#108 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATE2KDVXZFY2W7AIIGXZXDTWVVXEXANCNFSM6AAAAAATZV5JWA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Oh right, cache! I'm not sure the difference is measurable since python loops would eat most of the CPU time. You might have a better chance with numba that will reduce Python overcost. Or you can try to use nditer. |
At the end of the section "Anatomy of an array" you give an example of how given a view of a multidimensional array we can infer the start:stop:step structure of each index. I have compiled the code and it works fine. But there is a line of code that I have trouble following:
offset_stop = (np.byte_bounds(view)[-1] - np.byte_bounds(base)[-1]-1)//itemsize
Following the one-dimensional array example we are trying to find the offset at the end of the two arrays ; for this reason we use the address of the last byte in the view array minus the address of the last byte in the base array. This difference is converted into a difference in terms of memory blocks by dividing with itemsize.
However there is a '-1' in the expression; initially I thought it was a mistake but if I remove it the code does not work. Still I cannot make sense of it since it does not correspond to an offset in memory blocks. I am sure I am missing something trivial that would be immediately recognized by someone with more experience. Would you be able to help?
The text was updated successfully, but these errors were encountered: