-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ refactoring: sort #1072
C++ refactoring: sort #1072
Conversation
11f40b9
to
c8a3e13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two new kernels have been added that may replace the old ones. This is because there is no need for some of the parameters since the type in known. Actually, there could be just one if a ListArray
converted to a ListOffsetArray
.
nparray = np.random.rand(1000000)
v2_array = ak._v2.contents.NumpyArray(nparray) Here is a one dimensional In [9]: %timeit v2_array.sort()
18.5 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %timeit nparray.sort()
17.3 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) The same for a one dimensional In [10]: %timeit v2_array.sort()
1.26 s ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [11]: %timeit nparray.sort()
17.3 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Using In [3]: matrix = np.arange(64).reshape(8, -1)
In [4]: v2_array = ak._v2.contents.NumpyArray(matrix[:, 0])
In [5]: nparray = matrix[:, 0]
In [11]: %timeit v2_array.sort()
1.05 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [12]: %timeit nparray.sort()
249 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) |
It's not a question of how fast NumPy can sort one large array; it's a question of how fast it can sort all the little arrays in a ListArray/ListOffsetArray. For instance, this would be a good measure of wall time: import time
import numpy as np
def test(average_size):
counts = np.random.poisson(average_size, 1000000)
offsets = np.empty(1000000 + 1, np.int64)
offsets[0] = 0
offsets[1:] = np.cumsum(counts)
content = np.random.normal(0, 1, offsets[-1])
starts, stops = offsets[:-1], offsets[1:]
starttime = time.time()
for i in range(1000000):
content[starts[i]:stops[i]].sort()
return time.time() - starttime This sorts a ListOffsetArray of length 1000000 that contains irregular-length lists of >>> test(10.0)
0.5590884685516357 but some are going to involve very short lists: >>> test(10.0)
0.5590884685516357
>>> test(3.0)
0.43043017387390137
>>> test(2.0)
0.3896183967590332
>>> test(1.5)
0.3658483028411865
>>> test(1.0)
0.33797550201416016
>>> test(0.5)
0.30232715606689453
>>> test(0.1)
0.2877922058105469
>>> test(0.01)
0.2890009880065918 In the last case, very few of the lists have 1 element, let alone the 2 elements at which sorting starts to be meaningful. That's why I suggested the def test2(average_size):
counts = np.random.poisson(average_size, 1000000)
offsets = np.empty(1000000 + 1, np.int64)
offsets[0] = 0
offsets[1:] = np.cumsum(counts)
content = np.random.normal(0, 1, offsets[-1])
starts, stops = offsets[:-1], offsets[1:]
# computing the mask counts toward the total time
starttime = time.time()
mask = stops - starts >= 2
starts = starts[mask]
stops = stops[mask]
for i in range(len(starts)):
content[starts[i]:stops[i]].sort()
return time.time() - starttime Now, rather than Python spinning its wheels over a bunch of nearly empty lists, it just skips them, and short lists become a good thing, rather than a bad thing: >>> test2(10.0)
0.5271399021148682
>>> test2(3.0)
0.37287163734436035
>>> test2(2.0)
0.2766532897949219
>>> test2(1.5)
0.20734500885009766
>>> test2(1.0)
0.12771391868591309
>>> test2(0.5)
0.045095205307006836
>>> test2(0.1)
0.004815340042114258
>>> test2(0.01)
0.0021109580993652344 I was also thinking that hard-coding a special kernel for the starts, stops = offsets[:-1], offsets[1:]
starts_2 = starts[stops - starts == 2]
mask = stops - starts > 2
starts = starts[mask]
stops = stops[mask]
nplike["awkward_sort_length_2_kernel"](content, starts_2, len(starts_2))
for i in range(len(starts)):
content[starts[i]:stops[i]].sort() where the for (int64_t i = 0; i < len_starts_2; i++) {
T content_0 = content[starts_2[i]];
T content_1 = content[starts_2[i] + 1];
if (content_0 > content_1) {
content[starts_2[i]] = content_1;
content[starts_2[i] + 1] = content_0;
}
} A compiler would love to optimize that kernel, and I think the optimization of the many-small-sortings problem hasn't been investigated—certainly not as much as sorting one big array. The closest thing NumPy has to this is sorting along an axis, which is not the same because those list sizes are regular, not variable. For this case, however, NumPy is 5‒40× faster than iteration: def test3(exact_size):
content = np.random.normal(0, 1, 1000000 * exact_size)
offsets = np.arange(0, 1000000 * exact_size, exact_size)
starts, stops = offsets[:-1], offsets[1:]
starttime = time.time()
for i in range(len(starts)):
content[starts[i]:stops[i]].sort()
return time.time() - starttime
def test3_numpy(exact_size):
content = np.random.normal(0, 1, (1000000, exact_size))
starttime = time.time()
content.sort(axis=1)
return time.time() - starttime >>> test3(10)
0.5516564846038818
>>> test3_numpy(10)
0.09245777130126953
>>> test3(4)
0.49339795112609863
>>> test3_numpy(4)
0.02732539176940918
>>> test3(3)
0.4491863250732422
>>> test3_numpy(3)
0.018769264221191406
>>> test3(2)
0.4407010078430176
>>> test3_numpy(2)
0.011630535125732422 It suggests that even though iteration in Python can be surprisingly fast sometimes, it's still dominating over the actual sorting of data by an order of magnitude. They don't start to get close until the list sizes get to be about 100 or so: >>> test3(100)
2.723106622695923
>>> test3_numpy(100)
2.1994996070861816 including for irregularly sized lists: >>> test(100)
2.717956304550171 So any ListArray/ListOffsetArray sorting based on Python iteration will be spending much more time going through Python overhead than actually sorting if the user's lists are much shorter than 100, and that is typical for physics cases, which often enough have average sizes of 0.1, 1.0, 2.0, up to maybe 10.0. So Python iteration, using NumPy's built-in sorting, isn't a good idea. |
… replace it with kernel function calls later.
2c4d837
to
f396e6c
Compare
for more information, see https://pre-commit.ci
@jpivarski - I get the following error on builds with Python 2.7: self = <ListArray len='3'>
<starts><Index dtype='int64' len='3'>[0 3 5]</Index></...]
</NumpyArray></content>
</ListOffsetArray></content>
</ListArray>
offsets = <Index dtype='int64' len='4'>[0 3 3 5]</Index>
def _broadcast_tooffsets64(self, offsets):
> return ListOffsetArray._broadcast_tooffsets64(self, offsets)
E TypeError: unbound method _broadcast_tooffsets64() must be called with ListOffsetArray instance as first argument (got ListArray instance instead)
C:\hostedtoolcache\windows\Python\2.7.18\x86\lib\site-packages\awkward\_v2\contents\listarray.py:184: TypeError Shall I ignore it or fix it? Thanks |
for more information, see https://pre-commit.ci
You can copy the blanket On the other hand, I know what it is: Python 3 methods are just plain functions, which we use to consolidate implementations so that ListOffsetArray can just use ListArray's implementation of some methods. (Also for methods common to Content and Form.) This could be handled more carefully in Python 2 by constructing an instance of |
build on MacOS with Python 2.7 took longer then expected :-( |
I've just re-run it and it looks fine |
# FIXME: AttributeError: 'ListOffsetArray' object has no attribute 'mask' | ||
# v2_array = v1_to_v2(v1_array.layout) | ||
# assert v2_array.mask[is_valid].tolist() == [[0, 1, 2, None], [None, None, None, 2, 1]] | ||
# | ||
# assert ak.to_list(v2_array.mask[is_valid]) == [ | ||
# [0, 1, 2, None], | ||
# [1, 2, None, None, None], | ||
# ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we don't forget about it, I'm calling this out here: it looks like a slicing bug unrelated to sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I'm looking into an argsort
now and in another test I'm running into the following FIXME
:
# FIXME: might infinite-loop before simplify_optiontype is implemented
> return prepare_tuple_nested(next.simplify_optiontype())
E UnboundLocalError: local variable 'next' referenced before assignment
awkward/_v2/_slicing.py:218: UnboundLocalError
assert ak.to_list(array[ak.argsort(array, axis=1)]) == ak.to_list(
ak.sort(array, axis=1)
)
It is v2
, I've added the following to covert it - not sure if it's a good idea though:
diff --git a/src/awkward/operations/convert.py b/src/awkward/operations/convert.py
index 3a6983b3..17f73533 100644
--- a/src/awkward/operations/convert.py
+++ b/src/awkward/operations/convert.py
@@ -1920,6 +1920,10 @@ def to_layout(
elif isinstance(array, Iterable):
return from_iter(array, highlevel=False)
+ # FIXME?
+ elif isinstance(array, ak._v2.contents.Content):
+ return array
+
elif not allow_other:
raise TypeError(
"{0} cannot be converted into an Awkward Array".format(array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I suppose there are old kernels associated with the original implementation of sorting. We'll find those when we scan through for kernels that aren't called by any v2 code (and anyway, we can't get rid of them now because v1 still needs to work!).
(Sorry that I didn't get a chance to review this before releasing 1.5.0, though it probably doesn't matter if v2 code gets into an official release or not...)
No description provided.