C++ refactoring: sort #1072

ianna · 2021-08-24T20:43:14Z

No description provided.

ianna

Two new kernels have been added that may replace the old ones. This is because there is no need for some of the parameters since the type in known. Actually, there could be just one if a ListArray converted to a ListOffsetArray.

src/awkward/_v2/contents/regulararray.py

src/awkward/_v2/contents/numpyarray.py

ianna · 2021-08-31T16:17:40Z

nparray = np.random.rand(1000000)
v2_array = ak._v2.contents.NumpyArray(nparray)

Here is a one dimensional v2_array NumpyArray sorted by numpy.sort() compared with sorting the same numpy array:

In [9]: %timeit v2_array.sort()
18.5 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit nparray.sort()
17.3 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The same for a one dimensional v2_array NumpyArray sorted by std::stable_sort:

In [10]: %timeit v2_array.sort()
1.26 s ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit nparray.sort()
17.3 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using std::sort on non contiguous NumpArray compared with numpy:

In [3]: matrix = np.arange(64).reshape(8, -1)

In [4]: v2_array = ak._v2.contents.NumpyArray(matrix[:, 0])

In [5]: nparray = matrix[:, 0]

In [11]: %timeit v2_array.sort()
1.05 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [12]: %timeit nparray.sort()
249 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

jpivarski · 2021-08-31T19:28:42Z

It's not a question of how fast NumPy can sort one large array; it's a question of how fast it can sort all the little arrays in a ListArray/ListOffsetArray. For instance, this would be a good measure of wall time:

import time
import numpy as np

def test(average_size):
    counts = np.random.poisson(average_size, 1000000)
    offsets = np.empty(1000000 + 1, np.int64)
    offsets[0] = 0
    offsets[1:] = np.cumsum(counts)
    content = np.random.normal(0, 1, offsets[-1])
    starts, stops = offsets[:-1], offsets[1:]
    starttime = time.time()
    for i in range(1000000):
        content[starts[i]:stops[i]].sort()
    return time.time() - starttime

This sorts a ListOffsetArray of length 1000000 that contains irregular-length lists of average_size. Some physics problems are going to involve reasonably large lists:

>>> test(10.0)
0.5590884685516357

but some are going to involve very short lists:

>>> test(10.0)
0.5590884685516357
>>> test(3.0)
0.43043017387390137
>>> test(2.0)
0.3896183967590332
>>> test(1.5)
0.3658483028411865
>>> test(1.0)
0.33797550201416016
>>> test(0.5)
0.30232715606689453
>>> test(0.1)
0.2877922058105469
>>> test(0.01)
0.2890009880065918

In the last case, very few of the lists have 1 element, let alone the 2 elements at which sorting starts to be meaningful. That's why I suggested the starts - stops >= 2 mask:

def test2(average_size):
    counts = np.random.poisson(average_size, 1000000)
    offsets = np.empty(1000000 + 1, np.int64)
    offsets[0] = 0
    offsets[1:] = np.cumsum(counts)
    content = np.random.normal(0, 1, offsets[-1])
    starts, stops = offsets[:-1], offsets[1:]
    # computing the mask counts toward the total time
    starttime = time.time()
    mask = stops - starts >= 2
    starts = starts[mask]
    stops = stops[mask]
    for i in range(len(starts)):
        content[starts[i]:stops[i]].sort()
    return time.time() - starttime

Now, rather than Python spinning its wheels over a bunch of nearly empty lists, it just skips them, and short lists become a good thing, rather than a bad thing:

>>> test2(10.0)
0.5271399021148682
>>> test2(3.0)
0.37287163734436035
>>> test2(2.0)
0.2766532897949219
>>> test2(1.5)
0.20734500885009766
>>> test2(1.0)
0.12771391868591309
>>> test2(0.5)
0.045095205307006836
>>> test2(0.1)
0.004815340042114258
>>> test2(0.01)
0.0021109580993652344

I was also thinking that hard-coding a special kernel for the stops - starts == 2 cases could be a good thing (and maybe even stops - starts == 3, but eventually you hit diminishing returns):

starts, stops = offsets[:-1], offsets[1:]
starts_2 = starts[stops - starts == 2]
mask = stops - starts > 2
starts = starts[mask]
stops = stops[mask]
nplike["awkward_sort_length_2_kernel"](content, starts_2, len(starts_2))
for i in range(len(starts)):
    content[starts[i]:stops[i]].sort()

where the awkward_sort_length_2_kernel does

for (int64_t i = 0;  i < len_starts_2;  i++) {
    T content_0 = content[starts_2[i]];
    T content_1 = content[starts_2[i] + 1];
    if (content_0 > content_1) {
        content[starts_2[i]] = content_1;
        content[starts_2[i] + 1] = content_0;
    }
}

A compiler would love to optimize that kernel, and I think the optimization of the many-small-sortings problem hasn't been investigated—certainly not as much as sorting one big array.

The closest thing NumPy has to this is sorting along an axis, which is not the same because those list sizes are regular, not variable. For this case, however, NumPy is 5‒40× faster than iteration:

def test3(exact_size):
    content = np.random.normal(0, 1, 1000000 * exact_size)
    offsets = np.arange(0, 1000000 * exact_size, exact_size)
    starts, stops = offsets[:-1], offsets[1:]
    starttime = time.time()
    for i in range(len(starts)):
        content[starts[i]:stops[i]].sort()
    return time.time() - starttime

def test3_numpy(exact_size):
    content = np.random.normal(0, 1, (1000000, exact_size))
    starttime = time.time()
    content.sort(axis=1)
    return time.time() - starttime

>>> test3(10)
0.5516564846038818
>>> test3_numpy(10)
0.09245777130126953

>>> test3(4)
0.49339795112609863
>>> test3_numpy(4)
0.02732539176940918

>>> test3(3)
0.4491863250732422
>>> test3_numpy(3)
0.018769264221191406

>>> test3(2)
0.4407010078430176
>>> test3_numpy(2)
0.011630535125732422

It suggests that even though iteration in Python can be surprisingly fast sometimes, it's still dominating over the actual sorting of data by an order of magnitude. They don't start to get close until the list sizes get to be about 100 or so:

>>> test3(100)
2.723106622695923
>>> test3_numpy(100)
2.1994996070861816

including for irregularly sized lists:

>>> test(100)
2.717956304550171

So any ListArray/ListOffsetArray sorting based on Python iteration will be spending much more time going through Python overhead than actually sorting if the user's lists are much shorter than 100, and that is typical for physics cases, which often enough have average sizes of 0.1, 1.0, 2.0, up to maybe 10.0. So Python iteration, using NumPy's built-in sorting, isn't a good idea.

… replace it with kernel function calls later.

…anges

…thrown otherwise

for more information, see https://pre-commit.ci

ianna · 2021-09-10T17:14:14Z

@jpivarski - I get the following error on builds with Python 2.7:

self = <ListArray len='3'>
    <starts><Index dtype='int64' len='3'>[0 3 5]</Index></...]
        </NumpyArray></content>
    </ListOffsetArray></content>
</ListArray>
offsets = <Index dtype='int64' len='4'>[0 3 3 5]</Index>

    def _broadcast_tooffsets64(self, offsets):
>       return ListOffsetArray._broadcast_tooffsets64(self, offsets)
E       TypeError: unbound method _broadcast_tooffsets64() must be called with ListOffsetArray instance as first argument (got ListArray instance instead)

C:\hostedtoolcache\windows\Python\2.7.18\x86\lib\site-packages\awkward\_v2\contents\listarray.py:184: TypeError

Shall I ignore it or fix it? Thanks

for more information, see https://pre-commit.ci

jpivarski · 2021-09-10T17:57:33Z

@jpivarski - I get the following error on builds with Python 2.7:
Shall I ignore it or fix it?

You can copy the blanket pytestmark = skip Python 2.7 that we have on the other v2 test files. We'll be dropping Python 2 support (#1010) on a similar timescale to releasing Awkward version 2, so there isn't much point in making v2 work for Python 2.

On the other hand, I know what it is: Python 3 methods are just plain functions, which we use to consolidate implementations so that ListOffsetArray can just use ListArray's implementation of some methods. (Also for methods common to Content and Form.) This could be handled more carefully in Python 2 by constructing an instance of types.MethodType that rebinds a copy of a ListArray method for ListOffsetArray, and that would probably work just as well in Python 3. But if we're dropping Python 2 support anyway, why bother?

ianna · 2021-09-10T18:42:23Z

build on MacOS with Python 2.7 took longer then expected :-(

ianna · 2021-09-10T21:32:28Z

build on MacOS with Python 2.7 took longer then expected :-(

I've just re-run it and it looks fine

jpivarski · 2021-09-13T14:35:23Z

tests/test_1072-test-v2-sort.py

+    # FIXME: AttributeError: 'ListOffsetArray' object has no attribute 'mask'
+    # v2_array = v1_to_v2(v1_array.layout)
+    # assert v2_array.mask[is_valid].tolist() == [[0, 1, 2, None], [None, None, None, 2, 1]]
+    #
+    # assert ak.to_list(v2_array.mask[is_valid]) == [
+    #     [0, 1, 2, None],
+    #     [1, 2, None, None, None],
+    # ]


So that we don't forget about it, I'm calling this out here: it looks like a slicing bug unrelated to sorting.

BTW, I'm looking into an argsort now and in another test I'm running into the following FIXME:

# FIXME: might infinite-loop before simplify_optiontype is implemented > return prepare_tuple_nested(next.simplify_optiontype()) E UnboundLocalError: local variable 'next' referenced before assignment awkward/_v2/_slicing.py:218: UnboundLocalError assert ak.to_list(array[ak.argsort(array, axis=1)]) == ak.to_list( ak.sort(array, axis=1) )

It is v2, I've added the following to covert it - not sure if it's a good idea though:

diff --git a/src/awkward/operations/convert.py b/src/awkward/operations/convert.py index 3a6983b3..17f73533 100644 --- a/src/awkward/operations/convert.py +++ b/src/awkward/operations/convert.py @@ -1920,6 +1920,10 @@ def to_layout( elif isinstance(array, Iterable): return from_iter(array, highlevel=False) + # FIXME? + elif isinstance(array, ak._v2.contents.Content): + return array + elif not allow_other: raise TypeError( "{0} cannot be converted into an Awkward Array".format(array)

jpivarski

This looks great!

I suppose there are old kernels associated with the original implementation of sorting. We'll find those when we scan through for kernels that aren't called by any v2 code (and anyway, we can't get rid of them now because v1 still needs to work!).

(Sorry that I didn't get a chance to review this before releasing 1.5.0, though it probably doesn't matter if v2 code gets into an official release or not...)

ianna marked this pull request as draft August 24, 2021 20:43

ianna force-pushed the ianna/refactoring-sort branch from 11f40b9 to c8a3e13 Compare August 25, 2021 15:15

ianna commented Aug 31, 2021

View reviewed changes

src/awkward/_v2/contents/regulararray.py Outdated Show resolved Hide resolved

src/awkward/_v2/contents/numpyarray.py Outdated Show resolved Hide resolved

ianna added 10 commits September 3, 2021 15:48

C++ refactoring: sort

8affe96

pre-commit fixes

a258885

using numpy sort to sort the arrays - let's see how it performs. I'll…

ba79fc6

… replace it with kernel function calls later.

save for now

2cb329a

use argsort to sort arrays

eeb4427

use kernels for 8x faster sorting

d173534

save for now

1a360eb

convert ListArray to ListOffsetArray for sorting and mask out short r…

2a20783

…anges

sorting strings done

5d5b807

need to rabase - save local changes

f396e6c

ianna force-pushed the ianna/refactoring-sort branch from 2c4d837 to f396e6c Compare September 3, 2021 13:50

ianna and others added 14 commits September 3, 2021 16:05

remove duplicate function

efd9c85

try a fix for windows

acf84cc

try again to fix windows

6a1b644

windows fix

0e74099

indexed option array sort

a46955d

introducng shifts

f998ef3

remove numpy import

506bc25

all tests run

3d567d6

cleanup

c34c54a

cleanup

989c2cd

add kind and order

fa12e2a

do recordarray sort only for a simple case, not implemented error is …

d16aabe

…thrown otherwise

Merge branch 'main' into ianna/refactoring-sort

0a1a0a3

[pre-commit.ci] auto fixes from pre-commit.com hooks

19858df

for more information, see https://pre-commit.ci

ianna and others added 6 commits September 10, 2021 16:46

Update regulararray.py

7671873

Update listarray.py

285b6a5

[pre-commit.ci] auto fixes from pre-commit.com hooks

684d071

for more information, see https://pre-commit.ci

Update recordarray.py

bee3b76

Update test_1072-test-v2-sort.py

10ca8cb

Update listarray.py

b735d9a

ianna and others added 2 commits September 10, 2021 19:23

Update test_1072-test-v2-sort.py

d590c0d

[pre-commit.ci] auto fixes from pre-commit.com hooks

d6b41e2

for more information, see https://pre-commit.ci

ianna marked this pull request as ready for review September 10, 2021 18:41

ianna requested a review from jpivarski September 10, 2021 18:42

jpivarski reviewed Sep 13, 2021

View reviewed changes

jpivarski approved these changes Sep 13, 2021

View reviewed changes

jpivarski merged commit 665a187 into main Sep 13, 2021

jpivarski deleted the ianna/refactoring-sort branch September 13, 2021 14:41

jpivarski mentioned this pull request Nov 1, 2022

Support axis=None in argsort etc #1855

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ refactoring: sort #1072

C++ refactoring: sort #1072

ianna commented Aug 24, 2021

ianna left a comment

ianna commented Aug 31, 2021

jpivarski commented Aug 31, 2021

ianna commented Sep 10, 2021

jpivarski commented Sep 10, 2021

ianna commented Sep 10, 2021

ianna commented Sep 10, 2021

jpivarski Sep 13, 2021

ianna Sep 13, 2021

jpivarski left a comment

C++ refactoring: sort #1072

C++ refactoring: sort #1072

Conversation

ianna commented Aug 24, 2021

ianna left a comment

Choose a reason for hiding this comment

ianna commented Aug 31, 2021

jpivarski commented Aug 31, 2021

ianna commented Sep 10, 2021

jpivarski commented Sep 10, 2021

ianna commented Sep 10, 2021

ianna commented Sep 10, 2021

jpivarski Sep 13, 2021

Choose a reason for hiding this comment

ianna Sep 13, 2021

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment