Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ refactoring: sort #1072

Merged
merged 32 commits into from
Sep 13, 2021
Merged

C++ refactoring: sort #1072

merged 32 commits into from
Sep 13, 2021

Conversation

ianna
Copy link
Collaborator

@ianna ianna commented Aug 24, 2021

No description provided.

@ianna ianna marked this pull request as draft August 24, 2021 20:43
@ianna ianna force-pushed the ianna/refactoring-sort branch from 11f40b9 to c8a3e13 Compare August 25, 2021 15:15
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two new kernels have been added that may replace the old ones. This is because there is no need for some of the parameters since the type in known. Actually, there could be just one if a ListArray converted to a ListOffsetArray.

src/awkward/_v2/contents/regulararray.py Outdated Show resolved Hide resolved
src/awkward/_v2/contents/numpyarray.py Outdated Show resolved Hide resolved
@ianna
Copy link
Collaborator Author

ianna commented Aug 31, 2021

nparray = np.random.rand(1000000)
v2_array = ak._v2.contents.NumpyArray(nparray)

Here is a one dimensional v2_array NumpyArray sorted by numpy.sort() compared with sorting the same numpy array:

In [9]: %timeit v2_array.sort()
18.5 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit nparray.sort()
17.3 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The same for a one dimensional v2_array NumpyArray sorted by std::stable_sort:

In [10]: %timeit v2_array.sort()
1.26 s ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit nparray.sort()
17.3 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using std::sort on non contiguous NumpArray compared with numpy:

In [3]: matrix = np.arange(64).reshape(8, -1)

In [4]: v2_array = ak._v2.contents.NumpyArray(matrix[:, 0])

In [5]: nparray = matrix[:, 0]

In [11]: %timeit v2_array.sort()
1.05 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [12]: %timeit nparray.sort()
249 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@jpivarski
Copy link
Member

It's not a question of how fast NumPy can sort one large array; it's a question of how fast it can sort all the little arrays in a ListArray/ListOffsetArray. For instance, this would be a good measure of wall time:

import time
import numpy as np

def test(average_size):
    counts = np.random.poisson(average_size, 1000000)
    offsets = np.empty(1000000 + 1, np.int64)
    offsets[0] = 0
    offsets[1:] = np.cumsum(counts)
    content = np.random.normal(0, 1, offsets[-1])
    starts, stops = offsets[:-1], offsets[1:]
    starttime = time.time()
    for i in range(1000000):
        content[starts[i]:stops[i]].sort()
    return time.time() - starttime

This sorts a ListOffsetArray of length 1000000 that contains irregular-length lists of average_size. Some physics problems are going to involve reasonably large lists:

>>> test(10.0)
0.5590884685516357

but some are going to involve very short lists:

>>> test(10.0)
0.5590884685516357
>>> test(3.0)
0.43043017387390137
>>> test(2.0)
0.3896183967590332
>>> test(1.5)
0.3658483028411865
>>> test(1.0)
0.33797550201416016
>>> test(0.5)
0.30232715606689453
>>> test(0.1)
0.2877922058105469
>>> test(0.01)
0.2890009880065918

In the last case, very few of the lists have 1 element, let alone the 2 elements at which sorting starts to be meaningful. That's why I suggested the starts - stops >= 2 mask:

def test2(average_size):
    counts = np.random.poisson(average_size, 1000000)
    offsets = np.empty(1000000 + 1, np.int64)
    offsets[0] = 0
    offsets[1:] = np.cumsum(counts)
    content = np.random.normal(0, 1, offsets[-1])
    starts, stops = offsets[:-1], offsets[1:]
    # computing the mask counts toward the total time
    starttime = time.time()
    mask = stops - starts >= 2
    starts = starts[mask]
    stops = stops[mask]
    for i in range(len(starts)):
        content[starts[i]:stops[i]].sort()
    return time.time() - starttime

Now, rather than Python spinning its wheels over a bunch of nearly empty lists, it just skips them, and short lists become a good thing, rather than a bad thing:

>>> test2(10.0)
0.5271399021148682
>>> test2(3.0)
0.37287163734436035
>>> test2(2.0)
0.2766532897949219
>>> test2(1.5)
0.20734500885009766
>>> test2(1.0)
0.12771391868591309
>>> test2(0.5)
0.045095205307006836
>>> test2(0.1)
0.004815340042114258
>>> test2(0.01)
0.0021109580993652344

I was also thinking that hard-coding a special kernel for the stops - starts == 2 cases could be a good thing (and maybe even stops - starts == 3, but eventually you hit diminishing returns):

starts, stops = offsets[:-1], offsets[1:]
starts_2 = starts[stops - starts == 2]
mask = stops - starts > 2
starts = starts[mask]
stops = stops[mask]
nplike["awkward_sort_length_2_kernel"](content, starts_2, len(starts_2))
for i in range(len(starts)):
    content[starts[i]:stops[i]].sort()

where the awkward_sort_length_2_kernel does

for (int64_t i = 0;  i < len_starts_2;  i++) {
    T content_0 = content[starts_2[i]];
    T content_1 = content[starts_2[i] + 1];
    if (content_0 > content_1) {
        content[starts_2[i]] = content_1;
        content[starts_2[i] + 1] = content_0;
    }
}

A compiler would love to optimize that kernel, and I think the optimization of the many-small-sortings problem hasn't been investigated—certainly not as much as sorting one big array.

The closest thing NumPy has to this is sorting along an axis, which is not the same because those list sizes are regular, not variable. For this case, however, NumPy is 5‒40× faster than iteration:

def test3(exact_size):
    content = np.random.normal(0, 1, 1000000 * exact_size)
    offsets = np.arange(0, 1000000 * exact_size, exact_size)
    starts, stops = offsets[:-1], offsets[1:]
    starttime = time.time()
    for i in range(len(starts)):
        content[starts[i]:stops[i]].sort()
    return time.time() - starttime

def test3_numpy(exact_size):
    content = np.random.normal(0, 1, (1000000, exact_size))
    starttime = time.time()
    content.sort(axis=1)
    return time.time() - starttime
>>> test3(10)
0.5516564846038818
>>> test3_numpy(10)
0.09245777130126953

>>> test3(4)
0.49339795112609863
>>> test3_numpy(4)
0.02732539176940918

>>> test3(3)
0.4491863250732422
>>> test3_numpy(3)
0.018769264221191406

>>> test3(2)
0.4407010078430176
>>> test3_numpy(2)
0.011630535125732422

It suggests that even though iteration in Python can be surprisingly fast sometimes, it's still dominating over the actual sorting of data by an order of magnitude. They don't start to get close until the list sizes get to be about 100 or so:

>>> test3(100)
2.723106622695923
>>> test3_numpy(100)
2.1994996070861816

including for irregularly sized lists:

>>> test(100)
2.717956304550171

So any ListArray/ListOffsetArray sorting based on Python iteration will be spending much more time going through Python overhead than actually sorting if the user's lists are much shorter than 100, and that is typical for physics cases, which often enough have average sizes of 0.1, 1.0, 2.0, up to maybe 10.0. So Python iteration, using NumPy's built-in sorting, isn't a good idea.

@ianna ianna force-pushed the ianna/refactoring-sort branch from 2c4d837 to f396e6c Compare September 3, 2021 13:50
@ianna
Copy link
Collaborator Author

ianna commented Sep 10, 2021

@jpivarski - I get the following error on builds with Python 2.7:

self = <ListArray len='3'>
    <starts><Index dtype='int64' len='3'>[0 3 5]</Index></...]
        </NumpyArray></content>
    </ListOffsetArray></content>
</ListArray>
offsets = <Index dtype='int64' len='4'>[0 3 3 5]</Index>

    def _broadcast_tooffsets64(self, offsets):
>       return ListOffsetArray._broadcast_tooffsets64(self, offsets)
E       TypeError: unbound method _broadcast_tooffsets64() must be called with ListOffsetArray instance as first argument (got ListArray instance instead)

C:\hostedtoolcache\windows\Python\2.7.18\x86\lib\site-packages\awkward\_v2\contents\listarray.py:184: TypeError

Shall I ignore it or fix it? Thanks

@jpivarski
Copy link
Member

@jpivarski - I get the following error on builds with Python 2.7:
Shall I ignore it or fix it?

You can copy the blanket pytestmark = skip Python 2.7 that we have on the other v2 test files. We'll be dropping Python 2 support (#1010) on a similar timescale to releasing Awkward version 2, so there isn't much point in making v2 work for Python 2.

On the other hand, I know what it is: Python 3 methods are just plain functions, which we use to consolidate implementations so that ListOffsetArray can just use ListArray's implementation of some methods. (Also for methods common to Content and Form.) This could be handled more carefully in Python 2 by constructing an instance of types.MethodType that rebinds a copy of a ListArray method for ListOffsetArray, and that would probably work just as well in Python 3. But if we're dropping Python 2 support anyway, why bother?

@ianna ianna marked this pull request as ready for review September 10, 2021 18:41
@ianna
Copy link
Collaborator Author

ianna commented Sep 10, 2021

build on MacOS with Python 2.7 took longer then expected :-(

@ianna ianna requested a review from jpivarski September 10, 2021 18:42
@ianna
Copy link
Collaborator Author

ianna commented Sep 10, 2021

build on MacOS with Python 2.7 took longer then expected :-(

I've just re-run it and it looks fine

Comment on lines +72 to +79
# FIXME: AttributeError: 'ListOffsetArray' object has no attribute 'mask'
# v2_array = v1_to_v2(v1_array.layout)
# assert v2_array.mask[is_valid].tolist() == [[0, 1, 2, None], [None, None, None, 2, 1]]
#
# assert ak.to_list(v2_array.mask[is_valid]) == [
# [0, 1, 2, None],
# [1, 2, None, None, None],
# ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that we don't forget about it, I'm calling this out here: it looks like a slicing bug unrelated to sorting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm looking into an argsort now and in another test I'm running into the following FIXME:

                     # FIXME: might infinite-loop before simplify_optiontype is implemented
     >           return prepare_tuple_nested(next.simplify_optiontype())
     E           UnboundLocalError: local variable 'next' referenced before assignment
    
     awkward/_v2/_slicing.py:218: UnboundLocalError

     assert ak.to_list(array[ak.argsort(array, axis=1)]) == ak.to_list(
         ak.sort(array, axis=1)
     )

It is v2, I've added the following to covert it - not sure if it's a good idea though:

diff --git a/src/awkward/operations/convert.py b/src/awkward/operations/convert.py
index 3a6983b3..17f73533 100644
--- a/src/awkward/operations/convert.py
+++ b/src/awkward/operations/convert.py
@@ -1920,6 +1920,10 @@ def to_layout(
     elif isinstance(array, Iterable):
         return from_iter(array, highlevel=False)
 
+    # FIXME?
+    elif isinstance(array, ak._v2.contents.Content):
+        return array
+
     elif not allow_other:
         raise TypeError(
             "{0} cannot be converted into an Awkward Array".format(array)

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I suppose there are old kernels associated with the original implementation of sorting. We'll find those when we scan through for kernels that aren't called by any v2 code (and anyway, we can't get rid of them now because v1 still needs to work!).

(Sorry that I didn't get a chance to review this before releasing 1.5.0, though it probably doesn't matter if v2 code gets into an official release or not...)

@jpivarski jpivarski merged commit 665a187 into main Sep 13, 2021
@jpivarski jpivarski deleted the ianna/refactoring-sort branch September 13, 2021 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants