gh-138946: `list.sort` enhancement proposal: Adaptivity for `binarysort` #138947

dg-pb · 2025-09-15T18:15:16Z

V1 Info (outdated)

Currently, adaptivity is simple.

Record index of last insorted value (last)
Record difference between last insorted value and the one before diff = abs(new_idx - last)
Start iteration with:
1. Take the new midpoint to be last
2. Take the next midpoint to be last += diff
3. Repeat (2) once more
It always finishes off with simple binarysort

It is primarily targeted at data already sorted to significant degree (e.g. stock price data).
However it so happens that it handles some other patterns as well.

e.g.: [-1, 1, -2, 2, -3, 3, ...].
diff will always be the full length of sorted part, so it will be jumping from one end to the next in 1 step.

Microbenchmarks

PYMAIN=/Users/Edu/local/code/cpython/main/python.exe
PYNEW=/Users/Edu/local/code/cpython/wt1/python.exe

S="
import random
import itertools as itl
RND = [random.random() for _ in range(100_000)]
RWK = [random.randint(-1, 3) for _ in range(100_000)]
RWK = list(itl.accumulate(RWK))

RNDW = [[i] for i in RND]
RWKW = [[i] for i in RWK]
"
# RAW SMALL
$PYMAIN -m timeit -s $S "sorted(RND[:30])"  # 0.72 µs
$PYNEW -m timeit -s $S "sorted(RND[:30])"   # 0.85 µs
$PYMAIN -m timeit -s $S "sorted(RWK[:30])"  # 0.65 µs
$PYNEW -m timeit -s $S "sorted(RWK[:30])"   # 0.58 µs

# WRAPPED SMALL
$PYMAIN -m timeit -s $S "sorted(RNDW[:30])" # 4.3 µs
$PYNEW -m timeit -s $S "sorted(RNDW[:30])"  # 4.6 µs
$PYMAIN -m timeit -s $S "sorted(RWKW[:30])" # 2.8 µs
$PYNEW -m timeit -s $S "sorted(RWKW[:30])"  # 1.6 µs


# RAW
$PYMAIN -m timeit -s $S "sorted(RND)"   # 16.0 ms
$PYNEW -m timeit -s $S "sorted(RND)"    # 16.0 ms
$PYMAIN -m timeit -s $S "sorted(RWK)"   #  2.5 ms
$PYNEW -m timeit -s $S "sorted(RWK)"    #  2.3 ms

# WRAPPED
$PYMAIN -m timeit -s $S "sorted(RNDW)"  # 104 ms
$PYNEW -m timeit -s $S "sorted(RNDW)"   # 102 ms
$PYMAIN -m timeit -s $S "sorted(RWKW)"  #  14.5 ms
$PYNEW -m timeit -s $S "sorted(RWKW)"   #   8.3 ms

For optimised comparisons this has little effect.
As can be seen, the worst case is small random data.
But in the same way that small data feels the biggest adverse effect, the positive effect is also the largest as greater (or all) portion of data is sorted using binarysort only.

However, the impact is non-trivial for costly comparisons.
list.__lt__ is probably the fastest of all the possible ones.
For Pure Python user implemented __lt__, the impact would be greater.

V3 Getting closer to desirable result.

Raw integers & floats (specialised comparison functions)

Above wrapped into lists

Any tips for low level optimisation are welcome.
Any ideas on better adaptivity strategy are welcome as well

Issue: list.sort enhancement proposal: Adaptivity for binarysort #138946

pochmann3 · 2025-09-17T19:24:22Z

Since you asked for more ideas... Tim and I once talked about things like this here: #116939

dg-pb · 2025-09-17T19:57:47Z

And another idea was to use statistics and switch between strategies, similar to what you do in galloping-or-not. Like tracking the insertion point averages, and if they're usually in the middle, then use raw binary searches, but if they're usually towards the end, then use the optimistic or exponential variation, and if they're usually near the start, then do optimistic/exponential from the start. The strategy could be chosen either per new pivot element or just per binarysearch invocation.

This is pretty much what I have done to incorporate it so not to damage performance of non-target cases. In many ways it resembles galloping approach. It switches on/off and grows "time-off" parameter on failed attempts.

AlanCristhian · 2025-09-20T21:50:42Z

I have a simpler Python implementation of the adaptative algorithm. I made it look like the C implementation, kind of.

def adaptative_binary_insertion_sort(a, n=0, ok=0):
    n = n or len(a)
    last = 0
    for ok in range(ok + 1, n):
        pivot = a[ok]
        L = 0
        R = ok - 1   # Ensures that pivot will not compare with itself.

        # M is the index of the element that will be compared
        # with the pivot. So start from the last moved element.
        M = last
        while L <= R:
            if pivot < a[M]:
                R = M - 1
                last = M  # Stores the index of the last moved element.
            else:
                L = M + 1
                last = L  # Stores the index of the last moved element.
            M = (L + R) >> 1
        if last < ok:  # Don't move the element to its existing location
            for M in range(ok, last, -1):
                a[M] = a[M - 1]
            a[last] = pivot  # Move pivot to its last position.

It's so simple, I think it can be implemented by modifying a few lines of the original binarysort. But I have zero real life experience with C.

dg-pb · 2025-09-20T23:04:10Z

I have a simpler Python implementation of the adaptative algorithm.

Used your idea of taking expectation to simply be last value.
I was overcomplicating things a bit there.
This also slashed off some operations, which is exactly what I was looking for.

Comparison count is a bit up, but this is due to the fact that my old expected value calculation was adapting to some stuff that is not the target. Performance is slightly better, although this turned out not to be as impactful as I expected.

Results with this change:

Unwrapped (Optimised types):

Wrapped (list.__lt__):

dg-pb · 2025-09-21T15:01:55Z

Discrepancies that are >1-2% is most likely a fluke related to my machine or something similar.

I took my PR, reverted it to main, then cleaned all up and recompiled.

Although 2 versions are now identical, still getting similar discrepancies in timings:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃  macOS-11.7.10-x86_64-i386-64bit-Mach-O | CPython: 3.15.0a0   ┃
┃        50 repeats, 1,000 times | 2025-09-21T14:09:59          ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃                      Units: ns                           main ┃
┃                                ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃                sorted(WORST30) ┃                  557 ±     7 ┃
┃               sorted(WORST100) ┃                5,520 ±    21 ┃
┃               sorted(WORST640) ┃               66,458 ±   445 ┃
┃              sorted(WORST6400) ┃              728,827 ± 4,817 ┃
┃                   sorted(BEST) ┃               79,658 ±   699 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
┃                      Units: ns      code exactly matches main ┃
┃                                ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃                sorted(WORST30) ┃                  557 ±    13 ┃
┃               sorted(WORST100) ┃                6,037 ±   104 ┃
┃               sorted(WORST640) ┃               68,031 ±   414 ┃
┃              sorted(WORST6400) ┃              760,257 ± 8,552 ┃
┃                   sorted(BEST) ┃               82,632 ±   656 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
┃                      Units: ns                     adaptivity ┃
┃                                ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃                sorted(WORST30) ┃                  591 ±     5 ┃
┃               sorted(WORST100) ┃                5,437 ±    31 ┃
┃               sorted(WORST640) ┃               69,969 ±   431 ┃
┃              sorted(WORST6400) ┃              764,258 ± 7,773 ┃
┃                   sorted(BEST) ┃               42,973 ±   362 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Which one to compare the 3rd table? 1st or 2nd?
I think benchmarks need to be done on something more reliable than my 10+ year old mac laptop...

dg-pb · 2025-10-05T00:06:10Z

Not sufficiently happy with this.

tim-one · 2025-10-05T00:26:46Z

But let it percolate in the background. I tried to stay out of this for a change, to let you find your way through the minefield. You did good! And it probably best is put on hold for now. But can also be picked up again. Hope sprints eternal 😄.

dg-pb · 2025-10-05T02:39:58Z

Yeah, let's keep this open.

For the time being the only improvement that would not cause any harm is shaving off ~1% comparisons from binarysort by acknowledging that the next element is always going to be less than the last. This is assured by count_run.

Thus, can factor the first "insort" out and put it before the loop.

--------------------------------------------------------------
               |   inv / osc | cmp_base | cmp_new | cmp diff %
--------------------------------------------------------------
    wi_best_30 | 0.21 / 0.01 |       96 |      95 |      -1.04
   wi_best_100 | 0.25 / 0.02 |      399 |     397 |       -0.5
   wi_best_640 | 0.23 / 0.01 |     2305 |    2300 |      -0.22
 uf_dgworst_30 | 0.32 / 0.02 |       96 |      95 |      -1.04
uf_dgworst_100 | 0.43 / 0.34 |      431 |     430 |      -0.23
uf_dgworst_640 | 0.59 / 0.54 |     4112 |    4101 |      -0.27
 wi_acworst_30 | 1.00 / 1.00 |      108 |     108 |        0.0
wi_acworst_100 | 1.00 / 1.00 |      460 |     460 |        0.0
wi_acworst_640 | 1.00 / 1.00 |     3001 |    3001 |        0.0
  uf_random_30 | 0.64 / 0.71 |      112 |     111 |      -0.89
 uf_random_100 | 0.68 / 0.70 |      528 |     526 |      -0.38
 uf_random_640 | 0.67 / 0.65 |     5155 |    5142 |      -0.25
  wf_random_30 | 0.71 / 0.68 |      115 |     114 |      -0.87
 wf_random_100 | 0.58 / 0.64 |      536 |     534 |      -0.37
 wf_random_640 | 0.64 / 0.64 |     5145 |    5139 |      -0.12
    neils_6400 | 0.62 / 0.60 |    68122 |   68040 |      -0.12
     AAPL_6289 | 0.50 / 0.16 |    43770 |   43689 |      -0.19
AAPLSMA10_6280 | 0.13 / 0.05 |    39552 |   39491 |      -0.15
--------------------------------------------------------------

Detecting such tiny difference is close to impossible, but nevertheless:

100 repeats, 400 loops each
macOS-11.7.10-x86_64-i386-64bit-Mach-O | 3.15.0a0
---------------------------
      units: s |     diff %
---------------------------
    wi_best_30 |  0.5 ± 0.0
   wi_best_100 | -0.6 ± 0.0
   wi_best_640 |  1.6 ± 0.1
 uf_dgworst_30 |  5.5 ± 0.0
uf_dgworst_100 |  6.1 ± 0.0
uf_dgworst_640 |  1.6 ± 0.1
 wi_acworst_30 | -0.1 ± 0.0
wi_acworst_100 |  1.3 ± 0.1
wi_acworst_640 | -0.4 ± 1.6
  uf_random_30 | -0.0 ± 0.0
 uf_random_100 | -3.6 ± 0.0
 uf_random_640 | -3.2 ± 0.1
  wf_random_30 | -0.5 ± 0.1
 wf_random_100 |  0.8 ± 0.3
 wf_random_640 |  0.6 ± 3.9
    neils_6400 |  1.5 ± 2.4
     AAPL_6289 | -0.3 ± 1.3
AAPLSMA10_6280 | -1.4 ± 1.8
---------------------------

init

0341e01

bedevere-app bot added the awaiting review label Sep 15, 2025

bedevere-app bot mentioned this pull request Sep 15, 2025

list.sort enhancement proposal: Adaptivity for binarysort #138946

Open

minor edit

e820eb7

ZeroIntensity requested a review from tim-one September 16, 2025 07:58

dg-pb added 5 commits September 16, 2025 22:19

v2

93b69cd

v3

71c82af

minor changes

6e0269c

minor edits

827d48e

micro change

589572f

tim-one self-assigned this Sep 17, 2025

dg-pb added 2 commits September 20, 2025 02:23

minimum std condition added

992b26f

bit shift instead of div

984bcd0

simplified to mu=j

7f30b55

noop

2ff75f8

dg-pb closed this Oct 5, 2025

dg-pb deleted the adaptive_binary_sort branch October 5, 2025 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-138946: `list.sort` enhancement proposal: Adaptivity for `binarysort` #138947

gh-138946: `list.sort` enhancement proposal: Adaptivity for `binarysort` #138947

Uh oh!

dg-pb commented Sep 15, 2025 •

edited

Loading

Uh oh!

pochmann3 commented Sep 17, 2025

Uh oh!

dg-pb commented Sep 17, 2025 •

edited

Loading

Uh oh!

AlanCristhian commented Sep 20, 2025

Uh oh!

dg-pb commented Sep 20, 2025

Uh oh!

dg-pb commented Sep 21, 2025

Uh oh!

dg-pb commented Oct 5, 2025

Uh oh!

tim-one commented Oct 5, 2025

Uh oh!

dg-pb commented Oct 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-138946: list.sort enhancement proposal: Adaptivity for binarysort #138947

gh-138946: list.sort enhancement proposal: Adaptivity for binarysort #138947

Uh oh!

Conversation

dg-pb commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Microbenchmarks

V3 Getting closer to desirable result.

Uh oh!

pochmann3 commented Sep 17, 2025

Uh oh!

dg-pb commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlanCristhian commented Sep 20, 2025

Uh oh!

dg-pb commented Sep 20, 2025

Uh oh!

dg-pb commented Sep 21, 2025

Uh oh!

dg-pb commented Oct 5, 2025

Uh oh!

tim-one commented Oct 5, 2025

Uh oh!

dg-pb commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gh-138946: `list.sort` enhancement proposal: Adaptivity for `binarysort` #138947

gh-138946: `list.sort` enhancement proposal: Adaptivity for `binarysort` #138947

dg-pb commented Sep 15, 2025 •

edited

Loading

dg-pb commented Sep 17, 2025 •

edited

Loading

dg-pb commented Oct 5, 2025 •

edited

Loading