bpo-37229: Add compare_function to bisect functions #13970

FortStatement · 2019-06-11T10:40:33Z

All of bisect's functions (insort_{left,right}, bisect_{left,right}) can only use comparison of objects via __lt__.
They should support providing a custom comparison function.

https://bugs.python.org/issue37229

asvetlov · 2019-06-11T10:45:44Z

Please fix bpo number

remilapeyre · 2019-06-11T10:55:01Z

Hi @GPery, thanks for taking the time to improve Python!

As @tirkarthi mentionned, #11781 is already open to add this functionnality thought.

FortStatement · 2019-06-11T11:00:40Z

#11781 adds a key parameter, which I believe is inferior. It's a specific case of a comparison callback, and in my use case would force me to create another class to serve as a comparator for my objects, at which point I might as well wrap them and add lt.

Furthermore, I believe this is more in-line with similar standard functions in other languages such as C++ (std::sort), Java (PriorityQueue) or Rust (slice.sort_by).

asvetlov · 2019-06-11T11:02:57Z

key is more in line with Python list.sort() and sorted().
Use cmp_to_key() for handling __lt__ case.

remilapeyre · 2019-06-11T11:04:50Z

@GPery, both are actually two ways to describe the same behavior, do you know functools.cmp_to_key (https://docs.python.org/3/library/functools.html#functools.cmp_to_key) to go from one to the other?

FortStatement · 2019-06-11T11:11:18Z

You're right, both cases are about the same.
The key extraction syntax still seems less intuitive to me than a compare function. Having been adopted by so many other languages, I think it should be the preferred behaviour, despite not matching list (after all, it's a separate library, even though it's builtin).

FortStatement · 2019-06-11T11:16:46Z

I feel it might be relevant to note that this isn't reverting to the old-style (C-style) comparison.
It's a boolean comparison, parallel to how __lt__ behaves.
Not the old style, but a different alternative to it.

asvetlov · 2019-06-11T11:21:34Z

list sort uses key because this is much faster than old cmp method (very similar to your current proposal).
The reason is: key is calculated only once per item but cmp is called for every compared pair.
This is doesn't matter for fast compiled languages like C++ or Rust but in Python the python function call is relatively expensive.

FortStatement · 2019-06-11T11:59:18Z

"every compared pair" is also (up to) once per item.
If we're on the subject of function calls, the key flow calls the key function, then the result's __lt__ function per item.
The compare_function flow only calls the callback.

jdemeyer · 2019-06-11T12:06:06Z

The reason is: key is calculated only once per item but cmp is called for every compared pair.

What's your point exactly? Even with custom keys, you still need to compare every compared pair with __lt__.

(Note: I do think that key is the way to go but only for compatibility with sorted(), not for performance).

asvetlov · 2019-06-11T12:36:52Z

@jdemeyer IIRC at least for list sorting the point is that python-provided comparison function is usually much slower than built-in compare methods for int, str or tuple.

jdemeyer · 2019-06-11T12:40:35Z

at least for list sorting the point is that python-provided comparison function is usually much slower than built-in compare methods for int, str or tuple.

OK, so the performance argument becomes "in certain use cases, using key would be faster than a Python 2 style cmp". Fair enough. But there are certainly also uses cases (arguably less common) where the opposite is true, for example when using cmp_to_key.

FortStatement · 2019-06-11T12:42:46Z

@asvetlov I'm not sure what were the original concerns, but I checked this PR and the key PR (#11781) with timeit, compare_function is extremely slightly, yet consistently, faster.

timeit -s "from bisect import bisect
                          class C:
                           def __init__(self, n):
                            self.n = n
                          data = [C(n) for n in range(1_000_000)]
                          cmp = C(25)" "bisect(data, cmp, key=lambda x: x.n)"
50000 loops, best of 5: 6.93 usec per loop

timeit -s "from bisect import bisect
                          class C:
                           def __init__(self, n):
                            self.n = n
                          data = [C(n) for n in range(1_000_000)]
                          cmp = C(25)" "bisect(data, cmp, compare_function=lambda a,b: a.n < b.n)"
50000 loops, best of 5: 6.79 usec per loop

I think this is a rather fair comparison.
Anyway, the real issue is syntax compatibility, which I think is subjective.

remilapeyre · 2019-06-11T13:19:07Z

I just tried and had the opposite result, both with and without pydebug, I wonder what exactly might be doing that.

cirosantilli · 2020-09-03T12:37:16Z

Hi there, why was this one closed? Was it superseded/WONTFIXED? This addition would be amazing, I keep hitting it. Let me know if there's any way I can help. Thanks to all.

FortStatement requested a review from rhettinger as a code owner June 11, 2019 10:40

the-knights-who-say-ni added the CLA signed label Jun 11, 2019

bedevere-bot added the awaiting review label Jun 11, 2019

FortStatement changed the title ~~bpo-345216: Add compare_function to bisect functions~~ bpo-37229: Add compare_function to bisect functions Jun 11, 2019

bpo-345216: Add compare_function to bisect functions

056e27a

FortStatement force-pushed the bisect-custom-compare branch from d2f526e to 056e27a Compare June 11, 2019 10:53

rhettinger self-assigned this Jun 11, 2019

rhettinger closed this Jun 11, 2019

Uh oh!

bpo-37229: Add compare_function to bisect functions #13970

bpo-37229: Add compare_function to bisect functions #13970

Uh oh!

Conversation

FortStatement commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asvetlov commented Jun 11, 2019

Uh oh!

remilapeyre commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FortStatement commented Jun 11, 2019

Uh oh!

asvetlov commented Jun 11, 2019

Uh oh!

remilapeyre commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FortStatement commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FortStatement commented Jun 11, 2019

Uh oh!

asvetlov commented Jun 11, 2019

Uh oh!

FortStatement commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdemeyer commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asvetlov commented Jun 11, 2019

Uh oh!

jdemeyer commented Jun 11, 2019

Uh oh!

FortStatement commented Jun 11, 2019

Uh oh!

remilapeyre commented Jun 11, 2019

Uh oh!

cirosantilli commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

FortStatement commented Jun 11, 2019 •

edited

Loading

remilapeyre commented Jun 11, 2019 •

edited

Loading

remilapeyre commented Jun 11, 2019 •

edited

Loading

FortStatement commented Jun 11, 2019 •

edited

Loading

FortStatement commented Jun 11, 2019 •

edited

Loading

jdemeyer commented Jun 11, 2019 •

edited

Loading

cirosantilli commented Sep 3, 2020 •

edited

Loading