Skip to content

bpo-37229: Add compare_function to bisect functions #13970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

FortStatement
Copy link

@FortStatement FortStatement commented Jun 11, 2019

All of bisect's functions (insort_{left,right}, bisect_{left,right}) can only use comparison of objects via __lt__.
They should support providing a custom comparison function.

https://bugs.python.org/issue37229

@asvetlov
Copy link
Contributor

Please fix bpo number

@FortStatement FortStatement changed the title bpo-345216: Add compare_function to bisect functions bpo-37229: Add compare_function to bisect functions Jun 11, 2019
@FortStatement FortStatement force-pushed the bisect-custom-compare branch from d2f526e to 056e27a Compare June 11, 2019 10:53
@remilapeyre
Copy link
Contributor

remilapeyre commented Jun 11, 2019

Hi @GPery, thanks for taking the time to improve Python!

As @tirkarthi mentionned, #11781 is already open to add this functionnality thought.

@FortStatement
Copy link
Author

#11781 adds a key parameter, which I believe is inferior. It's a specific case of a comparison callback, and in my use case would force me to create another class to serve as a comparator for my objects, at which point I might as well wrap them and add lt.

Furthermore, I believe this is more in-line with similar standard functions in other languages such as C++ (std::sort), Java (PriorityQueue) or Rust (slice.sort_by).

@asvetlov
Copy link
Contributor

key is more in line with Python list.sort() and sorted().
Use cmp_to_key() for handling __lt__ case.

@remilapeyre
Copy link
Contributor

remilapeyre commented Jun 11, 2019

@GPery, both are actually two ways to describe the same behavior, do you know functools.cmp_to_key (https://docs.python.org/3/library/functools.html#functools.cmp_to_key) to go from one to the other?

@FortStatement
Copy link
Author

FortStatement commented Jun 11, 2019

You're right, both cases are about the same.
The key extraction syntax still seems less intuitive to me than a compare function. Having been adopted by so many other languages, I think it should be the preferred behaviour, despite not matching list (after all, it's a separate library, even though it's builtin).

@FortStatement
Copy link
Author

I feel it might be relevant to note that this isn't reverting to the old-style (C-style) comparison.
It's a boolean comparison, parallel to how __lt__ behaves.
Not the old style, but a different alternative to it.

@asvetlov
Copy link
Contributor

list sort uses key because this is much faster than old cmp method (very similar to your current proposal).
The reason is: key is calculated only once per item but cmp is called for every compared pair.
This is doesn't matter for fast compiled languages like C++ or Rust but in Python the python function call is relatively expensive.

@FortStatement
Copy link
Author

FortStatement commented Jun 11, 2019

"every compared pair" is also (up to) once per item.
If we're on the subject of function calls, the key flow calls the key function, then the result's __lt__ function per item.
The compare_function flow only calls the callback.

@jdemeyer
Copy link
Contributor

jdemeyer commented Jun 11, 2019

The reason is: key is calculated only once per item but cmp is called for every compared pair.

What's your point exactly? Even with custom keys, you still need to compare every compared pair with __lt__.

(Note: I do think that key is the way to go but only for compatibility with sorted(), not for performance).

@asvetlov
Copy link
Contributor

@jdemeyer IIRC at least for list sorting the point is that python-provided comparison function is usually much slower than built-in compare methods for int, str or tuple.

@jdemeyer
Copy link
Contributor

at least for list sorting the point is that python-provided comparison function is usually much slower than built-in compare methods for int, str or tuple.

OK, so the performance argument becomes "in certain use cases, using key would be faster than a Python 2 style cmp". Fair enough. But there are certainly also uses cases (arguably less common) where the opposite is true, for example when using cmp_to_key.

@FortStatement
Copy link
Author

@asvetlov I'm not sure what were the original concerns, but I checked this PR and the key PR (#11781) with timeit, compare_function is extremely slightly, yet consistently, faster.

timeit -s "from bisect import bisect
                          class C:
                           def __init__(self, n):
                            self.n = n
                          data = [C(n) for n in range(1_000_000)]
                          cmp = C(25)" "bisect(data, cmp, key=lambda x: x.n)"
50000 loops, best of 5: 6.93 usec per loop
timeit -s "from bisect import bisect
                          class C:
                           def __init__(self, n):
                            self.n = n
                          data = [C(n) for n in range(1_000_000)]
                          cmp = C(25)" "bisect(data, cmp, compare_function=lambda a,b: a.n < b.n)"
50000 loops, best of 5: 6.79 usec per loop

I think this is a rather fair comparison.
Anyway, the real issue is syntax compatibility, which I think is subjective.

@remilapeyre
Copy link
Contributor

I just tried and had the opposite result, both with and without pydebug, I wonder what exactly might be doing that.

@rhettinger rhettinger self-assigned this Jun 11, 2019
@rhettinger rhettinger closed this Jun 11, 2019
@cirosantilli
Copy link

cirosantilli commented Sep 3, 2020

Hi there, why was this one closed? Was it superseded/WONTFIXED? This addition would be amazing, I keep hitting it. Let me know if there's any way I can help. Thanks to all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants