bpo-4356: Add key function support to the bisect module #20556

rhettinger · 2020-05-31T17:50:15Z

https://bugs.python.org/issue4356

remilapeyre · 2020-05-31T18:03:14Z

Hi @rhettinger, was there anything wrong with #11781 ?

rhettinger · 2020-05-31T20:22:45Z

Hmm, I didn't see that PR until after this one was made.

Looking at it now, I prefer this one which has cleaner code, better docs updates, and focuses on key and leaves out reversed which was not requested by the OP (if that turns out to be needed, it can be added later; for now, it is better to avoid doubling the code complexity).

remilapeyre · 2020-05-31T20:36:53Z

I don't care which version gets merged but regarding the reversed argument you said in the review:

FWIW, if key-function support is added, it is inevitable that reversed support will be requested (since sorted() and bisect() are often used together).

rhettinger · 2020-05-31T20:40:10Z

Also, some of the tests should be combined. Both PRs have tests that the others don't.

Am unsure about reversed. It seems like a nice to have, but looking at the other PR, it doubles the code complexity and that spills into the docs/tests as well. What do you think?

remilapeyre · 2020-05-31T20:54:30Z

For reversed, most users that know they need bisect should be able to write the key function so that it gives the correct order directly and don't need the help of reversed. It's not part of the original request and complicates both the Python and C code.

If it's ever needed you could add it later but since it increases the complexity without adding functionality, it may be best not to add it.

~~There is a note recommending lru_cache() (maybe just cache() now) which I probably would not have thought of and may be worth keeping:~~

When specifying a custom key function, you should wrap it with
:func:functools.lru_cache if the key function is not already fast.

Edit: Nevermind, I read the doc better and it's actually here.

rhettinger · 2020-05-31T20:56:39Z

BTW, do you know how to appease the buildbots for the doc build. I'm unclear on what they deem "suspicious" and how to clear the blocker.

remilapeyre · 2020-05-31T21:22:40Z

You can fix the spurious error os "suspicious" by removing lines 114 and 115 of Doc/tools/susp-ignored.csv:

library/bisect,32,:hi,all(val >= x for val in a[i:hi])
library/bisect,42,:hi,all(val > x for val in a[i:hi])

rhettinger · 2020-05-31T22:58:43Z

Thanks.

Apologies again for duplicating work. I completely forget about the other PR.

remilapeyre

Apologies again for duplicating work. I completely forget about the other PR.

Don't worry, at least it's great the issue can be closed!

Doc/library/bisect.rst

Modules/_bisectmodule.c

remilapeyre · 2020-05-31T23:49:26Z

Lib/bisect.py

+    if key is None:
+        lo = bisect_right(a, x, lo, hi)
+    else:
+        lo = bisect_right(a, key(x), lo, hi, key=key)


I find suspicious that key(x) is called here and that it's not handled by bisect_right(), e.g. I would have expected this to work:

>>> from bisect import bisect_left >>> scores = [('Alice', 30), ('Bob', 20), ('Charles', 10)] >>> bisect_left(scores, ('David', 25), key=lambda s: -s[1]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: '<' not supported between instances of 'int' and 'tuple'

Is this on purpose?

I recall the discussion in the other PR whether the "searched for item" should be an argument to key or a value produced by key.

It's a bit clearer in the case of insort_* functions because array member is needed for insertion, thus, key will be applied to this value, thus the latter above.

Perhaps it's easier to disambiguate in the docs by mentioning how the key will be used, and/or talk about type or shape of key and array members?

I think the API is correct in that it supports typical usage patterns. insort() will insert entire records and bisect() will scan the key field for the first match in a range at or above the key value.

In SQL, the pattern would look like this:

CREATE INDEX age_ndx INDEX ON People (age); INSERT INTO People Values ("raymond", 50); /* Find the youngest person at or over 42 */ SELECT name, age FROM People WHERE age >= 42 ORDER BY age LIMIT 1;

In Python, the example would look like this:

from bisect import bisect, insort from operator import attrgetter from collections import namedtuple from pprint import pp Person = namedtuple('Person', ('name', 'age')) people = [ Person('tom', 30), Person('sue', 40), Person('mark', 35), Person('peter', 55), ] people.sort(key=attrgetter('age')) new_person = Person('randy', 50) insort(people, new_person, key=attrgetter('age')) pp(people) print(people[bisect(people, 42, key=attrgetter('age'))])

If bisect() required an entire record as an input, it would preclude the ability to do searches like like this.

Here's another example that we would want to support:

# https://www.quickenloans.com/blog/federal-income-tax-brackets Bracket = namedtuple('Bracket', ('rate', 'single', 'married_joint', 'married_sep', 'head')) brackets_2019 = [ Bracket(0.10, 9_700, 19_400, 9_700, 13_850), Bracket(0.12, 39_475, 78_950, 39_475, 52_850), Bracket(0.22, 84_200, 168_400, 84_200, 84_200), Bracket(0.24, 160_725, 321_450, 160_725, 160_700), ] taxable_income = 60_000 for status in ('single', 'married_joint', 'married_sep', 'head'): i = bisect(brackets_2019, taxable_income, key=attrgetter(status)) bracket = brackets_2019[i] print(status, bracket.rate)

I'll echo what Raymond explained. It would be more useful to a library like sortedcontainers if the argument to bisect was the field value rather than the record. After all, the key function can be applied to the record to get the field value if need be. And there are many cases where constructing the record may be expensive or impossible just to determine rank within a list. The field value API is compatible with having the record and key function but not the other way around.

The SortedKeyList data type provides both bisect_left and bisect_key_left variants. The bisect_left variant simply applies the key function and calls bisect_key_left so I think the API specifying the field value is more generically useful.

Doc/library/bisect.rst

zzl0 · 2020-11-09T05:16:19Z

Modules/_bisectmodule.c

@@ -16,7 +16,8 @@ module _bisect
 _Py_IDENTIFIER(insert);

 static inline Py_ssize_t
-internal_bisect_right(PyObject *list, PyObject *item, Py_ssize_t lo, Py_ssize_t hi)
+internal_bisect_right(PyObject *list, PyObject *item, Py_ssize_t lo, Py_ssize_t hi,
+                      PyObject* key)


@rhettinger thank you so much for adding key function parameter. Just a small typo here: PyObject* key -> PyObject *key

wpk- · 2022-05-03T15:25:41Z

Somehow bisect(lst, fn(x), key=fn) feels very uncomfortable. After all, we're looking to find the insertion point for x in lst, rather than fn(x). I'd really expect the syntax to be bisect(lst, x, key=fn). Would the Python devs be open for such a change?

rhettinger added 3 commits May 31, 2020 09:12

Python version, C version, and tests. Docs still needed

2d1a404

Add documentation

9c9b275

Add a blurb

5f2e4e5

the-knights-who-say-ni added the CLA signed label May 31, 2020

bedevere-bot added the awaiting core review label May 31, 2020

Add spacing around the colon in the in-line code fragments

92c40cb

Silence spurious warning from the doc build

bdfc074

remilapeyre reviewed May 31, 2020

View reviewed changes

csabella mentioned this pull request Aug 23, 2020

bpo-4356: Add key parameter to functions in bisect module #11781

Closed

iritkatriel reviewed Sep 4, 2020

View reviewed changes

Doc/library/bisect.rst Outdated Show resolved Hide resolved

rhettinger added 2 commits October 19, 2020 14:17

Fix typo

5e3e5b9

Fix typo

899767c

dimaqq approved these changes Oct 20, 2020

View reviewed changes

rhettinger merged commit 871934d into python:master Oct 20, 2020

bedevere-bot removed the awaiting core review label Oct 20, 2020

zzl0 reviewed Nov 9, 2020

View reviewed changes

adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021

bpo-4356: Add key function support to the bisect module (pythonGH-20556)

ac8ef7f

nineteendo mentioned this pull request Apr 13, 2024

TypeError in bisect when called with key argument #117848

Closed

Uh oh!

bpo-4356: Add key function support to the bisect module #20556

bpo-4356: Add key function support to the bisect module #20556

Uh oh!

Conversation

rhettinger commented May 31, 2020 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remilapeyre commented May 31, 2020

Uh oh!

rhettinger commented May 31, 2020 • edited by gpshead Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remilapeyre commented May 31, 2020

Uh oh!

rhettinger commented May 31, 2020

Uh oh!

remilapeyre commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented May 31, 2020

Uh oh!

remilapeyre commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented May 31, 2020

Uh oh!

remilapeyre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

remilapeyre May 31, 2020

Choose a reason for hiding this comment

Uh oh!

dimaqq Jun 1, 2020

Choose a reason for hiding this comment

Uh oh!

rhettinger Oct 19, 2020

Choose a reason for hiding this comment

Uh oh!

grantjenks Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zzl0 Nov 9, 2020

Choose a reason for hiding this comment

Uh oh!

wpk- commented May 3, 2022

Uh oh!

Uh oh!

rhettinger commented May 31, 2020 •

edited by bedevere-bot

Loading

rhettinger commented May 31, 2020 •

edited by gpshead

Loading

remilapeyre commented May 31, 2020 •

edited

Loading

remilapeyre commented May 31, 2020 •

edited

Loading