Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-4356: Add key function support to the bisect module #20556

Merged
merged 7 commits into from
Oct 20, 2020

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented May 31, 2020

@remilapeyre
Copy link
Contributor

Hi @rhettinger, was there anything wrong with #11781 ?

@rhettinger
Copy link
Contributor Author

rhettinger commented May 31, 2020

Hmm, I didn't see that PR until after this one was made.

Looking at it now, I prefer this one which has cleaner code, better docs updates, and focuses on key and leaves out reversed which was not requested by the OP (if that turns out to be needed, it can be added later; for now, it is better to avoid doubling the code complexity).

@remilapeyre
Copy link
Contributor

I don't care which version gets merged but regarding the reversed argument you said in the review:

FWIW, if key-function support is added, it is inevitable that reversed support will be requested (since sorted() and bisect() are often used together).

@rhettinger
Copy link
Contributor Author

Also, some of the tests should be combined. Both PRs have tests that the others don't.

Am unsure about reversed. It seems like a nice to have, but looking at the other PR, it doubles the code complexity and that spills into the docs/tests as well. What do you think?

@remilapeyre
Copy link
Contributor

remilapeyre commented May 31, 2020

For reversed, most users that know they need bisect should be able to write the key function so that it gives the correct order directly and don't need the help of reversed. It's not part of the original request and complicates both the Python and C code.

If it's ever needed you could add it later but since it increases the complexity without adding functionality, it may be best not to add it.

There is a note recommending lru_cache() (maybe just cache() now) which I probably would not have thought of and may be worth keeping:

When specifying a custom key function, you should wrap it with
:func:functools.lru_cache if the key function is not already fast.

Edit: Nevermind, I read the doc better and it's actually here.

@rhettinger
Copy link
Contributor Author

BTW, do you know how to appease the buildbots for the doc build. I'm unclear on what they deem "suspicious" and how to clear the blocker.

@remilapeyre
Copy link
Contributor

remilapeyre commented May 31, 2020

You can fix the spurious error os "suspicious" by removing lines 114 and 115 of Doc/tools/susp-ignored.csv:

library/bisect,32,:hi,all(val >= x for val in a[i:hi])
library/bisect,42,:hi,all(val > x for val in a[i:hi])

@rhettinger
Copy link
Contributor Author

Thanks.

Apologies again for duplicating work. I completely forget about the other PR.

Copy link
Contributor

@remilapeyre remilapeyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies again for duplicating work. I completely forget about the other PR.

Don't worry, at least it's great the issue can be closed!

if key is None:
lo = bisect_right(a, x, lo, hi)
else:
lo = bisect_right(a, key(x), lo, hi, key=key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find suspicious that key(x) is called here and that it's not handled by bisect_right(), e.g. I would have expected this to work:

>>> from bisect import bisect_left
>>> scores = [('Alice', 30), ('Bob', 20), ('Charles', 10)]
>>> bisect_left(scores, ('David', 25), key=lambda s: -s[1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'tuple'

Is this on purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall the discussion in the other PR whether the "searched for item" should be an argument to key or a value produced by key.

It's a bit clearer in the case of insort_* functions because array member is needed for insertion, thus, key will be applied to this value, thus the latter above.

Perhaps it's easier to disambiguate in the docs by mentioning how the key will be used, and/or talk about type or shape of key and array members?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API is correct in that it supports typical usage patterns. insort() will insert entire records and bisect() will scan the key field for the first match in a range at or above the key value.

In SQL, the pattern would look like this:

CREATE INDEX age_ndx INDEX ON People (age);
INSERT INTO People Values ("raymond", 50);
/* Find the youngest person at or over 42 */
SELECT name, age FROM People WHERE age >= 42 ORDER BY age LIMIT 1;

In Python, the example would look like this:

from bisect import bisect, insort
from operator import attrgetter
from collections import namedtuple
from pprint import pp

Person = namedtuple('Person', ('name', 'age'))

people = [
    Person('tom', 30),
    Person('sue', 40),
    Person('mark', 35),
    Person('peter', 55),
]

people.sort(key=attrgetter('age'))
new_person = Person('randy', 50)
insort(people, new_person, key=attrgetter('age'))
pp(people)
print(people[bisect(people, 42, key=attrgetter('age'))])

If bisect() required an entire record as an input, it would preclude the ability to do searches like like this.

Here's another example that we would want to support:

# https://www.quickenloans.com/blog/federal-income-tax-brackets
Bracket = namedtuple('Bracket', ('rate', 'single', 'married_joint', 'married_sep', 'head'))
brackets_2019 = [
    Bracket(0.10, 9_700, 19_400, 9_700, 13_850),
    Bracket(0.12, 39_475, 78_950, 39_475, 52_850),
    Bracket(0.22, 84_200, 168_400, 84_200, 84_200),
    Bracket(0.24, 160_725, 321_450, 160_725, 160_700),
]
taxable_income = 60_000
for status in ('single', 'married_joint', 'married_sep', 'head'):
    i = bisect(brackets_2019, taxable_income, key=attrgetter(status))
    bracket = brackets_2019[i]
    print(status, bracket.rate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll echo what Raymond explained. It would be more useful to a library like sortedcontainers if the argument to bisect was the field value rather than the record. After all, the key function can be applied to the record to get the field value if need be. And there are many cases where constructing the record may be expensive or impossible just to determine rank within a list. The field value API is compatible with having the record and key function but not the other way around.

The SortedKeyList data type provides both bisect_left and bisect_key_left variants. The bisect_left variant simply applies the key function and calls bisect_key_left so I think the API specifying the field value is more generically useful.

@rhettinger rhettinger merged commit 871934d into python:master Oct 20, 2020
@@ -16,7 +16,8 @@ module _bisect
_Py_IDENTIFIER(insert);

static inline Py_ssize_t
internal_bisect_right(PyObject *list, PyObject *item, Py_ssize_t lo, Py_ssize_t hi)
internal_bisect_right(PyObject *list, PyObject *item, Py_ssize_t lo, Py_ssize_t hi,
PyObject* key)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhettinger thank you so much for adding key function parameter. Just a small typo here: PyObject* key -> PyObject *key

adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
@wpk-
Copy link

wpk- commented May 3, 2022

Somehow bisect(lst, fn(x), key=fn) feels very uncomfortable. After all, we're looking to find the insertion point for x in lst, rather than fn(x). I'd really expect the syntax to be bisect(lst, x, key=fn). Would the Python devs be open for such a change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants