-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-4356: Add key function support to the bisect module #20556
Conversation
Hi @rhettinger, was there anything wrong with #11781 ? |
Hmm, I didn't see that PR until after this one was made. Looking at it now, I prefer this one which has cleaner code, better docs updates, and focuses on key and leaves out reversed which was not requested by the OP (if that turns out to be needed, it can be added later; for now, it is better to avoid doubling the code complexity). |
I don't care which version gets merged but regarding the reversed argument you said in the review:
|
Also, some of the tests should be combined. Both PRs have tests that the others don't. Am unsure about reversed. It seems like a nice to have, but looking at the other PR, it doubles the code complexity and that spills into the docs/tests as well. What do you think? |
For reversed, most users that know they need If it's ever needed you could add it later but since it increases the complexity without adding functionality, it may be best not to add it.
Edit: Nevermind, I read the doc better and it's actually here. |
BTW, do you know how to appease the buildbots for the doc build. I'm unclear on what they deem "suspicious" and how to clear the blocker. |
You can fix the spurious error os "suspicious" by removing lines 114 and 115 of
|
Thanks. Apologies again for duplicating work. I completely forget about the other PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies again for duplicating work. I completely forget about the other PR.
Don't worry, at least it's great the issue can be closed!
if key is None: | ||
lo = bisect_right(a, x, lo, hi) | ||
else: | ||
lo = bisect_right(a, key(x), lo, hi, key=key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find suspicious that key(x)
is called here and that it's not handled by bisect_right()
, e.g. I would have expected this to work:
>>> from bisect import bisect_left
>>> scores = [('Alice', 30), ('Bob', 20), ('Charles', 10)]
>>> bisect_left(scores, ('David', 25), key=lambda s: -s[1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'tuple'
Is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall the discussion in the other PR whether the "searched for item" should be an argument to key
or a value produced by key
.
It's a bit clearer in the case of insort_*
functions because array member is needed for insertion, thus, key will be applied to this value, thus the latter above.
Perhaps it's easier to disambiguate in the docs by mentioning how the key
will be used, and/or talk about type or shape of key and array members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API is correct in that it supports typical usage patterns. insort() will insert entire records and bisect() will scan the key field for the first match in a range at or above the key value.
In SQL, the pattern would look like this:
CREATE INDEX age_ndx INDEX ON People (age);
INSERT INTO People Values ("raymond", 50);
/* Find the youngest person at or over 42 */
SELECT name, age FROM People WHERE age >= 42 ORDER BY age LIMIT 1;
In Python, the example would look like this:
from bisect import bisect, insort
from operator import attrgetter
from collections import namedtuple
from pprint import pp
Person = namedtuple('Person', ('name', 'age'))
people = [
Person('tom', 30),
Person('sue', 40),
Person('mark', 35),
Person('peter', 55),
]
people.sort(key=attrgetter('age'))
new_person = Person('randy', 50)
insort(people, new_person, key=attrgetter('age'))
pp(people)
print(people[bisect(people, 42, key=attrgetter('age'))])
If bisect() required an entire record as an input, it would preclude the ability to do searches like like this.
Here's another example that we would want to support:
# https://www.quickenloans.com/blog/federal-income-tax-brackets
Bracket = namedtuple('Bracket', ('rate', 'single', 'married_joint', 'married_sep', 'head'))
brackets_2019 = [
Bracket(0.10, 9_700, 19_400, 9_700, 13_850),
Bracket(0.12, 39_475, 78_950, 39_475, 52_850),
Bracket(0.22, 84_200, 168_400, 84_200, 84_200),
Bracket(0.24, 160_725, 321_450, 160_725, 160_700),
]
taxable_income = 60_000
for status in ('single', 'married_joint', 'married_sep', 'head'):
i = bisect(brackets_2019, taxable_income, key=attrgetter(status))
bracket = brackets_2019[i]
print(status, bracket.rate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll echo what Raymond explained. It would be more useful to a library like sortedcontainers if the argument to bisect was the field value rather than the record. After all, the key function can be applied to the record to get the field value if need be. And there are many cases where constructing the record may be expensive or impossible just to determine rank within a list. The field value API is compatible with having the record and key function but not the other way around.
The SortedKeyList data type provides both bisect_left
and bisect_key_left
variants. The bisect_left
variant simply applies the key function and calls bisect_key_left
so I think the API specifying the field value is more generically useful.
@@ -16,7 +16,8 @@ module _bisect | |||
_Py_IDENTIFIER(insert); | |||
|
|||
static inline Py_ssize_t | |||
internal_bisect_right(PyObject *list, PyObject *item, Py_ssize_t lo, Py_ssize_t hi) | |||
internal_bisect_right(PyObject *list, PyObject *item, Py_ssize_t lo, Py_ssize_t hi, | |||
PyObject* key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhettinger thank you so much for adding key
function parameter. Just a small typo here: PyObject* key
-> PyObject *key
Somehow |
https://bugs.python.org/issue4356