Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a key for the heapq functions #106570

Closed
amrear opened this issue Jul 9, 2023 · 1 comment
Closed

Using a key for the heapq functions #106570

amrear opened this issue Jul 9, 2023 · 1 comment
Labels
type-feature A feature request or enhancement

Comments

@amrear
Copy link

amrear commented Jul 9, 2023

Feature or enhancement

The ability to use a key in order to heappop, heappush, heapify, etc. can come in handy in situations where the objects that are being pushed to the heap are not comparable. This functionality is similar to the key argument of the sorted function. It is up to the developer to keep the key argument consistent where more than one heapq function is called on a single heap. This functionality can be propagated to the PriorityQueue of the queue library which uses heapq under the hood.

@amrear amrear added the type-feature A feature request or enhancement label Jul 9, 2023
@rhettinger
Copy link
Contributor

rhettinger commented Jul 9, 2023

Internally, the heapq functions do many comparisons per call and we would end-up calling the key function many times for the same key even in a sing heapify call. This isn't efficient and is at odds with the expectation (from sorted, min, max, nsmallest, nlargest, merge, and groupby) that a key function will be called at most once per key.

Instead of a key function, the docs recommend, "the solution to the problem of non-comparable tasks is to create a wrapper class that ignores the task item and only compares the priority field". The docs then provides this simple recipe:

@dataclass(order=True)
class PrioritizedItem:
    priority: int
    item: Any=field(compare=False)

That allows the priorities to be made explicit, for the priority to be computed only once, for the priority to be made visible via the __repr__, and for the basic heap algorithms to be used as-is.

If we went down the path of adding a key-function, it creates an unexpected efficiency trap and creates a need to add caching to the key-function which has its own challenges.

Also, by having the user create heaps with prioritized inputs, it is easy to have multiple heaps, each with a different key (heapify by date or heapify by cost). In contrast, a key-function would be error-prone, creating the risk of heapifying with one key-function and popping/pushing with another.

Note that we did add key functions for nsmallest, nlargest, and merge because it was possible to avoid the above issues by pre-computing the key function exactly once per key. We intentionally excluded the heappush and heappop functions because they are too fine-grained and low level to incomporate compute-once logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants