Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title: Add Optional Comparator Support to heapq for Enhanced Flexibility #127328

Open
Agent-Hellboy opened this issue Nov 27, 2024 · 5 comments
Open
Labels
extension-modules C modules in the Modules dir pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@Agent-Hellboy
Copy link
Contributor

Agent-Hellboy commented Nov 27, 2024

Feature or enhancement

Proposal:

Currently, the heapq module in CPython uses a fixed implementation based on the default comparison behavior of Python objects. However, this design restricts the usability of heapq in scenarios where users need a custom ordering for their data.

Proposal
Introduce an optional comparator parameter to the heapq module to allow greater flexibility in heap operations. This would eliminate the need for users to create additional wrapper objects or manage unnecessary cognitive overhead for simple use cases (e.g., custom ordering for coordinates, tuples, or complex data types).

Motivation
1 Reduce Cognitive Load: Users currently must wrap data in objects with custom dunder methods (lt, etc.) or use workarounds like storing tuples with artificially ordered keys. This is an unnecessary complexity for scenarios that require only custom sorting logic.
2 Consistency: Other priority queue implementations in programming languages (e.g., Java's PriorityQueue or C++'s std::priority_queue) support custom comparators directly.
3 Maybe Flexibility
4 Instead of explicitly including extra information like distance in the tuple (as in the Dijkstra example), we could pass the raw data along with a comparator. The comparator would determine which element to pop and where to place it in the heap based on the desired priority.

Add an optional key or cmp argument to the following functions in the heapq module:

1 heapify
2 heappush
3 heappop
4 heappushpop
5 heapreplace
6 merge
The comparator could function similarly to the key argument in sorted(), where users provide a callable to define custom ordering logic.

Example Usage: Coordinates

import heapq

data = [(3, 4), (1, 2), (5, 6)]

heap = []
heapq.heapify(data, key=lambda x: x[0] + x[1])
heapq.heappush(heap, (2, 3), key=lambda x: x[0] + x[1])
min_item = heapq.heappop(heap)
print(min_item)  # Expected: (1, 2)

Example Usage: Dijkstra’s Algorithm
Dijkstra’s algorithm for finding the shortest path in a graph often requires a priority queue with custom ordering based on the distance from the source node. The lack of a built-in comparator in heapq makes the implementation less intuitive.

import heapq

def dijkstra(graph, start):
    # Priority queue for nodes based on their current shortest distance
    heap = []
    heapq.heappush(heap, (0, start))  # (distance, node)
    
    distances = {node: float('inf') for node in graph}
    distances[start] = 0

    while heap:
        current_distance, current_node = heapq.heappop(heap)

        # Skip processing if a shorter path has already been found
        if current_distance > distances[current_node]:
            continue

        for neighbor, weight in graph[current_node].items():
            distance = current_distance + weight
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(heap, (distance, neighbor))
    
    return distances

graph = {
    'A': {'B': 1, 'C': 4},
    'B': {'A': 1, 'C': 2, 'D': 5},
    'C': {'A': 4, 'B': 2, 'D': 1},
    'D': {'B': 5, 'C': 1}
}

distances = dijkstra(graph, 'A')
print(distances)  # Output: {'A': 0, 'B': 1, 'C': 3, 'D': 4}

In this example:
The heap uses a tuple of (distance, node) to prioritize nodes based on their distance from the source.
Adding an optional comparator would eliminate the need for tuples and make the code cleaner and more intuitive.

The discussion on Discourse is quite messy, so I created this issue here to gather more focused feedback from developers involved in CPython development, especially the professors(I guess they will agree to it :)). If there’s agreement on this approach, I’m happy to work on implementing it, as I’ve already tested it by modifying the .py file.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

Discussion happened here
https://discuss.python.org/t/create-new-package-similar-to-heapq-but-be-able-to-pass-custom-comparator-through-a-constructor/72136

@Agent-Hellboy Agent-Hellboy added the type-feature A feature request or enhancement label Nov 27, 2024
@picnixz
Copy link
Contributor

picnixz commented Nov 27, 2024

I think we rejected this idea in #106570 (I didn't go through the dpo post though so maybe we can revisit this now).

cc @rhettinger

@picnixz picnixz added stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir pending The issue will be closed if no feedback is provided labels Nov 27, 2024
@Agent-Hellboy
Copy link
Contributor Author

Oh, okay. I read his argument for rejection, and I have a few questions.

the heapq functions do many comparisons per call and we would end-up calling the key function many times for the same key even in a sing heapify call

It would be an optional argument and used only by those who are ready to accept the performance penalty.

Also, by having the user create heaps with prioritized inputs, it is easy to have multiple heaps, each with a different key (heapify by date or heapify by cost). In contrast, a key-function would be error-prone, creating the risk of heapifying with one key-function and popping/pushing with another.

Why do we care? It's an optional parameter, and people are worried about the invalidation of the heap invariant on Discourse as well. However, this is not a language construct used by everyone where we need to worry about code flow invalidation. It’s an optional thing, and users should know what it will do to their code.

We intentionally excluded the heappush and heappop functions because they are too fine-grained and low level to incomporate compute-once logic.

What does "too low-level and fine-grained" mean here? These are public APIs of heapq.

@picnixz
Copy link
Contributor

picnixz commented Nov 27, 2024

It would be an optional argument and used only by those who are ready to accept the performance penalty.

This kind of mindset could work for third party libs but I'm not sure the standard library aims to be flexible here. Note that heaps in other languages such as C++ do allow you to specify your own comparison operator so I personally would like this feature in Python as well.

Why do we care? It's an optional parameter, and people are worried about the invalidation of the heap invariant on Discourse as well.

Keep in mind that the code to modify is not only the pure Python code but also the C code which then requires additional handling. We could say "ok here's the very flexible API and deal with it" but it appears that we don't want to be too flexible.

What does "too low-level and fine-grained" mean here

Either it's because it's meant to rely on C accelerators or it's because it's to say that the operation is a low-level one in the heapq algorithm (think of it as an "basic operation" like an addition or a multiplication when you're doing complex arithmetic).

I'll let Raymond confirm my understandings.

@Agent-Hellboy
Copy link
Contributor Author

Agent-Hellboy commented Nov 27, 2024

This kind of mindset could work for third party libs but I'm not sure the standard library aims to be flexible here.

fair enough. got you.

it's because it's meant to rely on C accelerators

can I see one example? is it an extension-level thing or an interpreter/runtime?
did you mean things like https://github.com/python/cpython/blob/main/Modules/_heapqmodule.c#L315

@picnixz
Copy link
Contributor

picnixz commented Nov 27, 2024

Yes this is what I meant. Maintenance of the C code is something we also consider when adding new features. Usually the pure Python code is only there as a fallback and the C part is usually the one that dictates whether there will be a maintenance burden or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants