Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dict keys for order-preserving dedupes instead of set + list #15105

Merged
merged 8 commits into from
Apr 23, 2023

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Apr 23, 2023

I figured this might be faster and less code. Also just curious what the process is like to contribute to mypy.

Comment on lines 823 to 827
# Use a dict for O(1) lookups that preserve order; values are ignored
seen: dict[RType, Any] = {}
for item in items:
if item not in seen:
new_items.append(item)
seen.add(item)
if len(new_items) > 1:
return RUnion(new_items)
seen[item] = None
Copy link
Member

@AlexWaygood AlexWaygood Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop could be written more simply as:

seen = dict.fromkeys(items)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed this, I swear you read my mind :)

return RUnion(new_items)
seen[item] = None
if len(seen) > 1:
return RUnion(list(seen.keys()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call .keys() here, since dictionaries are perfectly iterable :)

Suggested change
return RUnion(list(seen.keys()))
return RUnion(list(seen))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep I'll blame the time

mypyc/ir/rtypes.py Outdated Show resolved Hide resolved
Comment on lines 823 to 827
unique_items = list(dict.fromkeys(items))
if len(unique_items) > 1:
return RUnion(unique_items)
else:
return new_items[0]
return unique_items[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if it actually makes a difference in terms of performance, but I feel like I mildly preferred your earlier idea of only casting it to a list if it's actually necessary, i.e.:

        unique_items = dict.fromkeys(items)
        if len(unique_items) > 1:
            return RUnion(list(unique_items))
        else:
            return next(iter(unique_items))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to indeed have a small impact:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah reverted to that

Copy link
Collaborator

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

(note the make_simplified_union I mention in #15104 is the one in typeops.py, specifically _remove_redundant_union_items. The same kind of idea in #15104 can be applied, but the truthiness logic adds an extra wrinkle that is poorly tested e.g. #15098)

@adriangb
Copy link
Contributor Author

I figured I might be in the wrong place altogether but this was low-hanging fruit so I decided to pick it. I'll try to take a look at _remove_redundant_union_items tomorrow.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

@hauntsaninja hauntsaninja merged commit 315b466 into python:master Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants