-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Index to be a frozenset #257
Switch Index to be a frozenset #257
Conversation
pangeo_forge_recipes/patterns.py
Outdated
Index = FrozenSet[DimIndex] | ||
else: | ||
# But if we just do this, it won't pass mypy 😖 | ||
Index = frozenset[DimIndex] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests have shown that this does not work on python 3.8, even with from __future__ import annotations
Ok I think I found the solution: class Index(FrozenSet[DimIndex]):
pass |
The upstream-dev tests have surfaced a new issue
Apparently the way we are constructing high-level graphs is not compatible with the latest dask release. I'm inclined to merge this, let the master tests fail, and open a new issue for it. |
/run-test-tutorials |
👍 upstream-dev tests ftw |
Out of curiosity, is there a one-ish sentence way to encapsulate the advantage of a frozenset vs. a tuple here? I wasn't able to immediately grasp the motivation from the linked xarray-beam thread. |
class Index(tuple): | ||
"""A tuple of ``DimIndex`` objects. | ||
The order of the indexes doesn't matter for comparision.""" | ||
|
||
def __new__(self, args: Iterable[DimIndex]): | ||
# This validation really slows things down because we call Index a lot! | ||
# if not all((isinstance(a, DimIndex) for a in args)): | ||
# raise ValueError("All arguments must be DimIndex.") | ||
# args_set = set(args) | ||
# if len(set(args_set)) < len(tuple(args)): | ||
# raise ValueError("Duplicate argument detected.") | ||
return tuple.__new__(Index, args) | ||
|
||
def __str__(self): | ||
return ",".join(str(dim) for dim in self) | ||
|
||
def __eq__(self, other): | ||
return (set(self) == set(other)) and (len(self) == len(other)) | ||
|
||
def __hash__(self): | ||
return hash(frozenset(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class was already basically doing the same thing as a frozenset! So this change simplifies our Index type--a key data structure--to use a built-in type. Simpler and less code to maintain.
For context, this is a tiny part of the changes that will move us in the direction of #256.
Simplification suggested by Stephan in google/xarray-beam#32 (comment).
Update:
I was amazed how simple it was to make this change. I just swapped our our
Index
class withand mostly everything just worked, requiring no change to how indexes are created (e.g.
idx = Index((some, stuff))
). That was a nice surprise.However, this PR has turned up some nasty mypy-related stuff, and also showed that that simple approach doesn't work in 3.8. In 3.8, we would need to do something like
However, if we do that, we get the error
TypeError: Type FrozenSet cannot be instantiated; use frozenset() instead
when we try to instantiate the Index.I would like to say we will just drop python 3.8. However, afaik, beam does not yet work with 3.9. So it looks like we will need some workarounds.
Does anyone know the best way to solve this?