-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{full,zeros,ones}_like typing #6611
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks a lot @headtr1ck .
I couldn't seem to push to the PR, but I added some annotations to our tests so we could also test the type annotations:
commit cb4c5dd2dac49230220beda3fa72b286c8849ab4
Author: Maximilian Roos <m@maxroos.com>
Date: Sun May 15 14:41:46 2022 -0700
Type check more tests
diff --git a/xarray/tests/test_dataset.py b/xarray/tests/test_dataset.py
index 263237d9..950f15e9 100644
--- a/xarray/tests/test_dataset.py
+++ b/xarray/tests/test_dataset.py
@@ -5559,7 +5559,7 @@ def test_binary_op_join_setting(self):
actual = ds1 + ds2
assert_equal(actual, expected)
- def test_full_like(self):
+ def test_full_like(self) -> None:
# For more thorough tests, see test_variable.py
# Note: testing data_vars with mismatched dtypes
ds = Dataset(
@@ -5572,8 +5572,9 @@ def test_full_like(self):
actual = full_like(ds, 2)
expected = ds.copy(deep=True)
- expected["d1"].values = [2, 2, 2]
- expected["d2"].values = [2.0, 2.0, 2.0]
+ # https://github.com/python/mypy/issues/3004
+ expected["d1"].values = [2, 2, 2] # type: ignore
+ expected["d2"].values = [2.0, 2.0, 2.0] # type: ignore
assert expected["d1"].dtype == int
assert expected["d2"].dtype == float
assert_identical(expected, actual)
@@ -5581,8 +5582,8 @@ def test_full_like(self):
# override dtype
actual = full_like(ds, fill_value=True, dtype=bool)
expected = ds.copy(deep=True)
- expected["d1"].values = [True, True, True]
- expected["d2"].values = [True, True, True]
+ expected["d1"].values = [True, True, True] # type: ignore
+ expected["d2"].values = [True, True, True] # type: ignore
assert expected["d1"].dtype == bool
assert expected["d2"].dtype == bool
assert_identical(expected, actual)
@@ -5788,7 +5789,7 @@ def test_ipython_key_completion(self):
ds.data_vars[item] # should not raise
assert sorted(actual) == sorted(expected)
- def test_polyfit_output(self):
+ def test_polyfit_output(self) -> None:
ds = create_test_data(seed=1)
out = ds.polyfit("dim2", 2, full=False)
@@ -5801,7 +5802,7 @@ def test_polyfit_output(self):
out = ds.polyfit("time", 2)
assert len(out.data_vars) == 0
- def test_polyfit_warnings(self):
+ def test_polyfit_warnings(self) -> None:
ds = create_test_data(seed=1)
with warnings.catch_warnings(record=True) as ws:
diff --git a/xarray/tests/test_variable.py b/xarray/tests/test_variable.py
index 0168f19b..886b0360 100644
--- a/xarray/tests/test_variable.py
+++ b/xarray/tests/test_variable.py
@@ -2480,7 +2480,7 @@ def test_datetime(self):
assert np.ndarray == type(actual)
assert np.dtype("datetime64[ns]") == actual.dtype
- def test_full_like(self):
+ def test_full_like(self) -> None:
# For more thorough tests, see test_variable.py
orig = Variable(
dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
@@ -2503,7 +2503,7 @@ def test_full_like(self):
full_like(orig, True, dtype={"x": bool})
@requires_dask
- def test_full_like_dask(self):
+ def test_full_like_dask(self) -> None:
orig = Variable(
dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
).chunk(((1, 1), (2,)))
@@ -2534,14 +2534,14 @@ def check(actual, expect_dtype, expect_values):
else:
assert not isinstance(v, np.ndarray)
- def test_zeros_like(self):
+ def test_zeros_like(self) -> None:
orig = Variable(
dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
)
assert_identical(zeros_like(orig), full_like(orig, 0))
assert_identical(zeros_like(orig, dtype=int), full_like(orig, 0, dtype=int))
- def test_ones_like(self):
+ def test_ones_like(self) -> None:
orig = Variable(
dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
)
|
||
|
||
@overload | ||
def zeros_like(other: Variable, dtype: DTypeLikeSave) -> Variable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these ones which are generic over DataArray
and Variable
, we could define a TypeVar
for them (but no need in this PR, and I'm not even sure whether it's a broader issue than just the *_like
.
@overload | ||
def zeros_like( | ||
other: Dataset | DataArray, dtype: DTypeMaybeMapping = None | ||
) -> Dataset | DataArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this overload is required? I had thought that DataArray
couldn't take a DTypeMaybeMapping
? (and same principle for the next two)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to adjust these locally. An issue I hit is that so much of our typing is on Dataset
rather than T_Dataset
and changing one seems to require changing its dependencies, which makes it difficult to change gradually. 20 minutes into me trying to change this, I had 31 errors in 6 files!
To the extent you know which parts are Perfect vs Not-perfect-but-required-to-pass: If you want to add a comment for the ones the latter, that will make it easier for future travelers to know why things are as they are and hopefully change them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling a lot with TypeVar
definitions.
Some other functions (like polyval
) are calling zeros_like
with DataArray | Dataset
.
This means that a "simple" TypeVar["T", DataArray, Dataset]
will not work.
I tried using TypeVar["T", bound=DataArray|Dataset]
(recommended by some mypy people) but then the if isinstance(x Dataset)
were causing problems (still not sure if that is a mypy bug or intended, my TypeVar knowledge is not good enough for that)...
So this was the only solution I could get to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, the code will actually create a plain e.g. DataArray, so typevars with bounds are actually wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think the proposed code is a big upgrade, and we can refine towards perfection in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I found this SO answer helpful in clarifying the difference — i saw that I had upvoted it before — but I'm still not confident in how we should design these methods.
@@ -1917,7 +1917,7 @@ def polyval( | |||
return res | |||
|
|||
|
|||
def _ensure_numeric(data: T_Xarray) -> T_Xarray: | |||
def _ensure_numeric(data: Dataset | DataArray) -> Dataset | DataArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of func that would be nice at some point to make generic; with the proposed code we lose whether it's a Dataset
vs. DataArray
. (fine to add as a comment / TODO tho)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I failed to make it work with Typevars since it is called with DataArray|Dataset
:(
Thanks a lot @headtr1ck ! I will hit the auto-merge button! |
One thought for any future travelers — if we want to move from |
* main: concatenate docs style (pydata#6621) Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612) {full,zeros,ones}_like typing (pydata#6611)
commit 398f1b6 Author: dcherian <deepak@cherian.net> Date: Fri May 20 08:47:56 2022 -0600 Backward compatibility dask commit bde40e4 Merge: 0783df3 4cae8d0 Author: dcherian <deepak@cherian.net> Date: Fri May 20 07:54:48 2022 -0600 Merge branch 'main' into dask-datetime-to-numeric * main: concatenate docs style (pydata#6621) Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612) {full,zeros,ones}_like typing (pydata#6611) commit 0783df3 Merge: 5cff4f1 8de7061 Author: dcherian <deepak@cherian.net> Date: Sun May 15 21:03:50 2022 -0600 Merge branch 'main' into dask-datetime-to-numeric * main: (24 commits) Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598) Enable flox in GroupBy and resample (pydata#5734) Add setuptools as dependency in ASV benchmark CI (pydata#6609) change polyval dim ordering (pydata#6601) re-add timedelta support for polyval (pydata#6599) Minor Dataset.map docstr clarification (pydata#6595) New inline_array kwarg for open_dataset (pydata#6566) Fix polyval overloads (pydata#6593) Restore old MultiIndex dropping behaviour (pydata#6592) [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558) Fix zarr append dtype checks (pydata#6476) Add missing space in exception message (pydata#6590) Doc Link to accessors list in extending-xarray.rst (pydata#6587) Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579) Add some warnings about rechunking to the docs (pydata#6569) [pre-commit.ci] pre-commit autoupdate (pydata#6584) terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583) Allow string formatting of scalar DataArrays (pydata#5981) Fix mypy issues & reenable in tests (pydata#6581) polyval: Use Horner's algorithm + support chunked inputs (pydata#6548) ... commit 5cff4f1 Merge: dfe200d 6144c61 Author: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Date: Sun May 1 15:16:33 2022 -0700 Merge branch 'main' into dask-datetime-to-numeric commit dfe200d Author: dcherian <deepak@cherian.net> Date: Sun May 1 11:04:03 2022 -0600 Minor cleanup commit 35ed378 Author: dcherian <deepak@cherian.net> Date: Sun May 1 10:57:36 2022 -0600 Support dask arrays in datetime_to_numeric
(partial) typing for functions
full_like
,zeros_like
,ones_like
.I could not figure out how to properly use TypeVars so many things are "hardcoded" with overloads.
I have added a
DTypeLikeSave
tonpcompat
, not sure that this file is supposed to be edited.Problem1:
TypeVar["T", Dataset, DataArray, Variable]
can only be one of these three, but never withUnion[Dataset, DataArray]
which is used in several other places in xarray.Problem2: The official mypy support says: use
TypeVar["T", bound=Union[Dataset, DataArray, Variable]
but the theisinstance(obj, Dataset)
could not be correctly resolved (is that a mypy issue?).So if anyone can get it to work with TypeVars, feel free to change it. :)