Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable pandas type checking #9213

Merged
merged 28 commits into from
Jul 17, 2024
Merged

Enable pandas type checking #9213

merged 28 commits into from
Jul 17, 2024

Conversation

headtr1ck
Copy link
Collaborator

@headtr1ck headtr1ck commented Jul 7, 2024

Since pandas-stubs is making good progress, I would say it is slowly advantageous to enable type checking for it.
About 200 mypy errors have to be fixed....

  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@headtr1ck
Copy link
Collaborator Author

It seems that mypy is now resolving the operators more precisely and also it seems that this causes some issues with the current implementation.

It looks like we have to completely rework them... :/
This is gonna take some time to get it right!

@headtr1ck
Copy link
Collaborator Author

After some debugging I noticed that DatasetGroupby + Dataset-subclass will return Dataset and not Dataset-subclass.
So our current typing of Self is wrong.

I think refactoring the code to actually return the subclass is blocked by the Coordinates refactor...

@max-sixty
Copy link
Collaborator

It seems that mypy is now resolving the operators more precisely and also it seems that this causes some issues with the current implementation.

It looks like we have to completely rework them... :/

For the pandas items or for all? 🫨

FWIW IIRC the latest mypy fails on main (though that could be a much much smaller & separate change...)

@max-sixty
Copy link
Collaborator

After some debugging I noticed that DatasetGroupby + Dataset-subclass will return Dataset and not Dataset-subclass.
So our current typing of Self is wrong.

I remember hitting similar things in the past. Ideally we'd return the subclass; but to the extent we don't I would heavily deprioritize any big effort to change the typing to state we return Dataset — instead I would vote to push it off until we can return the subclass...

@headtr1ck
Copy link
Collaborator Author

For the pandas items or for all? 🫨

I think I have fixed all mypy issues related to pandas.

FWIW IIRC the latest mypy fails on main (though that could be a much much smaller & separate change...)

I though of quickly fixing these issues here as well, turns out not to be so easy...

@headtr1ck
Copy link
Collaborator Author

Issue seems to be in xarray.core.types with

    try:
        from dask.array import Array as DaskArray
    except ImportError:
        DaskArray = np.ndarray

Somehow that gets evaluated to Any, which in turn makes mypy think that DataArray + anything = DataArray, even when this "anything" equals a Dataset/DatasetGroupby...

If I try to recreate this locally, I don't get Any, so no idea what is going on there...

Done for today, lets see how one solves this issue.

@headtr1ck
Copy link
Collaborator Author

After hours of trying to figure out what is going on, I am close to giving up :/
If I create a file xarray.core.types2.py with the exact same content as xarray.core.types.py, the DaskArray gets correctly resolved in types2 but not types.... WTH is going on?
Is that a mypy bug???? :/

@max-sixty
Copy link
Collaborator

That doesn't sound fun...

If I create a file xarray.core.types2.py with the exact same content as xarray.core.types.py, the DaskArray gets correctly resolved in types2 but not types.... WTH is going on?

Where are the errors from types.py? I don't see them in https://github.com/pydata/xarray/actions/runs/9845800832/job/27182145496?pr=9213#step:8:58

@headtr1ck
Copy link
Collaborator Author

Where are the errors from types.py? I don't see them in https://github.com/pydata/xarray/actions/runs/9845800832/job/27182145496?pr=9213#step:8:58

As I was saying some comments up, the mypy errors are the ones in _typed_ops. But these are only a symptom not the root of the error...

Mainly: Variable + Any = Variable, but should not be the case for e.g. Dataset.
This is because we type the "other" as Union[ArrayLike, Variable, DaskArray], but since DaskArray = Any thats what happens...

@max-sixty
Copy link
Collaborator

Something does seem confusing with DaskArray.

One thing that does seem to fix almost all errors is just excluding DaskArray. Arguably even if we can't improve from here this would be a quite worthwhile tradeoff. In particular there is a lot of code here it would be great to merge before we get conflicts etc, to the extent we can push dask typing precision off...

diff --git a/xarray/core/types.py b/xarray/core/types.py
index 20d94de5..5b231780 100644
--- a/xarray/core/types.py
+++ b/xarray/core/types.py
@@ -178,7 +178,8 @@ def copy(
 T_ExtensionArray = TypeVar("T_ExtensionArray", bound=pd.api.extensions.ExtensionArray)
 
 
-ScalarOrArray = Union["ArrayLike", np.generic, np.ndarray, "DaskArray"]
+# ScalarOrArray = Union[ArrayLike, np.generic, np.ndarray, DaskArray]
+ScalarOrArray = Union[ArrayLike, np.generic, np.ndarray]
 VarCompatible = Union["Variable", "ScalarOrArray"]
 DaCompatible = Union["DataArray", "VarCompatible"]
 DsCompatible = Union["Dataset", "DaCompatible"]

FWIW I also get weirdness around using dmypy with these — after a few changes it reports quite different results than mypy. Maybe it's a version thing...

@headtr1ck
Copy link
Collaborator Author

@max-sixty thank you! That was the solution, and it does not even matter, because a DaskArray is ArrayLike anyway!
Wow, that was some wasted hours.
Still I believe that this is some kind of mypy bug, but too difficult to trace!

@headtr1ck headtr1ck marked this pull request as ready for review July 12, 2024 19:23
@headtr1ck
Copy link
Collaborator Author

remaining mypy errors are from numpy2, see #9231

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive @headtr1ck ! Big lift!

@headtr1ck headtr1ck added the plan to merge Final call for comments label Jul 15, 2024
@headtr1ck headtr1ck enabled auto-merge (squash) July 16, 2024 18:40
@dcherian dcherian disabled auto-merge July 17, 2024 04:00
@dcherian dcherian merged commit 71fce9b into pydata:main Jul 17, 2024
26 of 28 checks passed
@dcherian
Copy link
Contributor

Force-merged since the auto-merge won't work till mypy on main is fixed.

dcherian added a commit to dcherian/xarray that referenced this pull request Jul 17, 2024
* main:
  Enable pandas type checking (pydata#9213)
  Per-variable specification of boolean parameters in open_dataset (pydata#9218)
  test push
  Added a space to the documentation (pydata#9247)
  Fix typing for test_plot.py (pydata#9234)
  Allow mypy to run in vscode (pydata#9239)
  Revert "Test main push"
  Test main push
  Revert "Update _typing.py"
  Update _typing.py
  Add a `.drop_attrs` method (pydata#8258)
@headtr1ck headtr1ck deleted the pandas-stubs branch July 18, 2024 18:51
dcherian added a commit that referenced this pull request Jul 22, 2024
* main:
  add backend intro and how-to diagram (#9175)
  Fix copybutton for multi line examples in double digit ipython cells (#9264)
  Update signature for _arrayfunction.__array__ (#9237)
  Add encode_cf_datetime benchmark (#9262)
  groupby, resample: Deprecate some positional args (#9236)
  Delete ``base`` and ``loffset`` parameters to resample (#9233)
  Update dropna docstring (#9257)
  Grouper, Resampler as public api (#8840)
  Fix mypy on main (#9252)
  fix fallback isdtype method (#9250)
  Enable pandas type checking (#9213)
  Per-variable specification of boolean parameters in open_dataset (#9218)
  test push
  Added a space to the documentation (#9247)
  Fix typing for test_plot.py (#9234)
dcherian added a commit that referenced this pull request Jul 24, 2024
* main: (54 commits)
  Adding `open_datatree` backend-specific keyword arguments (#9199)
  [pre-commit.ci] pre-commit autoupdate (#9202)
  Restore ability to specify _FillValue as Python native integers (#9258)
  add backend intro and how-to diagram (#9175)
  Fix copybutton for multi line examples in double digit ipython cells (#9264)
  Update signature for _arrayfunction.__array__ (#9237)
  Add encode_cf_datetime benchmark (#9262)
  groupby, resample: Deprecate some positional args (#9236)
  Delete ``base`` and ``loffset`` parameters to resample (#9233)
  Update dropna docstring (#9257)
  Grouper, Resampler as public api (#8840)
  Fix mypy on main (#9252)
  fix fallback isdtype method (#9250)
  Enable pandas type checking (#9213)
  Per-variable specification of boolean parameters in open_dataset (#9218)
  test push
  Added a space to the documentation (#9247)
  Fix typing for test_plot.py (#9234)
  Allow mypy to run in vscode (#9239)
  Revert "Test main push"
  ...
dcherian added a commit to JoelJaeschke/xarray that referenced this pull request Jul 25, 2024
…monotonic-variable

* main: (995 commits)
  Adding `open_datatree` backend-specific keyword arguments (pydata#9199)
  [pre-commit.ci] pre-commit autoupdate (pydata#9202)
  Restore ability to specify _FillValue as Python native integers (pydata#9258)
  add backend intro and how-to diagram (pydata#9175)
  Fix copybutton for multi line examples in double digit ipython cells (pydata#9264)
  Update signature for _arrayfunction.__array__ (pydata#9237)
  Add encode_cf_datetime benchmark (pydata#9262)
  groupby, resample: Deprecate some positional args (pydata#9236)
  Delete ``base`` and ``loffset`` parameters to resample (pydata#9233)
  Update dropna docstring (pydata#9257)
  Grouper, Resampler as public api (pydata#8840)
  Fix mypy on main (pydata#9252)
  fix fallback isdtype method (pydata#9250)
  Enable pandas type checking (pydata#9213)
  Per-variable specification of boolean parameters in open_dataset (pydata#9218)
  test push
  Added a space to the documentation (pydata#9247)
  Fix typing for test_plot.py (pydata#9234)
  Allow mypy to run in vscode (pydata#9239)
  Revert "Test main push"
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-typing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants