-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔧 fix typing issues from checking untyped defs, fixes #509 #510
base: main
Are you sure you want to change the base?
Conversation
clouddrift/wavelet.py
Outdated
gamma: float, | ||
beta: float, | ||
radian_frequency: np.ndarray, | ||
gamma: float | np.ndarray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gamma and beta parameters cannot be np.ndarray the way the function is written
@@ -660,13 +664,12 @@ def cartesian_to_spherical( | |||
return lon, lat | |||
|
|||
|
|||
T = TypeVar("T", bound=float | np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not finding this alias very readable. Why does this help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an alias, its a generic type variable: https://mypy.readthedocs.io/en/stable/generics.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the whole typing thing is confusing to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read more about TypeVar
and actually like it. If I understand correctly, it assigns the type to T
dynamically, correct?
68148f2
to
37113e7
Compare
This code also fixes a bug surfaced by the new type strictness. The bug itself was an issue where the download logic would never check if the upstream source was updated or the same resource when compared to the local data-store always resulting in files being downloaded. Our test code should have caught this but a mismatch in type checking introduced a bit of an oversight. Increase the type strictness should prevent code like where we call functions with the wrong types. Had this been type checked I would have updated to use the right type and realized the function would no longer check for the local modification time since it only handled buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my comments. You will see that I am confused in many places, we need to discuss.
clouddrift/ragged.py
Outdated
@@ -11,17 +12,19 @@ | |||
import pandas as pd | |||
import xarray as xr | |||
|
|||
_Out = t.TypeVar("_Out", bound=tuple[np.ndarray, np.ndarray] | np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, I look at the function input and output arguments to understand how to use it. When I see t.Any
and _Out
, I have no idea what it means. In the same way, what is typing.TypeVar("_Out", bound=tuple[np.ndarray, np.ndarray] | np.ndarray)
. Are we making things harder for the user? Look at the scipy library. Do they have any of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the type _Out
makes reading the function signature more difficult. I wish we could use the direct syntax support for generics that was released in python 3.12 but given were currently supporting python 3.10+ so we'll have to wait a bit before being able to leverage that syntax for generics.
The purpose of using this generic type variable is to annotate that the output type of both the func
passed to apply_ragged
and apply_ragged
itself are both the same. As an example if the output type of the func
variable is a tuple[ndarray, ndarray]
then the output type of apply_ragged
will also be a tuple[ndarray, ndarray]
.
Maybe I'm misunderstanding how this function works but the output type of the function passed to apply_ragged (func
) and apply_ragged
itself should match right?
Regarding t.Any: I used it in places where that was the implicit type and no more specific type could be determined like np.int or something similar. Examples are type annotations such as list
which would implicitly be interpret as list[Any]
by the type checker. In places where I could determine a more specific type I would use that but there were cases where I couldn't and so I would just annotate it with t.Any
.
My hope with the type annotations and stricter type checking is never to make users life harder but to try and improve the libraries robustness to change.
We could take a different approach to typing similar to numpy where they leverage .pyi
or stub files to define types like this
A quick code search on the usage of TypeVar is different per project please see:
Taking a look at the above I see moderate usage throughout the different projects. Some use it to some extent other projects have little use.
clouddrift/ragged.py
Outdated
rows: int | Iterable[int] = None, | ||
ragged_array: np.ndarray | xr.DataArray, | ||
rowsize: np.ndarray[int] | xr.DataArray, | ||
rows: int | np.int_ | Iterable[int] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to have both int
and np.int_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can check on this
clouddrift/signal.py
Outdated
@@ -10,7 +10,7 @@ def analytic_signal( | |||
x: np.ndarray | xr.DataArray, | |||
boundary: str = "mirror", | |||
time_axis: int = -1, | |||
) -> np.ndarray | tuple[np.ndarray, np.ndarray]: | |||
) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output CAN be a tuple depending on the input. Why remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, will update it back.
@@ -660,13 +664,12 @@ def cartesian_to_spherical( | |||
return lon, lat | |||
|
|||
|
|||
T = TypeVar("T", bound=float | np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the whole typing thing is confusing to me
clouddrift/wavelet.py
Outdated
import numpy as np | ||
from scipy.special import gamma as _gamma | ||
from scipy.special import gammaln as _lgamma | ||
|
||
|
||
@t.overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? @t.overload then a repeat of the definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These likely fit better in a stub file but the purpose of it is two fold.
- Document that the output type is directly dependent on the the value of the
complex
parameter - Helps the type checker understand the functions output type is dependent on the
complex
parameters value.
Without this, the type checker will think function calls made to morse_wavelet_transform
can return either type (tuple or a single ndarray) and so will warn us to write code to check the types and handle the two scenarios separately.
clouddrift/wavelet.py
Outdated
def morse_wavelet_transform( | ||
x: np.ndarray, | ||
gamma: float, | ||
beta: float, | ||
radian_frequency: np.ndarray, | ||
complex: bool = False, | ||
complex: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a boolean is False by default? Why remove the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can revert this change.
) | ||
|
||
return wtx | ||
return wavelet_transform(x, wavelet, boundary=boundary, time_axis=time_axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i understand how it is the same logic but to me it is less clear.
clouddrift/wavelet.py
Outdated
@@ -346,7 +367,7 @@ def morse_wavelet( | |||
length: int, | |||
gamma: float, | |||
beta: float, | |||
radian_frequency: np.ndarray, | |||
radian_frequency: float | np.ndarray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure the code needs the radian frequency to be an array, not a float, even with a single element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I will correct this.
@@ -22,7 +22,7 @@ def test_morse_wavelet_transform_real(self): | |||
length = 1023 | |||
radian_frequency = 2 * np.pi / np.logspace(np.log10(10), np.log10(100), 50) | |||
x = np.random.random(length) | |||
wtx = morse_wavelet_transform(x, 3, 10, radian_frequency) | |||
wtx = morse_wavelet_transform(x, 3, 10, radian_frequency, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we have the complex=False a default for this function. You seem to have it changed to a complusory argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert to have this default put back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was forgotten.
This is generally not necessary, but can be helpful for developers (e.g. if you want to require mypy to pass with certain settings on). I didn't look at the full diff, but from the description it seems that the changes are only to the type hints. Python ignores type hints, so the code itself is unchanged in functionality. So, two questions to ask are:
IMO:
|
7c76e65
to
6ebf753
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files📢 Thoughts on this report? Let us know! |
@selipot this is ready for re-review |
What's the advantage of using And if it's really required, I would suggest doing it uniformly. I see some places you have:
and I think we are only using NDArray in the code from this module. |
and I would suggest also doing :
to remove some verbosity in the docstring. |
longitude: float | np.ndarray, | ||
latitude: float | np.ndarray, | ||
) -> tuple[float, float] | tuple[np.ndarray, np.ndarray]: | ||
u: T, v: T, w: T, longitude: V, latitude: V |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'm confused then what is the purpose of the ArrayTypes
from clouddrift.typing
if we use TypeVar
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we define bounds to the type variable so while the type is "variable" we can define some bounds to constrain it a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious as to why we have both T and V here when they seem to have the same definition.
The main reason I swapped the ExampleUsing np.ndarray import numpy as np
from typing import Any
def foo() -> np.ndarray[Any, np.dtype[np.int64]]:
... Using np.typing.NDArray from numpy.typing import NDArray
def foo() -> NDArray[np.int64]:
... |
@@ -725,12 +729,12 @@ def cartesian_to_tangentplane( | |||
return u_projected, v_projected | |||
|
|||
|
|||
T = TypeVar("T", bound=float | np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this defined a second time? I see it is also defined in line 667.
func: Callable[..., _ArrayOutput], | ||
arrays: ArrayTypes, | ||
rowsize: ArrayTypes, | ||
*args: typing.Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might help to keep things consistent to also update the docstrings here.
One thing we should enable is the feature to disallow untyped definitions. This option was previously turned off but I've enabled it and fixed all of the issues listed (100+)
I still want to add some more changes here regarding standardizing on the array types we want to support as well as leveraging generics more with bounds to achieve some of the more dynamic behavior of the library.
This also follows the guidance from mypy to turn these feature on as soon as possible: article