🔧 fix typing issues from checking untyped defs, fixes #509 #510

kevinsantana11 · 2024-08-15T04:07:28Z

One thing we should enable is the feature to disallow untyped definitions. This option was previously turned off but I've enabled it and fixed all of the issues listed (100+)

I still want to add some more changes here regarding standardizing on the array types we want to support as well as leveraging generics more with bounds to achieve some of the more dynamic behavior of the library.

This also follows the guidance from mypy to turn these feature on as soon as possible: article

selipot · 2024-08-15T10:48:57Z

clouddrift/wavelet.py

-    gamma: float,
-    beta: float,
-    radian_frequency: np.ndarray,
+    gamma: float | np.ndarray,


Gamma and beta parameters cannot be np.ndarray the way the function is written

selipot · 2024-08-15T10:50:57Z

clouddrift/sphere.py

@@ -660,13 +664,12 @@ def cartesian_to_spherical(
    return lon, lat


+T = TypeVar("T", bound=float | np.ndarray)


I am not finding this alias very readable. Why does this help?

This isn't an alias, its a generic type variable: https://mypy.readthedocs.io/en/stable/generics.html

Sorry, the whole typing thing is confusing to me

I read more about TypeVar and actually like it. If I understand correctly, it assigns the type to T dynamically, correct?

kevinsantana11 · 2024-08-26T04:26:10Z

This code also fixes a bug surfaced by the new type strictness.

The bug itself was an issue where the download logic would never check if the upstream source was updated or the same resource when compared to the local data-store always resulting in files being downloaded.

Our test code should have caught this but a mismatch in type checking introduced a bit of an oversight.

Increase the type strictness should prevent code like where we call functions with the wrong types.

Had this been type checked I would have updated to use the right type and realized the function would no longer check for the local modification time since it only handled buffers.

selipot

Here are my comments. You will see that I am confused in many places, we need to discuss.

selipot · 2024-09-05T16:57:53Z

clouddrift/ragged.py

@@ -11,17 +12,19 @@
 import pandas as pd
 import xarray as xr

+_Out = t.TypeVar("_Out", bound=tuple[np.ndarray, np.ndarray] | np.ndarray)


As a user, I look at the function input and output arguments to understand how to use it. When I see t.Any and _Out, I have no idea what it means. In the same way, what is typing.TypeVar("_Out", bound=tuple[np.ndarray, np.ndarray] | np.ndarray). Are we making things harder for the user? Look at the scipy library. Do they have any of this?

I understand that the type _Out makes reading the function signature more difficult. I wish we could use the direct syntax support for generics that was released in python 3.12 but given were currently supporting python 3.10+ so we'll have to wait a bit before being able to leverage that syntax for generics.

The purpose of using this generic type variable is to annotate that the output type of both the func passed to apply_ragged and apply_ragged itself are both the same. As an example if the output type of the func variable is a tuple[ndarray, ndarray] then the output type of apply_ragged will also be a tuple[ndarray, ndarray].

Maybe I'm misunderstanding how this function works but the output type of the function passed to apply_ragged (func) and apply_ragged itself should match right?

Regarding t.Any: I used it in places where that was the implicit type and no more specific type could be determined like np.int or something similar. Examples are type annotations such as list which would implicitly be interpret as list[Any] by the type checker. In places where I could determine a more specific type I would use that but there were cases where I couldn't and so I would just annotate it with t.Any.

My hope with the type annotations and stricter type checking is never to make users life harder but to try and improve the libraries robustness to change.

We could take a different approach to typing similar to numpy where they leverage .pyi or stub files to define types like this

A quick code search on the usage of TypeVar is different per project please see:

scipy usage

numpy usage

xarray usage

matplotlib usage

pandas usage

Taking a look at the above I see moderate usage throughout the different projects. Some use it to some extent other projects have little use.

selipot · 2024-09-05T17:00:16Z

clouddrift/ragged.py

-    rows: int | Iterable[int] = None,
+    ragged_array: np.ndarray | xr.DataArray,
+    rowsize: np.ndarray[int] | xr.DataArray,
+    rows: int | np.int_ | Iterable[int] | None = None,


is it necessary to have both int and np.int_?

I can check on this

selipot · 2024-09-05T17:02:28Z

clouddrift/signal.py

@@ -10,7 +10,7 @@ def analytic_signal(
    x: np.ndarray | xr.DataArray,
    boundary: str = "mirror",
    time_axis: int = -1,
-) -> np.ndarray | tuple[np.ndarray, np.ndarray]:
+) -> np.ndarray:


The output CAN be a tuple depending on the input. Why remove it?

Got it, will update it back.

selipot · 2024-09-05T17:24:22Z

clouddrift/sphere.py

@@ -660,13 +664,12 @@ def cartesian_to_spherical(
    return lon, lat


+T = TypeVar("T", bound=float | np.ndarray)


Sorry, the whole typing thing is confusing to me

selipot · 2024-09-05T17:29:53Z

clouddrift/wavelet.py

 import numpy as np
 from scipy.special import gamma as _gamma
 from scipy.special import gammaln as _lgamma


+@t.overload


What is this for? @t.overload then a repeat of the definition?

These likely fit better in a stub file but the purpose of it is two fold.

Document that the output type is directly dependent on the the value of the complex parameter

Helps the type checker understand the functions output type is dependent on the complex parameters value.

Without this, the type checker will think function calls made to morse_wavelet_transform can return either type (tuple or a single ndarray) and so will warn us to write code to check the types and handle the two scenarios separately.

selipot · 2024-09-05T17:35:45Z

clouddrift/wavelet.py

 def morse_wavelet_transform(
    x: np.ndarray,
    gamma: float,
    beta: float,
    radian_frequency: np.ndarray,
-    complex: bool = False,
+    complex: bool,


a boolean is False by default? Why remove the default?

I can revert this change.

selipot · 2024-09-05T17:36:28Z

clouddrift/wavelet.py

-        )
-
-    return wtx
+    return wavelet_transform(x, wavelet, boundary=boundary, time_axis=time_axis)


ok i understand how it is the same logic but to me it is less clear.

selipot · 2024-09-05T17:38:25Z

clouddrift/wavelet.py

@@ -346,7 +367,7 @@ def morse_wavelet(
    length: int,
    gamma: float,
    beta: float,
-    radian_frequency: np.ndarray,
+    radian_frequency: float | np.ndarray,


I am pretty sure the code needs the radian frequency to be an array, not a float, even with a single element.

Got it, I will correct this.

selipot · 2024-09-05T17:42:31Z

tests/wavelet_tests.py

@@ -22,7 +22,7 @@ def test_morse_wavelet_transform_real(self):
        length = 1023
        radian_frequency = 2 * np.pi / np.logspace(np.log10(10), np.log10(100), 50)
        x = np.random.random(length)
-        wtx = morse_wavelet_transform(x, 3, 10, radian_frequency)
+        wtx = morse_wavelet_transform(x, 3, 10, radian_frequency, False)


Why can't we have the complex=False a default for this function. You seem to have it changed to a complusory argument?

I will revert to have this default put back

I think this was forgotten.

milancurcic · 2024-09-05T18:04:24Z

This is generally not necessary, but can be helpful for developers (e.g. if you want to require mypy to pass with certain settings on).

I didn't look at the full diff, but from the description it seems that the changes are only to the type hints. Python ignores type hints, so the code itself is unchanged in functionality.

So, two questions to ask are:

Are these hints helpful for the contributors?
Are these hints helpful for the end user?

IMO:

They can be in moderation, if I can understand the types. But if you develop a type system only to satisfy a strict type checking tool, e.g. mypy, it can easily escalate to a type system nightmare. There are some type definitions here that I don't understand on first look. Doesn't mean that I couldn't understand them if I studied them, but they're complex enough that they're not obvious.
Probably not; the users will mostly read the docstrings.

…ed accordingly.

codecov · 2024-10-09T02:49:25Z

Codecov Report

Attention: Patch coverage is 95.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
clouddrift/adapters/gdp/gdpsource.py	90.00%	1 Missing ⚠️
clouddrift/adapters/utils.py	83.33%	1 Missing ⚠️
clouddrift/typing.py	92.30%	1 Missing ⚠️

Additional details and impacted files

📢 Thoughts on this report? Let us know!

kevinsantana11 · 2024-10-09T03:12:24Z

@selipot this is ready for re-review

philippemiron · 2024-10-17T15:01:17Z

What's the advantage of using from numpy.typing import NDArray instead of np.ndarray ?

And if it's really required, I would suggest doing it uniformly. I see some places you have:

from numpy.typing import NDArray
import numpy.typing as np_typing

and I think we are only using NDArray in the code from this module.

philippemiron · 2024-10-17T15:05:54Z

and I would suggest also doing :

from clouddrift.typing import ArrayTypes

to remove some verbosity in the docstring.

philippemiron · 2024-10-17T15:21:05Z

clouddrift/sphere.py

-    longitude: float | np.ndarray,
-    latitude: float | np.ndarray,
-) -> tuple[float, float] | tuple[np.ndarray, np.ndarray]:
+    u: T, v: T, w: T, longitude: V, latitude: V


but I'm confused then what is the purpose of the ArrayTypes from clouddrift.typing if we use TypeVar here.

Here we define bounds to the type variable so while the type is "variable" we can define some bounds to constrain it a bit

I am curious as to why we have both T and V here when they seem to have the same definition.

pyproject.toml

kevinsantana11 · 2024-10-18T23:50:58Z

What's the advantage of using from numpy.typing import NDArray instead of np.ndarray ?

And if it's really required, I would suggest doing it uniformly. I see some places you have:
from numpy.typing import NDArray
import numpy.typing as np_typing
and I think we are only using NDArray in the code from this module.

The main reason I swapped the ndarray class out with the NDArray one is because it makes the type annotations less verbose. Using np.ndarray would be preferable if not for the extra verbosity required when using it as an annotation. It's one less import and more familiar to users but what made me consider using NDArray was that it only required we a dtype as a a type variable. To provide a fully concrete type using the ndarray class, a type for the shape must also be provided and the dtype (np.int64, np.float64, etc...) must be provided as a type variable of the np.dtype type. An example of what I mean in code can be found below:

Example

Using np.ndarray

import numpy as np
from typing import Any

def foo() -> np.ndarray[Any, np.dtype[np.int64]]:
    ...

Using np.typing.NDArray

from numpy.typing import NDArray

def foo() -> NDArray[np.int64]:
    ...

KevinShuman · 2024-11-13T15:21:02Z

clouddrift/sphere.py

@@ -725,12 +729,12 @@ def cartesian_to_tangentplane(
    return u_projected, v_projected


+T = TypeVar("T", bound=float | np.ndarray)


Why is this defined a second time? I see it is also defined in line 667.

KevinShuman · 2024-11-14T17:42:00Z

clouddrift/ragged.py

+    func: Callable[..., _ArrayOutput],
+    arrays: ArrayTypes,
+    rowsize: ArrayTypes,
+    *args: typing.Any,


It might help to keep things consistent to also update the docstrings here.

selipot reviewed Aug 15, 2024

View reviewed changes

selipot assigned kevinsantana11 Aug 15, 2024

kevinsantana11 force-pushed the 509-untyped-defs branch 2 times, most recently from 68148f2 to 37113e7 Compare August 24, 2024 03:52

selipot reviewed Sep 5, 2024

View reviewed changes

selipot requested a review from milancurcic September 5, 2024 17:43

kevinsantana11 mentioned this pull request Sep 29, 2024

🐛 fix bug not checking upstream update time and move buffer clean up logic #521

Merged

kevinsantana11 added 13 commits October 8, 2024 19:14

fix typing issues from checking untyped defs, fixes Cloud-Drift#509

eec967e

Increase type strictness

333be1d

formatting

8eb3905

typevar not needed

28eabf8

morse_wavelet should only accept gamma and beta as float. Tests updat…

654715d

…ed accordingly.

fix subscriptable types not available at runtime only compile time

21249dd

update checks

028152c

fix callable types

646e0c1

Use overloading and type vars

2d3dba9

fmt

2544069

Fix issues with downloading

ba9a34e

use different types

6e344e5

Address feedback

6ebf753

kevinsantana11 force-pushed the 509-untyped-defs branch from 7c76e65 to 6ebf753 Compare October 9, 2024 02:18

kevinsantana11 added 5 commits October 8, 2024 19:22

dont use float_ and formatting

c4f7b42

use float64 instead of float_

4b146ee

fix typing for utils

1ed1905

ignore transfer module

187c7bd

reverse

69a9fe4

kevinsantana11 added 2 commits October 8, 2024 19:49

simplify, cleanup and docs

1a61e41

Make v optional again

200f1e6

philippemiron self-requested a review October 17, 2024 14:01

philippemiron reviewed Oct 17, 2024

View reviewed changes

pyproject.toml Show resolved Hide resolved

use specific module imports

096eaef

KevinShuman reviewed Nov 13, 2024

View reviewed changes

KevinShuman reviewed Nov 14, 2024

View reviewed changes

		@@ -660,13 +664,12 @@ def cartesian_to_spherical(
		return lon, lat


		T = TypeVar("T", bound=float \| np.ndarray)

		@@ -725,12 +729,12 @@ def cartesian_to_tangentplane(
		return u_projected, v_projected


		T = TypeVar("T", bound=float \| np.ndarray)

🔧 fix typing issues from checking untyped defs, fixes #509 #510

Are you sure you want to change the base?

🔧 fix typing issues from checking untyped defs, fixes #509 #510

Conversation

kevinsantana11 commented Aug 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philippemiron Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

kevinsantana11 commented Aug 26, 2024

selipot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milancurcic commented Sep 5, 2024

codecov bot commented Oct 9, 2024 • edited Loading

Codecov Report

kevinsantana11 commented Oct 9, 2024

philippemiron commented Oct 17, 2024 • edited Loading

philippemiron commented Oct 17, 2024

philippemiron Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinsantana11 commented Oct 18, 2024 • edited Loading

Example

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philippemiron Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 9, 2024 •

edited

Loading

philippemiron commented Oct 17, 2024 •

edited

Loading

philippemiron Oct 17, 2024 •

edited

Loading

kevinsantana11 commented Oct 18, 2024 •

edited

Loading