-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated setup.cfg mypy flags and resolved related errors. #703
Conversation
@Sanketh7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just questions... I'll also test for if np.nan
does get seen as a float
instead of Any
on my end.
Need to review more too all the cast()
statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
Head branch was pushed to by a user without write access
) | ||
bin_counts_impose = bin_counts_impose_pos + bin_counts_impose_neg | ||
|
||
median_inds = np.abs(bin_counts_impose - 0.5) < 1e-10 | ||
if np.sum(median_inds) > 1: | ||
return np.mean(bin_edges_impose[median_inds]) | ||
return cast(float, np.mean(bin_edges_impose[median_inds])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a similar issue as before:
"median_absolute_deviation": utils.find_diff_of_numbers(
self.median_abs_deviation, other_profile.median_abs_deviation
),
dataprofiler/profilers/numerical_column_stats.py:381: error: Value of type variable "T" of "find_diff_of_numbers" cannot be "object"
Looking into it more, I believe it's because we're trying to assign a Union
to a TypeVar
with bound=Subtractable
and the Union
itself does not follow the Subtractable
protocol. So far the only reasonable solution I can think of is doing:
"median_absolute_deviation": utils.find_diff_of_numbers(
cast(float, self.median_abs_deviation), cast(float, other_profile.median_abs_deviation)
),
which isn't very ideal. I'll look into mypy generics more to see if there's a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See #703 (comment)
dataprofiler/labelers/base_model.py
Outdated
cls, clsname, bases, attrs | ||
) | ||
new_class._register_subclass() | ||
new_class._register_subclass() # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% positive, but I think the bound above should be BaseModel
which would fix this.
We should really try to avoid using these bandaid fixes with #type: ignore
unless absolutely necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sanketh7 can we get rid of this #type: ignore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the typevar to BaseModel
didn't end up fixing it because mypy doesn't detect the relationship between BaseModel
and AutoSubRegistrationMeta
(this caused super(AutoSubRegistrationMeta, cls)
to error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has some interesting suggestions:
https://stackoverflow.com/questions/66121127/calling-new-on-an-any
However, if we can't do this, ultimately we should cast referring to the output as a derivation of BaseModel of some type. Returning it as AutoSubRegistrationMeta
is not truly what matter since it is just a mixin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answer looks similar to what we're doing but they still end up doing a cast at the end. They also have a return type of the metaclass and not the derived class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I think that should go in a separate PR if we have to add anything to further validate. For now, I think as implemented is okay with regards to this PR getting merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it should be BaseModel, T should be identified as T ultimately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend in order to get the rest of this merged, @Sanketh7, revert this change for the AutoSubRegistration
and get it working to pass mypy
checks. Then in a follow-up PR, we can resolve this specific issue more fully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted these changes so we can tackle it in another PR.
@@ -1842,7 +1857,7 @@ def is_int(x: str) -> bool: | |||
return a == b | |||
|
|||
@staticmethod | |||
def np_type_to_type(val: Any) -> Union[int, float]: | |||
def np_type_to_type(val: Any) -> Union[int, float, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this can be any, we technically don't need union, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, will this error then bc we are using Any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the Union was redundant. Changing to Any actually doesn't cause issues with our mypy flag warn_return_any = True
because that only complains when we return Any and the function return type is not Any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -412,7 +413,19 @@ def __sub__(self: T, other: T) -> Any: | |||
T = TypeVar("T", bound=Subtractable) | |||
|
|||
|
|||
@overload | |||
def find_diff_of_numbers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up going with an overload here. The issue is that there are a couple cases where we pass in a Union[float, np.float64]
to find_diff_of_numbers
and Unions don't satisfy the typevar bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice -- I would recommend formatting slightly different based on the docs I was seeing here, but nice improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, disregard this. I'm seeing it both ways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to format it like that but the pre-commit formatter would add the extra new lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, an object shouldn't be casted as Union[float, np.float64]
but has the option to return as one or the other.
I wonder if this meant we did those returns wrong and should have done overloads on those funcs where it could return a float in one or np.float64 in the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason we have the float
case is because np.nan
is float
and some of our methods could return np.nan
if certain conditions aren't met. From what I can see, a lot of these conditions can't be solved by typing overloads (for example, some are based on class fields and not method parameters).
At least for finding the diff between 2 Union[float, np.float64]
, we shouldn't run into issues because float
gets promoted to float64
automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -412,7 +413,19 @@ def __sub__(self: T, other: T) -> Any: | |||
T = TypeVar("T", bound=Subtractable) | |||
|
|||
|
|||
@overload | |||
def find_diff_of_numbers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice -- I would recommend formatting slightly different based on the docs I was seeing here, but nice improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- let's create a follow-up issue for more fully resolving some of these comments in a subsequent, smaller PR
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up PRs as discussed
Added the following flags to
setup.cfg
. Runpre-commit install
thenpre-commit run -a
to test these changes locally.All of the warnings that needed to be resolved were because of
warn_return_any
. Some of my solutions are more elegant than others so let me know if you find a better way.