-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: pandas/core/missing.py #38339
TYP: pandas/core/missing.py #38339
Conversation
f3a0fea
to
830fa00
Compare
pandas/core/missing.py
Outdated
) | ||
if isna(order): | ||
raise ValueError(f"order needs to be specified; got order: {order}") | ||
assert isinstance(order, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use asserts to please mypy if there is any risk for it to bubble up to the user (without it being a panda bug). I didn't look in enough detail to be sure here, but it looks like a user can pass eg a float here, and it should not raise an AssertionError.
xref #37350
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is done because isna won't narrow the type. If order is typed correct, order is None
should be used instead of isna: order must either be a python integer or None. This eliminates the need for the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is done because isna won't narrow the type.
Yes that's the reason for this
If order is typed correct,
order is None
should be used instead of isna: order must either be a python integer or None. This eliminates the need for the assert.
Ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use asserts to please mypy if there is any risk for it to bubble up to the user (without it being a panda bug). I didn't look in enough detail to be sure here, but it looks like a user can pass eg a float here, and it should not raise an AssertionError.
xref #37350
Fair point, noted
pandas/core/missing.py
Outdated
method = alt_methods[method] | ||
new_y = method(x, y, new_x, **kwargs) | ||
|
||
assert isinstance(method, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here: is there any guarantee we already checked this is a string by here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure (don't think so) but if it's not then we'll raise with/without the assert because in the line below method is a key to a dict where all keys are strings
Lines 308 to 314 in 862cd05
# ignores some kwargs that could be passed along. | |
alt_methods = { | |
"barycentric": interpolate.barycentric_interpolate, | |
"krogh": interpolate.krogh_interpolate, | |
"from_derivatives": _from_derivatives, | |
"piecewise_polynomial": _from_derivatives, | |
} |
I can add a more informative error message here, though
pandas/core/missing.py
Outdated
xi: ArrayLike, | ||
yi: ArrayLike, | ||
x: Union[ArrayLike, Scalar], | ||
axis: Optional[int] = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is axis optional? (it has a default of 0 already, so it's always specified?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK Optional
means it could be None
.
That said, the docstring lists this arg as optional so I'll fix that while I'm here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[T]
in type-hinting is not the same as optional in docstrings. The former is an alias for Union[T, None]
while the latter means doesn't need to be specified.
pandas/core/missing.py
Outdated
): | ||
x, | ||
y, | ||
new_x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know anything about x, y, new_x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - added
pandas/core/missing.py
Outdated
@@ -61,7 +63,7 @@ def mask_missing(arr: ArrayLike, values_to_mask) -> np.ndarray: | |||
return mask | |||
|
|||
|
|||
def clean_fill_method(method, allow_nearest: bool = False): | |||
def clean_fill_method(method: str, allow_nearest: bool = False) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like method can be None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good match (mypy missed this)
start_nans = set() if start_nan_idx is None else set(range(start_nan_idx)) | ||
|
||
end_nan_idx = find_valid_index(yvalues, "last") | ||
end_nans = set() if end_nan_idx is None else set(range(1 + end_nan_idx, len(valid))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fixing a bug in the case where end_nan_idx is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because we special case all nans/no nans at the top of the the method
pandas/core/missing.py
Outdated
@@ -324,7 +338,7 @@ def _interpolate_scipy_wrapper( | |||
elif method == "cubicspline": | |||
alt_methods["cubicspline"] = _cubicspline_interpolate | |||
|
|||
interp1d_methods = [ | |||
interp1d_methods: List[str] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy cant figure this out on its own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe I needed this at some point but yes mypy gets this without the annotation with the file as is
pandas/core/missing.py
Outdated
yi: ArrayLike, | ||
x: Union[Scalar, ArrayLike], | ||
der: Optional[int] = 0, | ||
axis: Axis = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think just int; Axis
includes "index" and "columns" which dont get resolved here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, changed to int
pandas/core/missing.py
Outdated
def _akima_interpolate(xi, yi, x, der=0, axis=0): | ||
def _akima_interpolate( | ||
xi: ArrayLike, | ||
yi: ArrayLike, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring says ArrayLike, but im guessing this wouldnt work on EAs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not since this is a wrapper for scipy.interpolate.Akima1DInterpolator
(and mypy works with both of these annotated asnp.ndarray
)
pandas/core/missing.py
Outdated
xi: ArrayLike, | ||
yi: ArrayLike, | ||
x: Union[Scalar, ArrayLike], | ||
der: Optional[int] = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is None really allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, don't think so. Changed to plain int
pandas/core/missing.py
Outdated
@@ -572,12 +612,12 @@ def _interpolate_with_limit_area( | |||
|
|||
|
|||
def interpolate_2d( | |||
values, | |||
values: np.ndarray, | |||
method: str = "pad", | |||
axis: Axis = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guessing just int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to int
pandas/core/missing.py
Outdated
@@ -447,7 +480,14 @@ def _akima_interpolate(xi, yi, x, der=0, axis=0): | |||
return P(x, nu=der) | |||
|
|||
|
|||
def _cubicspline_interpolate(xi, yi, x, axis=0, bc_type="not-a-knot", extrapolate=None): | |||
def _cubicspline_interpolate( | |||
xi: ArrayLike, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be np.ndarray here as well, I think?
pandas/core/missing.py
Outdated
@@ -23,7 +23,9 @@ | |||
from pandas import Index | |||
|
|||
|
|||
def mask_missing(arr: ArrayLike, values_to_mask) -> np.ndarray: | |||
def mask_missing( | |||
arr: ArrayLike, values_to_mask: Union[List, Tuple, Scalar] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values_to_mask
is passed to infer_dtype_from_array
so the type of values_to_mask
should be the same as the arr
argument of infer_dtype_from_array
the arr
argument of infer_dtype_from_array
is not yet typed. (sometimes better to type the lower level functions first, but while np.ndarray resolves to Any
, mypy won't report any inconsistencies)
looking quickly through the body of infer_dtype_from_array, Series and EA are handled, so passing an array-like to values_to_mask
should be fine.
To handle Series, could either add to Union, or could maybe (if no perf implications) also update infer_dtype_from_array
to handle Index as well, and use AnyArrayLike
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the Index check to infer_dtype_from_array
diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py
index 165e63e23d..676ab572a0 100644
--- a/pandas/core/dtypes/cast.py
+++ b/pandas/core/dtypes/cast.py
@@ -83,6 +83,7 @@ from pandas.core.dtypes.generic import (
ABCDatetimeArray,
ABCDatetimeIndex,
ABCExtensionArray,
+ ABCIndexClass,
ABCPeriodArray,
ABCPeriodIndex,
ABCSeries,
@@ -877,7 +878,7 @@ def infer_dtype_from_array(
if pandas_dtype and is_extension_array_dtype(arr):
return arr.dtype, arr
- elif isinstance(arr, ABCSeries):
+ elif isinstance(arr, ABCSeries) or isinstance(arr, ABCIndexClass):
return arr.dtype, np.asarray(arr)
# don't force numpy coerce with nan's
seems to degrade some of the benchmarks that hit this
asv continuous -f 1.1 upstream/master HEAD -b ^algorithms -b ^replace
before after ratio
[d0db0098] [9b685722]
<master~1> <master>
+ 3.41±0.04ms 5.40±0.7ms 1.59 algorithms.Factorize.time_factorize(False, False, 'boolean')
+ 4.50±0.06ms 6.98±0.9ms 1.55 algorithms.Factorize.time_factorize(False, True, 'boolean')
+ 3.27±0.01μs 3.89±0.5μs 1.19 algorithms.Duplicated.time_duplicated(True, 'first', 'float')
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
so I guess a Union
with Series
is the way to go unless there's a more efficient patch for infer_dtype_from_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. Looking again, I see that Index is already handled and falls through the the final return after np.asarray(arr)
. so could probably use Union[AnyArrayLike, Sequence[Any], Scalar]
after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - done
pandas/core/dtypes/cast.py
Outdated
@@ -815,8 +822,8 @@ def dict_compat(d: Dict[Scalar, Scalar]) -> Dict[Scalar, Scalar]: | |||
|
|||
|
|||
def infer_dtype_from_array( | |||
arr, pandas_dtype: bool = False | |||
) -> Tuple[DtypeObj, ArrayLike]: | |||
arr: "Union[ArrayLike, Series, PandasScalar]", pandas_dtype: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for these just use AnyArrayLike
pandas/core/missing.py
Outdated
@@ -23,7 +33,9 @@ | |||
from pandas import Index | |||
|
|||
|
|||
def mask_missing(arr: ArrayLike, values_to_mask) -> np.ndarray: | |||
def mask_missing( | |||
arr: ArrayLike, values_to_mask: Union[AnyArrayLike, Scalar, Sequence[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnyArrayLike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/missing.py
Outdated
How many derivatives to extract; None for all potentially nonzero | ||
derivatives (that is a number equal to the number of points), or a | ||
list of derivatives to extract. This number includes the function | ||
value as 0th derivative. | ||
extrapolate : bool, optional | ||
extrapolate : bool, default False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the alignmnet here is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pandas/core/missing.py
Outdated
def _akima_interpolate( | ||
xi: np.ndarray, | ||
yi: np.ndarray, | ||
x: Union[Scalar, ArrayLike], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these are just np.ndarray right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and similar in almost all places that you are using Union[Scalar, ArrayLike] unless you can reveal_type that this is not the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, they're always ndarray
except in _interpolate_scipy_wrapper
. It's because in _interpolate_scipy_wrapper
we cast any scalar input to ndarray
and these functions always get called from there:
Line 306 in e91cbd7
new_x = np.asarray(new_x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed these, and docstrings too
@arw2019 this is somewhat old, so maybe simpler to start a new PR, or merge master and update; as always on typing if can do the unambiguous ones first |
I'm thinking I'll factor out the easy ones into a separate PR so this is easier to look at and maybe get that merged first (FWIW this got a lot of review so i'm confident the hard ones here are mostly right at this point) |
@@ -39,7 +44,9 @@ | |||
from pandas import Index | |||
|
|||
|
|||
def mask_missing(arr: ArrayLike, values_to_mask) -> np.ndarray: | |||
def mask_missing( | |||
arr: AnyArrayLike, values_to_mask: Union[AnyArrayLike, Scalar, Sequence[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ArrayLike
for arr
was correct here?
(or if not, the docstring needs to be updated as well)
@arw2019 can you merge master and fixup mypy failures |
@arw2019 if you add an ignore, cast or assert, or change/delete an existing annotation or change any actual code, put those changes a minimally scoped PR. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff