-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
_ensure_type
should use issubclass
#32416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…1925) Commit pandas-dev/pandas@6fd326d in pull request pandas-dev#30613 added `_ensure_type`, which utilizes `isinstance`. However, it is reasonable to assume that someone may want to create a DataFrame subclass. Therefore, `_ensure_type` should use `issubclass`.
I don't completely follow the changes here. isinstance typically considers subclasses to be instances of the parent's type. In [1]: class A:
...: pass
...:
In [2]: class B(A):
...: pass
...:
In [3]: isinstance(B(), A)
Out[3]: True but not the other way around. Why do we have this check in the first place? If it's just for static typing, then I'd prefer to remove the assert and handle it some other way. cc @topper-123 @simonjayhawkins |
to avoid adopting Literal and overloads until the inplace parameter of Series/DataFrame methods is deprecated or repetitively adding asserts following calls with inplace=False so that the return type is either DataFrame or Series and not None.
agreed |
Going from pandas 0.25.* to >= 1.0 breaks any libraries that create subclasses of |
Agreed. if the type annotations, in places, do not use typevars with generic self then this is another issue (but should be fixed). i.e. can you give an example of where you have errors |
I don't totally follow the issue or the fix; can someone provide a full traceback of what the actual issue is? I didn't see that in the OP FWIW the tests are subclassing non-public classes, and the OP doesn't correctly subclass a DataFrame per the guide: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-subclassing-pandas Wondering if the fact that the subclasses don't define |
this approach always assumes _constructor is type(self) |
@@ -93,7 +93,7 @@ def _ensure_type(self: T, obj) -> T: | |||
|
|||
Used by type checkers. | |||
""" | |||
assert isinstance(obj, type(self)), type(obj) | |||
assert issubclass(type(obj), type(self)), type(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of ensure_type is to ensure that we have a type and not None for methods with an inplace parameter where the return type is Optional. does this work?
assert issubclass(type(obj), type(self)), type(obj) | |
assert obj is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will probably also need to change
def _ensure_type(self: T, obj) -> T:
to def _ensure_type(self, obj: Optional[T]) -> T:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @simonjayhawkins 's suggestion works, I think that would be the mest fool-proof, though I'm not sure I understand the issue.
Another thing we probably should consider is never return multiple types, so e.g. no Optional[FrameOrSeries]
until we get to python 3.8 as the lowest python version. If we did that _ensure_type
could be pulled from the code base.
So just to confirm, if the OP is modified to provide a import pandas as pd
import pandas.testing as pdt
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
mdf = MyDataFrame()
df = pd.DataFrame()
mdf.reindex_like(df)
pdt.assert_frame_equal(mdf, df, check_like=True) Without the assert wouldn't we be returning a DataFrame instead of the subclass? I think this is just a red herring |
@WillAyd Yes, the library doesn't override
Here's the traceback.
|
from that output, it looks like the return type of DataFrame.reindex should use a typevar to allow subclassing #32416 (comment) |
changing that, and running mypy gives none the wiser atm and can't seem to find where the DataFrame object is created instead of the subclass the AssertionError implies that super().reindex(**kwargs) is a DataFrame not the subclass |
I don't think this fix actually works, but probably not ideal anyway. Maybe we should just get rid of I will admit I'm not entirely clear on the purpose of the function, even reading through the PR that introduced it, so might just be a lack of understanding on my end |
I'm fine with that, as hinted above, though functions that return multiple types will have their return annotation removed (which I'm ok with if it solves this annoying problem). |
Actually... looking back at the original issue, isn't it example invalid? Do we expect that for an NDFrame subclass, that the type of import pandas as pd
class MyDataFrame(pd.DataFrame):
pass That said, I still don't think that we should have runtime consequences for static type-checking stuff. But it'd be nice to find a better example for a test case than something that's (potentially) flawed). |
We could use a decision here. IIUC, #32416 (comment) provides a more realistic example where things fail. Two options:
diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index 106128004f..0718860e82 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -404,7 +404,8 @@ class DataFrame(NDFrame):
@property
def _constructor(self) -> Type["DataFrame"]:
- return DataFrame
+ return type(self)
+ # return DataFrame
_constructor_sliced: Type[Series] = Series
_deprecations: FrozenSet[str] = NDFrame._deprecations | frozenset([])
I think 1 may be a worthwhile change on its own, but I'm not comfortable doing it in a bugfix release (subclasses may have a different signature in their init). For now I'd prefer removing ensure_type until we can get @simonjayhawkins's suggestion in #32416 (comment) working. |
So I am still +1 to remove at least for 1.0.2 . Can always add back in later if there is a more comprehensive solution found |
Yes, I would also just remove the type checking for 1.0.2, we can discuss further changes related to |
Can someone more familiar with mypy (@simonjayhawkins?) remove the ensure_type stuff? I'm getting mypy errors with the following diff from master diff --git a/pandas/core/base.py b/pandas/core/base.py
index 478b83f538..40ff0640a5 100644
--- a/pandas/core/base.py
+++ b/pandas/core/base.py
@@ -9,7 +9,6 @@ from typing import Dict, FrozenSet, List, Optional, Union
import numpy as np
import pandas._libs.lib as lib
-from pandas._typing import T
from pandas.compat import PYPY
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
@@ -87,15 +86,6 @@ class PandasObject(DirNamesMixin):
# no memory_usage attribute, so fall back to object's 'sizeof'
return super().__sizeof__()
- def _ensure_type(self: T, obj) -> T:
- """
- Ensure that an object has same type as self.
-
- Used by type checkers.
- """
- assert isinstance(obj, type(self)), type(obj)
- return obj
-
class NoNewAttributesMixin:
"""
diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index 30abcafb56..d391f1be2c 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -3637,7 +3637,7 @@ class DataFrame(NDFrame):
# Pop these, since the values are in `kwargs` under different names
kwargs.pop("axis", None)
kwargs.pop("labels", None)
- return self._ensure_type(super().reindex(**kwargs))
+ return super().reindex(**kwargs)
def drop(
self,
@@ -3955,8 +3955,8 @@ class DataFrame(NDFrame):
@Appender(_shared_docs["shift"] % _shared_doc_kwargs)
def shift(self, periods=1, freq=None, axis=0, fill_value=None) -> "DataFrame":
- return self._ensure_type(
- super().shift(periods=periods, freq=freq, axis=axis, fill_value=fill_value)
+ return super().shift(
+ periods=periods, freq=freq, axis=axis, fill_value=fill_value
)
def set_index(
@@ -8409,14 +8409,12 @@ Wild 185.0
from pandas.core.reshape.concat import concat
values = collections.defaultdict(list, values)
- return self._ensure_type(
- concat(
- (
- self.iloc[:, [i]].isin(values[col])
- for i, col in enumerate(self.columns)
- ),
- axis=1,
- )
+ return concat(
+ (
+ self.iloc[:, [i]].isin(values[col])
+ for i, col in enumerate(self.columns)
+ ),
+ axis=1,
)
elif isinstance(values, Series):
if not values.index.is_unique:
diff --git a/pandas/core/generic.py b/pandas/core/generic.py
index f531351747..388a0c0b46 100644
--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -8442,9 +8442,7 @@ class NDFrame(PandasObject, SelectionMixin, indexing.IndexingMixin):
)
if method is not None:
- left = self._ensure_type(
- left.fillna(method=method, axis=fill_axis, limit=limit)
- )
+ left = left.fillna(method=method, axis=fill_axis, limit=limit)
right = right.fillna(method=method, axis=fill_axis, limit=limit)
# if DatetimeIndex have different tz, convert to UTC
@@ -9977,9 +9975,7 @@ class NDFrame(PandasObject, SelectionMixin, indexing.IndexingMixin):
if fill_method is None:
data = self
else:
- data = self._ensure_type(
- self.fillna(method=fill_method, axis=axis, limit=limit)
- )
+ data = self.fillna(method=fill_method, axis=axis, limit=limit)
rs = data.div(data.shift(periods=periods, freq=freq, axis=axis, **kwargs)) - 1
if freq is not None:
diff --git a/pandas/core/reshape/pivot.py b/pandas/core/reshape/pivot.py
index 61aa34f724..0f3b8737cc 100644
--- a/pandas/core/reshape/pivot.py
+++ b/pandas/core/reshape/pivot.py
@@ -150,7 +150,7 @@ def pivot_table(
table = table.sort_index(axis=1)
if fill_value is not None:
- table = table._ensure_type(table.fillna(fill_value, downcast="infer"))
+ table = table.fillna(fill_value, downcast="infer")
if margins:
if dropna:
diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py
index c879eaeda6..f011293273 100644
--- a/pandas/io/formats/format.py
+++ b/pandas/io/formats/format.py
@@ -283,9 +283,7 @@ class SeriesFormatter:
series = series.iloc[:max_rows]
else:
row_num = max_rows // 2
- series = series._ensure_type(
- concat((series.iloc[:row_num], series.iloc[-row_num:]))
- )
+ series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
self.tr_row_num = row_num
else:
self.tr_row_num = None
|
@ericchansen #32633 removed |
@TomAugspurger No, nothing else to do. This will fix the issue I was experiencing. I can't speak to why |
Commit 6fd326d in pull request #30613 added
_ensure_type
, which utilizesisinstance
. However, it is reasonable to assume that someone may want to create a DataFrame subclass. Therefore,_ensure_type
should useissubclass
.black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry isn't necessary?