-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: list-like objects are broadcast to each row (1.3 regression) #42549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I see the same issue when using pandas 1.3 from within Goldencheetah |
|
Would |
I can confirm that it does. The following prints from cpython.sequence cimport PySequence_Check
from data_pb2 import Data
proto_data = Data(values=range(3))
print(PySequence_Check(proto_data.values)) |
When I change the check from |
@jbrockmendel I came across this while looking for any issues that may be related to #43373. Modifying cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
cdef object iterfunc = getattr(obj, "__iter__", ...)
if iterfunc is None: # Explicitly not iterable
return False
return (
(iterfunc is not ... or PySequence_Check(obj))
and not isinstance(obj, type)
# we do not count strings/unicode/bytes as list-like
and not isinstance(obj, (str, bytes))
# exclude zero-dimensional numpy arrays, effectively scalars
and not cnp.PyArray_IsZeroDim(obj)
# exclude sets if allow_sets is False
and not (allow_sets is False and isinstance(obj, abc.Set))
) Edit: Realized I forgot to explain myself. Using When using the builtin |
For what it's worth, the change proposed in #42549 (comment) does fix my Goldencheetah use case. |
is_listlike has been changed to use |
Unfortunately no, because the object in the original issue does not have an As far as I can tell, |
@jbrockmendel That change is for when |
@erik-hasse we'll need to check the perf impact but it sounds like PySequence_Check is the way to go here (definitely not @aiudirog it is not obvious to me that we should see an item as listlike if |
You shouldn't. It's explicitly not list like if Here you can see the source code for https://github.com/python/cpython/blob/v3.9.7/Lib/_collections_abc.py#L83 |
changing milestone to 1.3.5 |
good to fix but not needed for 1.3.x |
as an MRE maybe... import pandas as pd
print(pd.__version__)
class MySequence:
def __getitem__(self, key):
return range(3)[key]
def __len__(self):
return 3
my_sequence = MySequence()
df = pd.DataFrame(index=range(3), data={"a": my_sequence})
print(df) on 1.2.5
and on pandas 1.3.x/master
|
tbc the code diff for the proposed change would be... diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx
index f527882a9d..ecd2906744 100644
--- a/pandas/_libs/lib.pyx
+++ b/pandas/_libs/lib.pyx
@@ -1098,9 +1098,12 @@ def is_list_like(obj: object, allow_sets: bool = True) -> bool:
cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
+ cdef object iterfunc = getattr(obj, "__iter__", ...)
+ if iterfunc is None: # Explicitly not iterable
+ return False
return (
- # equiv: `isinstance(obj, abc.Iterable)`
- getattr(obj, "__iter__", None) is not None and not isinstance(obj, type)
+ (iterfunc is not ... or PySequence_Check(obj))
+ and not isinstance(obj, type)
# we do not count strings/unicode/bytes as list-like
and not isinstance(obj, (str, bytes))
# exclude zero-dimensional numpy arrays, effectively scalars but this causes an infinite loop when testing with 1 core and node down for multicore for All other tests pass. |
moving off milestone |
first bad commit: [cec2f5f] REF: handle non-list_like cases upfront in sanitize_array (#38563)
so it appears that the regression could have been caused by the change from also tbc import pandas as pd
print(pd.__version__)
class MySequence:
def __getitem__(self, key):
return range(3)[key]
def __len__(self):
return 3
my_sequence = MySequence()
print(pd._libs.lib.is_list_like(my_sequence))
tbc i've moved off the milestone since IMO the proposed change is not suitable for backport since not only is tested behavior changed (i.e. If we want to fix this for 1.3.x, we could probably consider re-instating the PRs welcome either way. |
What is a earliest version of pandas that has this issue fixed? Is it affecting 1.4.X ? |
It is still an issue with pandas 2.0.0. It was never fixed. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
This requires both pandas and grpcio-tools.
Problem description
On 1.3.0 and the master branch this code prints:
The issue seems to arise from #41592. In 1.2.x this object was handled by
_try_cast
in this else clause. After that change, it's handled byconstruct_1d_arraylike_from_scalar
becauseis_list_like(data)
returns False. Note that RepeatedScalarContainer implementsPyTypeObject.tp_as_sequence
but notPyTypeObject.tp_iter
, solist(proto_data.values)
works fine, but thehasattr(obj, "__iter__")
check inis_list_like
is False.Based on all this, I suspect that this same issue will occur on any object which implements
PyTypeObject.tp_as_sequence
but notPyTypeObject.tp_iter
, however this protobuf object the only example I have right now so I can't test further.I'm not familiar enough with Cython to provide a full fix, but is there some way to examine the struct fields of
obj
inis_list_like
? If so, that function could be amended to checkhasattr(obj, "__iter__") or hasattr(obj, "tp_as_sequence")
. If not I think the logic ofsanitize_array
needs to be amended.Expected Output
On 1.2.x this prints:
Output of
pd.show_versions()
pandas : 1.3.0
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 53.0.0
Cython : 0.29.22
pytest : 6.2.2
hypothesis : 6.3.4
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: