We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
xref #35014
Creating a separate issue as the dropna=True requires a different fix to dropna=False (resolved by #35078)
dropna=True
dropna=False
The setup is:
In [1]: import pandas as pd In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]}) In [3]: gb = df.groupby("A", dropna=True)
All three of these commands:
In [4]: gb['B'].transform(len) In [5]: gb[['B']].transform(len) In [6]: gb.transform(len)
generate a variant of this error
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-4-3bae7d67a46f> in <module> ----> 1 gb['B'].transform(len) /workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs) 487 488 if not isinstance(func, str): --> 489 return self._transform_general( 490 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs 491 ) /workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs) 556 557 result.name = self._selected_obj.name --> 558 result.index = self._selected_obj.index 559 return result 560 /workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value) 5167 try: 5168 object.__getattribute__(self, name) -> 5169 return object.__setattr__(self, name, value) 5170 except AttributeError: 5171 pass /workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__() 64 65 def __set__(self, obj, value): ---> 66 obj._set_axis(self.axis, value) /workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath) 422 if not fastpath: 423 # The ensure_index call above ensures we have an Index object --> 424 self._mgr.set_axis(axis, labels) 425 426 # ndarray compatibility /workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels) 214 215 if new_len != old_len: --> 216 raise ValueError( 217 f"Length mismatch: Expected axis has {old_len} elements, new " 218 f"values have {new_len} elements" ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements
All three should return:
Out[9]: B 0 2 1 2 2 1
pd.show_versions()
commit : 9843926 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-42-generic Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : C.UTF-8 LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.2.0.dev0+54.g9843926e3 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.1.0.post20200704 Cython : 0.29.21 pytest : 5.4.3 hypothesis : 5.19.0 sphinx : 3.1.1 blosc : None feather : None xlsxwriter : 1.2.9 lxml.etree : 4.5.2 html5lib : 1.1 pymysql : None psycopg2 : 2.8.5 (dt dec pq3 ext lo64) jinja2 : 2.11.2 IPython : 7.16.1 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fsspec : 0.7.4 fastparquet : 0.4.0 gcsfs : 0.6.2 matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.4 pandas_gbq : None pyarrow : 0.17.1 pytables : None pyxlsb : None s3fs : 0.4.2 scipy : 1.5.0 sqlalchemy : 1.3.18 tables : 3.6.1 tabulate : 0.8.7 xarray : 0.15.1 xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.50.1
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
xref #35014
Creating a separate issue as the
dropna=True
requires a different fix todropna=False
(resolved by #35078)Problem description
The setup is:
All three of these commands:
generate a variant of this error
Expected Output
All three should return:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 9843926
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.0.dev0+54.g9843926e3
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1
The text was updated successfully, but these errors were encountered: