Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna=True #35612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
arw2019 opened this issue Aug 7, 2020 · 0 comments · Fixed by #35751
Closed
3 tasks done

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna=True #35612

arw2019 opened this issue Aug 7, 2020 · 0 comments · Fixed by #35751
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@arw2019
Copy link
Member

arw2019 commented Aug 7, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

xref #35014

Creating a separate issue as the dropna=True requires a different fix to dropna=False (resolved by #35078)

Problem description

The setup is:

In [1]: import pandas as pd                                                                                                                    
In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]})                                                                        
In [3]: gb = df.groupby("A", dropna=True)                                                                                                      

All three of these commands:

In [4]: gb['B'].transform(len)                                                                                                                 
In [5]: gb[['B']].transform(len)                                                                                                               
In [6]: gb.transform(len)                                                                                                                      

generate a variant of this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-3bae7d67a46f> in <module>
----> 1 gb['B'].transform(len)

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    487 
    488         if not isinstance(func, str):
--> 489             return self._transform_general(
    490                 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
    491             )

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs)
    556 
    557         result.name = self._selected_obj.name
--> 558         result.index = self._selected_obj.index
    559         return result
    560 

/workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value)
   5167         try:
   5168             object.__getattribute__(self, name)
-> 5169             return object.__setattr__(self, name, value)
   5170         except AttributeError:
   5171             pass

/workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
     64 
     65     def __set__(self, obj, value):
---> 66         obj._set_axis(self.axis, value)

/workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    422         if not fastpath:
    423             # The ensure_index call above ensures we have an Index object
--> 424             self._mgr.set_axis(axis, labels)
    425 
    426     # ndarray compatibility

/workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    214 
    215         if new_len != old_len:
--> 216             raise ValueError(
    217                 f"Length mismatch: Expected axis has {old_len} elements, new "
    218                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

Expected Output

All three should return:

Out[9]: 
   B
0  2
1  2
2  1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9843926
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+54.g9843926e3
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 7, 2020
@simonjayhawkins simonjayhawkins added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Aug 11, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Sep 2, 2020
@jreback jreback modified the milestones: 1.2, Contributions Welcome Nov 19, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
3 participants