-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Ambiguous behaviour when transform
groupby
with NaN
s
#17093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@chbrandt : Thanks for doing this! This does look a little weird to me, though perhaps @jreback or @jorisvandenbossche might have more information about this than I do. |
Perhaps, the same problem: df = pd.DataFrame({'A':[1,np.nan],'B':[1,1]})
df.groupby('A').apply(lambda x: x) # works
df.groupby('A').transform(lambda x:x) # ValueError: Length mismatch |
In my case, I catch this error on group-resample-aggregate. The affecting lines seem to be: pandas/pandas/core/resample.py Lines 442 to 444 in 3e9e947
If the series is empty, there's no point in setting a potentially non-empty index on it? |
Edit: This was because there were none values in the grouping column,
|
This looks to work on master now. Could use a test
|
Side note: related Stackoverflow post was updated to account for this progress: Thank you all very much. |
@mroeschke Although this case works fine, still not working for
But the under situation is feasible. Seems weird.
|
This issue was closed based on #17093 (comment), but this output disagrees with what was expected in OP; namely that null keys lead to null values in the output rather than no value in the output. It also disagrees with the transform docs, which say in general that the result of a transform should either be the same length or have the same index (there is inconsistency in the docs here). E.g. from DataFrameGroupBy.transform:
|
Similar issues: #10923, #9697, #9941
Please, consider the following data:
Now, there are two code lines below that do the same think: to output the average of groups as the new rows values. The first one uses a
string function name
, the second one, alambda
function. The first one works, the second, doesn't.The first one, using
'mean'
, is what I was expecting. By all means, it looks strange to me that we have two different behaviours for the same operation.Note: The second one, with
lambda
function, used to work on (pandas) version 0.19.1I first posted this question to SO: https://stackoverflow.com/questions/45333681/handling-na-in-groupby-transform . After some discussion there I started to think that a bug is around.
Thanks
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: