groupby w/ only NULL-groups crashes since 0.23 #21624

crepererum · 2018-06-25T14:22:48Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
import pandas.testing as pdt


# works:
df1 = pd.DataFrame({
    'g': [None, 'a'],
    'x': 1,
})
actual1 = df1.groupby('g')['x'].transform('sum')
expected1 = pd.Series([np.nan, 1.], name='x')

pdt.assert_series_equal(
    actual1,
    expected1,
)


# crashes:
df2 = pd.DataFrame({
    'g': [None],
    'x': 1,
})
actual2 = df2.groupby('g')['x'].transform('sum')
expected2 = pd.Series([np.nan], name='x')  # crashes with "ValueError: Length of passed values is 1, index implies 0"

pdt.assert_series_equal(
    actual2,
    expected2,
)

Problem description

groupby ignores groups that contain NULL-elements in any of the group columns. In that case, results of transform are NULL (NaN for floats, NaT for time, None for objects). The question is what happens if there are only "NULL groups":

before pandas 0.23: an object column with None objects is created
pandas 0.23: crash (ValueError: Length of passed values is 1, index implies 0)

Expected Output

A NULL-column according to the input data type.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.49-moby
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.1
pytest: 3.4.0
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.9.0
xarray: None
IPython: 5.7.0
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.0
xlrd: None
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.8.0
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: 0.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-06-25T16:10:30Z

Thanks for the report - investigation and PRs are always welcome!

Fixes bug where operations such as transform('sum') raise errors when only a single null group exists.

tobycheese · 2019-03-12T09:31:17Z

I came upon this issue when I got a SIGSEGV after a groupby where one collumn contained some NaN values. Removing them avoided the crash.

However, I cannot see a difference in behaviour between 0.22.0 and master, both raise the ValueError for me.

@crepererum Are you sure your example does not raise that error with 0.22.0?

crepererum · 2019-03-12T10:23:02Z

@tobycheese yes.

tobycheese · 2019-03-12T10:30:08Z

Indeed. With Python 2.7.10 and Pandas 0.22.0 there is no error, with Python 3.6.5 and Pandas 0.22.0 there is the ValueError.

mroeschke · 2019-10-16T04:44:00Z

This work on master. Could use a test.

In [83]: df2 = pd.DataFrame({
    ...:     'g': [None],
    ...:     'x': 1,
    ...: })
    ...: actual2 = df2.groupby('g')['x'].transform('sum')
    ...: expected2 = pd.Series([np.nan], name='x')  # crashes with "ValueError: Length of passed values is 1, index implies 0"
    ...:
    ...: pdt.assert_series_equal(
    ...:     actual2,
    ...:     expected2,
    ...: )

In [84]: pd.__version__
Out[84]: '0.26.0.dev0+576.gde67bb72e'

Closes pandas-dev#21624

Closes #21624

Closes pandas-dev#21624

WillAyd added Bug Groupby labels Jun 25, 2018

jorisvandenbossche added Regression Functionality that used to work in a prior pandas version and removed Bug labels Jun 26, 2018

jorisvandenbossche added this to the 0.23.2 milestone Jun 26, 2018

jreback modified the milestones: 0.23.2, 0.23.3 Jun 26, 2018

lopez86 added a commit to lopez86/pandas that referenced this issue Jul 10, 2018

BUG: Fix groupby bug pandas-dev#21624.

3543449

Fixes bug where operations such as transform('sum') raise errors when only a single null group exists.

lopez86 added a commit to lopez86/pandas that referenced this issue Jul 11, 2018

Improving fix to bug pandas-dev#21624.

eddcbd1

lopez86 mentioned this issue Jul 11, 2018

BUG: groupby with no non-empty groups, #21624 #21849

Closed

jreback modified the milestones: 0.23.4, 0.24.0 Jul 20, 2018

jreback modified the milestones: 0.23.4, 0.23.5 Aug 2, 2018

HuntJSparra mentioned this issue Aug 6, 2018

mi.drop(x).get_loc_level(x) returns empty slice (rather than raising KeyError) #22221

Closed

dpalamuri mentioned this issue Oct 8, 2018

Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0' #23050

Closed

jreback modified the milestones: 0.23.5, 0.24.0 Oct 23, 2018

jreback modified the milestones: 0.24.0, Contributions Welcome Dec 2, 2018

mroeschke added good first issue and removed Groupby Regression Functionality that used to work in a prior pandas version labels Oct 16, 2019

mroeschke added the Needs Tests Unit test(s) needed to prevent regressions label Oct 16, 2019

jbrockmendel added the Groupby label Oct 16, 2019

crepererum added a commit to crepererum/pandas that referenced this issue Oct 17, 2019

TST: add regression test for all-none-groupby

b4c9d43

Closes pandas-dev#21624

crepererum added a commit to crepererum/pandas that referenced this issue Oct 18, 2019

TST: add regression test for all-none-groupby

cec322a

Closes pandas-dev#21624

crepererum mentioned this issue Oct 18, 2019

TST: add regression test for all-none-groupby #29067

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Oct 18, 2019

jreback closed this as completed in #29067 Oct 18, 2019

jreback pushed a commit that referenced this issue Oct 18, 2019

TST: add regression test for all-none-groupby (#29067)

2683954

Closes #21624

HawkinsBA pushed a commit to HawkinsBA/pandas that referenced this issue Oct 29, 2019

TST: add regression test for all-none-groupby (pandas-dev#29067)

009ffc4

Closes pandas-dev#21624

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

TST: add regression test for all-none-groupby (pandas-dev#29067)

2671bd9

Closes pandas-dev#21624

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

TST: add regression test for all-none-groupby (pandas-dev#29067)

ebcd8bf

Closes pandas-dev#21624

bongolegend pushed a commit to bongolegend/pandas that referenced this issue Jan 1, 2020

TST: add regression test for all-none-groupby (pandas-dev#29067)

4fcc4c6

Closes pandas-dev#21624

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby w/ only NULL-groups crashes since 0.23 #21624

groupby w/ only NULL-groups crashes since 0.23 #21624

crepererum commented Jun 25, 2018

WillAyd commented Jun 25, 2018

tobycheese commented Mar 12, 2019

crepererum commented Mar 12, 2019

tobycheese commented Mar 12, 2019 •

edited

Loading

mroeschke commented Oct 16, 2019

groupby w/ only NULL-groups crashes since 0.23 #21624

groupby w/ only NULL-groups crashes since 0.23 #21624

Comments

crepererum commented Jun 25, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

WillAyd commented Jun 25, 2018

tobycheese commented Mar 12, 2019

crepererum commented Mar 12, 2019

tobycheese commented Mar 12, 2019 • edited Loading

mroeschke commented Oct 16, 2019

Output of `pd.show_versions()`

tobycheese commented Mar 12, 2019 •

edited

Loading