Skip to content

Unclear ValueError on core.groupby #15082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jan 8, 2017 · 6 comments
Closed

Unclear ValueError on core.groupby #15082

ghost opened this issue Jan 8, 2017 · 6 comments
Labels
Error Reporting Incorrect or improved errors from pandas Groupby Usage Question

Comments

@ghost
Copy link

ghost commented Jan 8, 2017

Code Sample

 def test_groupby_aggregate_item_by_item(self):
        def test_df():
            s = pd.DataFrame(np.array([[13, 14, 15, 16]]),
                             index=[0],
                             columns=['b', 'c', 'd', 'e'])
            num = np.array([[s, s, s, datetime.strptime('2016-12-28', "%Y-%m-%d"), 'asdf', 24],
                            [s, s, s, datetime.strptime('2016-12-28', "%Y-%m-%d"), 'asdf', 6]])
            columns = ['a', 'b', 'c', 'd', 'e', 'f']
            idx = [x for x in xrange(0, len(num))]
            return pd.DataFrame(num, index=idx, columns=columns)
        c = [test_df().sort_values(['d', 'e', 'f']),
             test_df().sort_values(['d', 'e', 'f'])]
        df = pd.concat(c)
        df = df[["e", "a"]].copy().reset_index(drop=True)
        df["e_idx"] = df["e"]
        what = [0, 0.5, 0.5, 1]

        def x():
            df.groupby(["e_idx", "e"])["a"].quantile(what)
        self.assertRaisesRegexp(ValueError,
                                "'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'",
                                x)

Problem description

The return message from the ValueError in _GroupBy._aggregate_item_by_item is vague.

                except (AttributeError):
>                   raise ValueError
E                   ValueError

core/groupby.py:592: ValueError

The proposed change raises the error message for the user to see.

Expected Output

                except (AttributeError) as e:
>                   raise ValueError(e)
E                   ValueError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'

core/groupby.py:592: ValueError

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: b895968 python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.0+311.gb895968.dirty
nose: 1.3.7
pip: 9.0.1
setuptools: 32.3.1
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_datareader: None

@ghost ghost mentioned this issue Jan 8, 2017
4 tasks
@ghost
Copy link
Author

ghost commented Jan 8, 2017

Possibly related:

@jreback
Copy link
Contributor

jreback commented Jan 9, 2017

this looks like a duplicate of #11759 ?

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Groupby labels Jan 9, 2017
@ghost
Copy link
Author

ghost commented Jan 9, 2017

merged that branch, testing again seems like the same issue occurs with a "ValueError 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'" as in the test that I made still passes.

Any suggestions?

Also, will do some cleanup on the test case, based on the reviews.

Thanks.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2017

@jmunsch you could have simply updated that original PR. but now that its closed GH won't let you re-open. so you can open a new one.

@ghost
Copy link
Author

ghost commented Jan 11, 2017

So after doing some more research I tried moving _aggregate_item_by_item into series, but then I notice some TypeErrors like this: #8472

When trying:

> /Users/jmunsch/Desktop/dev/pandas/pandas/core/groupby.py(2631)_aggregate_item_by_item()
-> cannot_agg.append(item)
(Pdb) item
   num1  num2
0    13    14
(Pdb) e
TypeError('Indexing a Series with DataFrame is not supported, use the appropriate DataFrame column',)
(Pdb) 

I followed the linked issue 11579 more closely, and retried with:

    def test_groupby_aggregate_item_by_item(self):
        s = pd.DataFrame(np.array([[13, 14]]),
                         index=[0],
                         columns=['num1', 'num2'])
        c = [pd.DataFrame([[s.copy(), 'zzz']], index=range(2), columns=['nums', 'node']),
             pd.DataFrame([[s.copy(), 'xxx']], index=range(2), columns=['nums', 'node'])]
        
        df = pd.concat(c)
        df = df[["node", "nums"]].copy().reset_index(drop=True)
        df["node_idx"] = df["node"]

       y = df.set_index(["node_idx", "node"]).groupby("nums").quantile([0.25, 1])

But it returns an empty set, I must be using this incorrectly.
closing.

@ghost ghost closed this as completed Jan 11, 2017
@jreback
Copy link
Contributor

jreback commented Jan 13, 2017

@jmunsch you are doing some really odd things in your example and seem to have embedded dataframes within dataframes.

please show the input data construction and what you are after.

@jreback jreback added this to the No action milestone Jan 13, 2017
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Groupby Usage Question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant