Skip to content

ENH: Allow to group by an empty list #35366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tomjaguarpaw opened this issue Jul 21, 2020 · 14 comments
Closed

ENH: Allow to group by an empty list #35366

tomjaguarpaw opened this issue Jul 21, 2020 · 14 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Groupby

Comments

@tomjaguarpaw
Copy link

tomjaguarpaw commented Jul 21, 2020

Is your feature request related to a problem?

Yes, I cannot group by an empty list.

Describe the solution you'd like

I would like to be able to group by an empty list of columns. At the moment I get an error.

>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).groupby([]).agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
Traceback (most recent call last):
...
ValueError: No group keys passed!

I would prefer to see

>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).groupby([]).agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
aagg  bagg
   4     6

which would generalise grouping by a non-empty list, which works fine.

>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).groupby(["a", "b"]).agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
     aagg  bagg
a b
1 2     1     2
3 4     3     4
>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).groupby(["a"]).agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
   aagg  bagg
a
1     1     2
3     3     4

It seems to me this would be reasonable behaviour. Is there a particular reason it is not supported?

API breaking implications

None that I know of.

Describe alternatives you've considered

A partial workaround is to cook up a grouping function for simulating the behaviour I want, but this lacks uniformity.

>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).groupby(lambda _: "Only row").agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
          aagg  bagg
Only row     4     6

But I don't really want "Only row" to be there. I just want

>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).groupby([]).agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
aagg  bagg
   4     6
@tomjaguarpaw tomjaguarpaw added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 21, 2020
@gurukiran07
Copy link
Contributor

gurukiran07 commented Jul 21, 2020

If you don't want to group by anything (why use DataFrame.groupby in the first place) then you can use pandas.DataFrame.agg

df.agg({'a':'sum', 'b':'sum'}) # gives Series as output, if DataFrame is needed use `to_frame`

But this doesn't support named aggergation AFAIK.

@simonjayhawkins
Copy link
Member

It seems to me this would be reasonable behaviour. Is there a particular reason it is not supported?

seems reasonable. PRs and contributions welcome.

may want to redo the OP and use the feature request template (available when opening issue but repeated in details below)

#### Is your feature request related to a problem?

[this should provide a description of what the problem is, e.g. "I wish I could use pandas to do [...]"]

Describe the solution you'd like

[this should provide a description of the feature request, e.g. "DataFrame.foo should get a new parameter bar that [...]", try to write a docstring for the desired feature]

API breaking implications

[this should provide a description of how this feature will affect the API]

Describe alternatives you've considered

[this should provide a description of any alternative solutions or features you've considered]

Additional context

[add any other context, code examples, or references to existing implementations about the feature request here]

# Your code here, if applicable

@simonjayhawkins simonjayhawkins added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 21, 2020
@simonjayhawkins
Copy link
Member

But this doesn't support named aggergation AFAIK.

1.1.0 gives

>>> pd.__version__
'1.1.0rc0+2.g3e88e170a'
>>>
>>> df = pd.DataFrame([{"a": 1, "b": 2}, {"a": 3, "b": 4}])
>>> df
   a  b
0  1  2
1  3  4
>>>
>>> df.agg(aagg=("a", "sum"), bagg=("b", "sum"))
        a    b
aagg  4.0  NaN
bagg  NaN  6.0
>>>

but 1.0.1 raises TypeError: aggregate() missing 1 required positional argument: 'func'

The output seems iffy. not looked in detail. This may be an issue with 1.1.0.

@tomjaguarpaw
Copy link
Author

may want to redo the OP and use the feature request template (available when opening issue but repeated in details below)

I see. I didn't realise that was to be taken as the literal structure. I have edited my OP.

1.1.0 gives

Interesting. That output looks like a bug to me. I am on 1.0.3.

@gurukiran07
Copy link
Contributor

gurukiran07 commented Jul 21, 2020

1.1.0 gives

Great, good to know.

I did not test in my development version. I tested it in 1.0.3, cool feature to have.
Issue related to named aggregation in df.agg and series.agg #26513

@tomjaguarpaw
Copy link
Author

If you don't want to group by anything (why use DataFrame.groupby in the first place) then you can use pandas.DataFrame.agg

Firstly because allowing an empty list would be more uniform (perhaps it's a parameter passed in my someone else) and secondly, that's what I tried first, but it doesn't support what I want (what I think you refer to as "named aggregation"):

>>> DataFrame([{"a":1, "b":2}, {"a":3, "b":4}]).agg(aagg=('a', 'sum'), bagg=('b', 'sum'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: aggregate() missing 1 required positional argument: 'func'

@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2020

Generic support for named aggregations was added very recently in #29116 so I think you can achieve the result you want without the groupby on master, though to @simonjayhawkins point maybe there are sum bugs to be worked out

FWIW I am -1 on changes to groupby here as I agree with @gurukiran07 comment that agg should generically handle

@gurukiran07
Copy link
Contributor

Here's a related question on SO How to do a groupby on an empty set of columns pandas where Wes Mckinney posted

Having an analogous DataFrame.aggregate method is a good idea.

This may be one of the reasons why DataFrame.agg and Series.agg are in API.

As @WillAyd and @simonjayhawkins pointed out named aggregations now allowed with Series.agg, DataFrame.agg (Though buggy as of now)

it's a parameter passed in my someone else

A workaround can be:

if group_list:
    # call `df.groupby(group_list).agg(...)`
else:
    # call `df.agg(...)`

@simonjayhawkins
Copy link
Member

Generic support for named aggregations was added very recently in #29116 so I think you can achieve the result you want without the groupby on master, though to @simonjayhawkins point maybe there are sum bugs to be worked out

#29116 didn't add documentation on the enhancement (one-line whatsnew add later in #35220). I'm not sure that it's a bug, the tests appear this was intentional. imo the output should maybe be a Series, the output above is confusing.

@tomjaguarpaw
Copy link
Author

A workaround can be

Sure, but why work around it when it's a perfectly coherent thing for the pandas API to support?

@douglas-raillard-arm
Copy link

+1 for that feature, it currently makes using pandas as a backend of another library difficult.

I'm currently writing functions that expose an optional group_on parameter to the user, which obviously defaults to None. In order to support that, I'm forced to have a separate path in my code, with different data types (GroupBy object on one side, DataFrame on the other). Given the subtle differences of various same-looking APIs in pandas, this is a recipe for bug, in addition to gratuitous code complexity. For example, DataFrame.first() is totally different from GroupBy.first(). Similarly, GroupBy does not support iat etc.

@elashrry elashrry mentioned this issue Sep 8, 2023
5 tasks
@elashrry
Copy link

elashrry commented Sep 8, 2023

I found myself in the same situation where I wanted groupby to have a uniform behaviour even if the user passed an empty array. I dived in the base code and found that a one-line change can allow this behaviour and still passes all the tests, except for the test for not accepting an empty array, of course. I created the PR #55068 with my changes. Please, let me know if it works.

@rhshadrach
Copy link
Member

FWIW I am -1 on changes to groupby here

I am -1 as well: #55068 (comment)

From that PR, users can do this if they want without much effort:

if len(keys) == 0:
    keys = np.zeros(len(df))
result = df.grouby(keys)...

@rhshadrach rhshadrach added the Closing Candidate May be closeable, needs more eyeballs label Sep 9, 2023
@mroeschke
Copy link
Member

Since there hasn't been much support for this, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants