-
Notifications
You must be signed in to change notification settings - Fork 176
Open
Labels
Description
We need to formalise the rules around
df.group_by(keys).agg(expr.sum())when expr isn't a simple single-column expression.
The rules aren't totally clear in Polars either, see
where I noted that agg(pl.col('a', 'b').sum()) differs from agg(pl.nth(0, 1).sum()).
It looks to me like the rule maybe is
When expressions in agg are expanded out, group-by keys are excluded, unless they are selected explicitly by name.
For example:
group_by('a').agg(nw.all().sum()): column'a'should be excluded from theallexpansiongroup_by('a').agg(nw.selectors.by_type(nw.Float32).sum()): column'a'should be excluded from theallexpansiongroup_by('a').agg(nw.col('a', 'b').sum()): column'a'should be included in theallexpansiongroup_by('a').agg(nw.nth(0, 1).sum()): column'a'should be excluded from theallexpansion
Regardless of what Polars does, we may want to think of what we think a good rule would look like, as there's no guarantee that Polars would remain stable here anyway