Skip to content

api: formalise expression expansion in group-by #2225

@MarcoGorelli

Description

@MarcoGorelli

We need to formalise the rules around

df.group_by(keys).agg(expr.sum())

when expr isn't a simple single-column expression.

The rules aren't totally clear in Polars either, see

where I noted that agg(pl.col('a', 'b').sum()) differs from agg(pl.nth(0, 1).sum()).

It looks to me like the rule maybe is

When expressions in agg are expanded out, group-by keys are excluded, unless they are selected explicitly by name.

For example:

  • group_by('a').agg(nw.all().sum()): column 'a' should be excluded from the all expansion
  • group_by('a').agg(nw.selectors.by_type(nw.Float32).sum()): column 'a' should be excluded from the all expansion
  • group_by('a').agg(nw.col('a', 'b').sum()): column 'a' should be included in the all expansion
  • group_by('a').agg(nw.nth(0, 1).sum()): column 'a' should be excluded from the all expansion

Regardless of what Polars does, we may want to think of what we think a good rule would look like, as there's no guarantee that Polars would remain stable here anyway

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions