Skip to content

[Enh]: Ability to use nw.col / Expr for .group_by #1385

@srivarra

Description

@srivarra

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

I'm writing a groupby method for annsel where we can group by one, or both of the following dataframes in the AnnData object: Variables (var) and Observations (obs).

Please describe the purpose of the new feature or describe the problem to solve.

I currently have both filtering and selecting which both make use of classes with respect to the dataframe where the query is being applied to. These wrap and return nw.col Expressions. These work, but with group_by it only expects strings. It would be nice if I could use nw.col(*names).

For example:

import narwhals as nw
from narwhals.group_by import GroupBy as NwGroupby

@nw.narwhalify
def _groupby_observation_df(df: IntoDataFrame, expr: IntoExpr) -> NwGroupby:
    return df.group_by(expr)


_groupby_observation_df(obs, nw.col("Cluster_ID"))

gives me the the following error: TypeError of 'Expr' object is not callable

This would help me keep my internal API consistent.

Suggest a solution if possible.

Maybe a special case for nw.col? Or a way to make it a hashable perhaps?

Perhaps a solution is to make nw.col an instantiation of it's own class Col such as how Polars does it.

If you have tried alternatives, please describe them below.

The alternative that I've tried is to strictly use strings, which works, but isn't ideal.

Additional information that may help us understand your needs.

Here is some additional context to my workflow.

from collections.abc import Callable, Iterable
from typing import Any
import narwhals as nw
from narwhals.utils import flatten

def _with_names(func: Callable) -> Callable:
    def wrapper(plx: Any, *names: str | Iterable[str]) -> Any:
        return plx.col(*flatten(names))

    return wrapper


@_with_names
def _func(plx: Any, *names: str | Iterable[str]) -> Any:
    return plx.col(*names)


class ObsExpr(nw.Expr):
    """A Obs DataFrame wrapper for the `narwhals.Expr` class."""

    def __init__(self, call: Callable[[Any], Any]) -> None:
        super().__init__(call)


class ObsCol:
    """Select columns from the :obj:`~anndata.AnnData.obs` DataFrame of an :obj:`~anndata.AnnData` object."""

    def __call__(self, *names: str | Iterable[str]) -> ObsExpr:
        """Select columns from the :obj:`~anndata.AnnData.obs` DataFrame of an :obj:`~anndata.AnnData` object.

        This is a wrapper around the `narwhals.col` function


        Parameters
        ----------
        names
            The names of the obs columns to select.

        Returns
        -------
        A `narwhals.Expr` object representing the selected columns.
        """
        return ObsExpr(lambda plx: _func(plx, *names))

I then run a match-case against the Expr subtype and collect those for var, obs and other DataFrames within AnnData. Depending on the operation some get executed on their respective DataFrame.

obs_col = ObsCol()

exprs = [obs_col(["Cluster_ID"])]


for expr in exprs:
	match f:
		case ObsCol():... # run obs_col() expression on the Obs DataFrame
		case VarCol():... # run var_col() expression on the Var DataFrame

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions