-
Notifications
You must be signed in to change notification settings - Fork 176
Description
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
I'm writing a groupby method for annsel where we can group by one, or both of the following dataframes in the AnnData object: Variables (var) and Observations (obs).
Please describe the purpose of the new feature or describe the problem to solve.
I currently have both filtering and selecting which both make use of classes with respect to the dataframe where the query is being applied to. These wrap and return nw.col Expressions. These work, but with group_by it only expects strings. It would be nice if I could use nw.col(*names).
For example:
import narwhals as nw
from narwhals.group_by import GroupBy as NwGroupby
@nw.narwhalify
def _groupby_observation_df(df: IntoDataFrame, expr: IntoExpr) -> NwGroupby:
return df.group_by(expr)
_groupby_observation_df(obs, nw.col("Cluster_ID"))gives me the the following error: TypeError of 'Expr' object is not callable
This would help me keep my internal API consistent.
Suggest a solution if possible.
Maybe a special case for nw.col? Or a way to make it a hashable perhaps?
Perhaps a solution is to make nw.col an instantiation of it's own class Col such as how Polars does it.
If you have tried alternatives, please describe them below.
The alternative that I've tried is to strictly use strings, which works, but isn't ideal.
Additional information that may help us understand your needs.
Here is some additional context to my workflow.
from collections.abc import Callable, Iterable
from typing import Any
import narwhals as nw
from narwhals.utils import flatten
def _with_names(func: Callable) -> Callable:
def wrapper(plx: Any, *names: str | Iterable[str]) -> Any:
return plx.col(*flatten(names))
return wrapper
@_with_names
def _func(plx: Any, *names: str | Iterable[str]) -> Any:
return plx.col(*names)
class ObsExpr(nw.Expr):
"""A Obs DataFrame wrapper for the `narwhals.Expr` class."""
def __init__(self, call: Callable[[Any], Any]) -> None:
super().__init__(call)
class ObsCol:
"""Select columns from the :obj:`~anndata.AnnData.obs` DataFrame of an :obj:`~anndata.AnnData` object."""
def __call__(self, *names: str | Iterable[str]) -> ObsExpr:
"""Select columns from the :obj:`~anndata.AnnData.obs` DataFrame of an :obj:`~anndata.AnnData` object.
This is a wrapper around the `narwhals.col` function
Parameters
----------
names
The names of the obs columns to select.
Returns
-------
A `narwhals.Expr` object representing the selected columns.
"""
return ObsExpr(lambda plx: _func(plx, *names))I then run a match-case against the Expr subtype and collect those for var, obs and other DataFrames within AnnData. Depending on the operation some get executed on their respective DataFrame.
obs_col = ObsCol()
exprs = [obs_col(["Cluster_ID"])]
for expr in exprs:
match f:
case ObsCol():... # run obs_col() expression on the Obs DataFrame
case VarCol():... # run var_col() expression on the Var DataFrame